Содержание
## Hash Function Efficiency(This article is also available in Russian) In article of Arash Partow "General Purpose Hash Function Algorithms" several 32-bit algorithms are reviewed: - rs — simple hash function from Robert Sedgwicks book 'Algorithms in C'
- js — bitwise hash function by Justin Sobel
- pjw — algorithm based on work by Peter J. Weinberger
- bkdr — hash function from Brian Kernighan and Dennis Ritchie’s book 'The C Programming Language'
- sdbm — algorithm of choice used in SDBM project
- djb — algorithm produced by Professor Daniel J. Bernstein
- dek — algorithm proposed by Donald E. Knuth in 'The Art Of Computer Programming'
- ap — algorithm produced by Arash Partow
Another five variants: - faq6 — number 6 from FAQ by Bob Jenkins
- lookup3 — author Bob Jenkins
- ly — proposed by Leonid Yuriev (congruential generator)
- rot13 — simple algorithm with circular shift, by Serge Vakulenko
- crc32 — standard checksum with polynom x
^{32}+x^{26}+x^{23}+x^{22}+x^{16}+x^{12}+x^{11}+x^{10}+x^{8}+x^{7}+x^{5}+x^{4}+x^{2}+x+1
Here are the C sources. ## Test 1To measure the efficiency of hash functions I prepared the following test data: - American dictionary from Ispell project, 62075 words.
- Russian dictionary from Ispell project, 128900 words.
- A list of symbols, extracted from all libs on my linux workstation (libc.a and others), 136073 words.
Total volume after merging is For every word a 32-bit hash value was computed, and counted a number of collisions.
A list of collisions for rs, bkdr, sdbm, ap, ly, rot13, faq6, lookup3 and crc32 is available here. Algorithms with minimal collisions: - rot13 — one circular shift (rotation) and addition
- lookup3 — one shift and addition (or more)
- ly — one multiplication and addition
- rs — two multiplications
Results are presented on picture. Two outsiders — pjw and dek — exceed the limits of Y axis. ## Test 2In previous test, all data had MSB unchanged. For the second test another data set was selected: - German dictionary from Ispell project, 39612 words.
- Hungarian dictionary from Ispell project, 211880 words.
- Italian dictionary from Ispell project, 37268 words.
- Swedish dictionary from Ispell project, 24019 words.
Total volume after merging is Algorithms js, pjw, djb и dek were excluded from testing.
Algorithms with minimal collisions: - rs — two multiplications
- ly — one multiplication and addition
- rot13 — one circular shift (rotation) and addition
Results are presented on picture. |
||||||||||||||||||||||||||||||||||||||||||||||||||