Homework 2: Hash functions

This assignment is to give you some practice with hash functions. You are required to submit the programming part of the assignment in softcopy, and the descriptive parts (Problem 1 and results of Problem 2) on paper.

Problem 1: Compression Maps (CLR 12.3-3)

Consider a version of the division method in which h(k) = k mod m, where m = (2^p) - 1 and k is a character string interpreted in radix 2^p. Show that if string x can be derived from string y by permuting its characters, then x and y hash to the same value.

Problem 2: Expected Number of Collisions

If you have N buckets and a good (pseudorandom) hash function, the probability of any two keys colliding is 1/N. So when you have k keys in the table and insert key k + 1, the probability that the new key does NOT collide with any old key is (1 - 1/N)^k. If you insert n distinct items, the expected number that WON'T collide is:

n-1
sum (1 - 1/N)^k = N - N(1 - 1/N)^n
k=0

so the expected number of collisions is
n - N + N(1 - 1/N)^n.

We require you to implement a hash table (see support files). Your hashtable should count the number of collisions for n elements. You should then compare the number of collisions you get with the formula given above. For example, if you have N=100 buckets and n=100 keys, expect about 36.6 collisions. In your written submission, report the number of collisions for the following combinations, and compare them with the expected value. You should clearly state the hash function you used. If you used different hash functions and observed different behavior, report that too.
  1. N=100 buckets, n=100 keys
  2. N=100 buckets, n=50 keys
  3. N=100 buckets, n=200 keys

Support files:
  1. Dictionary.java
  2. Entry.java
  3. HashTableChained.java
  4. words (input words : first 1k words from /usr/share/dict/words)
Compile outputs: outfile.compile
Run outputs: outfile.run