Assignment 5: Bloom Filter

Due: Oct 21, 11.55PM

(Submit at moodle.)

In this assignment we will implement a Bloom Filter which is a data structure for storing an unordered set using hash functions.

Suppose we are given a set S, drawn from a universe U and queries of the form "is x in S?", we define a Bloom filter as follows:

Membership Queries: Given a query "is x in S?" we look at A[h1(x)], A[h2(x)] and A[h3(x)]. If they are all 1 then we return "yes" as the answer, otherwise we return "no."

Note that it is possible that x is not in S but A[h1(x)], A[h2(x)] and A[h3(x)] are all set to 1 (due to some other elements being in S which have set those positions to 1). This phenomena is called a false positive i.e. the query returns a yes answer although the element is not in the set.

For this assignment you will implement a Bloom filter on the universe U = {0, 1, 2, ..., 99}. Your array should have size 100. You will choose any three hash functions of the from h(x) = ((ax + b) mod c) mod 100 where a, b and c are large positive integers. You can hardcode the hash functions, but your program should also give the TA the option of changing the hash functions at the beginning.

For your demo, you should do the following:

Assumption:

We will assume that there is no deletion for the purposes of this assignment i.e. only insertion and membership queries have to be implemented

Notes: