Monthly Archives: November 2011

Nov19

Bloom filter – A probabilistic datastructure in c

Posted on November 19, 2011 by neethutv

One of my project in c datastructures is to implement a bloom filter. Bloom filter is a space efficient probabilistic datastructure in c. like a set datastructure , the main function of bloom filter is to test whether a element is the member of a set. Probabilistic data structure means it tell us an element definitely not in the set or may be in the set. Bloom filter is an application of bitvector . When using bloom filter false positives are possible and false negative does’nt.

The main datastructure of bloom filter is a bit array. Implementation of bloom filter contain hash functions to get the array positions. An empty bloom filter contain all bits in the bit array zero .There are many hash functions in c which can be used to implement bloom filter. Each hash function maps some elements in the array. To add an element in the array ,element must pass through each hash functions and each function returns a integer corresponding to the element. Then using this integer as the index in the array we set the corresponding bit as do in the bit vector implementation .

Let we can take a simple example of idea behind bloom filter.

We insert and search on a bloom filter of size m=10 and the number of hash functions will be k=3.

Let H(x) denote the result of three hash functions and we can write it as a set { h1(x) ,h2(x) ,h3(x)}

This is our empty array

Then we inserting an element x0. The element must pass through 3 hash function and we obtain values as follows,

H(x0)= { 1, 4, 9}

Then array like this

Insert another element x1 , it also pass through hash function and we got a set as follows and we set the bit in bitarray

H(x1)={ 4, 5, 8}

Next we query for an element y0 in bloom filter it again pass through same hash function and we get

H(y0)= { 0 ,4, 8 } here the element is not in the set because of one of its bit is zero .

then H(y1)= { 1, 5, 8} here the element is present and it is a false positive

In my program i select two hash functions FNV hash function and DJB hash functions. These are two of the most efficient general purpose hash functions in c.

We can start from a dynamic array named bloom and allocate memory using malloc() function in c. To add an element to this array we just pass the element through the two hash functions and set the bit in the bit vector at the index of those hashes to 1. At first we pass the string to FNV hash function ,it return a specific hash value using this we set the bit as 1 in bit vector ,second we pass through DJB hash function. This hash function seems to have a good over-all distribution for many different data sets. Again using the hash value we set the another bit in the bloom array . Here is a simple code of my program

To test the membership of the element then hash the element with same hash function and check those values in the bitvector is set to 1. Element is pass through the same hash function used for insertion and we check those values in the bloom is 1 if it is true then element may be present otherwise definitely not in the set . In the below code if both r1 1nd r2 true then element may be present.

Source code is available here https://bitbucket.org/neethuedappal/c-data-structures/changeset/21b72694a7f0

Nov17

Bit vector implementation in C

Posted on November 17, 2011 by neethutv

Standard

Bit vector implementation provide space optimization by using a re-sizable character array . This article will help you to understand implementation of bitvector in c programming.

What is bit vector ?

Bit vector is composed of two words bit and vector. We already know bit and vector , bit is smallest unit of data in computer and it has single value either 0 or 1. Here vector defining a structure which is basically a re-sizable array . So bit vector is mean to pack values into an array so that no space is wasted. Bit vectors are array of bits. When we defining an integer array which can hold 10 elements then each block in the array contain 32 bit and total size will be 320 bit but we does’nt use entire 320 bit . So remaining space will be waste. Bit vector implementation is a solution to this problem . Here we declare a character array of some size and initialize it as zero .

In my program i declare an array named bit vector and initilize as zero . To insert an element into this dynamic array some steps are there Insertion means only setting a bit as true not actual insertion .

Let we take element ‘n ‘ as ‘6’

To insert 6 we need a block in the array and a bit position .
To find the block were 6 belongs , dividing 6 by the block size. Here array bitvector contain array of character so the block size will be 8 bits. 6 /8 is 0 and 6 must be belongs to 0th block.
To find bit position of 6 in the 0’th block take reminder of 6 %8 . So we get the bit position as 6
Here no insertion of element ‘6’ only setting the bit in the corresponding bit position as 1.
To set a bit ,left shifted 1 p times, where p is bitposition, and OR it with the value in the block

bitvector[bitpostion] |= 1 << bit position

Now we set 6’th bit in 0’th block as 1.

To check whether a number is present or not in bit array we need the block which number belongs and also bitpostion.

Find the block as do in the insertion
Find the bitposition also.
If the bit in the bitposition is zero no element is there
If the value returned by the bitwise AND with the value in the block and left shifing of 1 p times ,were p is bitposition will true then number is present in bit array.

To clear a bit without affecting the all other bit in bit array the bit must be change to 0.

So again shift a 1 left with bitposition
Apply unary operator ~ to reverse all bits , ~(1 <<bitposition)
Then binary AND operator with value in the block size

These are the main steps to implement a bit vector and bit vector implement a set data structre to store individual bits . Bit vector implementation in many languages provide space optimization.

Nov10

Avl Tree- a self balanced binary search tree

Posted on November 10, 2011 by neethutv

Standard

We are familiar with binary search tree , an efficient datastructure contain data as nodes. Avl tree is also binary search tree. What is difference between binary search tree and avl tree ? The answer is avl tree is self balanced , and it was first data structure which is self balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. In avl tree height of two child subtrees of any node differ by atmost one .The balance factor of a node is the height of left subtree minus height of right subtree and a node with balance factor 1,0,-1 considered as balanced. Below an avl tree with 9 nodes and each node have a balance factor of 1 ,-1 or 0.

When inserting a new node or deleting en existing node , the resulting tree will disturb the balance of tree. An unbalanced state is defind as a state in which any subtree have a balance factor greater than 1 or less than one . Tree rotations are necessary to retain the balanced state. A tree rotation move one node up and one node down in the tree.

Insertion

Every time inserting a node in tree ,it is necessary to check each of the nodes ancestors for consistency with the rules of avl. If the balance factor of each node remains -1,0 ,+1 then no rotations are necessary . In my program a function named balance(p) return the balance factor of the node p in the tree . The following code will check the balance condition.If a node in tree violate avl rule ,then rotation will necessary then we get a balanced tree.

There are four cases to be considered in which two are symmetric to the other two. Let p be the root and we can observe different cases.

Right right case and right left case

If the balance factor of p is less than or equal to -2 then right subtree of p overweight the left subtree ,then balance factor of right child of p will be checked.
If the balance factor of right child is -1 then single left rotation needed.

Left rotation can be done in following steps. Select a new node called pivot it will be the new root. Set right child of p as root and p takes the ownership of pivot’s left child , then p will be the left child of new root pivot.

Here a tree with 3 as root node and it is right heavy. Its balance factor is -2 so and balance factor of its right child is -1 then a single left rotation is takes place and the resulting tree must be balanced.

If the balance factor of right child is +1 then two rotations are needed .Sometimes single rotation not sufficient to balance a unbalanced tree .Such cases two rotations necessory. First is right rotation with p’s right child as root and second is left rotation with p as root. After right rotations we get a unbalanced tree with right subtree left heavy .Then a left rotation is necessory to balance the tree. The tree below need two rotations ( double right rotations ) because balance factor of root’s right child is +1 ,after two rotation resulting tree will balanced.

Left-Left case and Left-Right case

If the balance factor of p is greater than or equal to +2 then left subtree of p overweight the right subtree. Then balance factor of left child will be checked .In these cases we use right rotations

If the balance factor of left child is +1 ,a single right rotations is needed.

Right rotation is just opposite of left rotation. Here set p’s left child to pivot and it will be the new root . p takes ownership of the pivot’s right child as its left child . Pivot take ownership of p as its right child .The code as follows

Then below tree is left heavy and it need a single right rotation and resulting tree obey rules of avl .

If the balance factor of p’s left child is -1 ,then two rotations are necessory , First is left rotation with p’s left child as root and second is right rotation with p as root . The following tree is left heavy with root as 5 and its left child is right heavy. So two rotations are needed to balance the tree.

Following above rules you shold be able to balance an avl tree following an insert or delete every time. After all rotation the balance factor must be recalculated and check again the rules of avl .

Source code is available here https://bitbucket.org/neethuedappal/c-programming-exercises/changeset/e39e9d9727a2