Hash and Bitmap Indexes

Hash Index

A hash index is a type of index structure that uses a hash function to map key values to specific locations in the index. It is designed for fast data retrieval and is particularly efficient when searching for exact match queries.

Here's how a hash index works:

  1. Hash Function: A hash index employs a hash function to generate a hash value for each indexed key. The hash function takes the key value as input and calculates a fixed-size hash code, which is used as the index's address.

  2. Hash Buckets: The index consists of a collection of hash buckets or slots, where each bucket corresponds to a specific hash value or address. The hash buckets store pointers to the actual data or disk locations where the indexed records are stored.

  3. Hash Collision: Since multiple keys may produce the same hash value (hash collision), hash indexes typically include collision resolution mechanisms to handle such situations. The two common techniques used for collision resolution are chaining and open addressing.

    • Chaining: In chaining, each bucket contains a linked list of records with the same hash value. When a collision occurs, the new record is simply added to the linked list in the corresponding bucket.

    • Open Addressing: In open addressing, when a collision happens, the index searches for the next available empty bucket (or a pre-defined sequence of buckets) to store the new record.

  4. No Ordering: Hash indexes do not maintain any particular order of the keys, unlike B-trees or B+trees. As a result, they are best suited for exact match queries rather than range queries or other types of searches.

  5. Hash Function Performance: The efficiency of a hash index depends on the performance of the hash function. A good hash function should evenly distribute the keys across the hash buckets to minimize collisions and provide efficient access to the data.

Hash indexes offer fast data retrieval for exact match queries, making them suitable for scenarios where rapid access to specific records is critical. However, they may not be as efficient for range queries or non-exact match searches since hash functions do not preserve any ordering.

Bitmap Index

A bitmap index is a specialized type of index structure that uses bitmaps to efficiently represent the presence or absence of values for each row in a table. Unlike traditional index structures that point to the locations of data, a bitmap index stores a bitmap for each distinct value in an indexed column, indicating which rows contain that value.

Here's how a bitmap index works:

  1. Indexed Column: A bitmap index is created on a single column in a table, often a column with low cardinality (a small number of distinct values). Examples of such columns include gender (with values "male" and "female") or status (with values "active" and "inactive").

  2. Bitmap Representation: For each distinct value in the indexed column, the bitmap index creates a bitmap, which is a sequence of bits. Each bit in the bitmap corresponds to a row in the table. If a bit is set to 1, it means the value is present in that row; if it is set to 0, the value is absent.

  3. Bitmap Operations: Bitmap indexes are particularly efficient for certain types of queries, such as exact match queries and conjunctions (AND operations) and disjunctions (OR operations) of multiple conditions. These operations can be performed quickly using bitwise logical operations (AND, OR, NOT) on the bitmaps.

  4. Space Efficiency: Bitmap indexes can be highly space-efficient, especially for columns with low cardinality. Since each bitmap represents only one value, the space required is proportional to the number of distinct values, not the number of rows in the table.

  5. Updates: Bitmap indexes are well-suited for read-intensive workloads but may not be as efficient for write-intensive operations or frequent updates. Updating a bitmap index involves updating the corresponding bitmap(s), which can be computationally expensive.

Bitmap indexes are most effective when used on columns with low cardinality and for queries that involve filtering on multiple conditions. They can significantly speed up such queries by performing bitwise operations on the bitmaps rather than scanning the entire table. Bitmap indexes might not be suitable for columns with high cardinality, such as columns with many distinct values, as the bitmaps would be too large and memory-intensive. In such cases, traditional index structures like B-trees or B+trees are more appropriate.