Linear probing overview

Linear probing is a collision resolution technique used in hash tables. When a hash function maps two or more keys to the same index in the array, a collision occurs. In linear probing, when a collision occurs, the hash table looks for the next available index in the array and stores the element there, instead of creating a linked list like in chaining. The search for the next available index is done by sequentially scanning the array, starting from the collision index and wrapping around to the beginning of the array if necessary.

While it's a simple and easy-to-implement method, its effectiveness depends on certain conditions and use cases:

  1. Low Load Factor: Linear probing works best when the hash table has a low load factor, typically well below 0.7. The load factor is the ratio of the number of stored elements to the total number of slots. A lower load factor means fewer collisions, which is ideal for linear probing since it reduces the likelihood of long sequences of occupied slots (clusters).

  2. Good Hash Function: A hash function that distributes keys uniformly across the hash table is crucial. If the hash function is poor and causes many collisions, linear probing can lead to large clusters, significantly degrading performance.

  3. Small Data Sets: For smaller datasets, where the cost of managing a more complex collision resolution strategy outweighs its benefits, linear probing can be a good choice due to its simplicity and lower overhead.

  4. Cache Efficiency: Linear probing can be cache-friendly since it accesses consecutive memory locations. This is beneficial in scenarios where cache performance is critical, as it can lead to fewer cache misses compared to methods like chaining or double hashing.

  5. Simplicity of Implementation: In situations where simplicity and ease of implementation are important, linear probing is advantageous. It requires less additional memory than separate chaining (another common collision resolution technique) and is generally easier to implement than more complex schemes like double hashing or cuckoo hashing.

  6. Non-Uniform Data: If the data is not uniformly distributed, linear probing can lead to clustering, which is a scenario where a sequence of adjacent slots gets filled. This can significantly slow down both insertion and search operations.

  7. Use in Real-Time Systems: In real-time systems where predictability is more important than average performance, linear probing can be beneficial. It offers consistent and predictable performance, as the maximum number of probes is fixed.

  8. Workloads with Few Deletions: Linear probing is more suitable for scenarios where deletions are infrequent. Deletions in linear probing hash tables can be complex, as simply emptying a slot could break the probe sequence. Special care (like using a "deleted" marker) is needed to handle deletions properly.

Linear probing is a good choice for hash tables when dealing with small to moderately-sized datasets, low load factors, a need for cache efficiency, and simplicity of implementation is a priority. However, it's less suitable for large datasets, high load factors, or datasets with a high frequency of deletions.

Here is an example implementation of a hash table using linear probing in C++:

// main.cpp // LinearProbing // Created by Kevin Roark on 8/8/23. #include <iostream> #include <vector> using namespace std; class HashTable { private: int capacity; vector<int> keys; vector<int> values; int hash(int key) { return key % capacity; } public: HashTable(int capacity) : capacity(capacity), keys(capacity, -1), values(capacity, -1) {} // Initialize keys and values with -1, indicating empty slots void insert(int key, int value) { int index = hash(key); while (keys[index] != -1 && keys[index] != key) { index = (index + 1) % capacity; // Linear probing step } keys[index] = key; values[index] = value; } bool find(int key, int &value) { int index = hash(key); while (keys[index] != -1) { if (keys[index] == key) { value = values[index]; return true; } index = (index + 1) % capacity; // Linear probing step } return false; } void remove(int key) { int index = hash(key); while (keys[index] != -1) { if (keys[index] == key) { keys[index] = -1; values[index] = -1; return; } index = (index + 1) % capacity; // Linear probing step } } void print() { for (int i = 0; i < capacity; i++) { cout << "[" << i << "]: " << keys[i] << " -> " << values[i] << "\n"; } } }; int main() { HashTable hashTable(10); hashTable.insert(15, 150); hashTable.insert(25, 250); hashTable.insert(35, 350); hashTable.print(); int value; if (hashTable.find(25, value)) { cout << "Key 25 found with value: " << value << "\n"; } else { cout << "Key 25 not found\n"; } hashTable.remove(25); hashTable.print(); if (hashTable.find(25, value)) { cout << "Key 25 found with value: " << value << "\n"; } else { cout << "Key 25 not found\n"; } return 0; }

Output:

 

Here, the linear probing is achieved by incrementing the index by 1 ((index + 1) % capacity) each time a collision is detected until an empty slot or a slot with the same key is found.

This approach makes sure that the table gets filled efficiently, but it can lead to clustering, where groups of adjacent slots are filled, slowing down the search, insert, and delete operations. Various probing sequences and rehashing techniques can mitigate these issues.

Class Definition

Private Members

  • capacity: The size of the hash table.

  • keys: A vector of integers that stores the keys.

  • values: A vector of integers that stores the corresponding values.

  • hash(int key): A simple hash function that returns the remainder of the key divided by the capacity.

Public Members

  • Constructor: Initializes the capacity and sets all keys and values to -1, indicating empty slots.

  • insert(int key, int value): Inserts a key-value pair into the hash table.

    1. It calculates the hash index using the hash function.

    2. It then checks if the slot at the index is occupied (if keys[index] != -1).

    3. If the slot is occupied, it performs linear probing by incrementing the index by one in a loop (index = (index + 1) % capacity) until an empty slot or a slot with the same key is found.

    4. It then inserts the key and value into the slots.

  • find(int key, int &value): Searches for a key in the hash table.

    1. It calculates the hash index using the hash function.

    2. It then searches for the key using linear probing, incrementing the index in a loop until the key is found or an empty slot (keys[index] == -1) is reached.

    3. If the key is found, it returns true and sets the corresponding value. If not found, it returns false.

  • remove(int key): Removes a key-value pair from the hash table.

    1. It calculates the hash index using the hash function.

    2. It then searches for the key using linear probing, as in the find method.

    3. If the key is found, it sets the key and value at that index to -1, indicating an empty slot.

  • print(): Prints the contents of the hash table, showing both keys and values.

main Function

In the main function, a HashTable object is created with a capacity of 10. Three key-value pairs are inserted, and the contents of the hash table are printed.

The code then demonstrates finding a key (25) and prints whether it's found along with its value. Afterward, the code removes the key (25) and prints the updated hash table contents, and again checks for the key.

This code implements a hash table with linear probing in C++. The linear probing resolves collisions by searching linearly in the array for the next empty slot. This implementation uses two parallel vectors to represent the keys and values, respectively, and employs basic operations like insert, find, remove, and print.

2024 - Programming 3 / Data Structures - Author: Dr. Kevin Roark