Hash table resizing using a vector

A hash table (also called a hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. C++ does not have a built-in hash table - however, we can implement one.

Hash table resizing is an important technique used to maintain the efficiency of hash table operations as the number of elements in the hash table grows. When a hash table becomes too full, the cost of searching for an element can increase, because collisions become more likely. Resizing the hash table involves creating a new, larger array and rehashing all of the elements from the old array into the new array. This process ensures that the new hash table has a lower load factor (the ratio of the number of elements in the hash table to the size of the array), reducing the likelihood of collisions and maintaining the constant-time average-case complexity of hash table operations.

Here is an example implementation of a hash table with resizing in C++:

// HashNode.h // HashTableResize // Created by Kevin Roark // #ifndef HashNode_h #define HashNode_h // HashNode represents a node in a chained hash table template <typename K, typename V> class HashNode { public: K key; // the key of this node V value; // the value associated with this key HashNode* next; // pointer to the next node in the chain HashNode(K key, V value) { this->key = key; this->value = value; this->next = nullptr; } }; #endif /* HashNode_h */
// HashTable.h // StacksAndQueuesDemo // Created by Kevin Roark on 7/21/23. // #ifndef HashTable_h #define HashTable_h #include <iostream> #include <vector> #include "HashNode.h" using namespace std; template <typename K, typename V> class HashTable { public: const int INITIAL_SIZE = 100; // initial size of the hash table const double LOAD_FACTOR_THRESHOLD = 0.75; // threshold for resizing the table int size; // the number of entries in the hash table std::vector<HashNode<K, V>*> table; // the hash table, represented as a vector of pointers to HashNodes // constructor HashTable() { size = 0; table = std::vector<HashNode<K, V>*>(INITIAL_SIZE, nullptr); } // destructor ~HashTable() { for (int i = 0; i < table.size(); i++) { HashNode<K, V>* node = table[i]; while (node) { HashNode<K, V>* nextNode = node->next; delete node; node = nextNode; } table[i] = nullptr; } } // hash function int hashFunction(K key) { std::hash<K> hashFunction; int index = hashFunction(key) % table.size(); return index; } // resize the hash table void resize() { std::vector<HashNode<K, V>*> oldTable = table; table = std::vector<HashNode<K, V>*>(2 * oldTable.size(), nullptr); size = 0; for (int i = 0; i < oldTable.size(); i++) { HashNode<K, V>* node = oldTable[i]; while (node) { put(node->key, node->value); HashNode<K, V>* nextNode = node->next; delete node; node = nextNode; } } } // insert a key-value pair into the hash table void put(K key, V value) { int index = hashFunction(key); HashNode<K, V>* node = table[index]; while (node) { if (node->key == key) { node->value = value; return; } node = node->next; } HashNode<K, V>* newNode = new HashNode<K, V>(key, value); newNode->next = table[index]; table[index] = newNode; size++; if ((double)size / table.size() >= LOAD_FACTOR_THRESHOLD) { resize(); } } // get the value associated with a key V get(K key) { int index = hashFunction(key); HashNode<K, V>* node = table[index]; while (node) { if (node->key == key) { return node->value; } node = node->next; } throw std::out_of_range("Key not found"); } // check if a key is in the hash table bool containsKey(K key) { int index = hashFunction(key); HashNode<K, V>* node = table[index]; while (node) { if (node->key == key) { return true; } node = node->next; } return false; } }; #endif /* HashTable_h */

Driver

// main.cpp // HashTableResize // Created by Kevin Roark // #include <iostream> #include "HashTable.h" using namespace std; int main() { HashTable<string, int> hashTable; hashTable.put("Apple", 1); hashTable.put("Banana", 2); hashTable.put("Cherry", 3); cout << "Apple: " << hashTable.get("Apple") << endl; cout << "Banana: " << hashTable.get("Banana") << endl; cout << "Cherry: " << hashTable.get("Cherry") << endl; try { cout << "Orange: " << hashTable.get("Orange") << endl; } catch (const out_of_range& e) { cout << e.what() << endl; } return 0; }

The code is a simple implementation of a hash table (or hash map) in C++. A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values.

Let's break it down:

  1. HashNode: This is a basic node that includes a key-value pair and a pointer to the next node. It's used to handle collisions in the hash table by chaining, where multiple keys that hash to the same index are linked together in a linked list.

  2. HashTable: This is the primary hash table class. It's implemented as an array of HashNode pointers.

    • The INITIAL_SIZE and LOAD_FACTOR_THRESHOLD are constants that define the initial size of the hash table and the load factor at which the table will be resized, respectively.

    • The hashFunction function is used to generate a hash for a given key. In this case, it uses the STL's std::hash function to generate a hash and then finds the modulus by the size of the array to fit it into the current size of the array.

    • resize is a function that is used to increase the size of the hash table when the load factor (number of entries / size of table) exceeds the threshold. When resizing, the hash table is doubled in size, and all existing entries are rehashed and put into the new larger table.

    • put is the function used to insert key-value pairs into the hash table. If a collision occurs (that is, two different keys have the same hash), the new node is added to the existing linked list at that index.

    • get is the function to retrieve the value for a given key. It finds the appropriate index for the key using the hash function and then searches the linked list at that index to find the desired key.

    • containsKey is a utility function that checks if a given key exists in the hash table or not. It uses the same process as get to find the key, but simply returns true or false rather than the associated value.

The destructor (~HashTable) is used to clean up the memory. It deletes all the nodes that were created in the hash table.

This implementation provides an efficient way to implement a map or a dictionary where insertion, deletion, and retrieval operations can be done in approximately O(1) time complexity on average. It's a fundamental concept in computer science and used widely in many algorithms and systems.

2024 - Programming 3 / Data Structures - Author: Dr. Kevin Roark