Hash Tables - Hash Map
A hashtable, also known as a hash map, is a data structure that implements an associative array, a structure that can map keys to values. In C++, hashtables are typically implemented using the standard container unordered_map
.
Imagine a hashtable as a big set of drawers, where each drawer is labeled with a unique number. You can store and find your items (like documents or keys) in these drawers quickly. Determining which drawer to use for a particular item involves calculating a number based on the item's name or some identifying feature. This is what the hash function does—it converts the item's name into a drawer number.
Simple Explanation
Hash Function: A method to turn the item's name into a drawer number.
Bucket: Each drawer where items are stored.
Insert, Search, Delete: Putting an item in a drawer, finding it, or removing it.
Use Case: Classroom Attendance
Imagine you are a teacher with a classroom of students, and you want to keep track of their attendance quickly. You decide to use a hashtable to do this, where each student's name (or you could use the student ID - BannerID) is the key, and their attendance record is the value.
Set Up the Hashtable: You create a set of drawers (the hashtable) where each drawer will represent a student and hold their attendance record.
Adding Attendance Records: When a student attends a class, you calculate the drawer number (using the hash function) from the student's name (or ID) and put a mark in that drawer.
Checking Attendance: To find out if a student was present, you use their name to open the correct drawer and see if there’s a mark.
Updating Records: If a student attends another class, you again use their name to find the right drawer and add another mark.
In C++ code, using unordered_map
, this could look like this:
#include <unordered_map>
#include <string>
using namespace std;
unordered_map<string, int> attendance;
// Mark attendance for a student
attendance["John Doe"]++;
// Check attendance
int daysAttended = attendance["John Doe"];
In this example:
"John Doe"
is the key (the student's name).The value (
int
) represents the number of days attended.When you increment
attendance["John Doe"]
, you are essentially adding a mark in John Doe’s “drawer” in the hashtable.
So, a hashtable allows you to efficiently store and retrieve information (like attendance records) using keys (like student names). It's like having a well-organized set of drawers where you can quickly find what you're looking for.
Key Concepts
Hash Function: A function that computes an index (hash code) from a key. This function maps keys to buckets in the hashtable.
Bucket: A storage space in the hashtable where elements with the same hash code are stored. Each bucket can contain zero, one, or more elements.
Collisions: Occur when two keys hash to the same index. These are usually resolved by chaining (storing multiple elements in the same bucket) or by open addressing (finding another bucket for the second element).
Basic Operations
Insertion: To insert a key-value pair, the hash function calculates the index for the key, and the pair is stored in the corresponding bucket.
Search: To find a value by its key, the hash function calculates the key's index, and the bucket at that index is searched for the element.
Deletion: To delete a pair, the hash function finds the correct bucket, and the pair is removed from it.
C++ unordered_map
In C++, unordered_map
is the standard hashtable implementation. Here's how you can use it:
Include the Header: Include the
unordered_map
header file.#include <unordered_map>
Declare an Unordered Map: Create an
unordered_map
with the desired key and value types.unordered_map<string, int> map;
This creates a hashtable where keys are strings and values are integers.
Insert Elements: Use the
insert
method or the subscript operator[]
.Access Elements: Use the subscript operator
[]
or theat
method.Check for Existence: Use the
find
method to check if a key is present.Delete Elements: Use the
erase
method.Iterate over Elements: Use iterators to loop through the
unordered_map
.
Performance
Time Complexity: Average case for search, insert, and delete operations is O(1). In the worst case (e.g., many collisions), it can become O(n).
Space Complexity: O(n), where n is the number of key-value pairs in the map.
Best Practices
Choose a good hash function to minimize collisions.
Resize the hashtable appropriately to balance between time and space efficiency.
Use custom hash functions for complex types if necessary to improve distribution and performance.
2024 - Programming 3 / Data Structures - Author: Dr. Kevin Roark