Understanding the Hashtable Data Structure

Data structures are vital in computer science for efficient data organization and manipulation. Today, we will explore one of the most efficient data structures for key-value pair storage: the Hashtable.

What is a Hashtable?

A Hashtable, also known as a hashmap, is a data structure that maps keys to values for highly efficient data retrieval. It uses a hash function to compute an index into an array of buckets or slots. Its amazing superpower is fast reading, with time complexity of O(1) when finding an element and inserting it.

Core concepts of Hashtable

  • Hash Function: A function that convers a given key into an index in the hashtable. A good hash function distributes keys uniformly across the buckets.
  • Buckets: Array slots where the key-value pairs are stored. Each bucket can store multiple key-value pairs in case of collision.
  • Collision Handling: Techniques to handle situations where multiple keys hash to the same index. Common methods include chaining and open addressing.

Hashing with hash functions

We can image how hash function work with simple example like this:

Map letters to numbers:

  • A = 1
  • B = 2
  • C = 3
  • D = 4
  • E = 5
  • ABC converts to 123
  • ABE converts to 125
  • BCD converts to 234

The process of converting characters into numbers is called hashing. There are many other hashing functions besides this one. We can convert characters into numbers, then return the sum of them as an index to store key-value pairs.

For example, we have a key-value pair: ABC – “this is the value that needs to be stored.”

  1. Convert ABC -> 123
  2. Sum is : 1 + 2 + 3 = 6
  3. The value is being stored in an array with index 6.

Collision Handling Techniques

So what cause Collisions, with example above if we want to store key-value: BB:”value” what will happen ? The index return is 6 that is the same as ABC key, how to deal with it

  • Chaining: Each bucket contains a list (or another structure) of all key-value pairs that hash to the same index.
  • Open Addressing: When a collision occurs, the algorithms searches for the next available bucket using method like linear probing, quadratic probing, or double hashing

Use Cases of Hashtable

Hashtables are widely used due to their efficiency and versatility. Some common use cases include:

  • Database indexing: Fast data retrieval in databases.
  • Caches: Implementing caches for quick data access.
  • Counting frequencies: Counting occurrences of items (like words in a document).

Implementing a Hashtable in Python

def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]

    def _hash(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self._hash(key)
        for kvp in self.table[index]:
            if kvp[0] == key:
                kvp[1] = value
                return
        self.table[index].append([key, value])

    def lookup(self, key):
        index = self._hash(key)
        for kvp in self.table[index]:
            if kvp[0] == key:
                return kvp[1]
        return None

    def delete(self, key):
        index = self._hash(key)
        for i, kvp in enumerate(self.table[index]):
            if kvp[0] == key:
                del self.table[index][i]
                return

Source Code: https://github.com/hongquan080799/data-structure-and-algorithms

Conclusion

Hashtables are a powerful data structure providing efficient key-value pair storage and retrieval. Their average-case time complexity of O(1) for insertion and lookup makes them invaluable in many applications, from database indexing to implementing caches. Understanding the fundamentals of hashtables, including hash functions and collision handling, is essential for optimizing data storage and access in your programs.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top