Understanding the HashSet Data Structure

In the realm of data structures, sets are fundamental collections that store unique elements. This blog will delve into the concepts of sets, focusing on the `HashSet` in Java and `set` in Python. We’ll explore their characteristics, benefits, and provide practical examples in both languages.

What is a Set ?

A set is a collection that contains no duplicate elements. It is an abstract data type that supports operations such as adding, removing, and checking for membership. Sets are typically implemented using hash tables, which provide efficient average time complexity for these operations.

Characteristic of a Set

  1. Uniqueness: Ensures no duplicate elements.
  2. Unordered: Does not maintain any particular order of elements.
  3. Efficient operations: Supports fast membership tests and operations like union, intersection, and difference.

How Set being implemented

As we know Hashmap or Hashtable using hash to hash key of element to a number and store in array with the index have been hash. The Set is the same, its used Hashmap inside with key is element of Set and value is None (or new Object() in Java)

Here is the implementation of HashSet in java, we can see it uses HashMap inside
Here the implementation of add function, we can see it uses map to store with value is new Object()

Implementing a set in python

class MySet:
    def __init__(self, bucket_size=10):
        self.bucket_size = bucket_size
        self.buckets = [[] for _ in range(bucket_size)]
    
    def _hash(self, element):
        return hash(element) % self.bucket_size
    
    def add(self, element):
        bucket_index = self._hash(element)
        bucket = self.buckets[bucket_index]
        if element not in bucket:
            bucket.append(element)
    
    def remove(self, element):
        bucket_index = self._hash(element)
        bucket = self.buckets[bucket_index]
        if element in bucket:
            bucket.remove(element)
        else:
            raise KeyError(f"{element} not found in set")
    
    def contains(self, element):
        bucket_index = self._hash(element)
        bucket = self.buckets[bucket_index]
        return element in bucket
    
    def __iter__(self):
        for bucket in self.buckets:
            for element in bucket:
                yield element
    
    def __len__(self):
        return sum(len(bucket) for bucket in self.buckets)
    
    def __str__(self):
        elements = [element for bucket in self.buckets for element in bucket]
        return f"MySet({elements})"

Source code: https://github.com/hongquan080799/data-structure-and-algorithms/tree/master/Set

Conclusion

Sets, and their specific implementations like HashSet in Java, set in python are invaluable tools for managing collections of unique elements. They provide efficient operations for adding, removing, and checking membership, as well as powerful set operations like union, intersection, and difference.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top