DEV Community

Cover image for Set Implementation In JavaScript
draczihper
draczihper

Posted on

Set Implementation In JavaScript

Introduction

Do you know the popular data structure Set in Javascript? Or you might have seen the below snippet in some of Javascript codes.

const mySet = new Set();
mySet.add("apple");
console.log(mySet.has("apple")); // OUTPUT: True
mySet.delete("apple");
console.log(mySet.has(apple)); // OUTPUT: False
Enter fullscreen mode Exit fullscreen mode

Set is a built-in data structure that allows you to store unique values of any type, whether they are primitive values of object references. Unlike arrays, which allow duplicate values, a Set guarantees that all elements are unique.

Key Characteristics of a Javascript Set

  1. Uniqueness: A Set has no duplicates and if you try to add a duplicate value it is ignored.
  2. Order: A Set maintains insertion order, so values are iterated in the order they were inserted.
  3. Data types: A Set can hold any type, including objects, primitives and even other set.
  4. No Key-Value pairs: A Set only has values unlike key-value pairs in Map.

Advantages of using a Javascript Set

  • Automatic uniqueness: Every element in the set is unique.
  • Fast lookups: Checking if an element exists is faster than in array due to it's constant time complexity O(1).

It is worth noting that, there are many advantages of using Set such as in mathematics and complex database systems but let's stick with these two for now.

Implementation

Normally a Set in Javascript is implemented by hashing the value(due to absence of key-value pairs, we just regard it to key) passed as an argument in a hashing function. The hashing implementation is not explicity described in Javascript documentation, so we will implement our own hashing function (🤞-fingers crossed, it outperforms our considerations which we will talk about later). It's also worth noting that, we will implement a sort of Linked List through chaining to avoid collisions if two different keys have them same hash code after hashing. If you don't know what is hashing? what does it do? and how it's done?, please check out this article.

Node class

We will create a class called Node to store/add our keys. This class will create an object that has key and next property.

class Node {
    constructor(key){
        this.key = key;
        this.next = next;
    }
}
Enter fullscreen mode Exit fullscreen mode

HashSet class

We will create a HashSet class that implements our hashing function and other useful Set methods. In our constructor we declare a variable initialCapacity equal to 1, to store the initial length of buckets array.
Buckets are useful storage location of keys which will be hashed. We will use these buckets to do operations (like delete, insert, lookup) on that key by using the hash generated.
Size property is initialized to track the size of our hash set and load factor of 0.75 is set in order to expand our buckets when 75% of elements fill up the buckets.

class HashSet {
    constructor(initialCapacity = 1) {
        this.buckets = new Array(initialCapacity);
        this.size = 0;
        this.loadFactor = 0.75;

    }
}
Enter fullscreen mode Exit fullscreen mode

Hash function

We define a method (object function) that takes as a parameter the key that is supposed to be hashed and return the hash code which will be used as index for the bucket to store the key.

// Define hash function inside the HashSet class

hash(key) {
    let primeNumber = 31;
    let hashCode = 0;

    for(let i = 0; i < key.length; i++) {
        hashCode = (primeNumber * hashCode + key.charCodeAt(i)) % this.buckets.length;
    }
    return hashCode;
}
Enter fullscreen mode Exit fullscreen mode

We multiply by prime number 31 because it helps reduce the possibility of hash code being evenly divisible buckets length and avoid collision (though not at 100%). Prime number 31 is a choice in this case as it one of the prime number that does not lead to many collisions. You can read this stackoverflow question to get more context. We also find the UTF-16 code of the key by iterating through every character in the key and later modulo the whole operation by our buckets length to avoid hash codes outside our buckets length.

Add method

Inside the hash set class and below the hash function we add our first method to put keys inside our hash set. Add method takes the key as a parameter. In the first if statement we check if size has reached the load factor in order to resize our hash set.
If the hash set is empty, we then find the index in the buckets array, where our hashed key will stay. Check the bucket at the index if it is empty and then put our key inside else we return that the key already exists, and if it does exist we set the next property equal to the new key.

add(key) {
        if (this.size >= this.buckets * this.loadFactor) {
            this.resize() // Resize our hash set as elements increase
        }

        const index = this.hash(key);
        if (!this.buckets[index]) {
            this.buckets[index] = new Node(key);
            this.size++;
            return true;
        } else {
            let current = this.buckets[index];
            while (current) {
                if (current.key === key) {
                    return false; // Key already exists
                }
                if (!current.next) {
                    current.next = new Node(key);
                    this.size++;
                    return true;
                }
                current = current.next;
            }
        }

}
Enter fullscreen mode Exit fullscreen mode

Has method

The has method return a boolean, if the key is available in the hash set. It takes a key parameter and hashes it to find the index of the key in the buckets array. It then iterates throught the hash set and return true if current index key is equal to the key parameter.

has(key){
    const index = this.hash(key);
    let current = this.buckets[index];
    while(current){
        if(current.key === key){
            return true;
        }
        current = current.next;
    }
    return false;
}
Enter fullscreen mode Exit fullscreen mode

Remove method

The remove method, removes a key passed as a parameter. It hashes the key and gives the index in the bucket array. Keeps track of the previous node by defining let prev = null. After that we iterate the hash set and check if our current key of iteration is equal to the key. If it is equal and it is not the first element because in the first element the previous node is null, we remove that node else we remove the first node. Later we update the previous node equal to current node and current node equal to the next node to ensure our iterations goes through all elements.

remove(key) {
    const index = this.hash(key);
    let current = this.buckets[index];
    let prev = null

    while (current) {
        if (current.key === key){
            if (prev){
                prev.next = current.next;
            }else {
                this.bucket[index] = current.next
            }
            this.size--;
            return true;
        }
        prev = current;
        current = current.next;
    }
    return false;
}
Enter fullscreen mode Exit fullscreen mode

Resize method

Resize methods expands the length of our buckets array when elements in the bucket reach the load factor. In other words when we have three elements(keys) or above and four buckets, we double our buckets array. Resize method also acts as an add method on the new capacity of our hash set. It rehashes previous available elements and put them in buckets array again. This helps to reduce collision and evenly distribute elements.

resize() {
    const newCapacity = this.buckets.length * 2;
    const newBuckets = new Array(newCapacity);

    for(let i = 0; i < this.buckets.length; i++) {
        let current = this.buckets[i];
        while(current){
            const newIndex = this.hash(current.key) % newCapacity;
            if(!newBuckets[newIndex]){
                newBuckets[newIndex] = new Node(current.key);
            } else {
                let newCurrent = newBuckets[newIndex];
                while(newCurrent.next){ 
                    newCurrent = newCurrent.next;
                }
                newCurrent.next = new Node(current.key);
            }
            current = current.next
        }
    }
    this.buckets = newBuckets;
}
Enter fullscreen mode Exit fullscreen mode

Size method

The size method return the size of our hash set.

size(){
    return this.size;
}
Enter fullscreen mode Exit fullscreen mode

Clear method

Clear method, clear our hash set and set the size equal to 0.

clear(){
    this.buckets = new Array(1);
    this.size = 0;
}
Enter fullscreen mode Exit fullscreen mode

Keys method

Keys method create an array and iterates over the hash set to push all available keys one by one including keys that were hashed to the same index.

keys() {
        const allKeys = [];
        for (let i = 0; i < this.buckets.length; i++) {
            let current = this.buckets[i];
            while (current) {
                allKeys.push(current.key);
                current = current.next;
            }
        }
        return allKeys;
    }
Enter fullscreen mode Exit fullscreen mode

There are other useful methods like bucketCount() to see how many buckets are in our hash set and collisions to view all the collisions in our hash set but that's something for later on. If your interested to see how they are implemented, go checkout this github repo.

Common Considerations

  • Hash Collisions: In practice, hash functions can result in the same index for different keys. This implementation resolves collisions using chaining, where multiple values are stored in an array at the same index.
  • Performance Considerations: A poorly distributed hash function can cause excessive collisions, making the time complexity closer to O(n) instead of O(1).

Common Use Cases

  • Filtering Duplicates: For example, when you need to remove duplicates from a list of items.
  • Membership Testing: Checking if a value exists in a collection.
  • Efficient Data Lookups: For quick existence checks in algorithms that require this (like graph algorithms or caching).

That's it for this tutorial, I hope you have learned something. If you have any suggestions, corrections, opinions don't hesitate to say something.

Top comments (0)