During my Data-Structure and Algorithm learning, I was introduced to a new data structure Bloom Filter. Which is simply amazing and I was very curious to see it in action. So I tried to put the things in Javascript and the result blew my mind 🤯. This thing is crazy.
Thanks to @Tech-Dummies for introducing and explaining about it.
Here is the simple implementation of Bloom Filter
const names = ['Abhin', 'Pai', ......]; // n names
const noOfHashFunction = 6; // number of hash functions
const storage = Array(Math.pow(2, 22) - 1).fill(0); // Bllom filter bit
const hash = (key) => {
let hashNumbers = [];
for (let i = 1; i <= noOfHashFunction; i++) {
hashNumbers.push(
Math.abs(
key.split("").reduce((a, b) => ((a << i) - a + b.charCodeAt(0)) | 0, 0)
)
);
}
return hashNumbers;
};
// Initilizing bloom filter bit for a hash index
names.forEach((name) => {
let indexes = hash(name);
indexes.forEach((index) => (storage[index] = 1));
});
// Traditional single name search
console.time("Single Traditional Search");
const isValueContain = (searchString) => {
let result;
names.forEach((name) => {
if (name === searchString) {
result = true;
return;
}
});
return result ? true : false;
};
console.log(isValueContain("Pai"));
console.timeEnd("Single Traditional Search");
// End of traditional Search
let result = [];
// Bloom filter single name search
console.time("Single Bloom Filter Search");
const isValueContainInBloom = (searchString) => {
let hashes = hash(searchString);
let result = hashes.filter((index) => !storage[index]);
return result.length > 0 ? false : true;
};
console.log(isValueContainInBloom("Pai"));
console.timeEnd("Single Bloom Filter Search");
// End of Bloom Filter Search
// Tranditional Search for 1000 names
console.time("Traditional Search");
names.forEach((name) => {
result.push(isValueContain(name));
});
console.log(result.filter((res) => !res));
console.timeEnd("Traditional Search");
// End of tranditional Search for 1000 names
// Boolm filter search for 1000 names
console.time("Bloom Filter");
names.forEach((name) => {
result.push(isValueContainInBloom(name));
});
console.log(result.filter((res) => !res));
console.timeEnd("Bloom Filter");
// End of Boolm filter search for 1000 names
And here is the proof of how fast Bloom Filter is 😍
- Single Traditional Search: 19.371ms
- Single Bloom Filter Search: 0.239ms
- Traditional Search: 18.968ms
- Bloom Filter: 5.940ms
Note: This code ran on Macbook Pro 2017 2.3 GHz Dual-Core Intel Core i5 and 8 GB RAM it may vary on different machine
According to the formula given in this post
The probability of getting a false positive is 1 for every 1000 entries but it's subjective to the number of hash function and Bloom Bit Size
Top comments (2)
Never heard of a bloom filter before but its certainly interesting to learn about something new =)
However since constructing the structure there has a cost I think it would be more fair to include that in the timed section, in that scenario things don't look as good sadly. But it could certainly be useful if you have to construct it only once and need to do a lot more than 1000 checks.
I ran some local benhmarks here (Win 10, Chrome)
Traditional Search: 21.74609375 ms
Bloom Filter: 659.390869140625 ms (including setup)
Bloom Filter: 2.97900390625 ms (excluding setup)
Index Of: 1.644287109375 ms (your traditional search loop replaced with array.indexOf...)
For Loop: 4.094970703125 ms (your traditional search, but with a normal for loop instead of forEach)
Cool, honestly I tried neither foreach nor indexOf but these benchmarks seem to be quite interesting.
While digging into some more articles and posts I came to know that Bloom filter is more about saving the memory, I found one post in which he explained the purpose of it very neatly post
But as far as a construction concern, I guess it might be either prerequisite or cons of Bloom Filter but along with it, Bloom filter also says that we are not allowed to remove the data once it's fed into it according to wiki