Introduction
In v1.24, Go replaced its Map implementation with a new version of hash table called Swiss Table, or flat_hash_map.
What is a Swiss Table?
A Swiss Table is a map, that uses a cache-friendly, more efficient (with shorter memory footprint) approach that makes comparisons and insertions faster.
It also uses a different strategy for addressing collisions: linear probing on steroids over the previous strategy chaining.
When working with hash-tables, one thing is certain: there will be conflict.
The Old Map
The previous implementation was highly tuned for memory efficiency and performance.
Go team is awesome.
Chaining
How does it work? It pre-allocates memory in the form of buckets, where each bucket can have up to 8 key-value pairs. When a bucket is full (or half-full), the algorithm allocates a new overflow bucket as a linked list, in a process known as chaining.
As the table approaches a high rate of load factor*, the runtime moves all entries to a new memory address block (usually twice as big as the previous one), this is known as rehashing.
* Load Factor = used positions / total capacity
In traditional chaining, whenever there is a conflict, the runtime allocates memory for a new node/key-value pair and stores it as a linked-list.
I'll focus on how chaining works for nodes and leave the buckets strategy explanation for the Go 1.23 Map Implementation.
Practical Example
Imagine a hash function that, based on a string, returns an index (offset value) to a fixed memory address:
x = len(x)
Usually, these functions try to produce an avalanche effect to distribute the results "evenly" over the map memory interval.
Our hashing function is more vulnerable to conflicts, as it only returns the size of the string to define the address.
Now, let's assume we add the following keys:
key | index |
---|---|
C | 1 |
JS | 2 |
PHP | 3 |
And this table:
index | key | memory | next |
---|---|---|---|
0 | nil | 0 | nil |
1 | C | 1000 | nil |
2 | JS | 1008 | nil |
3 | PHP | 1016 | nil |
4 | PERL | 1024 | nil |
When we add a new "Go" key, we caused a conflict.
As the index of "Go" is 2 and there is already "JS" at position 2, the strategy will allocate a new node in the available memory and point the next prop of "JS" to "Go".
index | key | memory | next |
---|---|---|---|
0 | nil | 0 | nil |
1 | C | 1000 | nil |
2 | JS | 1008 | 2064 |
3 | PHP | 1016 | nil |
4 | PERL | 1024 | nil |
__ | Go | 2064 | nil |
So we have keys, that are used as inputs in a hash function, that generates an index (or an offset for a fixed memory address/start of the list) and the keys or nodes or buckets are inserted as linked-lists.
Problem
As you can see, each new node is placed far from it's conflict key in the memory.
This means that, when searching for a key that has conflicts (Go, for example), the processor will fetch sparse addresses in memory.
Even though the Go previous implementation used a lot of performance improvement techniques (as using 8-sized buckets instead of single-nodes) and partially comparing keys (7 bits instead of 64), this chaining approach is not cache-friendly.
You can check the specific details here.
Swiss Table
It's a cleaver implementation that uses 1 byte metadata and linear probing on steroids.
Linear Probing
No more dynamic memory allocation for buckets with linked lists.
Consider this table:
index | key | memory |
---|---|---|
0 | nil | 1000 |
1 | C | 1008 |
2 | JS | 1016 |
3 | PHP | 1024 |
4 | PERL | 1032 |
5 | nil | 1040 |
... | ... | ... |
N | nil | N_ADDRESS |
A better visualization in this case is to look at the has table as a, well, _flat_hash_map.
When a conflict occurs, the algorithm will linearly search for the next position, one by one.
For adding "Go", we have:
- hash("Go") = 2
- See slot 1016 and realize it is filled.
- See slot 1024 and realize it its filled.
- See slot 1032 and realize it its filled.
- Slot 1040 is open, use it.
It's only on slot 1040 that will find an open spot, this is where the "Go" key goes.
The important thing to notice is that this time, collisions are located nearby each other, making them cache friendly, which speeds things up a little bit.
Steroids (SSE3)
Streaming SIMD Extensions 3 is a kind of processor instruction that, combined with curated software engineering techniques such as bit manipulation, provides a way to perform parallel reads of memory addresses with a single instruction.
Which means that, when probing for collisions, the Swiss Table could perform up to 16 checks at once, even though Golang seems to be using just 8, it's way better than checking one by one!
Metadata
Swiss Table uses a metadata byte to partially store the hashed key, enabling quick comparison (7 bit instead of 64).
With linear probing (also known as open addressing) + metadata chunking + SSE3, comparisons and inserts are faster and consume less memory than the previous implementation.
1.23.4 vs 1.24
I ran this article benchmark script on my laptop.
The test was:
- Create and populate a
1_000_000
items map - Performs
10_000_000
lookups using mod indexing. - Inserts
1_000_000
new entries into the map. - Removes the first
1_000_000
from the map.
Results
Operation | Go 1.23 (ms) | Go 1.24 (ms) | Improvement (%) |
---|---|---|---|
Lookup | 287.340375 | 184.787167 | 35.67% faster |
Insertion | 118.564333 | 66.4095 | 43.99% faster |
Deletion | 39.893875 | 61.364875 |
-53.85% slower |
In Go 1.24’s Swiss Tables: The Silver Bullet or Just a Shiny New Gadget? article, we can also see the efficiency of Swiss Tables in Memory Usage.
Problem
Even though the Go Swiss Table uses buckets and a sparse hash function (for the avalanche effect) to distribute keys evenly over the pre-allocated memory block, as the table get's full, conflicts will generate clustered sections that will increase search time.
It's easy to see this happening with a 10 positions table:
Table: [nil, C, JS, PHP, PERL, nil, nil, nil, nil, nil]
Adding a new "B" key would cause:
- len(B) = 1
- position 1 is occupied by "C", look next
- position 2 is occupied by "JS", look next
- position 3 is occupied by "PHP", look next
- position 4 is occupied by "PRL", look next
- position 4 is free, add "B"
Table: [nil, C, JS, PHP, PERL, B, nil, nil, nil, nil]
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Now, any hash function call that returned 1, 2, 3, 4 or 5 would require searching up to 4 positions, piece of cake for the SSE3, I recognize, but what if there was a better way?
Elastic Hashing
In January 2025, a new paper regarding open addressing was launched suggesting a new approach, theoretically better than linear probing, called Elastic Hashing.
Feel free to read the news article or the paper to get the full context.
It claims to beat a 40-years old conjecture (Yao's conjecture), that defends linear probing as a simple, with near-optimal efficiency, that doesn't degrade catastrophically as load increases.
Krapivin discovered the new strategy while being unaware of Yao's conjecture, which indicates we should challenge known constraints more often.
What is it?
Basically, instead of checking positions one by one, or 16 by 16 (as our beloved Swiss Tables do), it created a new bidimensional strategy to calculate the insert address using virtual overflow buckets.
The idea is, there is a new function named φ(i, j) that returns a position of a node, where:
- i = Primary Bucket (Hash Result)
- j = Overflow Count
So, using our previous hash function, both keys "JS" and "Go" would return 2 as their "Primary Bucket".
The insertion order would determine if the key would be placed at position φ(2, 1) = 2 or maybe φ(2, 2) = 7.
Magic
The magic lies in the φ function, that is able to virtually create overflow buckets for collisions_. It outperforms the complexity algorithm for linear probing for worst and average cases, with these jumps or wormholes, it probes less addresses, leading to a better theoretical insert performance.
Will Elastic Hashing be applied to Swiss Tables?
Maybe. I hope so. Time will tell.
I still have open questions:
- Will elastic hashing prove itself faster than linear probing + SSE3?
- Can Elastic Hash benefit itself from the parallel reads of SIMD?
- Which language will be the first to implement this new algorithm?
If you, by any chance, happens to cross some of these answers out there, leave it a comment!
Conclusion
It's nice to experience these latest improvements to Hash Table efficiency as they happen, in real time.
A one-month old paper improving the code complexity of one aspect of the core data structure algorithm is great news!
It shows that there are no boundaries for performance improvements, not even decade-old rules are safe, no matter where you are, there is always room for improvement.
Cheers.
References
- Maps are faster in Go 1.24 - Matt Boyle
- # Go 1.24’s Swiss Tables: The Silver Bullet or Just a Shiny New Gadget? - David Lee
- Undergraduate upends a 40 year old data science conjecture - Quanta Magazine
- Optimal Bounds for Open Addressing Without Reordering - Martin Farach-Colton, Andrew Kapivin, William Kuszmaul
Top comments (0)