Caio Borghi

Posted on Feb 20 • Edited on Feb 21

Go 1.24 uses Swiss Tables, what are they?

#go #softwareengineering #programming #news

Introduction
The Old Map
- Chaining
  - Practical Example
  - Problem
Swiss Table
Elastic Hashing
- What is it?
Conclusion
References

Introduction

In v1.24, Go replaced its Map implementation with a new version of hash table called Swiss Table, or flat_hash_map.

What is a Swiss Table?
A Swiss Table is a map, that uses a cache-friendly, more efficient (with shorter memory footprint) approach that makes comparisons and insertions faster.

It also uses a different strategy for addressing collisions: linear probing on steroids over the previous strategy chaining.

When working with hash-tables, one thing is certain: there will be conflict.

The Old Map

The previous implementation was highly tuned for memory efficiency and performance.

Go team is awesome.

Chaining

How does it work? It pre-allocates memory in the form of buckets, where each bucket can have up to 8 key-value pairs. When a bucket is full (or half-full), the algorithm allocates a new overflow bucket as a linked list, in a process known as chaining.

As the table approaches a high rate of load factor*, the runtime moves all entries to a new memory address block (usually twice as big as the previous one), this is known as rehashing.

* Load Factor = used positions / total capacity

In traditional chaining, whenever there is a conflict, the runtime allocates memory for a new node/key-value pair and stores it as a linked-list.

I'll focus on how chaining works for nodes and leave the buckets strategy explanation for the Go 1.23 Map Implementation.

Practical Example

Imagine a hash function that, based on a string, returns an index (offset value) to a fixed memory address:

x = len(x)

Usually, these functions try to produce an avalanche effect to distribute the results "evenly" over the map memory interval.

Our hashing function is more vulnerable to conflicts, as it only returns the size of the string to define the address.

Now, let's assume we add the following keys:

key	index
C	1
JS	2
PHP	3

And this table:

index	key	memory	next
0	nil	0	nil
1	C	1000	nil
2	JS	1008	nil
3	PHP	1016	nil
4	PERL	1024	nil

When we add a new "Go" key, we caused a conflict.

As the index of "Go" is 2 and there is already "JS" at position 2, the strategy will allocate a new node in the available memory and point the next prop of "JS" to "Go".

index	key	memory	next
0	nil	0	nil
1	C	1000	nil
2	JS	1008	2064
3	PHP	1016	nil
4	PERL	1024	nil
__	Go	2064	nil

So we have keys, that are used as inputs in a hash function, that generates an index (or an offset for a fixed memory address/start of the list) and the keys or nodes or buckets are inserted as linked-lists.

Problem

As you can see, each new node is placed far from it's conflict key in the memory.

This means that, when searching for a key that has conflicts (Go, for example), the processor will fetch sparse addresses in memory.

Even though the Go previous implementation used a lot of performance improvement techniques (as using 8-sized buckets instead of single-nodes) and partially comparing keys (7 bits instead of 64), this chaining approach is not cache-friendly.

You can check the specific details here.

Swiss Table

It's a cleaver implementation that uses 1 byte metadata and linear probing on steroids.

Linear Probing

No more dynamic memory allocation for buckets with linked lists.

Consider this table:

index	key	memory
0	nil	1000
1	C	1008
2	JS	1016
3	PHP	1024
4	PERL	1032
5	nil	1040
...	...	...
N	nil	N_ADDRESS

A better visualization in this case is to look at the has table as a, well, _flat_hash_map.

When a conflict occurs, the algorithm will linearly search for the next position, one by one.

For adding "Go", we have:

hash("Go") = 2
See slot 1016 and realize it is filled.
See slot 1024 and realize it its filled.
See slot 1032 and realize it its filled.
Slot 1040 is open, use it.

It's only on slot 1040 that will find an open spot, this is where the "Go" key goes.

The important thing to notice is that this time, collisions are located nearby each other, making them cache friendly, which speeds things up a little bit.

Steroids (SSE3)

Streaming SIMD Extensions 3 is a kind of processor instruction that, combined with curated software engineering techniques such as bit manipulation, provides a way to perform parallel reads of memory addresses with a single instruction.

Which means that, when probing for collisions, the Swiss Table could perform up to 16 checks at once, even though Golang seems to be using just 8, it's way better than checking one by one!

Metadata

Swiss Table uses a metadata byte to partially store the hashed key, enabling quick comparison (7 bit instead of 64).

With linear probing (also known as open addressing) + metadata chunking + SSE3, comparisons and inserts are faster and consume less memory than the previous implementation.

1.23.4 vs 1.24

I ran this article benchmark script on my laptop.

The test was:

Create and populate a 1_000_000 items map
Performs 10_000_000 lookups using mod indexing.
Inserts 1_000_000 new entries into the map.
Removes the first 1_000_000 from the map.

Results

Operation	Go 1.23 (ms)	Go 1.24 (ms)	Improvement (%)
Lookup	287.340375	184.787167	35.67% faster
Insertion	118.564333	66.4095	43.99% faster
Deletion	39.893875	61.364875	-53.85% slower

In Go 1.24’s Swiss Tables: The Silver Bullet or Just a Shiny New Gadget? article, we can also see the efficiency of Swiss Tables in Memory Usage.

Problem

Even though the Go Swiss Table uses buckets and a sparse hash function (for the avalanche effect) to distribute keys evenly over the pre-allocated memory block, as the table get's full, conflicts will generate clustered sections that will increase search time.

It's easy to see this happening with a 10 positions table:

Table: [nil, C, JS, PHP, PERL, nil, nil, nil, nil, nil]

Adding a new "B" key would cause:
- len(B) = 1
- position 1 is occupied by "C", look next
- position 2 is occupied by "JS", look next
- position 3 is occupied by "PHP", look next
- position 4 is occupied by "PRL", look next
- position 4 is free, add "B"

Table: [nil, C, JS, PHP, PERL, B, nil, nil, nil, nil]
Index:   0,  1,  2,  3,   4,   5,   6,   7,   8,   9

Now, any hash function call that returned 1, 2, 3, 4 or 5 would require searching up to 4 positions, piece of cake for the SSE3, I recognize, but what if there was a better way?

Elastic Hashing

In January 2025, a new paper regarding open addressing was launched suggesting a new approach, theoretically better than linear probing, called Elastic Hashing.

Feel free to read the news article or the paper to get the full context.

It claims to beat a 40-years old conjecture (Yao's conjecture), that defends linear probing as a simple, with near-optimal efficiency, that doesn't degrade catastrophically as load increases.

Krapivin discovered the new strategy while being unaware of Yao's conjecture, which indicates we should challenge known constraints more often.

What is it?

Basically, instead of checking positions one by one, or 16 by 16 (as our beloved Swiss Tables do), it created a new bidimensional strategy to calculate the insert address using virtual overflow buckets.

The idea is, there is a new function named φ(i, j) that returns a position of a node, where:

i = Primary Bucket (Hash Result)
j = Overflow Count

So, using our previous hash function, both keys "JS" and "Go" would return 2 as their "Primary Bucket".

The insertion order would determine if the key would be placed at position φ(2, 1) = 2 or maybe φ(2, 2) = 7.

Magic

The magic lies in the φ function, that is able to virtually create overflow buckets for collisions_. It outperforms the complexity algorithm for linear probing for worst and average cases, with these jumps or wormholes, it probes less addresses, leading to a better theoretical insert performance.

Will Elastic Hashing be applied to Swiss Tables?
Maybe. I hope so. Time will tell.

I still have open questions:

Will elastic hashing prove itself faster than linear probing + SSE3?
Can Elastic Hash benefit itself from the parallel reads of SIMD?
Which language will be the first to implement this new algorithm?

If you, by any chance, happens to cross some of these answers out there, leave it a comment!

Conclusion

It's nice to experience these latest improvements to Hash Table efficiency as they happen, in real time.

A one-month old paper improving the code complexity of one aspect of the core data structure algorithm is great news!

It shows that there are no boundaries for performance improvements, not even decade-old rules are safe, no matter where you are, there is always room for improvement.

Cheers.

DEV Community

Go 1.24 uses Swiss Tables, what are they?

Introduction

The Old Map

Chaining

Practical Example

Problem

Swiss Table

Linear Probing

Steroids (SSE3)

Metadata

1.23.4 vs 1.24

Results

Problem

Elastic Hashing

What is it?

Magic

Conclusion

References

Top comments (0)

Read next

How to Create Fully Styled Forms in Tailwind CSS (Including Validation)

🚀 Efficient API Calls with Axios: Cancelling Unwanted Requests 🚀

Go Pointers vs. Values: When to Use Them (and When Not To)

Mastering Advanced Spring Boot: A Deep Dive into Modern Development Practices