DEV Community

Cover image for Go 1.24 uses Swiss Tables, what are they?
Caio Borghi
Caio Borghi

Posted on • Edited on

Go 1.24 uses Swiss Tables, what are they?

Introduction

In v1.24, Go replaced its Map implementation with a new version of hash table called Swiss Table, or flat_hash_map.

What is a Swiss Table?
A Swiss Table is a map, that uses a cache-friendly, more efficient (with shorter memory footprint) approach that makes comparisons and insertions faster.

It also uses a different strategy for addressing collisions: linear probing on steroids over the previous strategy chaining.

When working with hash-tables, one thing is certain: there will be conflict.

bytes conflict image

The Old Map

The previous implementation was highly tuned for memory efficiency and performance.

Go team is awesome.

Chaining

How does it work? It pre-allocates memory in the form of buckets, where each bucket can have up to 8 key-value pairs. When a bucket is full (or half-full), the algorithm allocates a new overflow bucket as a linked list, in a process known as chaining.

As the table approaches a high rate of load factor*, the runtime moves all entries to a new memory address block (usually twice as big as the previous one), this is known as rehashing.

* Load Factor = used positions / total capacity

In traditional chaining, whenever there is a conflict, the runtime allocates memory for a new node/key-value pair and stores it as a linked-list.

I'll focus on how chaining works for nodes and leave the buckets strategy explanation for the Go 1.23 Map Implementation.

Practical Example

Imagine a hash function that, based on a string, returns an index (offset value) to a fixed memory address:

x = len(x)

Usually, these functions try to produce an avalanche effect to distribute the results "evenly" over the map memory interval.

Our hashing function is more vulnerable to conflicts, as it only returns the size of the string to define the address.

Now, let's assume we add the following keys:

key index
C 1
JS 2
PHP 3

And this table:

index key memory next
0 nil 0 nil
1 C 1000 nil
2 JS 1008 nil
3 PHP 1016 nil
4 PERL 1024 nil

When we add a new "Go" key, we caused a conflict.

As the index of "Go" is 2 and there is already "JS" at position 2, the strategy will allocate a new node in the available memory and point the next prop of "JS" to "Go".

index key memory next
0 nil 0 nil
1 C 1000 nil
2 JS 1008 2064
3 PHP 1016 nil
4 PERL 1024 nil
__ Go 2064 nil

So we have keys, that are used as inputs in a hash function, that generates an index (or an offset for a fixed memory address/start of the list) and the keys or nodes or buckets are inserted as linked-lists.

Chaining table representation

Problem

As you can see, each new node is placed far from it's conflict key in the memory.

This means that, when searching for a key that has conflicts (Go, for example), the processor will fetch sparse addresses in memory.

Even though the Go previous implementation used a lot of performance improvement techniques (as using 8-sized buckets instead of single-nodes) and partially comparing keys (7 bits instead of 64), this chaining approach is not cache-friendly.

You can check the specific details here.

Swiss Table

It's a cleaver implementation that uses 1 byte metadata and linear probing on steroids.

Linear Probing

No more dynamic memory allocation for buckets with linked lists.

Consider this table:

index key memory
0 nil 1000
1 C 1008
2 JS 1016
3 PHP 1024
4 PERL 1032
5 nil 1040
... ... ...
N nil N_ADDRESS

A better visualization in this case is to look at the has table as a, well, _flat_hash_map.

Flat Hash Map visualization

When a conflict occurs, the algorithm will linearly search for the next position, one by one.

For adding "Go", we have:

  • hash("Go") = 2
  • See slot 1016 and realize it is filled.
  • See slot 1024 and realize it its filled.
  • See slot 1032 and realize it its filled.
  • Slot 1040 is open, use it.

It's only on slot 1040 that will find an open spot, this is where the "Go" key goes.

The important thing to notice is that this time, collisions are located nearby each other, making them cache friendly, which speeds things up a little bit.

Steroids (SSE3)

Streaming SIMD Extensions 3 is a kind of processor instruction that, combined with curated software engineering techniques such as bit manipulation, provides a way to perform parallel reads of memory addresses with a single instruction.

Which means that, when probing for collisions, the Swiss Table could perform up to 16 checks at once, even though Golang seems to be using just 8, it's way better than checking one by one!

Metadata

Swiss Table uses a metadata byte to partially store the hashed key, enabling quick comparison (7 bit instead of 64).

With linear probing (also known as open addressing) + metadata chunking + SSE3, comparisons and inserts are faster and consume less memory than the previous implementation.

1.23.4 vs 1.24

I ran this article benchmark script on my laptop.

The test was:

  1. Create and populate a 1_000_000 items map
  2. Performs 10_000_000 lookups using mod indexing.
  3. Inserts 1_000_000 new entries into the map.
  4. Removes the first 1_000_000 from the map.

Results

Operation Go 1.23 (ms) Go 1.24 (ms) Improvement (%)
Lookup 287.340375 184.787167 35.67% faster
Insertion 118.564333 66.4095 43.99% faster
Deletion 39.893875 61.364875 -53.85% slower

In Go 1.24’s Swiss Tables: The Silver Bullet or Just a Shiny New Gadget? article, we can also see the efficiency of Swiss Tables in Memory Usage.

Go 1.24’s Swiss Tables: The Silver Bullet or Just a Shiny New Gadget? memory comparison chart

Problem

Even though the Go Swiss Table uses buckets and a sparse hash function (for the avalanche effect) to distribute keys evenly over the pre-allocated memory block, as the table get's full, conflicts will generate clustered sections that will increase search time.

It's easy to see this happening with a 10 positions table:

Table: [nil, C, JS, PHP, PERL, nil, nil, nil, nil, nil]

Adding a new "B" key would cause:
- len(B) = 1
- position 1 is occupied by "C", look next
- position 2 is occupied by "JS", look next
- position 3 is occupied by "PHP", look next
- position 4 is occupied by "PRL", look next
- position 4 is free, add "B"

Table: [nil, C, JS, PHP, PERL, B, nil, nil, nil, nil]
Index:   0,  1,  2,  3,   4,   5,   6,   7,   8,   9  
Enter fullscreen mode Exit fullscreen mode

Now, any hash function call that returned 1, 2, 3, 4 or 5 would require searching up to 4 positions, piece of cake for the SSE3, I recognize, but what if there was a better way?

Elastic Hashing

In January 2025, a new paper regarding open addressing was launched suggesting a new approach, theoretically better than linear probing, called Elastic Hashing.

Feel free to read the news article or the paper to get the full context.

It claims to beat a 40-years old conjecture (Yao's conjecture), that defends linear probing as a simple, with near-optimal efficiency, that doesn't degrade catastrophically as load increases.

Krapivin discovered the new strategy while being unaware of Yao's conjecture, which indicates we should challenge known constraints more often.

What is it?

Basically, instead of checking positions one by one, or 16 by 16 (as our beloved Swiss Tables do), it created a new bidimensional strategy to calculate the insert address using virtual overflow buckets.

The idea is, there is a new function named φ(i, j) that returns a position of a node, where:

  1. i = Primary Bucket (Hash Result)
  2. j = Overflow Count

So, using our previous hash function, both keys "JS" and "Go" would return 2 as their "Primary Bucket".

The insertion order would determine if the key would be placed at position φ(2, 1) = 2 or maybe φ(2, 2) = 7.

Magic

The magic lies in the φ function, that is able to virtually create overflow buckets for collisions_. It outperforms the complexity algorithm for linear probing for worst and average cases, with these jumps or wormholes, it probes less addresses, leading to a better theoretical insert performance.

Jumps

Will Elastic Hashing be applied to Swiss Tables?
Maybe. I hope so. Time will tell.

I still have open questions:

  • Will elastic hashing prove itself faster than linear probing + SSE3?
  • Can Elastic Hash benefit itself from the parallel reads of SIMD?
  • Which language will be the first to implement this new algorithm?

If you, by any chance, happens to cross some of these answers out there, leave it a comment!

Conclusion

It's nice to experience these latest improvements to Hash Table efficiency as they happen, in real time.

A one-month old paper improving the code complexity of one aspect of the core data structure algorithm is great news!

It shows that there are no boundaries for performance improvements, not even decade-old rules are safe, no matter where you are, there is always room for improvement.

Cheers.

References

Top comments (0)