Muhammetberdi Jepbarov

Posted on Feb 25

Merging Redis Serialized HyperLogLog Sets in Golang (Without Redis Commands)

#go #redis #serialize #programming

I was looking through questions on Stack Overflow and noticed that one friend needed help with his problem. He wanted to merge two HyperLogLog sets stored in Redis, but instead of using Redis commands like PFMERGE, he wanted to do it inside his Golang application. The challenge was retrieving the serialized HyperLogLog data, deserializing it, merging the sets, and then using the merged result.

At first, this seemed straightforward. Redis provides PFADD for adding elements and PFCOUNT for getting an estimated count. But the real issue? Redis does not store HyperLogLog sets as raw serialized data. That means you can't simply use GET to retrieve a HyperLogLog and merge it manually in Go.

Understanding the Problem

When you use HyperLogLog in Redis, the data is stored in a special format optimized for cardinality estimation. Running GET hll1 does not return a serialized HyperLogLog object but instead results in an error. This happens because Redis does not expose the internal structure of a HyperLogLog directly.

A naive approach would be to assume that Redis stores the raw HyperLogLog object, retrieve it, and attempt to merge it. But that will not work. Instead, a more effective solution is to manage the HyperLogLog instances inside the Go application and store them in Redis as serialized byte arrays.

The Correct Approach: Managing HyperLogLog in Go

Instead of relying on Redis to handle HyperLogLog merging, we should:

Create HyperLogLog instances in Go.
Serialize them and store the serialized data in Redis.
Retrieve them later, deserialize them, and merge multiple instances.
Estimate the final count in Go.

Step 1: Serializing a HyperLogLog and Storing it in Redis

To properly store a HyperLogLog instance, we need to serialize it before saving it in Redis. We use encoding/gob to convert the object into a byte slice:

func serializeHLL(hll *hyperloglog.Sketch) ([]byte, error) {
    var buf bytes.Buffer
    enc := gob.NewEncoder(&buf)
    err := enc.Encode(hll)
    return buf.Bytes(), err
}

In the main() function, we create a HyperLogLog instance, insert some values, serialize it, and store it in Redis:

func main() {
    ctx := context.Background()
    rdb := redis.NewClient(&redis.Options{
        Addr: "localhost:6379",
    })

    hll1 := hyperloglog.New()
    hll1.Insert([]byte("foo"))
    hll1.Insert([]byte("bar"))

    data1, err := serializeHLL(hll1)
    if err != nil {
        log.Fatal("Error serializing hll1:", err)
    }
    rdb.Set(ctx, "hll1", data1, 0)
}

This approach ensures that we have full control over our HyperLogLog data structure.

Step 2: Retrieving and Merging HyperLogLog Sets

To merge two HyperLogLog sets, we first need to deserialize them from the stored byte data in Redis.

func deserializeHLL(data []byte) (*hyperloglog.Sketch, error) {
    var hll hyperloglog.Sketch
    buf := bytes.NewBuffer(data)
    dec := gob.NewDecoder(buf)
    err := dec.Decode(&hll)
    return &hll, err
}

Now, we retrieve the stored HyperLogLog sets, deserialize them, and merge them:

func main() {
    ctx := context.Background()
    rdb := redis.NewClient(&redis.Options{
        Addr: "localhost:6379",
    })

    raw1, err := rdb.Get(ctx, "hll1").Bytes()
    if err != nil {
        log.Fatal("Error fetching hll1:", err)
    }

    raw2, err := rdb.Get(ctx, "hll2").Bytes()
    if err != nil {
        log.Fatal("Error fetching hll2:", err)
    }

    hll1Deserialized, err := deserializeHLL(raw1)
    if err != nil {
        log.Fatal("Error deserializing hll1:", err)
    }

    hll2Deserialized, err := deserializeHLL(raw2)
    if err != nil {
        log.Fatal("Error deserializing hll2:", err)
    }

    err = hll1Deserialized.Merge(hll2Deserialized)
    if err != nil {
        log.Fatal("Error merging HyperLogLog sets:", err)
    }

    fmt.Println("Estimated count after merge:", hll1Deserialized.Estimate())
}

This method ensures we can fully control the HyperLogLog lifecycle, from creation to storage and retrieval.

Why This Works and Other Approaches Fail

Attempting to retrieve HyperLogLog directly from Redis does not work because Redis does not store it as a serialized object.
Using PFMERGE in Redis works but does not allow merging outside Redis, making the logic less flexible.
Manually managing HyperLogLog instances in Go ensures better control, allowing serialization, merging, and estimation without depending on Redis-specific operations.

Conclusion

Instead of relying on Redis commands, we can store serialized HyperLogLog sets manually in Redis, retrieve them in Go, merge them, and get accurate estimates. This gives us more control and flexibility when working with approximate counting in a distributed system.

This approach is beneficial when you need HyperLogLog merging logic outside of Redis, such as in microservices, offline processing, or custom caching layers.

Next time you work with HyperLogLog in Go, try managing your own serialized instances—it might save you a lot of trouble! 🚀

DEV Community

Merging Redis Serialized HyperLogLog Sets in Golang (Without Redis Commands)

Understanding the Problem

The Correct Approach: Managing HyperLogLog in Go

Step 1: Serializing a HyperLogLog and Storing it in Redis

Step 2: Retrieving and Merging HyperLogLog Sets

Why This Works and Other Approaches Fail

Conclusion

Top comments (0)

Read next

542. 01 Matrix

Here's the 2nd Tutorial for the Scalable Go API Series 🚀

Is library node-onvif still usable?

Java Newbie to Pro? Day 2 – How I Create My First Java Program