Siddhant Khare

Posted on Jun 16

Identifiers 101: Understanding and Implementing UUIDs and ULIDs

#database #programming #security #computerscience

At first glance, UUIDs (Universally Unique Identifiers) and ULIDs (Universally Unique Lexicographically Sortable Identifiers) are widely used identifiers in databases and distributed systems. Each has unique characteristics that make them suitable for various scenarios. In this article, we’ll delve into the features of UUIDs and ULIDs, and discuss when to use each. If you are currently using an auto-increment type primary key without much consideration, this article might give you some valuable insights.

Comparison Table

Feature	Auto Increment	UUID v4	UUID v7	ULID
Data Type (MySQL)	INT, BIGINT	CHAR(36)	CHAR(36)	CHAR(26)
Sort	❌	❌	✅	✅
Size	4 bytes (for INT)	16 bytes	16 bytes	16 bytes
Example	1, 2, 3, ...	d61f91c3-d3bf-4b34-9894-e21bfa277ca4	019020e0-cd2a-730a-a8ea-11ec3ddc847f	01J0GCBEEDPE3VDR0NBJ8TM8NQ

If You Don't Want to Use Auto Increment Type

Auto Increment is a mechanism that automatically generates a unique identifier in the database, typically a numeric column that increments with each new record. However, there are significant security and privacy concerns:

Predictability: Since Auto Increment IDs are sequential, it is easy to predict the next ID. This increases the risk that an attacker could infer the internal structure of the system and attempt unauthorized access.
Risk of Information Leakage: Sequential IDs can reveal patterns in the company’s activities. For example, a competitor might analyze the sequential IDs to infer the frequency of product releases or user registrations.

Example:

A competitor figured out how often a company releases new products by analyzing the sequential IDs. This allowed them to predict release timings and adjust their strategy accordingly.
The sequential IDs used to manage payments could reveal the number of user registrations and paid subscriptions if exposed.

UUID (Universally Unique Identifier)

A UUID is a 128-bit identifier used widely in distributed systems, with multiple versions available, each having a different generation method.

UUID v4

UUID v4 is commonly used due to its simplicity and high uniqueness. It generates a random 128-bit value, making it highly unique.

Generation Method:

Set Version Bit: Set 4 specific bits (version field) to 0100.
Set Variant Bits: Set 2 specific bits (variant field) to 10.

Here’s a code snippet to generate a UUID v4 in Go:

package main

import (
    "fmt"
    "github.com/google/uuid"
)

func main() {
    uuidV4 := uuid.New()
    fmt.Println(uuidV4)
}

Example Output:

d61f91c3-d3bf-4b34-9894-e21bfa277ca4

UUID v7

UUID v7 is a recent proposal designed to be sortable by incorporating timestamps into the identifier.

Generation Method:

Get Timestamp: Obtain the current timestamp in milliseconds and convert it to a 48-bit string.
Generate Random Bits: Fill the remaining 80 bits with random values.
Set Version Bit: Set the version field to 0111.

Here’s how to generate a UUID v7 in Go:

package main

import (
    "crypto/rand"
    "fmt"
    "time"
)

type UUID [16]byte

func NewUUIDv7() UUID {
    var uuid UUID
    timestamp := uint64(time.Now().UnixNano() / int64(time.Millisecond))
    uuid[0] = byte(timestamp >> 40)
    uuid[1] = byte(timestamp >> 32)
    uuid[2] = byte(timestamp >> 24)
    uuid[3] = byte(timestamp >> 16)
    uuid[4] = byte(timestamp >> 8)
    uuid[5] = byte(timestamp)

    randomBytes := make([]byte, 10)
    if _, err := rand.Read(randomBytes); err != nil {
        panic(err)
    }
    copy(uuid[6:], randomBytes)

    // Set version (7) and variant bits (2 MSB as 01)
    uuid[6] = (uuid[6] & 0x0f) | (7 << 4)
    uuid[8] = (uuid[8] & 0x3f) | 0x80

    return uuid
}

func main() {
    uuidV7 := NewUUIDv7()
    fmt.Printf("%x\n", uuidV7)
}

Example Output:

019020e0-cd2a-730a-a8ea-11ec3ddc847f

Extracting Timestamps from UUID v7:

package main

import (
    "crypto/rand"
    "fmt"
    "time"
)

type UUID [16]byte

func NewUUIDv7() UUID {
    var uuid UUID
    timestamp := uint64(time.Now().UnixNano() / int64(time.Millisecond))
    uuid[0] = byte(timestamp >> 40)
    uuid[1] = byte(timestamp >> 32)
    uuid[2] = byte(timestamp >> 24)
    uuid[3] = byte(timestamp >> 16)
    uuid[4] = byte(timestamp >> 8)
    uuid[5] = byte(timestamp)

    randomBytes := make([]byte, 10)
    if _, err := rand.Read(randomBytes); err != nil {
        panic(err)
    }
    copy(uuid[6:], randomBytes)

    // Set version (7) and variant bits (2 MSB as 01)
    uuid[6] = (uuid[6] & 0x0f) | (7 << 4)
    uuid[8] = (uuid[8] & 0x3f) | 0x80

    return uuid
}

func ExtractTimestampFromUUIDv7(uuid UUID) time.Time {
    timestamp := uint64(uuid[0])<<40 |
        uint64(uuid[1])<<32 |
        uint64(uuid[2])<<24 |
        uint64(uuid[3])<<16 |
        uint64(uuid[4])<<8 |
        uint64(uuid[5])
    return time.Unix(0, int64(timestamp)*int64(time.Millisecond))
}

func (uuid UUID) String() string {
    return fmt.Sprintf("%08x-%04x-%04x-%04x-%012x",
        uuid[0:4],
        uuid[4:6],
        uuid[6:8],
        uuid[8:10],
        uuid[10:16])
}

func main() {
    uuid := NewUUIDv7()
    fmt.Println(uuid.String())

    timestamp := ExtractTimestampFromUUIDv7(uuid)
    fmt.Println(timestamp)
}

Example Output:

019020e0-cd2a-730a-a8ea-11ec3ddc847f
2024-06-16 11:48:41.898 +0000 UTC

ULID (Universally Unique Lexicographically Sortable Identifier)

ULID is designed to be a sortable and human-readable alternative to UUIDs, with a focus on chronological order.

Generation Method:

Get Timestamp: Obtain the current timestamp in milliseconds and convert it to a 48-bit string.
Generate Random Values: Fill the remaining 80 bits with random values.
Encoding: Encode the generated bits using Crockford’s Base32.

Here’s how to generate a ULID in Go:

package main

import (
    "fmt"
    "github.com/oklog/ulid/v2"
    "math/rand"
    "time"
)

func main() {
    entropy := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
    ulidInstance := ulid.MustNew(ulid.Timestamp(time.Now()), entropy)
    fmt.Println(ulidInstance)

    // Extracting and formatting the timestamp
    timestamp := time.Unix(0, int64(ulidInstance.Time())*int64(time.Millisecond))
    fmt.Println(timestamp.Format(time.RFC3339))
}

Example Output:

01HZYC2028WMB3NJ16WCV9Z9E0
2024-06-09 11:27:38.056 +0000 UTC

Performance Considerations and Recommendations

While UUID v4 is purely random and does not support sorting, UUID v7 and ULID provide sortable identifiers based on timestamps. However, using UUIDs and ULIDs has performance implications compared to auto-incrementing numeric types.

If You Do Not Want to Use UUID or ULID

Even if we consider the adoption of UUID and ULID from the issue of Auto Increment, as mentioned above, there are other issues with UUID and ULID. I'll try to summarize it again:

UUID v4:
- Completely random values lead to performance degradation due to non-sortability.
UUID v7 / ULID:
- Poor performance compared to auto-numbering numbers.
- Leakage of generation time (timestamp).

To illustrate a concrete example, let's take the case of a large-scale e-commerce site that handles millions of products.

Background:

The database stores product details, user purchase history, reviews, and more. More data is added every day, and query performance is critical.

Challenges:

Performance: Database performance is critical due to the large amount of data being added. In particular, it is often

used to search for products and obtain the purchase history of users.

Privacy: Leaking a user's purchase history or review timestamps can identify patterns of user behavior.

UUID v4 Issues:

The order in which the data is inserted is disjointed, leading to index fragmentation and poor query performance.

UUID v7/ULID Issues:

The insertion order is preserved, but the ID of the string type is larger than the numeric type, increasing the size of the index.
Because it includes a timestamp, the time at which the data was generated is deducible, which is risky from a user privacy perspective.

Performance Concerns:

UUID v4: Random writes can degrade performance due to reduced cache hit rates.
UUID v7/ULID: Slightly better performance than UUID v4 but still less efficient than auto-increment numbers. Timestamps in UUID v7 and ULID can leak generation times.

Recommendation:
For large-scale applications, consider using auto-increment numeric types for primary keys to ensure optimal performance. For public-facing identifiers, generate a separate random string (UUID or ULID) to enhance security and privacy.

Conclusion

Choosing the right identifier depends on your specific use case. While UUIDs and ULIDs offer unique advantages, they also come with performance and privacy trade-offs. By understanding these trade-offs, you can make informed decisions that balance security, performance, and usability.

For further reading and implementation details, refer to the official documentation and libraries for UUIDs and ULIDs. Implementing these identifiers thoughtfully can significantly enhance the robustness and security of your systems.

Top comments (1)

Saransh Mishra • Jun 17

Nice information sir

DEV Community

Identifiers 101: Understanding and Implementing UUIDs and ULIDs

Comparison Table

If You Don't Want to Use Auto Increment Type

UUID (Universally Unique Identifier)

UUID v4

UUID v7

ULID (Universally Unique Lexicographically Sortable Identifier)

Performance Considerations and Recommendations

If You Do Not Want to Use UUID or ULID

Conclusion

Top comments (1)

Read next

2486. Append Characters to String to Make Subsequence

I built my first SaaS - NotiFast

Elastic Beanstalk: Developer's AWS paradise

Functional Programming in Python: A New Way to Think About Problem-Solving