Hello everyone!
Welcome to the second part of my series about tnfy.link — yet another URL shortener! In this post, we’ll dive into the fascinating process of generating short links. While this might sound straightforward, choosing the right method for link generation comes with unique challenges.
At its core, generating a short link involves creating a short, unique ID for each long URL. The ID should meet several criteria:
- Be unique to avoid conflicts.
- Be short enough for practical use.
- Be easy to type without introducing errors.
- Be unpredictable to prevent guessing.
After researching various approaches, I identified four main methods for generating short links. Let’s explore them in detail.
1. Random Bytes
The simplest method involves generating random bytes and encoding them. However, it’s important to understand the difference between pseudo-random and cryptographically secure random numbers.
Pseudo-Random Numbers
The math/rand
package in Go provides a pseudo-random number generator (PRNG). With the same seed (an initial value), it produces the same sequence of numbers. While this is sufficient for many applications, it’s not suitable for secure or unpredictable link generation.
Cryptographically Secure Random Numbers
For more secure random numbers, the crypto/rand
package is ideal. It generates truly random and unpredictable values by leveraging system noise. For example, electromagnetic noise captured at the physical level can be used. This ensures high entropy, but keep in mind that virtual machines may rely on their host for random data, which could slow down generation in high-load environments.
Encoding Random Bytes
Random bytes alone aren’t suitable for URLs, so they must be encoded. Here are the most common encoding methods:
- Integer: Converts bytes to an integer. Easy to type but may result in longer IDs.
- HEX: Encodes bytes in hexadecimal (0-9, A-F). Case-insensitive and typo-resistant.
-
Base64: Encodes bytes with characters A-Z, a-z, 0-9,
+
,/
, and=
. However, it’s case-sensitive and prone to typos. -
Base58: Similar to Base64 but excludes confusing characters (e.g.,
I
,l
,O
,0
). This makes it more user-friendly. Examples include the implementations by Bitcoin, Ripple, and Flickr.
For user-friendly short links, Base58 is often the best choice due to its balance of compactness and error resistance.
Key Takeaways:
- Random bytes are unique and unpredictable.
- Encoding methods like Base58 enhance usability.
- Cryptographically secure randomness ensures reliability.
2. Hashing
Hashing involves generating a fixed-length value based on the input (e.g., the long URL). While it guarantees consistency—hashing the same input always produces the same output—it lacks randomness. This means multiple requests to shorten the same URL will yield identical IDs, which doesn’t meet the unpredictability requirement.
Adding a random salt to the input before hashing can introduce variability, but at that point, using raw random bytes becomes simpler and more efficient.
3. UUID
UUIDs (Universally Unique Identifiers) are widely used for generating unique values. While they are effective, their default format is too long for short links. However, re-encoding UUIDs (e.g., in Base58) can reduce their size.
An alternative to UUID is NanoID, which generates shorter strings (21 characters by default) by using a customizable alphabet. This allows you to optimize IDs for readability and error resistance.
Why Not Use UUID?
UUIDs are ultimately based on random bytes, so there’s no significant advantage over generating raw random values directly.
4. Sequence
Random values can occasionally result in duplicates, especially under high load or with shorter IDs. While tnfy.link isn’t designed for high-load scenarios, it’s still worth considering potential issues.
Using a sequential counter ensures uniqueness by design. Tools like Redis can implement a distributed counter with the INCR
command. However, sequential IDs are predictable. Combining a sequence with random bytes addresses this issue, ensuring both uniqueness and unpredictability.
For example:
- Random Value + Incrementing Sequence: If two instances generate the same random value, the sequence ensures uniqueness.
Note: Including a sequential component in your IDs might reveal the total number of links generated, which could be undesirable in some contexts.
Conclusion
In this post, we explored various methods to generate short links:
- Random bytes: Simple and effective, especially with secure encoding like Base58.
- Hashing: Reliable but lacks randomness for this use case.
- UUID/NanoID: Great alternatives but add unnecessary complexity compared to raw random bytes.
- Sequence: Solves collisions but increases ID length.
For most use cases, random bytes with Base58 encoding are sufficient. To handle collisions in high-load scenarios, combining random bytes with a sequential component is a robust option. While this isn’t yet implemented in the current version of tnfy.link’s backend, I plan to add it as an optional feature in the future.
Thanks for reading! I’d love to hear your thoughts and experiences with link generation. Share your feedback in the comments below!
Related Post
If you’re interested in learning more about my projects, check out my article on SMS Gateway for Android.
Top comments (0)