Wycliffe A. Onyango

Posted on Feb 15

🌍 UTF-8

#utf8 #go #programming

Have you ever opened a file or webpage and seen something like this?

â€œHello, world!â€

Instead of this?

“Hello, world!”

That’s an encoding issue, and if you’ve been coding long enough, you’ve probably run into it at some point.

But why does this happen? Why do some characters get replaced with weird symbols? And most importantly—how do we fix it?

The answer is UTF-8, the encoding that powers almost everything today. Let's talk about what it is, why it matters, and how to use it properly in Go (Golang).

🔥 The Problem UTF-8 Solves

Back in the early days of computing, ASCII was the standard way to represent text. It used 7 bits per character, meaning it could only represent 128 characters (A-Z, a-z, 0-9, and some symbols).

That was fine—until computers went global.

Suddenly, people needed to store and display languages like Chinese (汉字), Arabic (العربية), Hindi (हिन्दी), and more. ASCII just couldn’t handle it.

So different countries created their own encodings:

ISO-8859-1 for Western Europe
Shift JIS for Japanese
Windows-1252 for Microsoft systems

💀 The result? Encoding chaos. A file written in one system might be unreadable in another.

Enter UTF-8, the hero of our story.

🏆 What Makes UTF-8 Special?

UTF-8 was designed in 1992 by Ken Thompson and Rob Pike (yes, the same Rob Pike who helped create Go!). It solved the encoding mess by being:

✅ Backwards-compatible with ASCII
✅ Compact for common characters (English stays at 1 byte per character)
✅ Capable of encoding every language and symbol
✅ Error-resistant (invalid bytes won’t accidentally form valid characters)

This is why UTF-8 is now used by 97% of websites and is the default encoding for most programming languages, including Go.

💻 UTF-8 in Action (With Go Examples)

Since Go natively supports UTF-8, you don’t need to do anything special—it just works. But let’s dig into some examples to see it in action.

1️⃣ Encoding a String as UTF-8 Bytes

package main

import (
    "fmt"
)

func main() {
    text := "Hello, 世界" // ASCII + Unicode
    utf8Bytes := []byte(text) // Convert to UTF-8 bytes

    fmt.Println("Original String:", text)
    fmt.Println("UTF-8 Bytes:", utf8Bytes) // Raw byte representation
    fmt.Printf("UTF-8 Bytes in Hex: %x\n", utf8Bytes)
}

📝 Output:

Original String: Hello, 世界  
UTF-8 Bytes: [72 101 108 108 111 44 32 228 184 150 231 149 140]  
UTF-8 Bytes in Hex: 48656c6c6f2c20e4b896e7958c

💡 Notice:

English characters (Hello, ) are 1 byte each.
Chinese characters (世界) are 3 bytes each.

This variable-length encoding is why UTF-8 is so efficient!

2️⃣ Decoding UTF-8 Bytes Back to a String

package main

import (
    "fmt"
)

func main() {
    utf8Bytes := []byte{72, 101, 108, 108, 111, 44, 32, 228, 184, 150, 231, 149, 140}
    text := string(utf8Bytes) // Convert back to string

    fmt.Println("Decoded String:", text)
}

💡 No extra libraries—Go just handles it. That’s one of the nice things about UTF-8 in Go.

3️⃣ Handling UTF-8 in Web Applications

If you're building a web app, always specify UTF-8 in your response headers:

package main

import (
    "fmt"
    "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/html; charset=utf-8") // Ensure UTF-8
    fmt.Fprint(w, "<h1>Hello, 世界!</h1>")
}


func main() {
    http.HandleFunc("/", handler)
    fmt.Println("Server running at http://localhost:8080")
    http.ListenAndServe(":8080", nil)
}

💡 Without charset=utf-8, some browsers might misinterpret the text and display garbage characters.

4️⃣ Validating UTF-8 Data

Not every byte sequence is valid UTF-8. You can check with utf8.ValidString():

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    valid := "Hello, 世界"
    invalid := []byte{0xff, 0xfe, 0xfd} // Invalid UTF-8

    fmt.Println("Is valid UTF-8?", utf8.ValidString(valid))
    fmt.Println("Is invalid UTF-8?", utf8.Valid(invalid))
}

📝 Output:

Is valid UTF-8? true  
Is invalid UTF-8? false

✅ Great for validating user input before processing it!

5️⃣ Counting Unicode Characters (Runes) in a String

Go strings are byte sequences, not necessarily character sequences.

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    text := "Hello, 世界"
    fmt.Println("Byte length:", len(text)) // Counts bytes
    fmt.Println("Rune count:", utf8.RuneCountInString(text)) // Counts characters
}

📝 Output:

Byte length: 13  
Rune count: 9

❗ Why the difference? Because 世界 takes 3 bytes each, so len(text) == 13, but there are only 9 characters.

6️⃣ Iterating Over Unicode Characters

Since some characters take more than 1 byte, normal indexing won’t work. Use range:

package main

import (
    "fmt"
)

func main() {
    text := "Hello, 世界"

    for i, r := range text {
        fmt.Printf("Index: %d, Rune: %c, Unicode: U+%04X\n", i, r, r)
    }
}

📝 Output:

Index: 0, Rune: H, Unicode: U+0048  
Index: 1, Rune: e, Unicode: U+0065  
Index: 2, Rune: l, Unicode: U+006C  
Index: 7, Rune: 世, Unicode: U+4E16  
Index: 10, Rune: 界, Unicode: U+754C

❗ Notice how 世界 starts at index 7, not 5, because it uses 3 bytes each.

🚀 Why UTF-8 is the Default Encoding

Before UTF-8:
❌ Confusing mess of different encodings
❌ Text corruption between systems
❌ Websites needed to support multiple charsets

After UTF-8:
✅ One encoding for everything
✅ No more garbled text (mojibake)
✅ Supported everywhere—from databases to web APIs

🌍 That’s why UTF-8 won.

🎯 Final Thoughts

If you’re dealing with text in Go (or any language), understanding UTF-8 is essential. It ensures your applications work worldwide without encoding issues.

DEV Community

🌍 UTF-8

🔥 The Problem UTF-8 Solves

🏆 What Makes UTF-8 Special?

💻 UTF-8 in Action (With Go Examples)

🚀 Why UTF-8 is the Default Encoding

🎯 Final Thoughts

Top comments (0)

Read next

An introduction to function calling and tool use in LLMs

Level Up Your AI-Era Dev Rizz (No LeetCode Required!)

Flutter App Memory Optimization Techniques

🚀 ¡Go 1.24 ya está aquí! 🚀