DEV Community

Cover image for 🌍 UTF-8
Wycliffe A. Onyango
Wycliffe A. Onyango

Posted on

🌍 UTF-8

Have you ever opened a file or webpage and seen something like this?

“Hello, world!”  
Enter fullscreen mode Exit fullscreen mode

Instead of this?

“Hello, world!”
Enter fullscreen mode Exit fullscreen mode

That’s an encoding issue, and if you’ve been coding long enough, you’ve probably run into it at some point.

But why does this happen? Why do some characters get replaced with weird symbols? And most importantly—how do we fix it?

The answer is UTF-8, the encoding that powers almost everything today. Let's talk about what it is, why it matters, and how to use it properly in Go (Golang).

🔥 The Problem UTF-8 Solves

Back in the early days of computing, ASCII was the standard way to represent text. It used 7 bits per character, meaning it could only represent 128 characters (A-Z, a-z, 0-9, and some symbols).

That was fine—until computers went global.

Suddenly, people needed to store and display languages like Chinese (汉字), Arabic (العربية), Hindi (हिन्दी), and more. ASCII just couldn’t handle it.

So different countries created their own encodings:

  • ISO-8859-1 for Western Europe

  • Shift JIS for Japanese

  • Windows-1252 for Microsoft systems

💀 The result? Encoding chaos. A file written in one system might be unreadable in another.

Enter UTF-8, the hero of our story.

🏆 What Makes UTF-8 Special?

UTF-8 was designed in 1992 by Ken Thompson and Rob Pike (yes, the same Rob Pike who helped create Go!). It solved the encoding mess by being:

✅ Backwards-compatible with ASCII
✅ Compact for common characters (English stays at 1 byte per character)
✅ Capable of encoding every language and symbol
✅ Error-resistant (invalid bytes won’t accidentally form valid characters)

This is why UTF-8 is now used by 97% of websites and is the default encoding for most programming languages, including Go.

💻 UTF-8 in Action (With Go Examples)

Since Go natively supports UTF-8, you don’t need to do anything special—it just works. But let’s dig into some examples to see it in action.

1️⃣ Encoding a String as UTF-8 Bytes

package main

import (
    "fmt"
)

func main() {
    text := "Hello, 世界" // ASCII + Unicode
    utf8Bytes := []byte(text) // Convert to UTF-8 bytes

    fmt.Println("Original String:", text)
    fmt.Println("UTF-8 Bytes:", utf8Bytes) // Raw byte representation
    fmt.Printf("UTF-8 Bytes in Hex: %x\n", utf8Bytes)
}

Enter fullscreen mode Exit fullscreen mode

📝 Output:

Original String: Hello, 世界  
UTF-8 Bytes: [72 101 108 108 111 44 32 228 184 150 231 149 140]  
UTF-8 Bytes in Hex: 48656c6c6f2c20e4b896e7958c  

Enter fullscreen mode Exit fullscreen mode

💡 Notice:

  • English characters (Hello, ) are 1 byte each.

  • Chinese characters (世界) are 3 bytes each.

This variable-length encoding is why UTF-8 is so efficient!

2️⃣ Decoding UTF-8 Bytes Back to a String

package main

import (
    "fmt"
)

func main() {
    utf8Bytes := []byte{72, 101, 108, 108, 111, 44, 32, 228, 184, 150, 231, 149, 140}
    text := string(utf8Bytes) // Convert back to string

    fmt.Println("Decoded String:", text)
}
Enter fullscreen mode Exit fullscreen mode

💡 No extra libraries—Go just handles it. That’s one of the nice things about UTF-8 in Go.

3️⃣ Handling UTF-8 in Web Applications

If you're building a web app, always specify UTF-8 in your response headers:

package main

import (
    "fmt"
    "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/html; charset=utf-8") // Ensure UTF-8
    fmt.Fprint(w, "<h1>Hello, 世界!</h1>")
}


func main() {
    http.HandleFunc("/", handler)
    fmt.Println("Server running at http://localhost:8080")
    http.ListenAndServe(":8080", nil)
}
Enter fullscreen mode Exit fullscreen mode

💡 Without charset=utf-8, some browsers might misinterpret the text and display garbage characters.

4️⃣ Validating UTF-8 Data

Not every byte sequence is valid UTF-8. You can check with utf8.ValidString():

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    valid := "Hello, 世界"
    invalid := []byte{0xff, 0xfe, 0xfd} // Invalid UTF-8

    fmt.Println("Is valid UTF-8?", utf8.ValidString(valid))
    fmt.Println("Is invalid UTF-8?", utf8.Valid(invalid))
}
Enter fullscreen mode Exit fullscreen mode

📝 Output:

Is valid UTF-8? true  
Is invalid UTF-8? false
Enter fullscreen mode Exit fullscreen mode

✅ Great for validating user input before processing it!

5️⃣ Counting Unicode Characters (Runes) in a String

Go strings are byte sequences, not necessarily character sequences.

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    text := "Hello, 世界"
    fmt.Println("Byte length:", len(text)) // Counts bytes
    fmt.Println("Rune count:", utf8.RuneCountInString(text)) // Counts characters
}
Enter fullscreen mode Exit fullscreen mode

📝 Output:

Byte length: 13  
Rune count: 9 
Enter fullscreen mode Exit fullscreen mode

❗ Why the difference? Because 世界 takes 3 bytes each, so len(text) == 13, but there are only 9 characters.

6️⃣ Iterating Over Unicode Characters

Since some characters take more than 1 byte, normal indexing won’t work. Use range:

package main

import (
    "fmt"
)

func main() {
    text := "Hello, 世界"

    for i, r := range text {
        fmt.Printf("Index: %d, Rune: %c, Unicode: U+%04X\n", i, r, r)
    }
}
Enter fullscreen mode Exit fullscreen mode

📝 Output:

Index: 0, Rune: H, Unicode: U+0048  
Index: 1, Rune: e, Unicode: U+0065  
Index: 2, Rune: l, Unicode: U+006C  
Index: 7, Rune: 世, Unicode: U+4E16  
Index: 10, Rune: 界, Unicode: U+754C
Enter fullscreen mode Exit fullscreen mode

❗ Notice how 世界 starts at index 7, not 5, because it uses 3 bytes each.

🚀 Why UTF-8 is the Default Encoding

Before UTF-8:
❌ Confusing mess of different encodings
❌ Text corruption between systems
❌ Websites needed to support multiple charsets

After UTF-8:
✅ One encoding for everything
✅ No more garbled text (mojibake)
✅ Supported everywhere—from databases to web APIs

🌍 That’s why UTF-8 won.

🎯 Final Thoughts

If you’re dealing with text in Go (or any language), understanding UTF-8 is essential. It ensures your applications work worldwide without encoding issues.

Top comments (0)