Have you ever opened a file or webpage and seen something like this?
“Hello, world!â€
Instead of this?
“Hello, world!”
That’s an encoding issue, and if you’ve been coding long enough, you’ve probably run into it at some point.
But why does this happen? Why do some characters get replaced with weird symbols? And most importantly—how do we fix it?
The answer is UTF-8, the encoding that powers almost everything today. Let's talk about what it is, why it matters, and how to use it properly in Go (Golang).
🔥 The Problem UTF-8 Solves
Back in the early days of computing, ASCII was the standard way to represent text. It used 7 bits per character, meaning it could only represent 128 characters (A-Z, a-z, 0-9, and some symbols).
That was fine—until computers went global.
Suddenly, people needed to store and display languages like Chinese (汉字), Arabic (العربية), Hindi (हिन्दी), and more. ASCII just couldn’t handle it.
So different countries created their own encodings:
ISO-8859-1 for Western Europe
Shift JIS for Japanese
Windows-1252 for Microsoft systems
💀 The result? Encoding chaos. A file written in one system might be unreadable in another.
Enter UTF-8
, the hero of our story.
🏆 What Makes UTF-8 Special?
UTF-8 was designed in 1992 by Ken Thompson and Rob Pike (yes, the same Rob Pike who helped create Go!). It solved the encoding mess by being:
✅ Backwards-compatible with ASCII
✅ Compact for common characters (English stays at 1 byte per character)
✅ Capable of encoding every language and symbol
✅ Error-resistant (invalid bytes won’t accidentally form valid characters)
This is why UTF-8 is now used by 97% of websites and is the default encoding for most programming languages, including Go.
💻 UTF-8 in Action (With Go Examples)
Since Go natively supports UTF-8, you don’t need to do anything special—it just works. But let’s dig into some examples to see it in action.
1️⃣ Encoding a String as UTF-8 Bytes
package main
import (
"fmt"
)
func main() {
text := "Hello, 世界" // ASCII + Unicode
utf8Bytes := []byte(text) // Convert to UTF-8 bytes
fmt.Println("Original String:", text)
fmt.Println("UTF-8 Bytes:", utf8Bytes) // Raw byte representation
fmt.Printf("UTF-8 Bytes in Hex: %x\n", utf8Bytes)
}
📝 Output:
Original String: Hello, 世界
UTF-8 Bytes: [72 101 108 108 111 44 32 228 184 150 231 149 140]
UTF-8 Bytes in Hex: 48656c6c6f2c20e4b896e7958c
💡 Notice:
English characters (
Hello
, ) are 1 byte each.Chinese characters (
世界
) are 3 bytes each.
This variable-length encoding is why UTF-8 is so efficient!
2️⃣ Decoding UTF-8 Bytes Back to a String
package main
import (
"fmt"
)
func main() {
utf8Bytes := []byte{72, 101, 108, 108, 111, 44, 32, 228, 184, 150, 231, 149, 140}
text := string(utf8Bytes) // Convert back to string
fmt.Println("Decoded String:", text)
}
💡 No extra libraries—Go just handles it. That’s one of the nice things about UTF-8 in Go.
3️⃣ Handling UTF-8 in Web Applications
If you're building a web app, always specify UTF-8 in your response headers:
package main
import (
"fmt"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html; charset=utf-8") // Ensure UTF-8
fmt.Fprint(w, "<h1>Hello, 世界!</h1>")
}
func main() {
http.HandleFunc("/", handler)
fmt.Println("Server running at http://localhost:8080")
http.ListenAndServe(":8080", nil)
}
💡 Without charset=utf-8
, some browsers might misinterpret the text and display garbage characters.
4️⃣ Validating UTF-8 Data
Not every byte sequence is valid UTF-8. You can check with utf8.ValidString()
:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
valid := "Hello, 世界"
invalid := []byte{0xff, 0xfe, 0xfd} // Invalid UTF-8
fmt.Println("Is valid UTF-8?", utf8.ValidString(valid))
fmt.Println("Is invalid UTF-8?", utf8.Valid(invalid))
}
📝 Output:
Is valid UTF-8? true
Is invalid UTF-8? false
✅ Great for validating user input before processing it!
5️⃣ Counting Unicode Characters (Runes) in a String
Go strings are byte sequences, not necessarily character sequences.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
text := "Hello, 世界"
fmt.Println("Byte length:", len(text)) // Counts bytes
fmt.Println("Rune count:", utf8.RuneCountInString(text)) // Counts characters
}
📝 Output:
Byte length: 13
Rune count: 9
❗ Why the difference? Because 世界 takes 3 bytes each, so len(text) == 13, but there are only 9 characters.
6️⃣ Iterating Over Unicode Characters
Since some characters take more than 1 byte, normal indexing won’t work. Use range
:
package main
import (
"fmt"
)
func main() {
text := "Hello, 世界"
for i, r := range text {
fmt.Printf("Index: %d, Rune: %c, Unicode: U+%04X\n", i, r, r)
}
}
📝 Output:
Index: 0, Rune: H, Unicode: U+0048
Index: 1, Rune: e, Unicode: U+0065
Index: 2, Rune: l, Unicode: U+006C
Index: 7, Rune: 世, Unicode: U+4E16
Index: 10, Rune: 界, Unicode: U+754C
❗ Notice how 世界 starts at index 7, not 5, because it uses 3 bytes each.
🚀 Why UTF-8 is the Default Encoding
Before UTF-8:
❌ Confusing mess of different encodings
❌ Text corruption between systems
❌ Websites needed to support multiple charsets
After UTF-8:
✅ One encoding for everything
✅ No more garbled text (mojibake)
✅ Supported everywhere—from databases to web APIs
🌍 That’s why UTF-8 won.
🎯 Final Thoughts
If you’re dealing with text in Go (or any language), understanding UTF-8 is essential. It ensures your applications work worldwide without encoding issues.
Top comments (0)