MySQL’s utf8 and utf8mb4 are often misunderstood. The key difference is how many bytes they support per character. This quick guide explains why utf8mb4 is better for Unicode and emoji support.
utf8 vs utf8mb4 in MySQL
Here’s a simple breakdown of MySQL’s utf8 and utf8mb4:
UTF-8 (utf8mb3)- Supports only up to 3 bytes per character. Emojis won’t work.
utf8mb4- Supports 4 bytes per character, enabling emojis and supplementary Unicode symbols.
Why MySQL uses utf8mb4
MySQL switched to utf8mb4 to solve a major problem — utf8 could not store 4-byte characters. Trying to store them would trigger an error:
Incorrect string value: ‘\x77\xD0’ for column ‘column_name_here’ at row 1
FAQ
What is UTF-8?
UTF-8 is a method of encoding Unicode characters into binary format.
Why use utf8mb4?
It supports 4-byte characters like emojis, while utf8 only supports 3 bytes.
How do I create a table with utf8mb4?
CREATE TABLE test_table (col VARCHAR(100)) CHARACTER SET utf8mb4;
Is utf8 the same as utf8mb4?
No, utf8 supports 3 bytes per character, while utf8mb4 supports 4.
Conclusion
To avoid issues with Unicode characters and emojis, always use utf8mb4. For a more detailed look, check out the article MySQL’s UTF-8: Is It Real?
Top comments (0)