here is just a quick little reminder that if you are ever parsing usernames and or user based content, think if you can parse non-Latin based text
The Problem
Recently I have ran into an issue where the regex for my parsing code, simply does not work on non-Latin based alphabets. For example, if I wanted to parse the display-name from this string: display-name=CoalTheTroll;emotes=;flags=;id=3ceab6bd-de3f-4d05-8038-5cebdb2af1c7; :tmi.twitch.tv USERNOTICE #cohhcarnage
The typical code would look like this:
fun userNoticeParsing(text: String):String{
val displaynamePattern = "display-name=([a-zA-Z0-9_]+)".toRegex()
val displayNameMatch = displayNamePattern.find(text)
return displayNameMatch?.groupValues?.get(1)!!
}
The code above works. However, there is a problem when the display name is non-latin based. For example, a Mandarin display name will not be parsed. So a display-name of 不橋小結 will cause the code to crash
The solution
A simple solution (some might say lazy) is to not worry about ASCII character sets. With regex, we simply say, match all characters after display-name. The code would look like this:
fun userNoticeParsing(text: String):String{
val displayNamePattern = "display-name=([^;]+)".toRegex()
val displayNameMatch = displayNamePattern.find(text)
return displayNameMatch?.groupValues?.get(1) ?: "username"
}
with the regex code above, display-name=([^;]+), we are stating. Match display-name= and any characters that follow one or more times, stop matching once you find a ;. The ()brackets allow us to break the regex expression into groups allowing for a easier match and quick retrieval of what we actually want. Lasty we us the ?: operator to say, if not match is found return "username"
Now, even with character based display names, such as Mandarin our code will work:
val text ="display-name=不橋小結;emotes=;flags=;id=3ceab6bd-de3f-4d05-8038-5cebdb2af1c7; :tmi.twitch.tv USERNOTICE #cohhcarnage"
fun userNoticeParsing(text: String):String{
val displayNamePattern = "display-name=([^;]+)".toRegex()
val displayNameMatch = displayNamePattern.find(text)
return displayNameMatch?.groupValues?.get(1) ?: "username"
}
val expectedUsername = "不橋小結"
val actualUsername = userNoticeParsing(text)
expectedUsername == actualUsername
Conclusion
Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.
Top comments (0)
Subscribe
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)