I love RegEx, I use it every day and I will show you how to use it to easily get some smaller and larger tasks done.
But...
Don’t use it in production
Ok, first things first: Be very careful using RegEx for anything in production code if you're not absolutely certain it's actually necessary.
This is an example of what could happen. In 95% of the cases, it's much safer and easier to comprehend to use simple loops to go over data, using something like String.contains()
or String.split(delimiter)
to search and break strings up in a simple and readable way.
[EDIT] To be very clear: I mean what I said above. Don’t use anything I show you here in production. I personally only use that on log files, test data and manual data creation.
Tools
There is actually no special tool I use. Every more or less sophisticated text editor or IDE supports RegEx in search an replace. Most of the work I personally do in Sublime Text, sometimes in IntelliJ.
Useful RegEx
This is how I most often use RegEx in my day-to-day life.
Replace start end of line
Consider you have the following text
Flour
Eggs
Milk
Salt
Maple sirup
And you want to make a bulleted list. You could obviously enter a *
in front of every line manually. But, you can use RegEx, of course.
Search | Replace by |
---|---|
^ |
* |
This will result in:
* Flour
* Eggs
* Milk
* Salt
* Maple sirup
The ^
is a special character that matches the beginning of a line. Replacing this with one or more characters will prefix each line.
The same goes for end of a line. Let's say you need to add a comma at the end of each line.
"Foo"
"Bar"
"Baz"
Search | Replace by |
---|---|
$ |
, |
"Foo",
"Bar",
"Baz",
The last comma might be unnecessary and thus must be removed manually. There is a more sophisticated search to fix this but most of the time it's not worth the effort. It's always good to let RegEx do the heavy lifting and fix the resulting 2% manually.
Swapping Columns
Assume we got the following data
"foo":8,
"bar":42,
"baz":13,
Search | Replace by |
---|---|
"(\w+)":(\d+), |
"$2":"$1", |
"8":"foo",
"42":"bar",
"13":"baz",
What's happening here? We are using groups. A group is delimited by parentheses. So we have (group1)(group2)(group3)
. The cool thing about groups is to use them later on. In Sublime, $n
is used where n
is the group index starting with 1. Notice that we did not include the ,
and "
inside the groups. Inside each group, I am using \d
which matches a single digit and \w
matching a word character like a-z, A-Z, 0-9 and _
, but no -
e.g. +
matches one ore more characters of the kind.
Convert CSV to JSON
Let's assume we have the following CSV:
1,35,"Bob"
2,42,"Eric"
3,27,"Jimi"
Search | Replace by |
---|---|
(\d+),(\d+),"(\w+)" |
{"id":$1,"age":$2,"name":"$3"}, |
Result:
{"id":1,"age":35,"name":"Bob"},
{"id":2,"age":42,"name":"Eric"},
{"id":3,"age":27,"name":"Jimi"},
Again, we're using groups and digit or word matchers.
The transformed result could easily turned into valid JSON by adding a wrapper object and arrays as well as removing the last comma. But the heavy lifting is done by RegEx.
Create Test Data
Sometimes I need test data, a lot.
What I usually do, is to create a sequence of numbers using...Excel. Yep, Excel. Excel is pretty smart when it comes to sequences. E.g. you can enter something like:
# |
---|
10 |
20 |
Then select both an drag on the right bottom corner to fill the cells below. Excel is able to determine that the next number is 30. So based on that that, copy the rows in to Sublime:
10
20
30
40
Then I apply the same strategy as before:
Search | Replace by |
---|---|
(\d+) |
{"id":$1,"username":"user$1"}, |
{"id":10,"username":"user10"},
{"id":20,"username":"user20"},
{"id":30,"username":"user30"},
{"id":40,"username":"user40"},
Learning
RegEx101
There is RegEx101 where you can test if RegEx matches. Modern editors like Sublime and IntelliJ will dynamically highlight matches in your current window. However, this page is also great to find errors and to learn what actually matches and why by using hover and the explanation section.
RegEx Golf
Then, you can use RegEx Golf as a fun way to learn RegEx.
And of course, here on dev.to
A regex cheatsheet for all those regex haters (and lovers) 👀
catherine ・ Jan 10 '19
Summary
As you can see there are plenty of use cases for RegEx to help you with small and larger tasks that would manually take hours, especially with large data sets.
Top comments (18)
Great article, good topic. If you’re an expert, there’s no reason not to add regexes to your bag of tricks. The key is to understand not only what happens logically, but also the runtime consequences. For example, take PCRE2, an ubiquitously available library. In this flavor of extended regex, you can use greedy matching (i.e., \d++). Used right, along with other constructs, you can judiciously avoid backtracking by the regex state machine and make your regexes fast and lean. I would advise not to be afraid of them, but like swords, to respect them and understand how to work with them. So it is often with powerful things. :)
Thanks 🙏
Since I would not consider myself as an expert, I would not do it :)
I would still vote against if there is any more readable alternative. Strive for readability/maintainability and only optimize for speed if it’s necessary.
Of course! Makes sense.
As an alternative to RegEx Golf, I've found Regex Crossword to be pretty fun!
Good!
Thanks, I will have a look!
Looks like you’ve started a markdown link but missed the url... ;)
lol oops: regexcrossword.com/
I really have to learn more regex. I use the online tools to figure what I need, but I really need to learn more on it, so it's more ingrained. Especially on search and replace in editors.
Thanks for the article.
I love regular expressions! I was able to circumvent using two whole different APIs by employing some very clever regex string manipulation in one of my projects. The speed improvement is unparalleled.
It depends on the circumstances and requirements but I’d still reply with:
dev.to/stealthmusic/comment/cnm2
I'd say that regex should be used if they can make a significant difference and you're aware of the scope of the problem being solved by it. That's where Cloudflare went wrong, I'd say. I use it for url formatting so even if it goes wrong, all I get is a 404 hopefully :P
Thanks for your advise. I absolutely share your views, so I have to ask if you actually read the first section about „not to use it in production“? ;)
I even use the results of such an operation only for testing purposes.
BTW, I just added a disclaimer, just in case what I wrote here could be misunderstood. I don’t mean something like „using regex is dangerous but I will show you how to do it right“. That’s absolutely not what I intended.
I read this, and fatefully was given the task of taking taking two excel columns of 6,000 zip codes and turning them in to arrays. Made incredibly quick work of that, so thanks!
Haha, I‘m glad my article could help! 😊
Thanks @stealthmusic . It was very helpful. I tried it, and it is amazing !
Glad it helps. There are certainly more things to explore and learn. :)
Nice article. Another alternative, an online visual regex tester: extendsclass.com/regex-tester.html