Have you ever been in a situation in school or college where your teacher gives you two to three different solutions on how to solve particular problem based on how it is structured, he/she will be like "if the question has an even number use solution a, if it has a prime number use solution c" and so on. Then after dropping about three different approaches, he then drops a fourth one and calls it the almighty formula( no matter the type of number, this solution will solve everything). This happened to me quite a number of times back in school and honestly it was annoying.
Coming back to JavaScript or programming in general, we tend to have our own almighty formula for strings. This formula can solve almost everything relating to strings from matching to testing and so on. This is called regular expression or regex.
So what is a regular expression
Regular expressions are basically patterns used to match character combinations in some part of a string. Different ways of creating a regular expression includes
- A regular expression can be created using a regular expression literal. (A regular expression literal is made of two backslashes for example
/regex/
) - Calling the constructor function of the regex object for example
new regExp("abc+d")
The first one is best used when you know the character combinations you want to match , while the second one is used if for example you are storing the regex in a variable or passing it from a user input.
Regular expression has a couple of built in methods that are basically used to test strings with the defined pattern. Here are the ones we will be looking at today
- Test
- Match
- Replace and
- Split
Don't worry about how the expression are created, we will also talk about that.
Test
The test
method is one of the most common method you will be using, and it basically takes a regular expression and tests it with a text you pass in. The test method returns true if a part of the text you pass in matches the regular expression.
/abcd/.test("abcd") // returns true.
Basically a literal expression matches letter for letter with the string passed to it except otherwise stated. So in this case, /abcd/
matches "abcd" exactly, a for a and b for b etc,
Match
The match
method also searches through a string that you pass in, but this returns the exact match found as an array. Note that the match method also returns the position where the match was found and the text passed except you use a g
flag, which you will learn about below. But basically with the g
flag it returns just the matched string alone
"abcd".match(/abc/) // returns ["abc", index: 0, input: "abcd", groups: undefined]
"abcd".match(/abc/g) // returns ["abc"]
It searches the same way the test method does but instead of returning true/false ,it returns an array.
Replace
The replace
method basically does just that replace. The replace method like the match and test method searches the whole string but instead of returning true if found or an array, it replaced that exact match with another string you pass in.
"Java is awesome".replace(/Java/,"JavaScript") // returns "JavaScript is awesome"
Split
If you have worked with JavaScript for a while, then you should be familiar with the split
method. The split basically takes a string or sentence and breaks it into an array based on the value you pass to it. This value is the separator .
"JavaScript is awesome guys".split(" ") // ["JavaScript","is","awesome","guys"]
Basically the split method loops through the string and anywhere it finds the separator that was passed (in this case, a space), it breaks it into an array. The split method also accepts regex as a separator which we will see later.
Flags
Before we move to constructing various regex, we will take a detour and talk about flags in regex.
Flags are optional when writing regex, but they help us a great deal. we are going to talk about 2 of the most important ones in javascript.
-
i - The
i
flag makes searches case insensitive, that is it makes it such that there is no difference btw a and A -
g - The
g
(global) flag looks through the whole string and get multiple matches. Without it regex finds the first match and stop.
Now that we have talked about some methods and flags in regex, we will now write different forms of regex. Note that as we write this regex, we will also be testing it using any of the methods or flags described above and this will be random, but any method or flag can be used based on what you want to do.
- Like I said earlier, we can test for the exact string.
let regex=/Quick/
let string1 ="Quick"
let string2="quick"
regex.test(string1) // return true
regex.test(string2) // return false
From above, the regex matches the exact word with the exact cases (uppercase for uppercase)
- You can search for multiple strings using the or operator
|
let regex =/quick|brown|lean/
console.log(regex.test("the quick fox")) // returns true
let string ="the quick brown fox"
console.log(string.match(regex)) // returns ["quick", index: 4, input: "the quick brown fox", groups: undefined]
This returns the first match found.
With the global flag, it returns all match found
console.log(string.match(/quick|brown/g)). // returns ["quick", "brown"]
- The dot/period
.
is called a wildcard and it matches any character, number, symbol and so on. But it matches just one character.
let regex =/hu./g
let string = "This are words with hu, hug, hum, hub and huh"
console.log(string.match(regex)) // returns ["hu,","hug","hum","hub","huh"]
You remember the flags right, normally regex
should find the first match and stop but because of the global flag, it goes through everything.
- Character class
[]
lets you define a group of character you want to match. It basically matches any character inside that character class.
let string="this are b_g strings e.g bOg big bAg bug"
console.log(string.match(/b[oui_]g/gi)) // returns ["b_g","bog","big","bug"]
Without the i
flag bOg and bAg won't be matched because in regex A is different from a. But the i
flag makes it such that regex does not match cases.
- The hypen
-
when used inside a character set, let's you define a range of characters or numbers instead of listing them out
console.log("bay bby bcy bdy".match(/b[a-c]y/g)) // returns ["bay", "bby", "bcy"]
console.log("123456789".match(/[5-8]/g)) // returns ["5", "6", "7", "8"]
- The caret
^
symbol when used inside a character set makes sure that none of the string in that set is matched. It can be used with or without the hypen
console.log("bay bby bcy bdy".match(/b[^a-c]y/g)) // returns ["bdy"]
- The caret
^
symbol when used at the beginning of a regular expression outside the character set basically means that the string passed in must start with that particular word or number.
console.log("123456789".match(/^[5-8]/g))// returns null
From above, we are basically saying the string must start with 5,6,7 or 8.
- The dollar
$
symbol makes sure that a string ends with a particular character or characters. Similar to the caret, just different positions
console.log(/JavaScript$/i.test("I love javascript")) // returns true
console.log(/JavaScript$/i.test("I love javscript")) //returns false
- The
+
symbol allows you to match the same character multiple times
console.log(/huh+/.test("huhhhhhhhhhhh")) // returns true
- The
*
symbol lets you match the same character multiple times also, but while the+
symbol lets you match one or more times the*
matches zero or more time. Basically with the+
, the character you are matching must appear at least once but with the*
it might or might not appear.
console.log(/huh*/.test("hu")) // returns true
-
?
is used to make a character optional that is, it might exist or not
console.log(/colou?r/.test("color")) // returns true
- A look head looks ahead of the string to check if a particular letter exists.
Positive look ahead makes sure that the string specified exist
(?=...)
for example(?=u)
and the negative look ahead makes sure the string is not there(?!...)
console.log("yes!".match(/yes(?=!)/g)) // returns ["yes"]
In the example above we only want to match yes if it is followed by an exclamation mark.
console.log("yes?".match(/yes(?=\?)/g)) // returns ["yes"]
?
is a special character in regex like we have seen above, so in order to check for it you need to escape it. The same way you escape quotation inside a string.
- We can also check for groups of characters using parentheses
()
console.log(/ex(pect|cept)/.test("expect")) // returns true
The next set of characters we will be looking at are certain alphabets that have special meaning in regex, but in order to use them for this their special purpose, you use the \
to escape them like we escaped ? above.
-
\w
is used to match all uppercase, lowercase, numbers and underscore. This is basically the same as[a-zA-Z0-9_]
, just really shorter.
console.log("JAVASCRIPT _ react = 5 and 6 :)".match(/\w/g)) // ["J", "A", "V", "A", "S", "C", "R", "I", "P", "T", "_", "r", "e", "a", "c", "t", "5", "a", "n", "d", "6"]
-
\W
is used to match anything that is not an alphabet, number or underscore, similar to [^a-zA-Z0-9_].
console.log("JAVASCRIPT _ react = 5 and 6 :)".match(/\W/g)) // returns [" ", " ", " ", "=", " ", " ", " ", " ", ":", ")"]
It basically matched the spaces, =, : and )
-
\d
is used to match digits or numbers, similar to [0-9]
console.log("JAVASCRIPT _ react = 5 and 6 :)".match(/\d/g)) // returns ["5","6"]
-
/D
is used to match anything that is not a digit or number, similar to [^0-9].
console.log("JAVASCRIPT _ react = 5 and 6 :)".match(/\D/g)) // returns ["J", "A", "V", "A", "S", "C", "R", "I", "P", "T", " ", "_", " ",...].
- The
\s
matches form fields, carriage return level, white line and new lines
console.log("JAVASCRIPT _ react = 5 and 6 :)".split(/\s/g)) // returns ["JAVASCRIPT", "_", "react", "=", "5", "and", "6", ":)"]
- The
\S
matches anything except from form fields, carriage return level, white line and new lines
console.log("JAVASCRIPT _ react = 5 and 6 :)".match(/\S/g)) // returns ["J", "A", "V", "A", "S", "C", "R"...]
Quantifiers
Quantifiers {}
are used to match specific number of characters. You can specify the values like this, {x,y}
where x is the lower bound and y is the higher bound, so basically the number of characters written must be between x and y
console.log(/huh{2,5}/.test("huhhh")) //returns true
console.log(/huh{2,5}/.test("huh")) //returns false
You can also decide to use just x. Using only x means the number of character inserted must be from x to infinity.
console.log("huhhhhhhhhhhhhhhh".replace(/h{2,}/,"g")) //returns hug
You can also use the quantifier to match an exact number of character like below
console.log("huhhhh".replace(/h{4}/,"g"))
If you have read it to this point congratulations. I know this is a really long and exhausting article but that is regex for you. I hope you have learnt a lot from this.
There are a few characters and combinations I left out, I felt this are the ones you will mostly use.
It is OK if you feel overwhelmed with this, when I first learnt regex, I was confused and this was mostly because I did not know when to use it and also because it looked really difficulty to construct.
So in my next article, we are going to go through a couple of coding problems that should help you get comfortable with regex. I am also going to mix it a few other non regex questions so that you can know when you can or cannot use it(I might be wrong about this), see you next time, bye for now.
Thank you.
Top comments (2)
Tip - try using JS syntax highlighting on your examples by adding
javascript
after the backticks at the start of the blockWhoa.... Thank you very much for this
This really is awesome
Thanks