Glob Patterns: A Hilarious Journey through Slashes and Asterisks!
(Cover image from pexels.com by cottonbro studio)
ACT 1: EXPOSITION
I don't know about you, but I almost never manage to nail down a Glob Pattern URL on the first try when using it in cy.intercept()
.
Sure, it seems easy enough at first glance! But let's be honest—most of the time, you don't get it right on the first attempt, either!
And that's Okay. I promise I won't judge you if you don't judge me. 😉
So, I decided to unravel this enigma once and for all, crafting the definitive solution with strategies and even a cheat sheet that will help us nail down these URL glob patterns in our Cypress interceptions—making this article the ultimate Baba Yaga of glob patterns, striking fear into the heart of any misplaced "path segment"!
ACT 2: CONFRONTATION
THE THEORETIC
Alright, in Cypress, URLs used in an intercept can be:
- Full URL (e.g.
https://example.com/users/admin
) - Relative URL (e.g.
/users/admin
orusers/admin
—both represent the same)
Also, a URL can be specified as a String, a Glob, or a RegExp. In this article, we will focus on Glob patterns.
⚠️ IMPORTANT: For relative URLs in an intercept, they will be relative to the baseUrl
Cypress configuration parameter, unless you specify the url
and the API hostname
in a RouteMatcher
object.
According to the Cypress documentation for URL glob patterns: "Under the hood, cy.intercept
uses the minimatch library with the { matchBase: true } option".
If you're like me, you might be tempted to immediately explore the minimatch library. However, let's set this library aside, at least for now, and focus on the basics first. That's exactly why I haven't included a hyperlink to the library.. 🤗
Before we continue, it's important to define and understand a key concept: path segment.
A path segment in a URL is the portion of the URL path located between slashes (/
). In a relative URL like /users/admin
, users
is first path segment, and admin
is the second path segment.
And now, let's confront the two most notorious offenders in the shadowy realm of glob patterns: the elusive single *
, and the daunting double **
. 👹
In URL glob patterns:
-
*
: Matches any characters except/
within a single path segment.⚠️ IMPORTANT:
-
/*/
signifies exactly one path segment. -
/*/*/
indicates precisely two path segments, and so forth. - You can use
*
multiple times within a single path segment, such as in/*-us*/
.
-
Examples
/images/*.jpg
will match/images/photo.jpg
, but not/images/nature/photo.jpg
/images/*/photo.jpg
will match/images/nature/photo.jpg
, but not/images/photo.jpg
/*ag*/photo.jpg
will match/images/photo.jpg
-
**
: Matches any characters including/
across multiple path segments, allowing for a broader match.⚠️ IMPORTANT:
-
**
has special significance only when it is the sole content in a path part (between/**/
); otherwise, it will behave exactly like*
. -
/**/
indicates be zero or any number of path segments. -
/**/**/
is treated as/**/
. - Note that if you specify
/**
at the end of a pattern, there might be zero path segments after it, but if there is zero, the URL must end then with a/
(e.g.,api/users/**
will matchapi/users/admin
andapi/users/
, but will not matchapi/users
).
-
Examples:
ab/**/cd
will matchab/wx/yz/cd
, butab/**cd
will notab/w**/**z/cd
will matchab/wx/yz/cd
since**
behaves like*
in this case, as it is not the only thing between two/
/images/**/photo.jpg
will match both/images/photo.jpg
and/images/nature/gallery/photo.jpg
Easy peasy, right? Yet, like trying to track John Wick's next move, pinpointing those elusive URL paths remains a challenge when we try to intercept a request in our tests.
TEST YOUR GLOB ABILITIES: AN EXAMPLE
Suppose that in our cypress/config.js, we have defined baseUrl: 'https://example.com'
, and our glob pattern is */v2/**/images/*/*umb*
. Which of the following URL requests would be intercepted successfully?
https://example.com/api/files/v2/project/2022/gallery/images/small/thumb.png
https://example.com/v2/project/2022/gallery/images/large/thumbs
https://example.com/files/v2/project/2021/gallery/images/snapshot/small/thumb.png
https://example.com/api/v2/project/2023/gallery/images/long_thumbnail
https://example.com/files/v2/project/2022/images/small/my_thumb/pics
https://example.com/files/v2/project/2022/gallery/images/04/umb
https://example.com/api/v2/project/images/large/thumbnail
🤔...
🤔...
🤔...
Let's break it down what the URL glob pattern */v2/**/images/*/*umb*
signifies:
- It's a relative URL, with the root path being
baseUrl: 'https://example.com'
. - Between the root path and the
v2
path segment, there must be exactly one path segment. - Between the
v2
path segment and theimages
path segment, there can be any number of path segments. - Following the
images
path segment, there must be one path segment, followed by another path segment that contains the stringumb
.
So the answer is: Requests from 1 to 5 will not be intercepted; however, requests 6 and 7 will. Did you get them right on your first try?
We will analyze each of these URLs one by one to determine why they do or do not match with the glob pattern */v2/**/images/*/*umb*
, based on our understanding of *
and **
. Remember we defined baseUrl
as 'https://example.com'
. Additionally for clarity, we will stack the URL and the glob pattern vertically to unravel the mystery.
CASE 1
The *
before v2
in the glob indicates that exactly one path segment is expected between baseUrl
and v2
. However, the URL contains two path segments: api
and files
.
So, interception failed! ❌
CASE 2
The *
before v2
in the glob indicates that exactly one path segment is expected between baseUrl
and v2
. However, the URL does not contain any path segments in that position.
As result, interception failed! ❌
CASE 3
The *
between images
and *umb*
in the glob indicates that exactly one path segment is expected between them. However, the URL contains two path segments: snapshot
and small
.
Hence, interception failed! ❌
CASE 4
The *
between images
and *umb*
in the glob indicates that exactly one path segment is expected between them. However, the URL does not contain any path segments in that position.
Meaning, interception failed! ❌
CASE 5
The *umb*
at the end of the glob pattern indicates that it must be the last path segment and must contain the string umb
. However, in the URL, the path segment my_thumb
is not the final path segment.
As consequence, interception failed! ❌
CASE 6
Bingo, interception successful! ✔️
CASE 7
We're on a roll, interception successful! ✔️
Okay, maybe things are a bit clearer now. But how about we create a cheat sheet so we can spot this stuff spot on?! 🎯
THE CHEAT SHEET
I have created a table where the first column lists the main use cases for URL Glob Patterns. The second column contains examples of URLs that match each glob pattern, with the matching portions highlighted in green. The third column provides examples of URLs that do not match, with the cause of the mismatch highlighted in red.
From these primary glob patterns, you can construct the rest and confidently apply what you've learned so far, along with the cheat sheet!
BONUS CONTENT: EXTGLOB PATTERNS
Remember when I mentioned at the beginning of the article to "set aside" the minimatch library for now?
Well, if you're feeling adventurous, we can now explore some concepts from the minimatch library. Specifically, we will focus on extglob patterns, adding yet another enlightening twist to our URL glob knowledge. 🦉
An extglob pattern is an extension of the standard glob patterns that provide additional pattern matching capabilities in Unix-like environments such as Bash. Extglobs allow for more complex matching conditions by using specific syntax to define patterns for inclusion and exclusion. These patterns enhance the flexibility of path matching, making them more powerful than the simpler glob patterns.
Some of the most useful extglob patterns are:
-
!(pattern)
: Matches anything except the specified pattern. -
@(pattern)
: Matches exactly one of the specified patterns. -
+(pattern)
: Matches one or more occurrences of the specified pattern.
Let's look at a few examples of extglob patterns. Suppose we want to intercept requests to the URL https://owasp.org/www--site-theme/assets/sitedata/menus.json
.
Will glob pattern is www--site-theme/**/menus.!(jpg)
intercept that request?
The answer is yes, because it will intercept files with name menus
and the extension is NOT jpg
. ✔️
How about the glob pattern www--site-theme/**/menus.+(json|png)
?
In this case, the answer is also yes, because it will intercept files with the name menus
and an extension of either json
or png
. ✔️
You get the idea! 👍👍👍
ACT3: RESOLUTION
Ah, we've journeyed together through the labyrinth of slashes and asterisks, becoming glob masters along the way! With our newfound knowledge and cheat sheet in hand, we can confidently commandeer those elusive URLs in cy.intercept()
, leaving no stray path segment unturned.
Remember, with each glob pattern conquered, you are an inch closer to becoming the Baba Yaga of your test suites!
Revel in your mastery and share your triumph — follow, react, or comment if these insights have sparked joy or illuminated your path! ❤️ 🦄 🤯 🙌 🔥
Top comments (4)
Thank you @sebastianclavijo ! It is really useful!
Thank you Mohhammad!
Excellent post! I've definitely learnt something new here!!
Thank you James! Glad it could be of help.