Ever wondered how to validate URLs in JavaScript? The process is surprisingly simple. On the internet, web pages, images, and videos can be recognized by their URLs (Uniform Resource Locators). They are used to send emails, transfer files, and perform a variety of other tasks. However, unknown URLs can be risky since they can direct users away from safe websites and toward hazardous ones. They can also create a number of attack paths, such as server-side request forgery and malicious script injection (SSRF). This article will take an in-depth look at how to validate URLs in JavaScript using two methods: URL constructor and regex.
What is a validated URL?
When you create a URL input with the appropriate type value, url, you get automated validation that the inputted text is in the proper form to be a legitimate URL. This process can reduce users' likelihood of incorrectly typing their website address or providing an invalid one.
A URL needs to be well-formed, which means it must adhere to all HTTP or HTTPS protocols. It also needs to point to a resource as a URL without an associated resource is invalid. Most web browsers place an address bar above the page showing a website's URL. An example of a standard URL might be http://www.example.com/index.html which denotes a protocol (http), hostname (www.example.com), and file name (index.html).
An application may include additional specifications in addition to having a "valid" URL. For instance, only accepting http: or https: and rejecting file: and other URLs. This may be okay depending on the application, but figuring it out requires some active design work. In order to resolve user difficulties and make recommendations, it also helps to know why the URL's validation failed.
Validating URLs with URL constructor
Browsers provide a URL constructor to parse and create URL objects that provide structured access to its parts. We may use the constructor to validate URLs in JavaScript because it will throw an error if a URL is invalid. The structured access can then approve or refuse based on application-specific components.
A string is passed into the URL constructor with the new keyword. If the string is a valid URL, it returns a new URL object which contains data like the host, hostname, href, origin, protocols, etc. If not, it returns an error:
const isUrlCorrect = new URL("https://www.example.com/");
console.log(isUrlCorrect)
When we log isUrlCorrect to the console, we get the following response.
Let's see the response we get when we pass in an invalid URL string.
const isUrlCorrect = new URL("example");
console.log(isUrlCorrect)
We get a TypeError because โexampleโ is not a valid URL.
Creating a URL validator function with the URL constructor
We can create a URL validator function by creating a custom isUrlValid function using the URL constructor and a try...catch statement.
function isUrlValid(string) {
try {
new URL(string);
return true;
} catch (err) {
return false;
}
}
When the string passed as an argument to the isUrlValid function is a valid URL, it returns true. If not, it returns false.
Validating only HTTP URLs with the URL constructor
Apart from checking if a URL is valid, we might want to determine whether the string is a legitimate HTTP URL and prevent other legitimate URLs like "mailto:/mail@example.com."
Looking at the image above, we can see that the value of the protocol property is https. We can check the protocol attribute of the URL object to determine whether a string is a legitimate HTTP URL.
So, by utilizing the URL constructor and a try...catch statement as we did before, we will create a URL validator function by creating a custom isHttpValid function.
function isHttpValid(str) {
try {
const newUrl = new URL(str);
return newUrl.protocol === 'http:' || newUrl.protocol === 'https:';
} catch (err) {
return false;
}
}
console.log(isHttpValid('https://www.example.com/')); // true
console.log(isHttpValid('mailto://example.com')); // false
What did we do here? We checked whether the protocol property's value is equal to "http:" or "https:," returning true if it is and false if it is not.
Validating URLs using regex
Regular expressions or regex URL validation is another way to validate a URL in JavaScript. All valid URLs follow a specific pattern: protocol, domain name, and path. Sometimes a fragment locator or query string follows the path. You may use regex to search for such patterns in a string if you know the pattern that makes up the URLs. The string passes the regex test if the patterns are present. If not, it fails.
We can also use regular expressions to require HTTPS before we check the URL
function isUrlValid(str) {
const pattern = new RegExp(
'^(https?:\\/\\/)?' + // protocol
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|' + // domain name
'((\\d{1,3}\\.){3}\\d{1,3}))' + // OR IP (v4) address
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*' + // port and path
'(\\?[;&a-z\\d%_.~+=-]*)?' + // query string
'(\\#[-a-z\\d_]*)?$', // fragment locator
'i'
);
return pattern.test(str);
}
The regular expression in the isUrlValid function above determines whether a string is a valid URL.
A regular expression allows you to detect invalid URLs in input strings. A regex, however, is complex and impractical in a real-world situation as it can be challenging to read, scale, and debug. Libraries are often a better option because of this.
Conclusion
In this article, we discussed what a validated URL is and what it looks like. We also discussed the risks that unidentified URLs pose to a web application. We began by utilizing the URL constructor to do simple URL validations and showed how to use it to check the validity of the URLs. Then, we demonstrated how to scan URLs for the most basic and essential information needed, such as all URLs containing a protocol, domain name, and path, using the regular expression method. We additionally explored the complexity and difficulty of using regular expressions to validate URLs.
Please leave a comment below to ask me anything! Iโm always happy to talk and help.
Kindly Connect with me on Twitter and on Linkedin
Thanks for Reading!!! ๐
Originally published on Turing.
Top comments (2)
Your first function also says
https://..
is a valid URL.Regex one says that a google image result (such as
https://www.google.com/search?sca_esv=569111345&rlz=1C5GCEM_en&q=google+image+result&tbm=isch&source=lnms&sa=X&ved=2ahUKEwi8stHkic2BAxXEXvEDHbo-ACwQ0pQJegQICxAB&biw=1720&bih=1241&dpr=1#imgrc=RpR54mzdGQ1LpM
) is faulty.Still looking for a working solution..
Did you find anything?
new URL()
solution is a bit only focused on coding, I mean while coding you probably wanna play with the URL sohttp://-apple-.com
,http://.com
,www.aa
orhttps://we@.com
are probably valid URL. But in our case they are not. do @davidemaye have any idea?