Hey coders!!
A famous term in web development, Web Scraping
Web scraping is the process of collecting unstructured and structured data in an automated manner.
Main use cases include price monitoring, price intelligence, news monitoring, lead generation, market research and more.
Let's check how to create a small script to get data from a popular website to see the temperature.
Previous to continue you have to install the dotnet cli.
If you want to check in more details, please check the next reference:
https://learn.microsoft.com/en-us/dotnet/core/install/macos
We are going to build the script in VS code with c#.
- Open a folder in VS code and open a terminal
- The next step, run this command to create a console app in c#
dotnet new console -o webscrapping
- After that, we have to install the package HtmlAgilityPack from nuget
- Add the extension Nucket Package manager in your VS Code in the extensions section.
- Press Ctrl+Shift+P (en Windows/Linux) or Cmd+Shift+P (in macOS).
- Write NuGet: Add Package, it will open a new window where you have to search the package name.
- Search the term "HtmlAgilityPack" and select the version you want to install.
- Then, we proced to build a small app, so the next step we will paste this code in program.cs
using HtmlAgilityPack;
using System;
using System.Net.Http;
namespace WebScrapping
{
class Program
{
static void Main(string[] args)
{
// send get request to weather.com
String url = "https://weather.com/weather/today/l/ad585d4294c07f7d8e78c8e7d6d3945a4e7e67f135f13f6c013df05e0d6f728e";
var httpClient = new HttpClient();
var html = httpClient.GetStringAsync(url).Result;
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
// get the temperature degree
var temperatureElement = htmlDocument.DocumentNode.SelectSingleNode("//span[@class='CurrentConditions--tempValue--MHmYY']");
var temperatureValue = temperatureElement.InnerText;
Console.WriteLine(temperatureValue);
}
}
}
- Finally in the terminal run the command
dotnet run
You can see the next result in your console:
60°
And it will be the same we can see in the website
Note:
To verify the class name, please check the dev tools and in the html section you can verify the html tags and so on.
Top comments (2)
A strong tip though: Use View Source instead of dev tools in your web browser. Why? Because dev tools show the current state of the page after Javascripts were already run. For all you know, the site might begin with almost no html at all and only get filled through some react application because they dont have any server side rendering.
Which would reveal itself through View Source.
Nice, thanks for your comment. I will have in mind.