DEV Community

joedev090
joedev090

Posted on • Edited on

How to code a web scraper in c# dotnet 8

Hey coders!!

A famous term in web development, Web Scraping

Web scraping is the process of collecting unstructured and structured data in an automated manner.

Main use cases include price monitoring, price intelligence, news monitoring, lead generation, market research and more.

Let's check how to create a small script to get data from a popular website to see the temperature.

https://weather.com/

Previous to continue you have to install the dotnet cli.

If you want to check in more details, please check the next reference:

https://learn.microsoft.com/en-us/dotnet/core/install/macos

We are going to build the script in VS code with c#.

  • Open a folder in VS code and open a terminal

Image description

  • The next step, run this command to create a console app in c#

dotnet new console -o webscrapping

  • After that, we have to install the package HtmlAgilityPack from nuget
  1. Add the extension Nucket Package manager in your VS Code in the extensions section.
  2. Press Ctrl+Shift+P (en Windows/Linux) or Cmd+Shift+P (in macOS).
  3. Write NuGet: Add Package, it will open a new window where you have to search the package name.
  4. Search the term "HtmlAgilityPack" and select the version you want to install.
  • Then, we proced to build a small app, so the next step we will paste this code in program.cs
using HtmlAgilityPack;
using System;
using System.Net.Http;

namespace WebScrapping 
{
    class  Program 
    {
        static void Main(string[] args)
        {
            // send get request to weather.com
            String url = "https://weather.com/weather/today/l/ad585d4294c07f7d8e78c8e7d6d3945a4e7e67f135f13f6c013df05e0d6f728e";
            var httpClient = new HttpClient();
            var html = httpClient.GetStringAsync(url).Result;
            var htmlDocument = new HtmlDocument();
            htmlDocument.LoadHtml(html);

            // get the temperature degree
            var temperatureElement = htmlDocument.DocumentNode.SelectSingleNode("//span[@class='CurrentConditions--tempValue--MHmYY']");
            var temperatureValue = temperatureElement.InnerText;

            Console.WriteLine(temperatureValue);
        }
    }
}

Enter fullscreen mode Exit fullscreen mode
  • Finally in the terminal run the command dotnet run

You can see the next result in your console:

60°

And it will be the same we can see in the website

Image description

Note:
To verify the class name, please check the dev tools and in the html section you can verify the html tags and so on.

Image description

Top comments (2)

Collapse
 
sinni800 profile image
sinni800

A strong tip though: Use View Source instead of dev tools in your web browser. Why? Because dev tools show the current state of the page after Javascripts were already run. For all you know, the site might begin with almost no html at all and only get filled through some react application because they dont have any server side rendering.

Which would reveal itself through View Source.

Collapse
 
joedev090 profile image
joedev090

Nice, thanks for your comment. I will have in mind.