Serhii Korol

Posted on Feb 14

Say Goodbye to WebDriver: Modern Alternatives for Browser Automation – Part 1

#csharp #dotnet #automation

Every developer, including Automation QA engineers, is familiar with Selenium WebDriver, the go-to library for browser automation. WebDriver allows you to programmatically launch browsers, open websites, locate elements, and parse HTML. Another popular alternative is PuppeteerSharp, a .NET port of the JavaScript library Puppeteer, which offers similar capabilities. However, I propose a different approach: ditching both libraries and embracing a more modern solution. In this article, I’ll show you how to manage and interact with browsers without relying on third-party libraries by leveraging the Chrome DevTools Protocol (CDP) in .NET 9.

The Chrome DevTools Protocol (CDP), developed by Google, enables direct communication with the Chrome browser via WebSockets. This approach provides greater flexibility and control compared to traditional tools like Selenium or Puppeteer. Ready to dive in? Let’s get started!

Preconditions

To follow along, you’ll need:

A console project set up in .NET.
The Google Chrome browser installed.
Here’s the basic structure of our project:

internal static class Program
{
    private static async Task Main()
    {
        // We'll fill this in step by step
    }
}

Step 1: Launch the Browser and Open a Page

First, we need to configure and launch the Chrome browser. We’ll specify the port, browser path, and a unique directory for user data. The random filename ensures that each session starts fresh, without retaining tabs or data from previous runs.

internal static class Program
{
    private static async Task Main()
    {
        const int port = 9222;
        string chromePath = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome";
        string userDataDir = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());  
    }
}

Console.WriteLine("Starting a new Chrome instance...");
Directory.CreateDirectory(userDataDir);

Next, we define the arguments for launching Chrome:

--remote-debugging-port: Specifies the port for communication.
--no-first-run: Skips the initial setup prompts (offers a set browser by default and sends statistics).
--user-data-dir: Sets the directory for user data.
The URL to open (in this case, a site that detects bots).

var psi = new ProcessStartInfo
{
    FileName = chromePath,
    Arguments = string.Join(" ",
    $"--remote-debugging-port={port}",
    "--no-first-run",
    $"--user-data-dir={userDataDir}",
    "https://deviceandbrowserinfo.com/are_you_a_bot")
};

Let's start the browser. I set a delay for waiting for page rendering. I intentionally set up a site that runs a fingerprint test for bot detection. If you run code, you'll see a label that you are a human. If you run this site using PuppetterSharp with the same settings, the fingerprint test will detect that you are a bot. It can be a problem when a site uses anti-bot protection, also known as a captcha. When the system suspects that you are a bot, it offers you confirmation that you are a human. You don't have this problem when using CDP. However, PuppetterSharp has a third-party plugin, Stealth, that allows you to pass the fingerprint test. But in practice, it is harder to adjust. The CDP is much easier.

var chromeProcess = Process.Start(psi);
if (chromeProcess == null)
{
Console.WriteLine("❌Failed to start Chrome.");
return;
}

Console.WriteLine("🚀Chrome started. Waiting for initialization...");
await Task.Delay(10000);

Step 2. Retrieve the WebSocket Debugger URL

To communicate with the browser, we need the WebSocket Debugger URL. This URL is obtained by querying the Chrome DevTools API.

ws://127.0.0.1:9222/devtools/browser/13eacb3d-a775-4a9f-8e42-eebb857f9a58

We call http://localhost:9222/json using HTTP. This API returns several objects. Since the browser has an open page, it also has service workers. Yes, we can communicate with service workers. In our case, we need only the page. Here, you need the Newtonsoft.Json NuGet package, but it's optional. You can parse JSON in another way.

static async Task<string?> GetPageWebSocketUrl()
{
    using var httpClient = new HttpClient();
    string json = await             httpClient.GetStringAsync("http://localhost:9222/json");
    JArray tabs = JArray.Parse(json);
    var targetTab = tabs.FirstOrDefault(t => t["type"]?.ToString() == "page");
    return targetTab?["webSocketDebuggerUrl"]?.ToString();
}

Let’s fetch the URL and handle any errors:

string? debuggerUrl = await GetPageWebSocketUrl();
Console.WriteLine(debuggerUrl);
if (string.IsNullOrEmpty(debuggerUrl))
{
    Console.WriteLine("❌Failed to retrieve WebSocket Debugger URL.");
    return;
}

Step 3. Establish a WebSocket Connection

With the WebSocket URL, we can now establish a connection to the browser.

using var ws = new ClientWebSocket();
await ws.ConnectAsync(new Uri(debuggerUrl), CancellationToken.None);

Step 4. Query HTML Content

Here, we want to get a specific HTML element. The CDP also allows you to execute JS code. In more complicated scenarios, you'll have more complicated JS code. The CDP command has ID, type method, and params that can contain different numbers of parameters depending on the type method chosen. In our case, we have the Runtime.evaluate method, which is needed for executing scripts. In the second part of the article, I'll show other methods for different tasks.

var command = new
{
    id = 1,
    method = "Runtime.evaluate",
    @params = new { expression = "document.querySelector('section.content')?.outerHTML || ''" }
};
string message = JsonSerializer.Serialize(command);
byte[] buffer = Encoding.UTF8.GetBytes(message);
await ws.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true,
                    CancellationToken.None);

Step 5. Receive and Parse Data

After sending the command, we’ll receive the response and parse the HTML content.

Console.WriteLine("✅Receiving data...");
using var memoryStream = new MemoryStream();
var receiveBuffer = new byte[65536];
WebSocketReceiveResult result;
do
{
    result = await ws.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None);
    memoryStream.Write(receiveBuffer, 0, result.Count);
} while (!result.EndOfMessage);

memoryStream.Position = 0;
using var reader = new StreamReader(memoryStream, Encoding.UTF8);
string responseText = await reader.ReadToEndAsync();

Step 6. Extract HTML

We’ll deserialize the response and extract the HTML content using a simple model.

public class CdpResponse
{
    [JsonPropertyName("id")]
    public int Id { get; set; }

    [JsonPropertyName("result")]
    public ResultWrapper? Result { get; set; }

    public class ResultWrapper
    {
        [JsonPropertyName("result")]
        public InnerResult? HtmlResult { get; set; }
    }

    public class InnerResult
    {
        [JsonPropertyName("value")]
        public string? Value { get; set; }
    }
}

Console.WriteLine("📝Extracting data...");
var jsonResponse = JsonSerializer.Deserialize<CdpResponse>(responseText);
string htmlContent = jsonResponse?.Result?.HtmlResult?.Value ?? "Failed to extract HTML.";
Console.WriteLine("📄 Page HTML:");
Console.WriteLine(htmlContent);

Step7: Save HTML

Next, you need to save it to the file.

Console.WriteLine("📄 Saving to file...");
await File.WriteAllTextAsync("site.html", htmlContent, Encoding.UTF8);

Step 8. Open the Saved File in Chrome

We got the file, but I don't want to find it and open it manually. I want to open the file in the same browser. For this, we need to send another command to the browser. I used another CDP method,Page.navigate. Actually, we passed the file path and navigated it to him.

static async Task NavigateToSavedFileInChrome(ClientWebSocket ws, string filePath)
{
    try
    {
        if (!File.Exists(filePath))
        {
            Console.WriteLine($"❌ File not found: {filePath}");
            return;
        }

    // 📝Convert the file path to a file URI
    string fileUri = GetValidFileUri(filePath);

    // 🚀Send the navigation command to the existing Chrome tab
    var navigationCommand = new
    {
        id = 2,
        method = "Page.navigate",
        @params = new
        {
            url = fileUri
        }
    };
    string message = JsonSerializer.Serialize(navigationCommand);
    byte[] buffer = Encoding.UTF8.GetBytes(message);
    await ws.SendAsync(new ArraySegment<byte>(buffer),      WebSocketMessageType.Text, true, CancellationToken.None);

    Console.WriteLine($"✅ Navigating to file: {fileUri}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"❌ Error navigating to file in Chrome: {ex.Message}");
    }
}

private static string GetValidFileUri(string filePath)
{
    string absolutePath = Path.GetFullPath(filePath);

    if (Environment.OSVersion.Platform == PlatformID.Win32NT)
    {
        // Windows: file:///C:/path/to/file.html
        return "file:///" + absolutePath.Replace("\\", "/");
    }
    else
    {
        // macOS/Linux: file:///path/to/file.html
        return "file://" + absolutePath;
    }
}

Run it, passing the Web Socket connection and the file path.

Console.WriteLine("🌐 Opening file in the default browser...");
await NavigateToSavedFileInChrome(ws,"site.html");

Step 9. Clean Up

Finally, we’ll close the WebSocket connection and terminate the Chrome process.

Console.WriteLine("🚪Press Enter to close...");
                Console.ReadLine();
await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "", CancellationToken.None);
chromeProcess.Kill();

Final Code

Here’s the complete implementation:

        static async Task ParseHtml()
        {
            const int port = 9222;
            string chromePath = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome";
            string userDataDir = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());

            //🚀Step 1: Start Chrome
            Console.WriteLine("Starting a new Chrome instance...");
            Directory.CreateDirectory(userDataDir);

            var psi = new ProcessStartInfo
            {
                FileName = chromePath,
                Arguments = string.Join(" ",
                    $"--remote-debugging-port={port}",
                    "--no-first-run",
                    $"--user-data-dir={userDataDir}",
                    "https://deviceandbrowserinfo.com/are_you_a_bot")
            };

            var chromeProcess = Process.Start(psi);
            if (chromeProcess == null)
            {
                Console.WriteLine("❌Failed to start Chrome.");
                return;
            }

            Console.WriteLine("🚀Chrome started. Waiting for initialization...");
            await Task.Delay(10000);

            try
            {
                //✅Step 2: Get WebSocket Debugger URL
                string? debuggerUrl = await GetPageWebSocketUrl();
                Console.WriteLine(debuggerUrl);
                if (string.IsNullOrEmpty(debuggerUrl))
                {
                    Console.WriteLine("❌Failed to retrieve WebSocket Debugger URL.");
                    return;
                }

                // ⚙️Step 3: Connect to WebSocket
                using var ws = new ClientWebSocket();
                await ws.ConnectAsync(new Uri(debuggerUrl), CancellationToken.None);

                // 🚀Step 4: Send command to retrieve HTML
                var command = new
                {
                    id = 1,
                    method = "Runtime.evaluate",
                    @params = new { expression = "document.querySelector('section.content')?.outerHTML || ''" }
                };
                string message = JsonSerializer.Serialize(command);
                byte[] buffer = Encoding.UTF8.GetBytes(message);
                await ws.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true,
                    CancellationToken.None);

                // ✅Step 5: Receive and parse response
                Console.WriteLine("✅Receiving data...");
                using var memoryStream = new MemoryStream();
                var receiveBuffer = new byte[65536];
                WebSocketReceiveResult result;
                do
                {
                    result = await ws.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None);
                    memoryStream.Write(receiveBuffer, 0, result.Count);
                } while (!result.EndOfMessage);

                memoryStream.Position = 0;
                using var reader = new StreamReader(memoryStream, Encoding.UTF8);
                string responseText = await reader.ReadToEndAsync();

                // 📝Step 6: Output HTML content
                Console.WriteLine("📝Extracting data...");
                var jsonResponse = JsonSerializer.Deserialize<CdpResponse>(responseText);
                string htmlContent = jsonResponse?.Result?.HtmlResult?.Value ?? "Failed to extract HTML.";

                Console.WriteLine("📄 Page HTML:");
                Console.WriteLine(htmlContent);

                // 📄Step 7: Save HTML to file
                Console.WriteLine("📄 Saving to file...");
                await File.WriteAllTextAsync("site.html", htmlContent, Encoding.UTF8);

                // 🌐Step 8: Open the file in the default browser
                Console.WriteLine("🌐 Opening file in the default browser...");
                await NavigateToSavedFileInChrome(ws,"site.html");

                //🚪Step 9: Close
                Console.WriteLine("🚪Press Enter to close...");
                Console.ReadLine();
                await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "", CancellationToken.None);
            }
            catch (Exception ex)
            {
                Console.WriteLine($"❌ Error: {ex.Message}");
            }
            finally
            {
                if (!chromeProcess.HasExited)
                {
                    chromeProcess.Kill();
                }
            }
        }

Conclusion

The Chrome DevTools Protocol offers a powerful and flexible alternative to traditional browser automation tools like Selenium and Puppeteer. While it lacks the convenience of high-level APIs, it provides unparalleled control and customization.

In Part 2, we’ll dive deeper into interacting with the DOM, manipulating elements, and handling more complex scenarios. Stay tuned!

You can find the full source code here.

I hope you found this guide helpful. Happy coding, and see you in the next part!