Takuto Yuki

Posted on Jan 29

Automating 44 E2E Tests with AI-Powered Browser Control for Under $1

#ai #llm #playwright #testing

Introduction

End-to-end (E2E) testing is crucial for ensuring software quality, yet it is often costly. Writing and maintaining test scripts is time-consuming, and even minor DOM changes can cause test failures.

While some engineers enjoy writing tests, few specialize in E2E testing. The tech industry has responded with numerous automated and no-code E2E testing solutions, but these tools are often expensive and lack precision.

browser-use offers a new approach by automating browser interactions. Given its capabilities, I experimented with using it for E2E test automation.

GitHub - browser-use

Overview

In this experiment, I used the following tools and services:

Tools: Python, browser-use, Playwright, Jest
Test site: Sauce Demo
- A mock e-commerce site widely used for testing
- Fully mocked backend with publicly available test accounts
Goal: Minimize manual effort in E2E testing

Key Takeaway

It performs well but requires optimization.

The complete source code is available in this repository:

GitHub - browser-use E2E test automation

Strategy

I initially expected to fully automate E2E testing with a single prompt like "Test this site with E2E!" However, the process required structured steps:

Extract the site structure using browser-use
Generate test scenarios for each page
Review and refine the generated scenarios manually
Generate test code based on the reviewed scenarios
Execute the generated test code

Step 1: Extracting the Site Structure

The first step was extracting the site's structure with browser-use to create a list of pages for testing.

Processing the entire site at once did not work effectively, likely due to LLM processing constraints. Handling each page separately improved stability.

Prompt Example

site_structure_task = f"""
Analyze the website starting from {url}. Identify and output:
1. All accessible pages and subpages within the domain ({url}), including dynamically loaded content.
2. Each page's purpose in concise terms.
3. Include:
   - Static links
   - JavaScript-driven links
   - Form submissions
   - API endpoints (if visible)
4. Group similar structured pages (e.g., query parameters like ?id=).

## Output JSON Format:
[
  { "path": "<path or URL>", "purpose": "<brief description>" },
  ...
]

## Login Information
- id: {user_id}
- password: {password}
"""

Step 2: Generating Test Scenarios

Once the site structure was extracted, test scenarios were generated for each page. By generating scenarios in natural language first, I achieved:

Better manual review
More stable test code generation

Prompt Example

I can read Japanese more accurately, so I set it to Japanese as the scenario_language.

scenario_task = f"""
Generate exhaustive test scenarios for the following page:
- Page: {page_path}
  Purpose: {page_purpose}

For this page, include all possible user actions, such as:
  - Form submissions
  - Button clicks
  - Dropdown selections
  - Interactions with modals or dynamic elements

Test both expected behaviors and edge cases for each action.
Output format:
path: {page_path},
actions:
  - test: <description of action>,
    expect: <expected result>,
  - test: <description of action>,
    expect: <expected result>,

The output must be written in {scenario_language}.

## Root URL
{url}

## Login Information
- id: {user_id}
- password: {password}
"""

Sample Output

path: /,
actions:
  - test: Enter correct username and password, then click login,
    expect: Redirect to user dashboard,
  - test: Leave username blank and attempt login,
    expect: Show error message,
  - test: Enter invalid username and attempt login,
    expect: Show error message,

Step 3: Generating Test Code

Using the reviewed scenarios, Jest and Playwright-based test code was generated. Although running tests directly via LLM is possible, generating structured test code is more reliable and cost-effective.

Prompt Example

task = f"""
Generate Jest + Playwright test code for:
- URL: {url}
- Scenario: {scenario}

Ensure the output is fully executable without modification.
"""

Generated Test Code Example

const { test, expect } = require('@playwright/test');

test.describe('Login Tests', () => {
  test('Valid login', async ({ page }) => {
    await page.goto('https://www.saucedemo.com/');
    await page.fill('input[name="user-name"]', 'standard_user');
    await page.fill('input[name="password"]', 'secret_sauce');
    await page.click('input[name="login-button"]');
    await expect(page).toHaveURL('https://www.saucedemo.com/inventory.html');
  });
});

In this file, beforeEach is also created correctly

const { test, expect } = require('@playwright/test');

test.describe('Checkout Step One Tests', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('https://www.saucedemo.com/');
    await page.fill('#user-name', 'standard_user');
    await page.fill('#password', 'secret_sauce');
    await page.click('#login-button');
    await page.goto('https://www.saucedemo.com/checkout-step-one.html');
  });

  test('User fills in all fields correctly and clicks the Continue button', async ({ page }) => {
    await page.fill('#first-name', 'John');
    await page.fill('#last-name', 'Doe');
    await page.fill('#postal-code', '12345');
    await page.click('#continue');
    await expect(page).toHaveURL('https://www.saucedemo.com/checkout-step-two.html');
  });
});

Test Execution & Results

The generated tests were executed, and 44 tests ran, with 18 failures. However, many failures were reasonable:

Incorrect Expectations (5 tests)
- Example: Expected an error when entering numbers in the name field, but the site allowed it.
- The test scenarios were generated based on ideal behavior, but the actual behavior of the test site differed.
- This is not an issue with the test code itself but rather a mismatch between the expected functionality and the actual implementation of the site.
Test Runner Mismatch (12 tests)
- The tests assumed Playwright’s runner, but Jest Circus was used, causing failures. This can be fixed by specifying the correct runner in the prompt.

Cost Analysis

Using OpenAI’s GPT-4o API, I ran multiple tests, totaling $7. However, with optimized prompts, the entire pipeline (site structure analysis → scenario generation → test code generation) costs under $1.

Conclusion

Manual E2E test writing is becoming obsolete. However, programming languages remain the best way to provide precise AI instructions—for now.

DEV Community

Automating 44 E2E Tests with AI-Powered Browser Control for Under $1

Introduction

Overview

Key Takeaway

Strategy

Step 1: Extracting the Site Structure

Prompt Example

Step 2: Generating Test Scenarios

Prompt Example

Sample Output

Step 3: Generating Test Code

Prompt Example

Generated Test Code Example

Test Execution & Results

Cost Analysis

Conclusion

Top comments (0)

Read next

The Tech Stack for Building AI Apps in 2025

Amazon Lex: Build AI-Powered Conversational Bots

Your Roadmap to Mastering k6 for Performance Testing

Azure OpenAI Error Handling in Semantic Kernel