DEV Community

Cover image for Automating 44 E2E Tests with AI-Powered Browser Control for Under $1
Takuto Yuki
Takuto Yuki

Posted on

Automating 44 E2E Tests with AI-Powered Browser Control for Under $1

Introduction

End-to-end (E2E) testing is crucial for ensuring software quality, yet it is often costly. Writing and maintaining test scripts is time-consuming, and even minor DOM changes can cause test failures.

While some engineers enjoy writing tests, few specialize in E2E testing. The tech industry has responded with numerous automated and no-code E2E testing solutions, but these tools are often expensive and lack precision.

browser-use offers a new approach by automating browser interactions. Given its capabilities, I experimented with using it for E2E test automation.

GitHub - browser-use

Overview

In this experiment, I used the following tools and services:

  • Tools: Python, browser-use, Playwright, Jest
  • Test site: Sauce Demo
    • A mock e-commerce site widely used for testing
    • Fully mocked backend with publicly available test accounts
  • Goal: Minimize manual effort in E2E testing

Key Takeaway

It performs well but requires optimization.

The complete source code is available in this repository:

GitHub - browser-use E2E test automation

Strategy

I initially expected to fully automate E2E testing with a single prompt like "Test this site with E2E!" However, the process required structured steps:

  1. Extract the site structure using browser-use
  2. Generate test scenarios for each page
  3. Review and refine the generated scenarios manually
  4. Generate test code based on the reviewed scenarios
  5. Execute the generated test code

Step 1: Extracting the Site Structure

The first step was extracting the site's structure with browser-use to create a list of pages for testing.

Processing the entire site at once did not work effectively, likely due to LLM processing constraints. Handling each page separately improved stability.

Prompt Example

site_structure_task = f"""
Analyze the website starting from {url}. Identify and output:
1. All accessible pages and subpages within the domain ({url}), including dynamically loaded content.
2. Each page's purpose in concise terms.
3. Include:
   - Static links
   - JavaScript-driven links
   - Form submissions
   - API endpoints (if visible)
4. Group similar structured pages (e.g., query parameters like ?id=).

## Output JSON Format:
[
  { "path": "<path or URL>", "purpose": "<brief description>" },
  ...
]

## Login Information
- id: {user_id}
- password: {password}
"""
Enter fullscreen mode Exit fullscreen mode

Step 2: Generating Test Scenarios

Once the site structure was extracted, test scenarios were generated for each page. By generating scenarios in natural language first, I achieved:

  • Better manual review
  • More stable test code generation

Prompt Example

I can read Japanese more accurately, so I set it to Japanese as the scenario_language.

scenario_task = f"""
Generate exhaustive test scenarios for the following page:
- Page: {page_path}
  Purpose: {page_purpose}

For this page, include all possible user actions, such as:
  - Form submissions
  - Button clicks
  - Dropdown selections
  - Interactions with modals or dynamic elements

Test both expected behaviors and edge cases for each action.
Output format:
path: {page_path},
actions:
  - test: <description of action>,
    expect: <expected result>,
  - test: <description of action>,
    expect: <expected result>,

The output must be written in {scenario_language}.

## Root URL
{url}

## Login Information
- id: {user_id}
- password: {password}
"""
Enter fullscreen mode Exit fullscreen mode

Sample Output

path: /,
actions:
  - test: Enter correct username and password, then click login,
    expect: Redirect to user dashboard,
  - test: Leave username blank and attempt login,
    expect: Show error message,
  - test: Enter invalid username and attempt login,
    expect: Show error message,
Enter fullscreen mode Exit fullscreen mode

Step 3: Generating Test Code

Using the reviewed scenarios, Jest and Playwright-based test code was generated. Although running tests directly via LLM is possible, generating structured test code is more reliable and cost-effective.

Prompt Example

task = f"""
Generate Jest + Playwright test code for:
- URL: {url}
- Scenario: {scenario}

Ensure the output is fully executable without modification.
"""
Enter fullscreen mode Exit fullscreen mode

Generated Test Code Example

const { test, expect } = require('@playwright/test');

test.describe('Login Tests', () => {
  test('Valid login', async ({ page }) => {
    await page.goto('https://www.saucedemo.com/');
    await page.fill('input[name="user-name"]', 'standard_user');
    await page.fill('input[name="password"]', 'secret_sauce');
    await page.click('input[name="login-button"]');
    await expect(page).toHaveURL('https://www.saucedemo.com/inventory.html');
  });
});
Enter fullscreen mode Exit fullscreen mode

In this file, beforeEach is also created correctly

const { test, expect } = require('@playwright/test');

test.describe('Checkout Step One Tests', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('https://www.saucedemo.com/');
    await page.fill('#user-name', 'standard_user');
    await page.fill('#password', 'secret_sauce');
    await page.click('#login-button');
    await page.goto('https://www.saucedemo.com/checkout-step-one.html');
  });

  test('User fills in all fields correctly and clicks the Continue button', async ({ page }) => {
    await page.fill('#first-name', 'John');
    await page.fill('#last-name', 'Doe');
    await page.fill('#postal-code', '12345');
    await page.click('#continue');
    await expect(page).toHaveURL('https://www.saucedemo.com/checkout-step-two.html');
  });
});
Enter fullscreen mode Exit fullscreen mode

Test Execution & Results

The generated tests were executed, and 44 tests ran, with 18 failures. However, many failures were reasonable:

  1. Incorrect Expectations (5 tests)

    • Example: Expected an error when entering numbers in the name field, but the site allowed it.
    • The test scenarios were generated based on ideal behavior, but the actual behavior of the test site differed.
    • This is not an issue with the test code itself but rather a mismatch between the expected functionality and the actual implementation of the site.
  2. Test Runner Mismatch (12 tests)

    • The tests assumed Playwright’s runner, but Jest Circus was used, causing failures. This can be fixed by specifying the correct runner in the prompt.

Cost Analysis

Using OpenAI’s GPT-4o API, I ran multiple tests, totaling $7. However, with optimized prompts, the entire pipeline (site structure analysis → scenario generation → test code generation) costs under $1.

Conclusion

Manual E2E test writing is becoming obsolete. However, programming languages remain the best way to provide precise AI instructions—for now.

Top comments (0)