Introduction
End-to-end (E2E) testing is crucial for ensuring software quality, yet it is often costly. Writing and maintaining test scripts is time-consuming, and even minor DOM changes can cause test failures.
While some engineers enjoy writing tests, few specialize in E2E testing. The tech industry has responded with numerous automated and no-code E2E testing solutions, but these tools are often expensive and lack precision.
browser-use offers a new approach by automating browser interactions. Given its capabilities, I experimented with using it for E2E test automation.
Overview
In this experiment, I used the following tools and services:
- Tools: Python, browser-use, Playwright, Jest
-
Test site: Sauce Demo
- A mock e-commerce site widely used for testing
- Fully mocked backend with publicly available test accounts
- Goal: Minimize manual effort in E2E testing
Key Takeaway
It performs well but requires optimization.
The complete source code is available in this repository:
GitHub - browser-use E2E test automation
Strategy
I initially expected to fully automate E2E testing with a single prompt like "Test this site with E2E!" However, the process required structured steps:
- Extract the site structure using browser-use
- Generate test scenarios for each page
- Review and refine the generated scenarios manually
- Generate test code based on the reviewed scenarios
- Execute the generated test code
Step 1: Extracting the Site Structure
The first step was extracting the site's structure with browser-use to create a list of pages for testing.
Processing the entire site at once did not work effectively, likely due to LLM processing constraints. Handling each page separately improved stability.
Prompt Example
site_structure_task = f"""
Analyze the website starting from {url}. Identify and output:
1. All accessible pages and subpages within the domain ({url}), including dynamically loaded content.
2. Each page's purpose in concise terms.
3. Include:
- Static links
- JavaScript-driven links
- Form submissions
- API endpoints (if visible)
4. Group similar structured pages (e.g., query parameters like ?id=).
## Output JSON Format:
[
{ "path": "<path or URL>", "purpose": "<brief description>" },
...
]
## Login Information
- id: {user_id}
- password: {password}
"""
Step 2: Generating Test Scenarios
Once the site structure was extracted, test scenarios were generated for each page. By generating scenarios in natural language first, I achieved:
- Better manual review
- More stable test code generation
Prompt Example
I can read Japanese more accurately, so I set it to Japanese as the scenario_language.
scenario_task = f"""
Generate exhaustive test scenarios for the following page:
- Page: {page_path}
Purpose: {page_purpose}
For this page, include all possible user actions, such as:
- Form submissions
- Button clicks
- Dropdown selections
- Interactions with modals or dynamic elements
Test both expected behaviors and edge cases for each action.
Output format:
path: {page_path},
actions:
- test: <description of action>,
expect: <expected result>,
- test: <description of action>,
expect: <expected result>,
The output must be written in {scenario_language}.
## Root URL
{url}
## Login Information
- id: {user_id}
- password: {password}
"""
Sample Output
path: /,
actions:
- test: Enter correct username and password, then click login,
expect: Redirect to user dashboard,
- test: Leave username blank and attempt login,
expect: Show error message,
- test: Enter invalid username and attempt login,
expect: Show error message,
Step 3: Generating Test Code
Using the reviewed scenarios, Jest and Playwright-based test code was generated. Although running tests directly via LLM is possible, generating structured test code is more reliable and cost-effective.
Prompt Example
task = f"""
Generate Jest + Playwright test code for:
- URL: {url}
- Scenario: {scenario}
Ensure the output is fully executable without modification.
"""
Generated Test Code Example
const { test, expect } = require('@playwright/test');
test.describe('Login Tests', () => {
test('Valid login', async ({ page }) => {
await page.goto('https://www.saucedemo.com/');
await page.fill('input[name="user-name"]', 'standard_user');
await page.fill('input[name="password"]', 'secret_sauce');
await page.click('input[name="login-button"]');
await expect(page).toHaveURL('https://www.saucedemo.com/inventory.html');
});
});
In this file, beforeEach is also created correctly
const { test, expect } = require('@playwright/test');
test.describe('Checkout Step One Tests', () => {
test.beforeEach(async ({ page }) => {
await page.goto('https://www.saucedemo.com/');
await page.fill('#user-name', 'standard_user');
await page.fill('#password', 'secret_sauce');
await page.click('#login-button');
await page.goto('https://www.saucedemo.com/checkout-step-one.html');
});
test('User fills in all fields correctly and clicks the Continue button', async ({ page }) => {
await page.fill('#first-name', 'John');
await page.fill('#last-name', 'Doe');
await page.fill('#postal-code', '12345');
await page.click('#continue');
await expect(page).toHaveURL('https://www.saucedemo.com/checkout-step-two.html');
});
});
Test Execution & Results
The generated tests were executed, and 44 tests ran, with 18 failures. However, many failures were reasonable:
-
Incorrect Expectations (5 tests)
- Example: Expected an error when entering numbers in the name field, but the site allowed it.
- The test scenarios were generated based on ideal behavior, but the actual behavior of the test site differed.
- This is not an issue with the test code itself but rather a mismatch between the expected functionality and the actual implementation of the site.
-
Test Runner Mismatch (12 tests)
- The tests assumed Playwright’s runner, but Jest Circus was used, causing failures. This can be fixed by specifying the correct runner in the prompt.
Cost Analysis
Using OpenAI’s GPT-4o API, I ran multiple tests, totaling $7. However, with optimized prompts, the entire pipeline (site structure analysis → scenario generation → test code generation) costs under $1.
Conclusion
Manual E2E test writing is becoming obsolete. However, programming languages remain the best way to provide precise AI instructions—for now.
Top comments (0)