DEV Community

Cover image for Starter Guide: Browser Agents
Emil Anker
Emil Anker

Posted on

Starter Guide: Browser Agents

Overview

A browser agent receives instruction (e.g., “Open example.com and get the page title”) and then performs browser actions—click, type, navigate, scrape—on a remote or headless browser. The steps below let you:

  1. Use Vercel AI SDK + Anthropic or OpenAI to interpret the user’s instruction.

  2. Call “Browser Use (Open)” or Multi-On to automate a browser session in the cloud, without installing heavy local dependencies.

  3. Return the browser’s output (like scraped text) to the user in a Next.js front end.

We’ll also highlight a Vercel Template to help you get the UI running fast, plus an example using Llama-2-7b-chat on Replicate if you prefer open-source.


Ingredients (Est. 15 min)

  1. Next.js (13+ recommended)
  2. Vercel AI SDK
  3. LLM
  4. Browser Automation

(Est. 15 min) to sign up for any LLM service, select a Next.js template on Vercel, and install dependencies.


Step 1: Set Up Your Next.js Project (Est. 15 min)

Option A: Create from scratch

npx create-next-app browser-agent
cd browser-agent
Enter fullscreen mode Exit fullscreen mode

Option B: Use a Vercel Template

  1. Visit vercel.com/templates.
  2. Select a Next.js starter (e.g. “Next.js 13 Minimal Starter”).
  3. Click “Deploy,” then clone it locally (or directly code in the Vercel environment).

Install the Vercel AI SDK

npm install @vercel/ai
Enter fullscreen mode Exit fullscreen mode

Choose Your Browser Framework

  • Browser Use (Open):
  npm install browser-use
Enter fullscreen mode Exit fullscreen mode
  • Or Multi-On Cookbook (replace references accordingly in the code below).

Add Environment Variables

Create .env.local:

ANTHROPIC_API_KEY=your_anthropic_key
# OR
OPENAI_API_KEY=your_openai_key

# If using Llama-2-7b-chat on Replicate:
REPLICATE_API_TOKEN=...
Enter fullscreen mode Exit fullscreen mode

(Est. 15 min) for setting up the template, installing deps, and adding environment variables.


Step 2: Browser Agent API Route (Est. 30 min)

Create a route in Next.js 13 at app/api/agent/route.js (or pages/api/agent.js if using Next.js 12). If you used a Vercel Template, just add this file:

// app/api/agent/route.js
import { NextResponse } from 'next/server';
import { launch } from 'browser-use'; 

//-------------------------------------------------------------------
// 1. CHOOSE YOUR LLM OPTION
//-------------------------------------------------------------------

// Option A: Anthropic/OpenAI with Vercel AI SDK
import { createLLMService } from '@vercel/ai';

// Option B: Llama-2-7b-chat via Replicate (custom fetch)
async function fetchLlama2Replicate(userPrompt) {
  const REPLICATE_URL = 'https://api.replicate.com/v1/predictions';
  const response = await fetch(REPLICATE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Token ${process.env.REPLICATE_API_TOKEN}`,
    },
    body: JSON.stringify({
      version: 'replicate/llama-2-7b-chat:latest', // see replicate for the version
      input: {
        prompt: userPrompt,
      },
    }),
  });
  const replicateData = await response.json();
  // You might need to poll for completion; depends on replicate's flow
  // The actual text might be in replicateData.output or a sub-object
  const text = replicateData?.output ?? 'No response yet';
  return text;
}

//-------------------------------------------------------------------
// 2. MAIN HANDLER
//-------------------------------------------------------------------

export async function POST(request) {
  try {
    const { userPrompt, useReplicate } = await request.json();

    // If user wants to use Llama-2-7b-chat on Replicate:
    if (useReplicate) {
      const replicateResponse = await fetchLlama2Replicate(`
        The user says: "${userPrompt}".
        Please output a JSON with:
        {"url":"https://...","action":"scrapeText","selector":"..."}
      `);
      // parse replicateResponse to get instructions
      let instructions;
      try {
        instructions = JSON.parse(replicateResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    } else {
      // Otherwise, Anthropic/OpenAI with Vercel AI SDK
      const llm = createLLMService({
        provider: 'anthropic', // or 'openai'
        apiKey: process.env.ANTHROPIC_API_KEY, // or OPENAI_API_KEY
      });

      const llmResponse = await llm.generate({
        model: 'claude-instant-v1', // or 'gpt-3.5-turbo'
        prompt: `
          The user says: "${userPrompt}".
          Please output a JSON with:
          {
            "url": "https://...",
            "action": "scrapeText" or "click" or "fillForm",
            "selector": "...",
            "inputText": "..."
          }
          If not sure, set "action":"none".
        `,
        max_tokens: 150,
      });

      // parse LLM response
      let instructions;
      try {
        instructions = JSON.parse(llmResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    }
  } catch (error) {
    console.error(error);
    return NextResponse.json({ success: false, error: error.message }, { status: 500 });
  }
}

//-------------------------------------------------------------------
// 3. HELPER to launch browser & perform actions
//-------------------------------------------------------------------
async function handleBrowserActions(instructions) {
  const { page, browser } = await launch();
  await page.goto(instructions.url || 'https://example.com');

  let result;
  switch (instructions.action) {
    case 'scrapeText':
      result = await page.textContent(instructions.selector || 'h1');
      break;
    case 'click':
      await page.click(instructions.selector || 'body');
      result = 'Clicked the element!';
      break;
    case 'fillForm':
      if (instructions.inputText) {
        await page.fill(instructions.selector, instructions.inputText);
        result = `Filled form with: ${instructions.inputText}`;
      } else {
        result = `No inputText provided.`;
      }
      break;
    default:
      result = 'No recognized action or action was "none". Did nothing.';
      break;
  }

  await browser.close();
  return NextResponse.json({ success: true, instructions, result });
}
Enter fullscreen mode Exit fullscreen mode

Notes

  • useReplicate: We added this field so you can toggle between a Replicate Llama-2 call or the default Anthropic/OpenAI approach with minimal code changes.
  • If you only want Llama-2-7b-chat on Replicate, remove the Anthropic/OpenAI code and rely on fetchLlama2Replicate.

(Est. 30 min) to write, test, and debug this route.


Step 3: Frontend UI (Est. 15 min)

If you used a Vercel Template with a default homepage, you can replace or edit that page. For Next.js 13:

"use client";
import { useState } from "react";

export default function Home() {
  const [userPrompt, setUserPrompt] = useState("");
  const [agentOutput, setAgentOutput] = useState("");
  const [useReplicate, setUseReplicate] = useState(false);

  async function handleSubmit() {
    setAgentOutput("Loading...");
    const res = await fetch("/api/agent", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ userPrompt, useReplicate }),
    });
    const data = await res.json();
    if (data.success) {
      setAgentOutput(
        `Action done!\n\nInstructions: ${JSON.stringify(
          data.instructions
        )}\nResult: ${data.result}`
      );
    } else {
      setAgentOutput(`Error: ${data.error}`);
    }
  }

  return (
    <main style={{ padding: 20 }}>
      <h1>Browser Agent Demo</h1>
      <p>Try: "Go to https://example.com and scrapeText at 'h1'."</p>
      <input
        style={{ width: 300 }}
        value={userPrompt}
        onChange={(e) => setUserPrompt(e.target.value)}
      />
      <div style={{ marginTop: 10 }}>
        <label>
          <input
            type="checkbox"
            checked={useReplicate}
            onChange={(e) => setUseReplicate(e.target.checked)}
          />
          Use Llama-2-7b-chat on Replicate
        </label>
      </div>
      <button style={{ marginTop: 10 }} onClick={handleSubmit}>Run</button>
      <pre style={{ marginTop: 20 }}>{agentOutput}</pre>
    </main>
  );
}
Enter fullscreen mode Exit fullscreen mode

(Est. 15 min) to build a simple input form, fetch the API, and display results.


Step 4: Deploy to Vercel (Est. 15 min)

  1. Push to GitHub (or GitLab).
  2. Create a New Project on Vercel (if you didn’t do so at the template stage).
  3. In Project Settings → Environment Variables, add:
    • ANTHROPIC_API_KEY or OPENAI_API_KEY
    • If using Llama-2-7b-chat on Replicate, add REPLICATE_API_TOKEN.
  4. Deploy.

You’ll get a production URL like https://browser-agent.vercel.app.

(Est. 15 min) to finalize environment variables and deployment.


Step 5: Example Use Case — Get Headline from Hacker News

Now that your Browser Agent is running, let’s try a real-world example. We’ll scrape the latest headline from the front page of Hacker News.

  1. Go to Your Deployed Site

    • e.g., https://browser-agent.vercel.app
  2. Enter a Prompt:

   Go to https://news.ycombinator.com and scrapeText at '.titleline a'
Enter fullscreen mode Exit fullscreen mode

This instructs the LLM to generate JSON instructions like:

   {
     "url": "https://news.ycombinator.com",
     "action": "scrapeText",
     "selector": ".titleline a"
   }
Enter fullscreen mode Exit fullscreen mode
  1. Agent Executes

    • The server route interprets instructions, launches a remote browser, navigates to Hacker News, and scrapes .titleline a.
  2. Response

    • The JSON returned might look like:
     {
       "success": true,
       "instructions": {
         "url": "https://news.ycombinator.com",
         "action": "scrapeText",
         "selector": ".titleline a"
       },
       "result": "Example Headline from HN"
     }
    
  • Your UI shows “Action done!” plus the scraped headline.

More Potential Browser Agent Use Cases

Once you have a Browser Agent, you can easily add new tasks. For example:

  1. Auto-Form Filling

    “Go to example.com/login, fillForm with username: myUser, password: myPass, then click on ‘#submitBtn’.”

  2. Price Comparison

    Scrape prices from multiple e-commerce sites to find the best deal, then combine the results.

  3. Content Extraction

    Collect blog post titles, meta descriptions, or images across a list of websites.

  4. Email Reading (in Webmail)

    Navigate to Gmail/Outlook web UI, log in, parse unread messages, maybe even respond.

  5. Automated Testing

    Provide a test scenario, e.g., “Go to my staging URL, fill the form with test data, confirm success message.”


Time Summary & Final Notes

  • Initial Setup: (Est. 15 min)
  • API Route: (Est. 30 min)
  • Front End: (Est. 15 min)
  • Deployment: (Est. 15 min)

Total: ~1.5 hours for a basic functional version!

Congrats! You now have a Browser Agent that can interpret user prompts, run tasks in a headless browser, and return results—all from a Next.js front end deployed on Vercel. You’ve even tested scraping headlines from Hacker News and seen how easy it is to integrate Llama-2-7b-chat on Replicate. Next, explore multi-step flows, advanced UI, or a different use case to take your agent even further.

Top comments (1)

Collapse
 
vitorbruno404 profile image
vitorbruno404

awesome!