Overview
A browser agent receives instruction (e.g., “Open example.com and get the page title”) and then performs browser actions—click, type, navigate, scrape—on a remote or headless browser. The steps below let you:
Use Vercel AI SDK + Anthropic or OpenAI to interpret the user’s instruction.
Call “Browser Use (Open)” or Multi-On to automate a browser session in the cloud, without installing heavy local dependencies.
Return the browser’s output (like scraped text) to the user in a Next.js front end.
We’ll also highlight a Vercel Template to help you get the UI running fast, plus an example using Llama-2-7b-chat on Replicate if you prefer open-source.
Ingredients (Est. 15 min)
-
Next.js (13+ recommended)
- Quick Start: create-next-app
- Or use a Vercel Template such as
-
Vercel AI SDK
- Docs: sdk.vercel.ai
-
LLM
- Default: Anthropic or OpenAI
- Alternative (Open Source): Llama-2-7b-chat on Replicate
- Browser Automation
(Est. 15 min) to sign up for any LLM service, select a Next.js template on Vercel, and install dependencies.
Step 1: Set Up Your Next.js Project (Est. 15 min)
Option A: Create from scratch
npx create-next-app browser-agent
cd browser-agent
Option B: Use a Vercel Template
- Visit vercel.com/templates.
- Select a Next.js starter (e.g. “Next.js 13 Minimal Starter”).
- Click “Deploy,” then clone it locally (or directly code in the Vercel environment).
Install the Vercel AI SDK
npm install @vercel/ai
Choose Your Browser Framework
- Browser Use (Open):
npm install browser-use
- Or Multi-On Cookbook (replace references accordingly in the code below).
Add Environment Variables
Create .env.local
:
ANTHROPIC_API_KEY=your_anthropic_key
# OR
OPENAI_API_KEY=your_openai_key
# If using Llama-2-7b-chat on Replicate:
REPLICATE_API_TOKEN=...
(Est. 15 min) for setting up the template, installing deps, and adding environment variables.
Step 2: Browser Agent API Route (Est. 30 min)
Create a route in Next.js 13 at app/api/agent/route.js
(or pages/api/agent.js
if using Next.js 12). If you used a Vercel Template, just add this file:
// app/api/agent/route.js
import { NextResponse } from 'next/server';
import { launch } from 'browser-use';
//-------------------------------------------------------------------
// 1. CHOOSE YOUR LLM OPTION
//-------------------------------------------------------------------
// Option A: Anthropic/OpenAI with Vercel AI SDK
import { createLLMService } from '@vercel/ai';
// Option B: Llama-2-7b-chat via Replicate (custom fetch)
async function fetchLlama2Replicate(userPrompt) {
const REPLICATE_URL = 'https://api.replicate.com/v1/predictions';
const response = await fetch(REPLICATE_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Token ${process.env.REPLICATE_API_TOKEN}`,
},
body: JSON.stringify({
version: 'replicate/llama-2-7b-chat:latest', // see replicate for the version
input: {
prompt: userPrompt,
},
}),
});
const replicateData = await response.json();
// You might need to poll for completion; depends on replicate's flow
// The actual text might be in replicateData.output or a sub-object
const text = replicateData?.output ?? 'No response yet';
return text;
}
//-------------------------------------------------------------------
// 2. MAIN HANDLER
//-------------------------------------------------------------------
export async function POST(request) {
try {
const { userPrompt, useReplicate } = await request.json();
// If user wants to use Llama-2-7b-chat on Replicate:
if (useReplicate) {
const replicateResponse = await fetchLlama2Replicate(`
The user says: "${userPrompt}".
Please output a JSON with:
{"url":"https://...","action":"scrapeText","selector":"..."}
`);
// parse replicateResponse to get instructions
let instructions;
try {
instructions = JSON.parse(replicateResponse.trim());
} catch {
instructions = { url: "", action: "none", selector: "", inputText: "" };
}
return await handleBrowserActions(instructions);
} else {
// Otherwise, Anthropic/OpenAI with Vercel AI SDK
const llm = createLLMService({
provider: 'anthropic', // or 'openai'
apiKey: process.env.ANTHROPIC_API_KEY, // or OPENAI_API_KEY
});
const llmResponse = await llm.generate({
model: 'claude-instant-v1', // or 'gpt-3.5-turbo'
prompt: `
The user says: "${userPrompt}".
Please output a JSON with:
{
"url": "https://...",
"action": "scrapeText" or "click" or "fillForm",
"selector": "...",
"inputText": "..."
}
If not sure, set "action":"none".
`,
max_tokens: 150,
});
// parse LLM response
let instructions;
try {
instructions = JSON.parse(llmResponse.trim());
} catch {
instructions = { url: "", action: "none", selector: "", inputText: "" };
}
return await handleBrowserActions(instructions);
}
} catch (error) {
console.error(error);
return NextResponse.json({ success: false, error: error.message }, { status: 500 });
}
}
//-------------------------------------------------------------------
// 3. HELPER to launch browser & perform actions
//-------------------------------------------------------------------
async function handleBrowserActions(instructions) {
const { page, browser } = await launch();
await page.goto(instructions.url || 'https://example.com');
let result;
switch (instructions.action) {
case 'scrapeText':
result = await page.textContent(instructions.selector || 'h1');
break;
case 'click':
await page.click(instructions.selector || 'body');
result = 'Clicked the element!';
break;
case 'fillForm':
if (instructions.inputText) {
await page.fill(instructions.selector, instructions.inputText);
result = `Filled form with: ${instructions.inputText}`;
} else {
result = `No inputText provided.`;
}
break;
default:
result = 'No recognized action or action was "none". Did nothing.';
break;
}
await browser.close();
return NextResponse.json({ success: true, instructions, result });
}
Notes
-
useReplicate
: We added this field so you can toggle between a Replicate Llama-2 call or the default Anthropic/OpenAI approach with minimal code changes. - If you only want Llama-2-7b-chat on Replicate, remove the Anthropic/OpenAI code and rely on
fetchLlama2Replicate
.
(Est. 30 min) to write, test, and debug this route.
Step 3: Frontend UI (Est. 15 min)
If you used a Vercel Template with a default homepage, you can replace or edit that page. For Next.js 13:
"use client";
import { useState } from "react";
export default function Home() {
const [userPrompt, setUserPrompt] = useState("");
const [agentOutput, setAgentOutput] = useState("");
const [useReplicate, setUseReplicate] = useState(false);
async function handleSubmit() {
setAgentOutput("Loading...");
const res = await fetch("/api/agent", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ userPrompt, useReplicate }),
});
const data = await res.json();
if (data.success) {
setAgentOutput(
`Action done!\n\nInstructions: ${JSON.stringify(
data.instructions
)}\nResult: ${data.result}`
);
} else {
setAgentOutput(`Error: ${data.error}`);
}
}
return (
<main style={{ padding: 20 }}>
<h1>Browser Agent Demo</h1>
<p>Try: "Go to https://example.com and scrapeText at 'h1'."</p>
<input
style={{ width: 300 }}
value={userPrompt}
onChange={(e) => setUserPrompt(e.target.value)}
/>
<div style={{ marginTop: 10 }}>
<label>
<input
type="checkbox"
checked={useReplicate}
onChange={(e) => setUseReplicate(e.target.checked)}
/>
Use Llama-2-7b-chat on Replicate
</label>
</div>
<button style={{ marginTop: 10 }} onClick={handleSubmit}>Run</button>
<pre style={{ marginTop: 20 }}>{agentOutput}</pre>
</main>
);
}
(Est. 15 min) to build a simple input form, fetch the API, and display results.
Step 4: Deploy to Vercel (Est. 15 min)
- Push to GitHub (or GitLab).
- Create a New Project on Vercel (if you didn’t do so at the template stage).
- In Project Settings → Environment Variables, add:
-
ANTHROPIC_API_KEY
orOPENAI_API_KEY
- If using Llama-2-7b-chat on Replicate, add
REPLICATE_API_TOKEN
.
-
- Deploy.
You’ll get a production URL like https://browser-agent.vercel.app
.
(Est. 15 min) to finalize environment variables and deployment.
Step 5: Example Use Case — Get Headline from Hacker News
Now that your Browser Agent is running, let’s try a real-world example. We’ll scrape the latest headline from the front page of Hacker News.
-
Go to Your Deployed Site
- e.g.,
https://browser-agent.vercel.app
- e.g.,
Enter a Prompt:
Go to https://news.ycombinator.com and scrapeText at '.titleline a'
This instructs the LLM to generate JSON instructions like:
{
"url": "https://news.ycombinator.com",
"action": "scrapeText",
"selector": ".titleline a"
}
-
Agent Executes
- The server route interprets instructions, launches a remote browser, navigates to Hacker News, and scrapes
.titleline a
.
- The server route interprets instructions, launches a remote browser, navigates to Hacker News, and scrapes
-
Response
- The JSON returned might look like:
{ "success": true, "instructions": { "url": "https://news.ycombinator.com", "action": "scrapeText", "selector": ".titleline a" }, "result": "Example Headline from HN" }
- Your UI shows “Action done!” plus the scraped headline.
More Potential Browser Agent Use Cases
Once you have a Browser Agent, you can easily add new tasks. For example:
Auto-Form Filling
“Go to example.com/login, fillForm with username: myUser, password: myPass, then click on ‘#submitBtn’.”Price Comparison
Scrape prices from multiple e-commerce sites to find the best deal, then combine the results.Content Extraction
Collect blog post titles, meta descriptions, or images across a list of websites.Email Reading (in Webmail)
Navigate to Gmail/Outlook web UI, log in, parse unread messages, maybe even respond.Automated Testing
Provide a test scenario, e.g., “Go to my staging URL, fill the form with test data, confirm success message.”
Time Summary & Final Notes
- Initial Setup: (Est. 15 min)
- API Route: (Est. 30 min)
- Front End: (Est. 15 min)
- Deployment: (Est. 15 min)
Total: ~1.5 hours for a basic functional version!
Congrats! You now have a Browser Agent that can interpret user prompts, run tasks in a headless browser, and return results—all from a Next.js front end deployed on Vercel. You’ve even tested scraping headlines from Hacker News and seen how easy it is to integrate Llama-2-7b-chat on Replicate. Next, explore multi-step flows, advanced UI, or a different use case to take your agent even further.
Top comments (1)
awesome!