DEV Community

Alain Airom
Alain Airom

Posted on

Troubleshooting steps for “Build OpenAI-like web agents using IBM watsonx.ai”

Image description

This article describes my hands-on experience trying to “Build OpenAI-like web agents using IBM watsonx.ai” based on the article from IBM developer site!

Disclaimer: the initial code and content are from the article “Build OpenAI-like web agents using IBM watsonx.ai” on IBM Developer site.

Introduction

Being a very curious and hand-on person, I tried to run on my own the article mentioned above. I do not develop the article’s content here, but I focus on how I made it run on my environment and how to overcome to technical problems if someone else encounters them as well.

The important point of the article is (quotes from the original writer);

**“Operator is OpenAI’s new AI-driven web agent designed to perform repetitive and mundane web browsing tasks.

Operator have the following features:

  • Task automation: Handles travel bookings, shopping, and reservations.
  • Browser emulation: Operates in a dedicated browser window for safety and privacy.
  • Human-like interaction: Uses buttons, menus, and forms to mimic human behavior.
  • Model integration: Powered by GPT-4 Vision for reasoning and understanding.
  • Ethical boundaries: Asks for user confirmation before completing external tasks.
  • Limitations: Acknowledges when it’s stuck or cannot complete a sensitive task.”**

So to discover it on my own I tried to reproduce the sample code provided. As the code couldn’t run on my environment I had to proceed as what follows.

Code troubleshooting and execution

First I create my own “.env” file with the required watsonx.ai keys.

WATSONX_API_KEY="your_api_key"
WATSONX_PROJECT_ID="your_projet_id"
Enter fullscreen mode Exit fullscreen mode

In order to install the required Python libraries Python 3.12 is required.

# 
pip install langchain-ibm python-dotenv browser-use
#
Enter fullscreen mode Exit fullscreen mode

As very recently I upgraded globally from Python 10 to Python 3.13, I had to create a virtual python environment very specific to the code here. Hereafter my steps 👇.

brew install pyenv-virtualenv
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
source ~/.zshrc (or source ~/.bashrc)

pyenv install 3.12.0
pyenv virtualenv --help
pyenv virtualenv 3.12.0 my-langchain-env-312
pyenv activate my-langchain-env-312

pip install langchain-ibm
pip install browser-use
# or pip install -r requirements.txt (if you make one)
Enter fullscreen mode Exit fullscreen mode

I made some slight modifications to the original code👇.

import os
import asyncio
from dotenv import load_dotenv
from langchain_ibm import ChatWatsonx
from browser_use import Agent

load_dotenv()

WATSONX_URL = "https://us-south.ml.cloud.ibm.com"
WATSONX_API_KEY = os.getenv("WATSONX_API_KEY")
WATSONX_PROJECT_ID = os.getenv("WATSONX_PROJECT_ID")
os.environ["WATSONX_APIKEY"] = WATSONX_API_KEY

def initialize_chat_model():
    return ChatWatsonx(
        model_id="meta-llama/llama-3-2-90b-vision-instruct",
        url=WATSONX_URL,
        project_id=WATSONX_PROJECT_ID,
    )

async def run_web_agent(task, llm):
    try:
        agent = Agent(
            task=task,
            llm=llm,
        )
        result = await agent.run()
        print("Extracted Content:")
        print(result.extracted_content)
        return result
    except Exception as e:
        print(f"Error running the web agent: {e}")

async def main():
    llm = initialize_chat_model()
    task = "Google search OpenAI Operator, click on the first link, and return key features."
    await run_web_agent(task, llm)  # You can await here, inside the async function

if __name__ == "__main__":
    asyncio.run(main())  # Correct way to run the async main() function
Enter fullscreen mode Exit fullscreen mode

Running the code, it worked as a charm 😉

INFO     [agent] 🚀 Starting task: Google search OpenAI Operator, click on the first link, and return key features.
INFO     [agent] 📍 Step 1
INFO     [ibm_watsonx_ai.wml_resource] Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-02-05'
INFO     [agent] 👍 Eval: Success openid
INFO     [agent] 🧠 Memory: Created new tab with new link
INFO     [agent] 🎯 Next goal: Click all weird popups and close them. The popup should be easy to click. After that go to https://www.openai.com and click on the first link of the result
INFO     [agent] 🛠️  Action 1/1: {"open_tab":{"url":"https://www.google.com"}}
INFO     [controller] 🔗  Opened new tab with https://www.google.com
...
Enter fullscreen mode Exit fullscreen mode

Image description

Conclusion

In this article I went through running and troubleshooting and OpenAI web agent by using watsonx!

Thanks for reading 😎

Useful links

Top comments (0)