foxgem

Posted on Mar 9

Code Explanation: "OpenManus: An Autonomous Agent Platform"

#ai #llm #rag #aiagent

Disclaimer: this is a report generated with my tool: https://github.com/DTeam-Top/tsw-cli. See it as an experiment not a formal research, 😄。

Summary

The OpenManus repository provides a platform for developing and running autonomous agents that can perform tasks by utilizing various tools. It includes several agent implementations, a flexible tool system, planning capabilities, and different execution flows. The core idea is to create agents that can "think" (plan), "act" (execute tools), and "observe" (process results) to achieve complex goals.

Modules

app: Contains the core application logic, including agent definitions, tool implementations, configuration, and flow management.
config: Handles application configuration, including LLM settings.
agent: Defines the base agent class and specific agent implementations such as ReActAgent, ToolCallAgent, SWEAgent, Manus, and PlanningAgent.
llm: Manages interactions with Large Language Models (LLMs) like OpenAI or Azure OpenAI.
tool: Implements various tools that agents can use, such as a bash shell, web browser, Google Search, and file saver.
prompt: Contains prompt templates used by the agents for different tasks.
schema: Defines data schemas for messages, tool calls, and agent states.
flow: Defines execution flows for agents, including planning flows.
examples: Provides example use cases and associated files.
config: Provides configuration file management
.github: Stores the information for issues and pull requests

Code Structure

Configuration (app/config.py)

This module handles the application's configuration using a singleton pattern. It loads settings from a TOML file (config.toml or config.example.toml) and defines data classes (LLMSettings, AppConfig) to structure the configuration.

class Config:
    _instance = None
    _lock = threading.Lock()
    _initialized = False

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = super().__new__(cls)
        return cls._instance

    def __init__(self):
        if not self._initialized:
            with self._lock:
                if not self._initialized:
                    self._config = None
                    self._load_initial_config()
                    self._initialized = True

This implementation ensures that only one instance of the Config class exists throughout the application, preventing inconsistent configurations. The threading.Lock ensures thread-safe initialization in concurrent environments. The configuration is loaded only once during the first initialization. The LLM configurations support overrides, allowing different LLM settings for different agents or tasks. The PROJECT_ROOT and WORKSPACE_ROOT constants define the project's directory structure, which is essential for file operations performed by various tools.

Large Language Model Management (app/llm.py)

The LLM class manages interactions with Large Language Models (LLMs). It supports both OpenAI and Azure OpenAI, handling authentication, API calls, and message formatting.

class LLM:
    _instances: Dict[str, "LLM"] = {}

    def __new__(
        cls, config_name: str = "default", llm_config: Optional[LLMSettings] = None
    ):
        if config_name not in cls._instances:
            instance = super().__new__(cls)
            instance.__init__(config_name, llm_config)
            cls._instances[config_name] = instance
        return cls._instances[config_name]

It uses a singleton pattern to ensure that only one instance of the LLM class exists for each configuration name. It also includes retry logic with exponential backoff to handle API errors and rate limits, improving the reliability of LLM calls. The format_messages method ensures that messages are correctly formatted for the LLM API. It handles both dictionary and Message objects. The ask and ask_tool methods are the primary interfaces for interacting with the LLM. ask is used for standard text-based prompts, while ask_tool is used when tools/functions are involved.

Agents (app/agent/*.py)

The agent directory contains the base agent class (BaseAgent) and several agent implementations.

BaseAgent (app/agent/base.py): Defines the basic structure and functionality of an agent, including state management, memory management, and the execution loop.
- The state_context method uses a context manager to handle agent state transitions safely.
- The update_memory method adds messages to the agent's memory.
- The run method executes the agent's main loop, calling the step method repeatedly.
- The is_stuck method checks if the agent is stuck in a loop by detecting duplicate responses.
ReActAgent (app/agent/react.py): Extends BaseAgent and introduces the ReAct (Reasoning and Acting) paradigm. It requires subclasses to implement think and act methods.
ToolCallAgent (app/agent/toolcall.py): Extends ReActAgent and adds support for tool/function calls. It includes logic for selecting and executing tools based on the LLM's response.
- The think method uses llm.ask_tool to get a response from the LLM with tool call suggestions.
- The act method executes the selected tools and handles their results.
- The execute_tool method executes a single tool call with error handling.
SWEAgent (app/agent/swe.py): A specialized agent for software engineering tasks, extending ToolCallAgent. It includes tools like Bash and StrReplaceEditor for interacting with the file system and executing commands.
PlanningAgent (app/agent/planning.py): Extends ToolCallAgent and implements a planning mechanism. It uses a PlanningTool to create and manage plans, tracking progress through individual steps.
- The think method integrates the current plan status into the prompt.
- The act method updates the plan status after executing a step.
- The _get_current_step_index method parses the current plan to identify the next step.
Manus (app/agent/manus.py): A versatile general-purpose agent that uses planning to solve various tasks. It extends PlanningAgent with a comprehensive set of tools.

The agents use a combination of inheritance and composition. ToolCallAgent and PlanningAgent inherit from ReActAgent, providing a basic structure. They then use composition to include tools and memory.

Tools (app/tool/*.py)

The tool directory contains implementations of various tools that agents can use. Each tool inherits from BaseTool and implements the execute method.

BaseTool (app/tool/base.py): Defines the base class for all tools.
Bash (app/tool/bash.py): Executes bash commands in a sandboxed environment.
- The _BashSession class manages a single bash session, handling input, output, and timeouts.
- The execute method runs a command in the bash session.
CreateChatCompletion (app/tool/create_chat_completion.py): Creates a structured completion with specified output formatting.
GoogleSearch (app/tool/google_search.py): Performs a Google search and returns a list of relevant links using the googlesearch-python library.
Terminate (app/tool/terminate.py): Terminates the interaction.
PlanningTool (app/tool/planning.py): Allows agents to create and manage plans.
- It provides commands for creating, updating, listing, getting, setting active, marking step, and deleting plans.
- Plans are stored in memory within the PlanningTool instance.
PythonExecute (app/tool/python_execute.py): Executes Python code in a sandboxed environment using a thread with a timeout.
FileSaver (app/tool/file_saver.py): Saves content to a local file.
StrReplaceEditor (app/tool/str_replace_editor.py): Custom editing tool for viewing, creating and editing files with string replacement and insertion.
BrowserUseTool (app/tool/browser_use_tool.py): Interacts with a web browser to perform actions such as navigation, element interaction, and content extraction. It uses the browser-use library.

The ToolCollection class simplifies tool management by providing a way to group and access tools.

class ToolCollection:
    """A collection of defined tools."""

    def __init__(self, *tools: BaseTool):
        self.tools = tools
        self.tool_map = {tool.name: tool for tool in tools}

    def __iter__(self):
        return iter(self.tools)

    def to_params(self) -> List[Dict[str, Any]]:
        return [tool.to_param() for tool in self.tools]

    async def execute(
        self, *, name: str, tool_input: Dict[str, Any] = None
    ) -> ToolResult:
        tool = self.tool_map.get(name)
        if not tool:
            return ToolFailure(error=f"Tool {name} is invalid")
        try:
            result = await tool(**tool_input)
            return result
        except ToolError as e:
            return ToolFailure(error=e.message)

It allows tools to be added dynamically and provides a convenient way to execute tools by name. The to_params method converts tools into the format expected by the OpenAI function calling API.

Flows (app/flow/*.py)

The flow directory defines execution flows for agents.

BaseFlow (app/flow/base.py): Defines the base class for all flows.
PlanningFlow (app/flow/planning.py): Implements a flow that manages planning and execution of tasks using agents.
- The execute method orchestrates the planning and execution process.
- The _create_initial_plan method creates an initial plan using the LLM and PlanningTool.
- The _get_current_step_info method parses the current plan to identify the next step.
- The _execute_step method executes the current step with the specified agent.
- The _mark_step_completed method marks the current step as completed.
- The _get_plan_text method retrieves the current plan as formatted text.
- The _finalize_plan method finalizes the plan and provides a summary.

The FlowFactory class uses a factory pattern to create different types of flows.

class FlowFactory:
    """Factory for creating different types of flows with support for multiple agents"""

    @staticmethod
    def create_flow(
        flow_type: FlowType,
        agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]],
        **kwargs,
    ) -> BaseFlow:
        flows = {
            FlowType.PLANNING: PlanningFlow,
        }

        flow_class = flows.get(flow_type)
        if not flow_class:
            raise ValueError(f"Unknown flow type: {flow_type}")

        return flow_class(agents, **kwargs)

This pattern allows new flows to be added easily without modifying existing code.

Db Schema

There is no explicit database schema defined in the code. However, the PlanningTool stores plans in memory within a dictionary called plans. This dictionary maps plan IDs to plan data, which includes the plan title, steps, step statuses, and step notes.

External API Calls

OpenAI / Azure OpenAI: The LLM class makes calls to the OpenAI or Azure OpenAI API to generate text and tool call suggestions.
Google Search API (via googlesearch-python): The GoogleSearch tool uses the googlesearch-python library, which in turn makes calls to the Google Search API.
BrowserUseTool: The BrowserUseTool uses the browser-use library, which in turn makes calls to the Browser APIs.

Insights

Modularity and Extensibility: The codebase is highly modular, with clear separation of concerns. This makes it easy to add new agents, tools, and flows.
Flexibility: The platform supports different agent implementations, tool configurations, and execution flows, providing a high degree of flexibility.
Error Handling: The code includes robust error handling, with retry logic for API calls and exception handling for tool execution.
Singleton Pattern for Configuration: The use of a singleton pattern for configuration ensures that all modules use the same configuration settings.
Context Managers for State Management: The use of context managers for agent state transitions ensures that the agent's state is managed safely and consistently.
Factory Pattern for Flow Creation: The use of a factory pattern for flow creation allows new flows to be added easily without modifying existing code.
In-Memory Planning: The PlanningTool uses in-memory storage for plans, which is suitable for simple use cases but may not be scalable for more complex scenarios.
Tool-Based Architecture: The architecture emphasizes the use of tools for performing specific tasks. This allows the agents to leverage external resources and capabilities.
ReAct Paradigm: The use of the ReAct paradigm allows agents to reason about their actions and adapt their behavior based on observations.

The platform is designed to be extended and customized. New agents, tools, and flows can be added to support different tasks and use cases. The use of configuration files and environment variables allows the platform to be easily deployed in different environments. The platform has the potential to be used in a variety of applications, such as:

Autonomous Task Automation: Automating complex tasks that require planning, reasoning, and tool use.
Software Engineering: Assisting developers with tasks such as code generation, testing, and debugging.
Information Retrieval: Gathering and processing information from the web.
Personal Assistants: Building intelligent personal assistants that can perform tasks on behalf of users.

Report generated by TSW-X
Advanced Research Systems Division
Date: 2025-03-10 10:10:10.474678

DEV Community

Code Explanation: "OpenManus: An Autonomous Agent Platform"

Summary

Modules

Code Structure

Configuration (app/config.py)

Large Language Model Management (app/llm.py)

Agents (app/agent/*.py)

Tools (app/tool/*.py)

Flows (app/flow/*.py)

Db Schema

External API Calls

Insights

Top comments (0)

Read next

PremAI Autonomous Fine-tuning System: Technical Architecture Documentation

Mercury’s Diffusion Language Model Is Better than ChatGPT, Claude, DeepSeek, and Gemini?

AI-Powered Test Case Generation: The Future of Software Testing

Use Mistral Small 3 (24B) in Microsoft Word Locally. No Monthly Fees.