Sebastian

Posted on Sep 30

LLM Agents: Custom Tools in Autogen

#llm #autogen

Large Language Models used as agents promise automatic task solution and to promote LLM usage to the next level. Effectively, an agent is created with a specific and refined prompt, detailing task types, expected task solution behavior, constraints, and even linguistic tone. Tools are the necessary ingredients to make the agent effective for its tasks. But what are these tools? And how can they be added to an agent?

This article explores tool definition and invocation in the context of the Autogen framework. You will learn how to define custom tools and how to configure them for an agent or user proxy to be executed. You will also learn about the importance of system prompting to make the agent an effective tool user. And you will see the essential tricks how to steer tool usage in a conversation, preventing agents form just responding with the same tool invocation over and over.

Note: The following content is the result of hours of trying to get a local LLM function-calling ready and actually executing the function too. Or to put it another way: Although the Autogen documentation shows plenty examples for function calling, they are almost exclusively with an OpenAI ChatGPT connection which "just works". Understanding what’s going on required an equal measure of extensive try-and-error and digging into the autogen code base.

The technical context of this article is Python v3.11 and autogen v0.2.27. All code examples should work with newer library versions too, but may require code updates.

This article originally appeared at my blog admantium.com.

Required Libraries

You need to install the autogen library with the following command:

pip install autogen==0.2.27

Autogen supports many different LLM engines, see the compatible LLM documentation for further information. The essential requirement is to be OpenAI API compatible. My specific solution is to start an Ollama process as the LLM engine, and then to wrap it with an LiteLLM server to achieve 100% compatible OpenAI API.

To start the LiteLLM server, open a terminal and run the following code:

ollama serve
litellm --model ollama_chat/llama3

Tool Implementation

A tool is a Python function that exposes type information and is added to an agents definition with additional tool description information. To define such a function, it can be included anywhere in the source code, or alternatively expressed with a type annotation that gets picked up when Autogen runs.

Here is an example: A Python function that returns fetches Wikipedia article content and delivers their content as cleaned up wikitext.

def wikipedia_search(title: str) -> str:
  response = requests.get(
    "https://en.wikipedia.org/w/api.php",
    params={
      "action": "query",
      "format": "json",
      "titles": title,
      "prop": "revisions",
      "rvprop": "content",
    },
  ).json()
  page = next(iter(response["query"]["pages"].values()))
  wikicode = page["revisions"][0]["*"]
  parsed_wikicode = mwparserfromhell.parse(wikicode)
  content =  parsed_wikicode.strip_code()

  return content

However, the mere presence of a Python function in your code does not give any indication to the defined agents if and how it should be used. Two more additions need to be made.

Tool Configuration for Agents

The crucial point to understand function calling is this: An LLM returns structured JSON which are arguments to a function, but the code execution needs be done by another component. In Autogen, both agents and the user proxy object can be given the ability of code execution and/or function calling. This means that the autogen process itself will execute the code and return its result. As a best-practice security measure, this feature is usually defined for the user proxy object and can be further safe-guarded by requiring human input to allow the code execution.

To actually make Python functions accessible to Autogen agents as tools, at least two steps are necessary:

Specification: Add a formal definition of the tool as a part of the llm_config for an agent
Registration: Define which functions serve as tools, and detail which components can execute code
LLM awareness: An optional step that may be necessary for some LLM - inform the LLM in its system prompt about tools.

Specification

The specification is an OpenAI API JSON-like definition of a function, detailing input and output types, and other metadata. The Wikipedia search function requires this:

function_list = [
 {
  "name": "wikipedia_search",
  "description": "Perform a search on Wikipedia",
  "parameters": {
   "type": "object",
   "properties": {
     "title": {
      "type": "string",
      "description": "Name of the article to search for",
     }
   },
   "required": ["title"],
  },
 }
]

This list needs to be passed to an agent definition as shown:

agent = AssistantAgent(
 name="librarian",
 system_message=SYSTEM_PROMPT,
 human_input_mode="NEVER",
 llm_config={
  "functions": function_list,
  "config_list": config_list
 },
)

Registration

The second step is to register the function with the autogen framework itself. In the following example, the Wikipedia search function is defined with the agent as the caller, meaning it will suggest to call a function, and the user as the code executor.

register_function(
 wikipedia_search,
 caller=agent,
 executor=user,
 name="wikipedia_search",
 description="Perform a search on Wikipedia"
)

LLM Awareness (Optional)

An optional third step is to add the tool definition to an LLMs system prompt so in a very specific format so that they are "aware" of the tool. See the following two examples about Ollama LLMs that are specifically suited for function calling.

The natural-functions LLM uses the function definition format of the OpenAI API:

SYSTEM_PROMPT = """
You are a knowledgeable librarian that answers questions from your supervisor.

Functions:
{
 "name": "wikipedia_search",
 "description": "Perform a search on Wikipedia",
 "parameters": {
  "type": "object",
  "properties": {
   "title": {
    "type": "string",
    "description": "Name of the article to search for",
   }
  },
  "required": ["title"],
 },
}

In contrast, the nexusraven LLM requires a Python-like syntax:

SYSTEM_PROMPT = """
You are a knowledgeable librarian that answers questions from your supervisor.

Function:
 def wikipedia_search(title):
 '''
 Returns the content of a wikipedia article.

 Args:
 title (str): The name of the article.

 Returns:
 str: The content of the article.
 '''
"""

Complete Source Code

The complete source code for an agent to user chat with a Wikipedia search tool and using LiteLLM + Ollama as the backend service is shown here:

import tempfile

from autogen.agentchat import AssistantAgent, UserProxyAgent
from autogen.coding import LocalCommandLineCodeExecutor

import json

config_list = [
  {
    "model": "",
    "base_url": "http://localhost:4000", # lite-llm
    "api_key": "ollama",
  }
]

function_list = [
  {
    "name": "wikipedia_search",
    "description": "Perform a search on Wikipedia",
    "parameters": {
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "description": "Name of the article to search for",
        }
      },
      "required": ["title"],
    },
  }
]

SYSTEM_PROMPT = """
You are a knowledgeable librarian that answers questions from your supervisor.

For research tasks, only use the functions provided to you. Check the functions output and make your answer.

Constraints:
- Think step by step.
- Be accurate and precise.
- Answer briefly, in few words.
- Reflect on your answer, and if you think you are hallucinating, reformulate the answer.
- When you receive the result of a tool call, use it to respond to the supervisor, and then add the word "TERMINATE"
- Do not repeat yourself
"""

system_message = {"role": "system", "content": SYSTEM_PROMPT}

temp_dir = tempfile.TemporaryDirectory()

code_executor_config = LocalCommandLineCodeExecutor(
  timeout=30,
  work_dir=temp_dir.name,
)

agent = AssistantAgent(
  name="librarian",
  system_message=SYSTEM_PROMPT,
  human_input_mode="NEVER",
  llm_config={
    "functions": function_list,
    "config_list": config_list,
    "timeout": 280,
    "temperature": 0.2,
  },
)

user = UserProxyAgent(
  name="supervisor",
  human_input_mode="ALWAYS",
 max_consecutive_auto_reply=1,
 code_execution_config={"excutor": code_executor_config},
)

## Tools

import requests
import mwparserfromhell

def wikipedia_search(title: str) -> str:
  response = requests.get(
    "https://en.wikipedia.org/w/api.php",
    params={
      "action": "query",
      "format": "json",
      "titles": title,
      "prop": "revisions",
      "rvprop": "content",
    },
  ).json()
  page = next(iter(response["query"]["pages"].values()))
  wikicode = page["revisions"][0]["*"]
  parsed_wikicode = mwparserfromhell.parse(wikicode)
  content =  parsed_wikicode.strip_code()
  return json.dumps({"name": "wikipedia_search", "content": content})

from autogen.agentchat import register_function

register_function(
  wikipedia_search,
  caller=agent,
  executor=user,
  name="wikipedia_search",
  description="Perform a search on Wikipedia",
)

chat_result = user.initiate_chat(
  agent,
  message="Get the content of the Wikipedia page for 'BattleTech'. Then, summarize the page content.",
)

print(chat_result)

Invocation Example

Here is a complete conversation example:

supervisor (to librarian):

Get the content of the Wikipedia page for 'BattleTech'. Then, summarize the page content.

--------------------------------------------------------------------------------
librarian (to supervisor):

***** Suggested tool call (call_fbe96532-42ee-492c-976a-a183416cc998): wikipedia_search *****
Arguments:
{"title": "BattleTech"}
*********************************************************************************************

--------------------------------------------------------------------------------
Provide feedback to librarian. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING FUNCTION wikipedia_search...
supervisor (to librarian):

supervisor (to librarian):

***** Response from calling tool (call_fbe96532-42ee-492c-976a-a183416cc998) *****
"thumb|Former BattleTech logo\n\nBattleTech is a wargaming and military science fiction franchise launched by FASA Corporation in 1984, acquired by WizKids in 2001, which was in turn acquired by Topps in 2003; and published since 2007 by Catalyst Game Labs. The trademark is currently owned by Topps and, for video games, Microsoft's Xbox Game Studios; Catalyst Game Studios licenses the franchise from Topps.\n\nThe series began with FASA's debut of the board game BattleTech (originally named Battledroids) by Jordan Weisman and L. Ross Babcock III and has since grown to include numerous expansions to the original game, several board games, role playing games, video games, a collectible card game, a series of more than 100 novels, and an animated television series.\n\n Gameplay \n\nIn its most basic form, BattleTech is played on a map sheet composed of hexagonal terrain tiles.  The combat units are roughly  humanoid armored combat units called BattleMechs, powered by fusion reactors and armed with a variety of weapons. [...]"

librarian (to supervisor):
BattleTech is a science fiction franchise that began as a board game in 1984 and has since expanded to include various games, novels, and a TV series. Set in a distant future where humanity has colonized space, BattleTech revolves around conflicts between factions using giant humanoid combat units called BattleMechs. The franchise features a detailed timeline that diverges from real-world history in 1984. With feudalism widespread and technology blending futuristic and modern elements, BattleTech emphasizes conflict and warfare in its storytelling. Despite the advanced technology, humans remain the only sentient species in the universe.

Debugging Tool Usage

It is a tremendous challenge to get an Autogen agent with a local LLM to use tools consistently and only when necessary. Here is a list of frequent errors that I encountered and hints how to solve them:

Unfounded tool suggestion: In some cases, an LLM might always suggest tool invocation even when the task is not about a tool at all. This can happen when you pass the function_list to the llm_config, and it appears as if Autogen internally "motivates" the LLM to call a tool. The only remedy I found is a very thorough system prompt.
Tool suggestion and return ping-pong: Agents and the user proxy might just suggest and run functions indefinitely. This happens specifically when the user proxy object executes the code and returns the function result as-is to the agent. To influence this, you can lower the max_consecutive_auto_reply value, or change the human input method of the object that executes the function so that you can manually stop the conversation.
Message parsing errors: This category encompasses errors between Autogen and LiteLLM. One error reads {'message': "Expecting ':' delimiter: line 33 column 1 (char 44) - this can be solved by formatting tool output as JSON itself. Another error is 'message': 'Invalid control character at: line 1 column 13 - check the string encoding and cast it to UTF-8. You might also consider these two bug issues on Github: Ollama KeyError ‘name’ and Ollama functions in conversations.

Summary

This article provided a complete overview and practical utilization of tools in the Autogen framework. You learned and saw code snippets for these aspects: a) defining tools as Python functions, b) configuring the functions to be accessible and executable, c) making an agent aware of available tools, d) crafting effective prompts to ensure consistent and dependable tool usage, and e) debugging common tool usage errors. With this knowledge, you should be able to promote agents to effective tool users, and therefore to design agents with the tools that you specifically require.

DEV Community

LLM Agents: Custom Tools in Autogen

Required Libraries

Tool Implementation

Tool Configuration for Agents

Specification

Registration

LLM Awareness (Optional)

Complete Source Code

Invocation Example

Debugging Tool Usage

Summary

Top comments (0)

Read next

RAG Simplified!! 🐣

Understanding RAG (Part 5): Recommendations and wrap-up

Benchmarking Pixtral 12B: MistralAI's New VLM

PHP Library for Working with LLM, Agents and RAG