Ivan Chen for ImagineX

Posted on Jan 16 • Edited on Jan 29

Building AI Agents from Scratch - Basics, Options, and Considerations

#ai #agentaichallenge #aiops

Recently, AI agents have gained significant attention due to their ability to autonomously perceive and act upon their environment to achieve specific goals. AI agents are software systems that leverage AI models to make decisions, plan actions, and adapt to changing conditions. They can handle various tasks, from simple queries to complex problem-solving, and interact with external systems to achieve specific objectives.

In this article, we will provide a guide on how to build an AI agent from scratch to do basic arithmetic operation and string formatting by using the AWS Bedrock service. It covers the following key sections:

the key components that would be developed,
the options available for each component,
and the considerations to keep in mind while building an AI agent.

The complete code is available at this GitHub repository.

Steps to Build an AI Agent

1. Define the Agent's Purpose and Objectives

Before building an AI agent, it's essential to define its purpose and objectives. This involves identifying the tasks the agent will perform, the problems it will solve, and the goals it will achieve. By clearly defining the agent's purpose, you can ensure that it aligns with your organization's needs and objectives. According to the blog post AI Agents — A Software Engineer’s Overview, AI agents are not silver bullets that can solve all problems. They are tools that can be used to solve specific problems. There are some use cases where AI agents can be particularly effective, such as customer support handling unique requests or conversational shopping assistants, which need to handle a wide range of user queries and provide personalized responses. On the other hand, there are also use cases where AI agents may not be the best solution, such as simple, repetitive tasks that can be easily automated using traditional programming techniques, like regular automated data downloading or file processing.

2. Build Core AI Agent Components

An AI agent usually consists of four fundamental components: model, tools, memory, and planning, shown in the diagram below:

Figure: AI Agent Key Components

With the AI agent architecture in mind, let us build the core components of an AI agent step by step.

Environment Setup

First, we need to setup the environment and install the necessary libraries. We are going to use the AWS Bedrock Claude 3.5 Haiku model as an example. The following Python code provides a guide of how to install the required libraries using pip:

pip install boto3 python-dotenv

Then we are setting up the AWS keys in a .env file, which is used to store the AWS credentials:

AWS_ACCESS_KEY_ID="your aws access key id"
AWS_SECRET_ACCESS_KEY="your aws secret access key"
AWS_REGION="your aws region or default of us-east-1"

And then set up the Python environment by loading necessary libraries and the AWS credentials:

from dotenv import load_dotenv
import boto3
import logging
import json
from botocore.exceptions import ClientError
import pprint
import re
import ast

load_dotenv()

Model Selection

An AI Model acts as the central decision-maker for agent processes. It can be one or multiple language models (LMs) that are capable of understanding and generating text, images, or other data types. In this article, we are going to use the Claude 3.5 Haiku model hosting on AWS Bedrock due to its great capabilities, privacy, and relative cheap cost. We can use the following Python code to define the model name variable:

MODEL_NAME = "us.anthropic.claude-3-5-haiku-20241022-v1:0"

There are many other AI models available in the market, such as GPT-4o, Llama, and others. The main factors to consider when selecting an AI model are the model's capabilities, performance, context length, speed, pricing, and availability. The model's capabilities refer to the tasks it can perform, such as question-answering, translation, summarization, and others. The performance refers to the model's accuracy, consistency, and reliability in completing tasks. The context length refers to the maximum number of tokens the model can process in a single input. The speed refers to the model's latency, throughput, and response time. The pricing refers to the cost of using the model, which can vary based on the model provider, usage, and features. The availability refers to the model's accessibility, support, and maintenance.

The following figure shows the market share, which can be considered as a big factor on model availability. The top three model providers are OpenAI, Google, and Anthropic. All of them are providing close-source models with different capabilities and pricing. The top open-source model provider is Meta, following with Mistral and Cohere. Their models can be easily used and access via Huggingface.co for free.

Figure: AI Agent Model Market Share (Source: State of AI Agents)

And the following figure shows the model usage based on industry use cases.

Figure: LLM Model Use Selection (Source: State of AI Agents)

Based on the benchmark results, both Anthropic's and OpenAI's models demonstrate strong capabilities in technical tasks. Both models show particularly impressive performance in mathematical reasoning (MATH) and question-answering tasks (SimpleQA), with scores ranging from 67-70% for MATH and 88-90% for SimpleQA. While both models perform similarly on these tasks, they show different patterns in their performance consistency, as indicated by their varying standard deviations. While Google's Gemini shows strong on translation tasks, healthcare, and marketing business.

The figure below shows the AI agent model effective context length vs. claimed context length.

Figure: AI Agent Model Context Length Effective vs. Claimed (Source: Deeplearning.ai)

The AI21 labs Jamba 1.5 models have the highest context length of 256K tokens, which like a 200-page book with the average page length of 1,280 words. The Anthropic Claude 3.5 Sonnet model has a context length of 200K tokens and is the most cost-effective model in the market. The OpenAI GPT-4o model has a context length of 128K tokens.

Beyond this figure, Google Gimini 1.5 Pro can take 2 million tokens, which like 40 books with the average book length of 50,000 words. According to the report from the LLMs with largest context windows, Magic.dev LTM-2-Mini can take 100 million tokens as input, which is almost the largest context window in the market.

The following figure shows the AI agent model speed vs. model size. Generally, the larger the model size, the slower the model speed. The fastest model currently is the Jamba 1.5 Mini model from the AI21 labs, which contains 12 billions active parameters (and 52 billions total parameters).

Figure: AI Agent Model Speed vs. Model Size (Source: Deeplearning.ai)

With all of those on-line AI models available, one can easily pick the right model for their AI agent based on the model's capabilities, popularity, context length, pricing, and performance to meet their specific needs.

Moreover, besides leveraging these online AI model providers, Nvidia has announced a new AI supercomputer initiative called Project Digits, targeted at democratizing local AI deployment. Expected to launch in May 2025, this system promises to support models with up to 200 billion parameters at an unprecedented price point of around $3,000 USD. For context, this would enable running 2 models comparable in size to Llama 3 70B (70 billion parameters) locally. If successful, Project Digits could represent a significant breakthrough in making enterprise-grade AI infrastructure accessible to smaller organizations, offering a cost-effective alternative to pay-per-token cloud services for high-volume inference workloads.

Tools Implementation

Tools are the agent's interface to external systems. The tools include two major components, tool definition and tool execution. The tool definition describes the tool's name, description, and input schema, while the tool execution specifies how to use the tool to accomplish a specific task. The following JSON provides the tool definitions for the arithmetic operation and string formatting.

tools = [
    {
        "name": "calculator",
        "description": "A simple calculator that performs basic arithmetic operations.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The mathematical expression to evaluate (e.g., '2 + 3 * 4')."
                }
            },
            "required": ["expression"]
        }
    },
    {
        "name": "formatter",
        "description": "A simple string formatter that format the input string properly based on the indicated type.",
        "input_schema": {
            "type": "object",
            "properties": {
                "input": {
                    "type": "string",
                    "description": "The input string (e.g., '123', '123.00')."
                },
                "type": {
                    "type": "string",
                    "description": "The type of the output (e.g., 'integer', 'float', 'double', 'real', 'decimal')."
                }
            },
            "required": ["input", "type"]
        }
    }
]

Based on the tools' definitions, the AI model can understand what the tools are, when to use them, and what input parameters they require, respectively. The following Python code provides the tool executions.

def calculate(expression):
    # Remove any non-digit or non-operator characters from the expression
    expression = re.sub(r'[^0-9+\-*/().]', '', expression)

    try:
        # Do calculation with safer eval
        node = ast.parse(expression, mode='eval')
        result = eval(compile(node, '<string>', 'eval'))

        return str(result)
    except (SyntaxError, ZeroDivisionError, NameError, TypeError, OverflowError):
        return "Error: Invalid expression"

def special_format(input, type):   
    try:
        if type.lower() == "integer":
            return "{:d}".format(int(float(input)))
        elif type.lower() == "float" or type.lower() == "double" or type.lower() == "real" or type.lower() == "decimal":
            return "{:.2f}".format(float(input))
        else:
            return str(input)
    except (SyntaxError, ZeroDivisionError, NameError, TypeError, OverflowError) as e:
        return "Error: Invalid conversion - {e}"

We can also use tool definitions to integrate the agent with other systems, such as databases, APIs, and Retrieval Augment Generation (RAG) services or other AI agents, to enhance its capabilities and functionality. For example, we can use the RAG service as a tool to retrieve relevant information from a project knowledge base to allow the agent to answer questions about the project use case study. The following Python code provides the RAG tool definition.

project_rag_def = {
    "name": "Use_case_study_rag",
    "description": "Queries the project use case study information at the company based on the given query and returns the result, including the project name, client name,description, and technology used.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {
          "type": "string",
          "description": "The user prompt to query use case project information(e.g., which projects do we use .Net technology.)"
        }
      },
      "required": ["query"]
    }
}

# And then we can define the RAG tool function as shown below.
def project_rag_tool(query):
    """
    Queries the knowledge base for the given query.
    input: query as string
    output: use case info as string
    """
    try:
        boto3_session = boto3.session.Session(
            aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
            aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
            region_name=os.environ.get('AWS_REGION', 'us-east-1')
        )
        bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=boto3_session.region_name)
        model_id = "anthropic.claude-instant-v1" 
        model_arn = f'arn:aws:bedrock:us-east-1::foundation-model/{MODEL_NAME}'
        response = bedrock_agent_runtime_client.retrieve_and_generate(
            input={
                'text': query
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': "your knowledge base id",
                    'modelArn': model_arn
                }
            },
        )

        generated_text = response['output']['text']
        # await cl.Message(content=f"AWS Bedrock Knowledge Base Retrieval Result: {generated_text}").send()
        return {"status": "success", "use_case_info": generated_text} 
    except Exception as e:
        logger.info(f"Error: {str(e)}")
        return {"error": str(e)}

Note that before using the RAG functionality, you must first set up a Knowledge Base in AWS Bedrock. This involves:

Creating a Knowledge Base in the AWS Bedrock console
Ingesting your documents/data into the Knowledge Base
Obtaining the Knowledge Base ID (found in the Knowledge Base details page)

You'll need this Knowledge Base ID to configure the RAG tool as shown in the code above. For a step-by-step guide on setting up a Knowledge Base, refer to the AWS Bedrock Knowledge Base documentation.

AI Agent Implementation with Memory

With the model and tools in place, we can define an AI agent that can now process user requests, execute tasks by using available tools, and generate responses. The following Python code provides the details.

class Agent:
    # Define the __init__ method to initialize the Agent
    def __init__(self, model_name=None, client=None, custom_tools=None, reasoning_prompt=None):
        self.model_name = model_name if model_name else MODEL_NAME
        self.client = client if client else boto3.client(service_name='bedrock-runtime')
        self.tools = custom_tools if custom_tools else tools
        self.reasoning_prompt = reasoning_prompt if reasoning_prompt else None
        self.memory = []

    # Define the process_tool_call method to process the tool call
    def process_tool_call(self, tool_name, tool_input):
        if tool_name == "calculator":
            return calculate(tool_input["expression"])
        elif tool_name == "formatter":
            return special_format(tool_input["input"], tool_input["type"])

    # Define the invoke_tool_use method to iteratively process the tool use
    def invoke_tool_use(self, init_user_message, response_messages, tool_use_body):
        try:
            # Initialize the tool_use_message_content
            tool_use_result_content = []
            # Initialize the response_messages_content, including the current tool use information
            response_messages_content = response_messages
            # Start the loop to process the tool use
            while True:
                tool_use_id = tool_use_body.get("id")
                tool_name = tool_use_body.get("name")
                tool_input = tool_use_body.get("input")

                print(f"\nTool Used: {tool_name}")
                print(f"Tool Input: {tool_input}")
                tool_result = self.process_tool_call(tool_name, tool_input)
                print(f"Tool Result: {tool_result}")
                # Append the current tool use result to the tool_use_result_content
                # This step is very important because the AI model needs to know what tool results have been produced.
                tool_use_result_content.append(
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": tool_result,
                    }
                )
                # Prepare the tool_use_result_message and tool_use_message
                tool_use_result_message = {"role": "user", "content": tool_use_result_content} # including current tool use result
                tool_use_message =  {"role": "assistant", "content": response_messages_content} # including current tool use information
                # Prepare the body for the next iteration
                if self.reasoning_prompt:
                    body=json.dumps(
                        {
                            "anthropic_version": "bedrock-2023-05-31",
                            "max_tokens": 10000,
                            "messages": [init_user_message, tool_use_message, tool_use_result_message],
                            "tools": self.tools,
                            "system": self.reasoning_prompt,
                        }  
                    )
                else:
                    body=json.dumps(
                        {
                            "anthropic_version": "bedrock-2023-05-31",
                            "max_tokens": 10000,
                            "messages": [init_user_message, tool_use_message, tool_use_result_message],
                            "tools": self.tools,
                        }  
                    )
                # Invoke the model with the new body
                response = self.client.invoke_model(body=body, modelId=self.model_name)
                # Extract the response body and messages
                response_body = json.loads(response.get("body").read())
                response_messages = response_body.get('content')
                # Update the response_messages_content with the latest response message
                # This step is very important because the AI model needs to know what tools have been and will be used. 
                # tool_use_result_content and response_messages_content need to be aligned.
                response_messages_content.append(response_messages[-1])
                # Initialize the tool_use_body and assistant_response. Otherwise, they may make infinite loop
                tool_use_body = None
                assistant_response = ""
                # Check if the response contains another tool_use
                for message in response_messages:
                    if message.get("type") == "tool_use":
                        tool_use_body = message
                    elif message.get("type") == "text":
                        assistant_response = message.get("text")
                # If the response contains another tool_use, continue the loop. Otherwise, break the loop
                if tool_use_body is not None and tool_use_body.get("type") == "tool_use":
                    continue
                else:
                    break
            # Prepare the final_response
            final_response = assistant_response
            final_memory = [init_user_message, tool_use_message, tool_use_result_message]

            return final_response, final_memory
        except ClientError as e:
            logging.error(e)
            return None

    # Define the chat_with_model method to chat with the model with a user input prompt.
    def chat_with_model(self, prompt):
        print(f"\n{'='*50}\nUser Message: {prompt}\n{'='*50}")
        tool_use_body = None
        user_message =  {"role": "user", "content": prompt}
        if self.reasoning_prompt:
            body=json.dumps(
                    {
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 10000,
                        "messages": [user_message],
                        "tools": self.tools,
                        "system": self.reasoning_prompt,
                    }  
                )
        else:
            body=json.dumps(
                    {
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 10000,
                        "messages": [user_message],
                        "tools": self.tools,
                    }  
                )
        response = self.client.invoke_model(body=body, modelId=self.model_name)
        response_body = json.loads(response.get("body").read())
        response_messages = response_body.get('content')
        for message in response_messages:
            if message.get("type") == "tool_use":
                tool_use_body = message
            elif message.get("type") == "text":
                assistant_response = message.get("text")
        print(f"\nInitial Response: {assistant_response}")

        if tool_use_body is not None and tool_use_body.get("type") == "tool_use":
            final_response, final_memory = self.invoke_tool_use(user_message, response_messages, tool_use_body)
        else:
            final_response = assistant_response
            final_memory = []

        # Save the intermediate results in the memory
        self.memory = final_memory

        return final_response

In this Python code block, we define the Agent class, which includes the __init__ method to initialize the agent, the process_tool_call method to process tool calls, the invoke_tool_use method to iteratively process tool uses, and the chat_with_model method to chat with the model. The agent's memory stores information, context, and state across interactions. In this example, we only use the short-term memory (STM), which stores information relevant to the current task or interaction.

The agent's memory is a crucial component because it allows the agent to learn and perform more efficiently and effectively over time. AI (LLMs) models are often stateless, meaning they lack inherent memory. This poses limitations on models' capabilities. AI Agent's memory enables several key capabilities:

Contextual Awareness: Short-term memory helps AI agents maintain context within a conversation, enabling them to remember recent user inputs and provide more coherent responses. This is similar to human working memory.
Long-Term Learning: Long-term memory allows AI agents to retain knowledge from past interactions and apply it to future ones. This facilitates learning and adaptation.
Reduced Redundancy: By recalling previously encountered information, memory systems help avoid redundant processing, improving response times and resource utilization.
Adaptive Interactions: AI agents with memory can learn user preferences over time and tailor their responses accordingly.

Generally, AI agents can implement different types of memory, each serving distinct functions:

Short-Term Memory (STM): Similar to working memory, STM handles temporary information storage. In an AI agent, this includes:
- Context Windows: The limited space holding the current prompt and recent conversation history.
- Conversation Buffers: Storing a set number of past interactions or a fixed token length.
- Conversation Summaries: Periodically summarizing conversation history for longer context retention.
Long-Term Memory (LTM): Encompasses various types of memory, including:
- Episodic Memory: Stores specific past events and experiences, enabling learning from successes and failures.
- Semantic Memory: Holds general knowledge, concepts, and facts about the world, aiding in context understanding and effective responses.
- Procedural Memory: Stores "how-to" knowledge like skills and step-by-step instructions, allowing AI agents to learn strategies and procedures.
The long-term memory can be implemented using external databases, knowledge graphs, or retrieval augmented generation (RAG) systems.

The choice of memory implementation depends on the specific use case, complexity, and requirements of the AI agent, with more advanced agents often combining multiple solutions for enhanced performance and capabilities. An industry example can be like: an AI-agent-based virtual assistant is helping you plan a trip:

Short-term memory might hold your desired travel dates and budget.
Working memory would use this information to search for destinations and activities that fit your needs.
Semantic memory provides the knowledge base of cities, attractions, and travel costs.
Procedural memory could involve strategies for finding the best deals on flights and hotels.
Episodic memory could potentially store details from past trips to personalize future recommendations.

Implementing human-like memory in LLMs remains a complex and evolving field. However, as research progresses, we can anticipate AI agents with increasingly sophisticated memory capabilities, leading to more intelligent and adaptable interactions.

Reasoning and Planning

Reasoning and planning are essential components that manage the agent's operations. They include information processing, reasoning, action decision-making, and state management. In this example, we use the reasoning_prompt parameter to provide additional context or constraints to the model. The following Python code provides an example of the reasoning prompt.

reasoning_instruction = """
You are a helpful assistant that:
1. Performs arithmetic operations using the calculator tool
2. Formats results appropriately using the formatter tool
3. Explains your process clearly to the user

Always verify calculations and format results according to the user's needs.
"""
agent = Agent(reasoning_prompt=reasoning_instruction)

In this reasoning prompt, we are telling the AI model to follow the reasoning and planning instruction to perform operations in the appropriate sequence using provided tools (only). It will prevent the AI model from generating arbitrary responses. The reasoning prompt can be customized based on the agent's objectives, tasks, and constraints. In order to achieve appropriate reasoning and planning results, we recommend to follow reasoning and logic frameworks, such as ReAct, CoT, or Tree-of-Thoughts, each offering unique capabilities and design patterns.

3. Test and Validate the AI Agent

After building the AI agent, let us test and validate its performance. The first test case can be very simple as shown below.

# Test the AI model capabilities to call one function
agent = Agent()
final_response = agent.chat_with_model("What is the result of 1,984,135 * 9,343,116?")
print(f"\nFinal Response: {final_response}")

The output of the test case should be like:

==================================================
User Message: What is the result of 1,984,135 * 9,343,116?
==================================================

Initial Response: I'll help you calculate that multiplication. I'll use the calculator function to perform this operation.

Tool Used: calculator
Tool Input: {'expression': '1984135 * 9343116'}
Tool Result: 18538003464660

Final Response: The result of 1,984,135 * 9,343,116 is 18,538,003,464,660.

As a verification:
- First number: 1,984,135
- Second number: 9,343,116
- Product: 18,538,003,464,660

And let us take a look at the memory content of the agent.

pprint.pprint(agent.memory)

The output of the memory content should be like:

[{'content': 'What is the result of 1,984,135 * 9,343,116?', 'role': 'user'},
 {'content': [{'text': "I'll help you calculate that multiplication. I'll use "
                       'the calculator function to perform this operation.',
               'type': 'text'},
              {'id': 'toolu_bdrk_01C8FwSembdE3TGFdNS6Gceu',
               'input': {'expression': '1984135 * 9343116'},
               'name': 'calculator',
               'type': 'tool_use'},
              {'text': 'The result of 1,984,135 * 9,343,116 is '
                       '18,538,003,464,660.\n'
                       '\n'
                       'As a verification:\n'
                       '- First number: 1,984,135\n'
                       '- Second number: 9,343,116\n'
                       '- Product: 18,538,003,464,660',
               'type': 'text'}],
  'role': 'assistant'},
 {'content': [{'content': '18538003464660',
               'tool_use_id': 'toolu_bdrk_01C8FwSembdE3TGFdNS6Gceu',
               'type': 'tool_result'}],
  'role': 'user'}]

From the outputs of agent's response and memory, we can easily see that the AI agent has successfully calculated the multiplication of two numbers and provided the correct result. It successfully leverages the calculation tool to do the math work.

The second test case can be checking the custom reasoning capability as shown below.

# Test the AI agent with a custom reasoning instruction
reasoning_instruction = """
You are a helpful assistant that:
1. Performs arithmetic operations using the calculator tool
2. Formats results appropriately using the formatter tool
3. Explains your process clearly to the user

Always verify calculations and format results according to the user's needs.
"""
agent = Agent(reasoning_prompt=reasoning_instruction)
final_response = agent.chat_with_model("What is the result of 1,984,135 * 9,343,116?")
print(f"\nFinal Response: {final_response}")

The output of the test case should be like:

==================================================
User Message: What is the result of 1,984,135 * 9,343,116?
==================================================

Initial Response: I'll help you calculate the multiplication of 1,984,135 and 9,343,116. I'll use the calculator tool to perform this calculation.

Tool Used: calculator
Tool Input: {'expression': '1984135 * 9343116'}
Tool Result: 18538003464660

Tool Used: formatter
Tool Input: {'input': '18538003464660', 'type': 'decimal'}
Tool Result: 18538003464660.00

Final Response: The result of 1,984,135 * 9,343,116 is 18,538,003,464,660

I broke this down into two steps:
1. Used the calculator to multiply the two numbers
2. Used the formatter to display the result as a decimal for clarity

And the memory should be like:

pprint.pprint(agent.memory)

[{'content': 'What is the result of 1,984,135 * 9,343,116?', 'role': 'user'},
 {'content': [{'text': "I'll help you calculate the multiplication of "
                       "1,984,135 and 9,343,116. I'll use the calculator tool "
                       'to perform this calculation.',
               'type': 'text'},
              {'id': 'toolu_bdrk_01UbKb27Xo8FmBAWimaQ9HxM',
               'input': {'expression': '1984135 * 9343116'},
               'name': 'calculator',
               'type': 'tool_use'},
              {'id': 'toolu_bdrk_0113YX4mHcc7wBPYdF1fid3C',
               'input': {'input': '18538003464660', 'type': 'decimal'},
               'name': 'formatter',
               'type': 'tool_use'},
              {'text': 'The result of 1,984,135 * 9,343,116 is '
                       '18,538,003,464,660.\n'
                       '\n'
                       'I broke this down into two steps:\n'
                       '1. Used the calculator to multiply the two numbers\n'
                       '2. Used the formatter to display the result as a '
                       'decimal for clarity\n'
                       '\n'
                       'Is there anything else I can help you with?',
               'type': 'text'}],
  'role': 'assistant'},
 {'content': [{'content': '18538003464660',
...
              {'content': '18538003464660.00',
               'tool_use_id': 'toolu_bdrk_0113YX4mHcc7wBPYdF1fid3C',
               'type': 'tool_result'}],
  'role': 'user'}]

From the outputs above, we can see the custom reasoning prompt really helps the AI agent to follow the instruction to perform the operations in the correct sequence, even though the user input doesn't explicitly require the two-step operations. It first uses the calculator tool to calculate the multiplication and then uses the formatter tool to format the result properly. The agent's response is clear and informative, indicating the tools used and the results obtained.

The third test case can be checking the AI agent's ability to handle a more complex expression as shown below.

# Test the AI model capabilities to do reasoning/planning and then call two functions without explicit sequence instruction.
final_response = agent.chat_with_model("Format the calculation result of (12851 - 593) * 301 + 76 as a decimal.")
print(f"\nFinal Response: {final_response}")

The output would be like:

==================================================
User Message: Format the calculation result of (12851 - 593) * 301 + 76 as a decimal.
==================================================

Initial Response: I'll solve this step by step:

1. First, I'll use the calculator to perform the calculation.
2. Then, I'll use the formatter to convert the result to a decimal.

Tool Used: calculator
Tool Input: {'expression': '(12851 - 593) * 301 + 76'}
Tool Result: 3689734

Tool Used: formatter
Tool Input: {'input': '3689734', 'type': 'decimal'}
Tool Result: 3689734.00

Final Response: The result of the calculation (12851 - 593) * 301 + 76 is 3,689,734, and when formatted as a decimal, it becomes 3,689,734.00.

The memory output would be like:

pprint.pprint(agent.memory)

[{'content': 'Format the calculation result of (12851 - 593) * 301 + 76 as a '
             'decimal.',
  'role': 'user'},
 {'content': [{'text': "I'll solve this step by step:\n"
                       '\n'
                       "1. First, I'll use the calculator to perform the "
                       'calculation.\n'
                       "2. Then, I'll use the formatter to convert the result "
                       'to a decimal.',
               'type': 'text'},
              {'id': 'toolu_bdrk_019YZc41jDa1Ce1GU9sFqRuW',
               'input': {'expression': '(12851 - 593) * 301 + 76'},
               'name': 'calculator',
               'type': 'tool_use'},
              {'id': 'toolu_bdrk_01VgQ4fGiQUfF9FogujPo49Y',
               'input': {'input': '3689734', 'type': 'decimal'},
               'name': 'formatter',
               'type': 'tool_use'},
              {'text': 'The result of the calculation (12851 - 593) * 301 + 76 '
                       'is 3,689,734, and when formatted as a decimal, it '
                       'becomes 3,689,734.00.',
               'type': 'text'}],
  'role': 'assistant'},
 {'content': [{'content': '3689734',
               'tool_use_id': 'toolu_bdrk_019YZc41jDa1Ce1GU9sFqRuW',
               'type': 'tool_result'},
              {'content': '3689734.00',
               'tool_use_id': 'toolu_bdrk_01VgQ4fGiQUfF9FogujPo49Y',
               'type': 'tool_result'}],
  'role': 'user'}]

So, from the outputs above, we can see that the AI agent has successfully handled the complex expression, performed the calculation, and formatted the result as requested. The agent's response is clear and informative, indicating the tools used, the results obtained, and the steps taken to solve the problem.

Conclusion

In this article, we have provided a guide on how to build an AI agent from scratch, covering the key components, options, and considerations. By following the steps outlined in this guide, you can create an AI agent that can perform specific tasks, interact with users, and achieve defined objectives. Building an AI agent requires careful planning, design, and implementation to ensure that it meets the desired requirements and delivers the expected results. With the right tools, models, and strategies, you can build an effective AI agent that can help you solve problems, automate tasks, and improve user experiences.

In the next article, we will provide a guide on how to build an AI agent with some popular frameworks, such as CrewAI, Smolagent, and Swarm. Stay tuned!

DEV Community

Building AI Agents from Scratch - Basics, Options, and Considerations

Steps to Build an AI Agent

1. Define the Agent's Purpose and Objectives

2. Build Core AI Agent Components

Environment Setup

Model Selection

Tools Implementation

AI Agent Implementation with Memory

Reasoning and Planning

3. Test and Validate the AI Agent

Conclusion

References

Top comments (0)

Read next

RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5 with AMD 9950X CPU

Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!

Benchmarking ChatGPT, Qwen, and DeepSeek on Real-World AI Tasks

DeepSeek Extension for GitHub Copilot in VS Code 🚀