Sina Tavakkol

Posted on Feb 19

Building Local AI Agents: A Practical Guide to Frameworks and Deployment

#ai #aiagent #llm #rag

Part 2/3

Welcome back to our three-part series on AI Agents. In the first article, "AI Agents Explained: Architecture, Benefits, and Real-World Applications" we established a solid understanding of what AI Agents are, their internal components, and the advantages they offer.

Now, we move into the practical realm. In this second article, we'll explore how to build and deploy AI Agents locally using popular frameworks and tools.

We'll also present a step-by-step guide and a simple code example to get you started. The final article will dive into real world examples with in-depth explanations and sample codes. This hands-on guide will empower you to create your own AI Agents and leverage their capabilities on your own hardware.

Introduction to Local AI Agent Deployment

Deploying AI Agents locally offers several key benefits:

Privacy: Data is processed on your machine, keeping sensitive information under your control.
Low Latency: Local processing removes network delays, allowing for quicker response times, which is crucial for real-time applications.
Offline Access: Agents can work without an internet connection, making them perfect for remote areas.
Cost Savings: You can save on ongoing cloud computing costs by running agents on your own setup.

This article highlights practical tools and techniques to take advantage of these benefits.

Frameworks and Tools for Local AI Agent Development

Several frameworks and tools simplify the process of building and deploying AI Agents locally. We'll focus on three prominent options:

Ollama: Ollama simplifies the process of deploying and running Large Language Models (LLMs) locally. It handles the complexities of model management, allowing you to quickly deploy and experiment with different LLMs without worrying about underlying infrastructure. Ollama is designed to work with a wide array of models, so we can easily integrate this with the other options.
LangChain: LangChain is a powerful framework for building applications powered by language models. It provides a modular and flexible architecture for model integration, data connection, agent creation, and more. Its modular design allows you to customize every aspect of your agent's behavior.
AutoGen (Microsoft): AutoGen enables the development of LLM applications with multiple agents that can converse with each other to solve tasks. It simplifies the orchestration, optimization, and automation of complex workflows involving multiple LLMs and tools.

Choosing the right framework depends on your specific needs and the complexity of your project. For simple agents, LangChain might suffice, while AutoGen is better suited for multi-agent systems. LangChain and AutoGen are more complete tools, so Ollama can be used as a module for these.

Setting up Your Environment

1. Create a Virtual Environment:

Let's walk through the process of setting up your development environment using LangChain, as it's very powerful for deploying agents. These instructions assume you have Python 3.7+ installed.

python3 -m venv venv
source venv/bin/activate  # On Linux/macOS
venv\Scripts\activate  # On Windows

Explanation: A virtual environment isolates your project's dependencies from the system-wide Python installation, preventing conflicts and ensuring reproducibility.

2. Install Dependencies:

pip install langchain openai chromadb python-dotenv

Explanation: pip is Python's package installer. This command installs the following packages:

langchain: The LangChain framework.
openai: *OpenAI'*s Python library, used for interacting with OpenAI models (you'll need an API key).
chromadb: A vector database for storing embedding. ChromaDB is lightweight and easy to use for local development.
python-dotenv: For loading environment variables from a .env file.

3. Install Ollama:

Follow Ollama's steps to install it at ollama.com.
Make sure to download and test your local LLM before continue.

4. Set up API Keys:

Create a .env file in your project directory. This file will store sensitive information like API keys separately from your code.

Add your OpenAI API key and Ollama URL:

OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
OLLAMA_BASE_URL="http://localhost:11434"

5. Load Environment Variables:

import os
from dotenv import load_dotenv

load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")
ollama_base_url = os.getenv("OLLAMA_BASE_URL")

Explanation: This code snippet loads the environment variables from the .env file into your Python script, making them accessible for use.

Note: You will need an account at OpenAI to use its models with API KEY. You can use the Ollama URL if you want to use one of the local models you have previously downloaded with Ollama.

Designing Your Agent's Architecture

Before diving into code, let's outline the key architectural considerations for our simple AI Agent:

Defining Goals:
- Our agent's goal is to answer questions based on a local document. This is a common use case for information retrieval and knowledge management.
Choosing appropriate LLMs:
- We will use OpenAI's gpt-3.5-turbo model or a local model, such as llama2, depending on your configuration.
- Consider the model's capabilities, cost, and latency when making your choice.
Choosing Memory/Storage:
- We will load a document and create a vector embedding to respond to the questions. A vector embedding is a numerical representation of the text that captures its semantic meaning.
- ChromaDB will be used to store these embedding.
Selecting Tools:
- We do not need any external tools for this basic example. However, in more complex scenarios, agents might require access to tools like web search, calculators, or external databases.

Basic Code Example: Question Answering Agent

import os
from dotenv import load_dotenv
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI, Ollama
from langchain.chains import RetrievalQA

load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")
ollama_base_url = os.getenv("OLLAMA_BASE_URL")

# 1. Load the Document
loader = TextLoader("your_document.txt")  # Replace with your document path
documents = loader.load()

# 2. Create Embeddings and Store in ChromaDB
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)  # or HuggingFaceEmbeddings
db = Chroma.from_documents(documents, embeddings)

# 3. Choose LLM and Create RetrievalQA Chain

use_local_model = True # Set it to false if you want to use OpenAI Model.

if use_local_model:
    llm = Ollama(base_url=ollama_base_url, model="llama2")  # Replace with your model
else:
    llm = OpenAI(openai_api_key=openai_api_key)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())

# 4. Ask Questions
query = "What is the main topic of this document?"
result = qa.run(query)

print(result)

Explanation:

Load the Document: This code loads a text document using TextLoader. Replace "your_document.txt" with the path to your local file.
Create Embedding: It creates embedding from the document using OpenAI's embedding (or you can select another embedding model). Embedding are numerical representations of the text, allowing the agent to understand the semantic meaning of the document. These embedding are stored in a ChromaDB vector database.
Choose LLM and Create QA Chain: Here is where we change between OpenAI model or a local model with Ollama. Just select the variable use_local_model to true or false. Change the local model to the one you have available. The RetrievalQA chain combines the LLM with the ChromaDB retriever.
Ask Questions: The code prompts the agent with a question. The qa.run(query) method retrieves relevant information from the ChromaDB and uses the LLM to generate an answer.

Before running:

Replace "YOUR_OPENAI_API_KEY" in your .env file with your actual OpenAI API key or use local models with Ollama.
Replace "your_document.txt" with the path to a local text file.
Ensure Ollama is running and your local model is downloaded.

Troubleshooting Tips

API Key Errors: Double-check that your OpenAI API key is correctly set in the .env file. If you're using a local model, ensure that you haven't set OPENAI_API_KEY accidentally. Common errors include missing keys, incorrect formatting, or expired keys.
Dependency Issues: If you encounter ModuleNotFoundError errors, ensure that all required packages are installed using pip install. If you recently updated Python or pip, try upgrading pip with pip install --upgrade pip and then reinstalling the dependencies.
Model Loading Errors: If you're using a local LLM and encounter errors related to loading the model, verify that Ollama is running correctly and that the specified model (llama2 in the example) is downloaded. Check Ollama's logs for detailed error messages.
Out of Memory Errors: LLMs can be memory-intensive. If you encounter "out of memory" errors, try reducing the size of the document you're loading, using a smaller LLM, or increasing your system's RAM.

Conclusion

In this article, we've explored the practical steps for building and deploying AI Agents locally. We've discussed key frameworks, setting up the environment, and provided a basic code example. By using these tools, you can begin creating your own AI Agents and use their capabilities as you wish. In the next and final article, we'll look into advanced use cases and techniques to optimize AI Agent performance.

DEV Community