Multi-agent AI systems represent a significant shift in how we approach complex problem-solving and task automation. At the core of this is OpenAI's Swarm framework, which offers a groundbreaking approach to orchestrating multiple AI agents. Imagine a team of AI helpers, each with their own job, working together to make life easier. That's the basic idea behind using Swarm. It helps different AI agents work together, like a team, to tackle big tasks. This setup is execellent for various types of tasks, from handling customer questions where one AI might greet the customer while another solves their problem to analyzing big data where one AI collects information, another processes the numbers, and a third turns it all into easy-to-understand charts. The possibilities are endless.
Although Swarm is meant to be used with OpenAI's API key by default, many developers are reluctant towards relying on an API key for this system. To solve this problem, we have come up with this article to help you build a multi-agent AI system with Swarm framework by using a locally installed LLM through Ollama instead of an API key.
Prerequisites
The minimum system requirements for this use case are:
GPUs: A100 or RTX 4090 (for smooth execution).
Disk Space: 100-300 GB
RAM: At least 16 GB.
Jupyter Notebook installed.
Note: The prerequisites for this are highly variable across use cases. A high-end configuration could be used for a large-scale deployment.
Step-by-step process to build a multi-agent AI System with OpenAI Swarm
For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.
Step 1: Setting up a NodeShift Account
Visit app.nodeshift.com and create an account by filling in basic details, or continue signing up with your Google/GitHub account.
If you already have an account, login straight to your dashboard.
Step 2: Create a GPU Node
After accessing your account, you should see a dashboard (see image), now:
1) Navigate to the menu on the left side.
2) Click on the GPU Nodes option.
3) Click on Start to start creating your very first GPU node.
These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.
Step 3: Selecting configuration for GPU (model, region, storage)
1) For this tutorial, we’ll be using the RTX 4090 GPU; however, you can choose any GPU of your choice based on your needs.
2) Similarly, we’ll opt for 100GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.
Step 4: Choose GPU Configuration and Authentication method
1) After selecting your required configuration options, you'll see the available VMs in your region and according to (or very close to) your configuration. In our case, we'll choose a 1x RTX 4090 GPU node with 8 vCPUs/32GB RAM/100 GB SSD.
2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our official documentation.
Step 5: Choose an Image
The final step would be to choose an image for the VM, which in our case is Jupyter Notebook, where we’ll deploy and run the inference our swarm.
That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click Create to deploy the node.
Step 6: Connect to active Compute Node using SSH
1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status Running in green, meaning that our Compute node is ready to use!
2) Once your GPU shows this status, navigate to the three dots on the right and click on Connect with SSH. This will open a new tab with a Jupyter Notebook session in which we can run our model.
Step 7: Install a model locally with Ollama
Since we'll use a locally installed LLM instead of OpenAI API to power the multi-agent system, we'll install and set up a local Ollama server where the model will serve.
1) Open a terminal window inside Jupyter.
2) Install Ollama.
curl -fsSL https://ollama.com/install.sh | sh
Output:
As the installation completes, you may get the following warning:
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
This may happen when Ollama is not able to detect the GPU in your system automatically. To resolve this, simply install some GPU dependencies with the following command:
sudo apt install pciutils lshw
After this, rerun the Ollama installation command, and this time, it should successfully detect and use the GPU.
3) Confirm installation by running the ollama command.
Output:
4) Start Ollama.
ollama serve
Output:
5) Open a new terminal window and run the following command to download the model.
For the scope of this tutorial, we'll download and use the Llama3.3 model as the AI brain of our multi-agent system. You can download any model of your choice.
ollama pull <MODEL_CODE>
Output:
Note that we are just pulling the model inside our local system and not running it. This is because we just require the model to be present inside the system (we'll use it within our code) and not essentially running behind the scenes.
Step 8: Build & run a simple multi-agent system
1) To write the code for our system and run the model, we'll open a new Python Notebook.
2) Clone the OpenAI Swarm Repository and install OpenAI's Ollama integration with pip.
!pip install git+https://github.com/openai/swarm.git --quiet
!pip install ollama openai --quiet
Output:
3) Configure Ollama client.
import ollama
from openai import OpenAI
model = "llama3.3"
ollama_client = OpenAI(
base_url = "http://localhost:11434/v1",
api_key = "ollama"
)
4) Write the code for the multi-agent AI system.
from swarm import Swarm, Agent
client = Swarm(ollama_client)
def escalate_to_tech_support():
return tech_support_agent
# Define the sales agent
sales_agent = Agent(
name="Sales Agent",
model=model,
instructions="You only handle inquiries related to product pricing and offers.",
functions=[escalate_to_tech_support],
)
# Define the technical support agent
tech_support_agent = Agent(
name="Tech Support Agent",
model=model,
instructions="You only handle technical issues related to product troubleshooting.",
)
# Simulate conversation with sales agent
response = client.run(
agent=sales_agent,
messages=[{"role": "user", "content": "I need some help from the Tech support. Are you the tech support agent?"}],
)
print(response.messages[-1]["content"])
In the above code, we have defined two agents as follows:
Sales Agent: Instructed to manage inquiries related to products, like pricing, offers, etc.
Tech Support Agent: Instructed to help users troubleshoot their product issues.
After this, we simulated a conversation to check if agents were able to do handoffs. For this, we have initiated a conversation with the Sales Agent, and we'll intentionally send a query that is supposed to be resolved by the Tech support agent.
This way, if we receive the expected response, we can confirm that the Sales Agent must have handed over the conversation to the Tech support agent.
As we run the above code asking the following question to the Sales Agent:
"I need some help from the Tech support. Are you the tech support agent?"
The system generates the following response:
As you may see in the output above, we originally called the Sales agent with our query. Instead of denying that it was not their responsibility, they handed over the conversation to the relevant agent in the swarm (the expected behavior), the Tech support agent. Finally, the tech support agent generated this response.
A basic workflow explaining how the agents communicate internally in the above code example:
Conclusion
Building multi-agent AI systems with OpenAI Swarm opens up new possibilities for automating complex workflows and enabling AI-driven collaboration across various domains. With the Swarm framework, developers can create intelligent systems that work together to solve complex problems with greater efficiency and adaptability. In this tutorial, we've explored how you can quickly put together more than one agent to build a robust AI system and deploy the system using NodeShift, which makes deploying and scaling such systems even more effortless. NodeShift Cloud provides a reliable infrastructure to run and manage AI workloads, ensuring smooth operations, optimized performance, and simplified deployment workflows, allowing developers to focus on exploring endless possibilities in the era of AI.
For more information about NodeShift:
Top comments (0)