“Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons. Agents won’t simply make recommendations; they’ll help you act on them.” – Bill Gates
Many people have claimed that 2024 is the year of the agent in AI. In this blog, we’re going to take an informational approach to the definition of AI agents, the current status of agentic AI, and what the future may look like as agents become more common. But first...
What is an AI Agent?
AI agents, in simplest terms, are programs that can use tools, carry out tasks, and work with or without a human to achieve a goal. These tasks span a wide array of complexity levels, from answering basic inquiries to executing intricate actions that require a deep understanding of the external environment. From a programming perspective, we’re looping LLMs and giving them access to APIs. Basically, ChatGPT isn’t an agent. If you loop ChatGPT and give it tools, a goal, and access to information, it becomes an intelligent agent.
Recent History
The history of intelligent agents in artificial intelligence is tough to chart. What seemed impossible a year ago now happens in months or weeks. Futurist Ray Kurzwil uses the following chart to talk about Moore's Law (“the principle that the speed and capability of computers can be expected to double every two years, as a result of increases in the number of transistors a microchip can contain,” according to Oxford Languages Dictionary) and, while experts have claimed for years that Moore’s Law is over, we’re now seeing the Price Performance of Computation outpacing it.
As of April 16, 2024, the capabilities of processing and synthesizing vast amounts of data far exceed human capabilities in major areas such as data analysis and high-level coding. As self-improving AI agents hit the market, many of the flaws present today will vanish.
In recent history (meaning the past year), several papers have led the industry to shift towards model-based agents. Most notably, the paper "Generative Agents: Interactive Simulacra of Human Behavior" by researchers from Stanford University and Google introduced a process to simulate human-like behaviors in a video game-type environment. These agents are built on an architectural framework that extends large language models, enabling them to store experiences, synthesize memories over time, and dynamically retrieve them to inform behavior planning.
This capability allows the agents to perform a diverse range of actions, from daily activities like cooking to creative tasks such as painting and writing. By remembering past interactions and forming opinions, these intelligent agents in AI create a more vibrant and believable simulation of human interaction.
Many types of agents emerged in the early days, such as Autogen, ChatDev, and AgentGPT, but quickly were surpassed in most regards by other frameworks. However, it’s worth it to have a basic understanding of these AI agent examples and what they offer.
AutoGen
AutoGen is an AI framework by Microsoft designed to streamline multi-agent conversations. AutoGen allows agents to communicate, share information, and make collective decisions. This setup enhances the responsiveness and dynamism of conversations. Developers use AutoGen to tailor agents to specific roles, such as programmer, content writer, CEO, etc. This enhances their ability to handle tasks from simple queries to intricate problem-solving.
AutoGen is modular and as a Microsoft-published, open-source initiative, it can be credited with popularizing the idea of agentic frameworks with many developers and making the industry take the idea seriously.
ChatDev
ChatDev was launched as a Software as a Service (SaaS) platform in late 2023 and aims to easily allow business cases where an intelligent agent AI works to execute software development tasks. ChatDev is customizable and extendable, ideal for harnessing collective intelligence in software creation. Hallucinations are limited by structuring the process into designing, coding, testing, and documentation phases, each managed by specialized agent teams.
The framework's architecture, which is based on the Chat Chain model, breaks down tasks into manageable subtasks, enhancing agent collaboration and communication. Additional innovative features include Git integration, Human-Agent-Interaction mode, and an Art mode for generating software-related visuals, further fostering community involvement and enhancing the development process.
While it offers unique advantages in hosted agent environments and transparency, it lacks some features, like visual builders, that are present in competitors like BondAI and drag-and-drop solutions like SmythOS. It’s a good artificial intelligence agent option for software development teams and project managers who prioritize an AI-driven workflow.
AgentGPT
AgentGPT was an early agent framework designed to create, configure, and deploy autonomous AI agents. It mostly relies on looping OpenAI's GPT models like GPT-3.5 and GPT-4. AgentGPT allows users to set a goal for the AI, which autonomously plans, executes, and refines strategies to achieve it. This platform allows for both web browser access and local operation via Docker, or server deployment.
Like ChatDev, AgentGPT breaks down complex tasks into manageable sub-tasks and iteratively prompts itself, showcasing advanced capabilities in autonomy. As an open-source project, it fosters a community-driven development approach, supporting features like user management, authentication, agent run-saving, dynamic translations, and AI model customization.
Upcoming enhancements include advanced web browsing, backend migration to Python, and long-term memory integration.
Advancements
After the initial release of frameworks like AutoGen, more refined systems began popping up. CrewAI, TaskWeaver, and Aider are a few examples of different types of agents that have taken a significant step up in the past several months.
Crew AI
Crew AI is similar to the other agent frameworks before it and structured on principles of modularity and simplicity. The approach divides a simulated AI world into manageable segments such as agents, tools, tasks, processes, and crews, which enhances system approachability and operational efficiency. Built on the LangChain platform, Crew AI champions a multi-agent system where agents function like team members, collaborating to achieve sophisticated levels of decision-making, creativity, and problem-solving.
Crew AI enables the automation of complex tasks, such as crafting a landing page from a basic idea by coordinating a crew of specialized agents responsible for different project facets. The setup process is straightforward, allowing for significant customization and integration of various AI models to suit specific needs.
Looking forward, Crew AI aims to introduce more intricate process types and expand its adaptability, continually driven by community-focused development. For those who prefer working with Langchain, Crew AI is a preferable framework.
TaskWeaver
TaskWeaver is an open-source project by Microsoft focused on transforming data analytics and domain-specific tasks. This framework excels in planning and executing complex tasks through a blend of agentic AI and user-defined plugins. TaskWeaver operates on a code-first principle, translating user requests into executable code that utilizes various plugins for data analytics, supporting stateful conversations, and handling complex data structures, like Pandas DataFrames, directly in memory. Its architecture includes a Planner to break down tasks, a Code Generator to convert these into executable code, and a Code Executor to maintain the execution state throughout interactions.
This setup allows for the incorporation of rich data structures and domain-specific knowledge, enhancing flexibility and applicability across different domains. TaskWeaver is a step beyond AutoGen and seeks to drive innovation in AI applications from healthcare to finance. And, it demonstrates Microsoft’s commitment to Open Source.
Aider
Aider was developed by Paul Gaither and focuses on giving developers a pair programming experience directly from developers' terminals. This command-line tool edits code in real-time based on a user prompt in the command terminal. As of writing, it only supports OpenAI’s API but can write, edit, and refine code across multiple languages including Python, JavaScript, and HTML. Developers can use Aider for code generation, debugging, and understanding complex projects.
Concerns
Despite the potential of AI agents, we’re still early in the process and no one knows where this may lead. Along with the technical challenges associated with shipping production-quality agents, there may be major security and ethical concerns.
Ethical Implications and Social Impact
“Your scientists were so preoccupied with whether they could, they didn't stop to think if they should.” - Jurassic Park (1993)
The rise of the agent in artificial intelligence is almost a foregone conclusion. While some skeptics believe that agents will fail to add real economic value, others believe that they threaten knowledge work as we know it. With the assumption that agents will continue to improve and will add economic value, these AI agents will likely become integrated into our daily lives. What effects will an AI that can actually perform tasks for you have on your daily life and the economy as a whole?
One major concern is bias. AI can perpetuate and amplify existing societal inequalities and the governance of AI agents is complex, to say the least. Robust legal frameworks will need to be erected to ensure minimal harm but history has shown that politics tends to lag behind the most slow-moving technologies, let alone one on an exponential curve.
Socially, the deployment of agentic AI will likely impact employment, with certain jobs being augmented or replaced by AI. Many argue that society as a whole will have to shift to new ways of working while others believe that the end of what people typically consider “work” may be near.
Technical Challenges and Solutions
Deploying AI agents on a large scale is not a simple feat. While current technology may allow agents to complete complex tasks, self-correct, and self-improve, edge cases remain a major issue. A major difference between computers and humans is that computers do exactly what they’re programmed to do, every time. Humans can adapt to many situations but lack the ability to complete repetitive tasks with the speed and accuracy of a machine. AI agents are a step towards making machines more human, but making them more adaptable towards generalized tasks means they also lose part of their advantage of being predictable deterministic machines. A major challenge lies in error correction and having a machine know when it has committed an error. There is no try-catch block for the trolley problem, and the further we push agentic AI to stand in for human decisions, the more complex scenarios they’ll be tasked with handling.
Resources also become an issue. Sam Altman recently estimated that the world will need to spend around 7 trillion dollars to provide the infrastructure needed for AI to scale. As the demand for more sophisticated agents grows, the underlying infrastructure must evolve to support their advanced capabilities.
While the largest models continue to push the edge of what’s possible, many issues with AI agents may be solved by stacking smaller models on local hardware instead of relying on cutting-edge systems in the cloud. Having multiple small specialized models may provide some of that streamlined control flow without the need for hyper-generalization. Mixture of Experts (MOE) models have already shown that using a gating mechanism to route tokens to specialized models can produce major results with much less compute.
Security Considerations
As intelligent agents gain autonomy and access to sensitive data, security becomes a paramount concern. Cloud models are routinely jailbroken to leak sensitive information and perform actions against their programmed guardrails.
Air-gapped, on-device processing for AI agents seems to be the best way forward for privacy and security. By keeping data local and reducing the vulnerability associated with data transmission over networks, not only is data safer, but the system only has to worry about a local user.
However, this shift is slowed by the local compute available and the development of security protocols to prevent on-device breaches. AI agents could be hamstrung by the lack of trust businesses and individuals are willing to put in them.
Devin Mania
In the last month, Cognition Labs announced an AI Agent called Devin. After several tech demos were shown off, Devin became the symbol of AI Agent promise and doom in many programmers’ eyes. After its announcement, other open-source projects attempted to capitalize on its marketing by revealing their own examples of intelligent agents. Let’s do a quick breakdown of each, starting with Devin.
Devin
Devin, developed by Cognition Labs, is an autonomous AI software engineer designed to perform a broad spectrum of engineering tasks independently, such as coding, debugging, and deploying applications. Influenced by the necessity to streamline repetitive coding processes, Devin advances beyond traditional AI tools like ChatGPT and agent frameworks like AutoGen by not just assisting but fully managing software projects, as evidenced by its performance on platforms like GitHub and Upwork.
Devin's emergence marks a significant shift in AI's role within software engineering, promising increased productivity and democratization of development. However, it also brings potential concerns such as job displacement and the need for rigorous oversight to ensure transparency and human control over a learning agent and AI actions.
Open Devin
Open Devin is an innovative open-source project that builds on the ideas and capabilities of Devin. Employing technologies like Docker and Kubernetes for secure code execution and featuring a user-friendly interface designed with React, Open Devin supports real-time developer interactions and adjustments. Its versatility is supported by a community-driven approach, encouraging developer contributions and rapid adaptation to new technologies.
Despite its potential, Open Devin faces challenges such as instability during rapid development and the need for a solid programming foundation among its users. As it continues to evolve, Open Devin focuses on research and development to improve foundational models and evaluation methods, aiming to integrate AI more deeply into software development and transform how developers interact with coding environments.
Devika
Devika is an open-source AI software engineer designed as an alternative to Cognition AI's Devin. Devika integrates with large language models like Claude 3 and GPT-4, facilitating complex tasks and natural interactions. Its modular, agent-based architecture supports a spectrum of software development activities from planning to debugging, emphasizing community-driven improvements and accessibility.
Devika aims to deeply integrate with development tools and specialize in domains like web development and machine learning, transforming the tech job market by making development skills accessible to a wider audience.
The Future of AI Agents
Almost exactly one year ago, the paper “Generative Agents: Interactive Simulacra of Human Behavior” was released, showing the real abilities and promise of AI agents. AI critics have pointed to hallucinations, lack of reasoning abilities, and the limited nature of LLMs to attempt to downplay the significance AI will have on society. Intelligent agents look to fix all these issues.
Here’s a quick look at how this field may advance in the coming weeks and months.
Google AI Agents
In recent weeks, Google has unveiled significant advancements in AI development with the introduction of Google AI Studio and Vertex AI. These platforms are crafted to enhance developers' capabilities and streamline the process of creating production-ready AI agents.
Google AI Studio serves as a hub for integrating Google's advanced AI models, including the newly launched Gemini series. It offers a streamlined interface and features like the Gemini 1.5 model, which boasts a 1-million-token context window for handling complex datasets and queries efficiently.
Vertex AI, integrated with Google Cloud, allows for the customization and scaling of AI applications, supporting high standards of security and governance. It introduces Gemini Pro, available at no cost initially, with functionalities like function calling and custom knowledge grounding.
These platforms make sophisticated AI tools like Gemini Pro accessible, facilitating a range of AI functionalities such as chat and semantic retrieval across varied data types, including text and images. Google’s continuous enhancements, including the release of CodeGemma and RecurrentGemma for coding and recurrent tasks, and Imagen 2 for high-quality image generation, further expand the possibilities for AI application development. With Google AI Studio and Vertex AI, Google is setting a new standard for AI integration, offering developers the tools to innovate and improve user experiences.
What will Llama 3, GPT 4.5, or GPT 5 bring?
During the agent revolution, OpenAI has been somewhat silent. Many consider Claude 3 Opus to be the most intelligent LLM currently available, and since the launch of the GPT store, OpenAI has released very little.
As we anticipate the arrival of AI's next frontier with OpenAI's GPT-5 and with Meta's Llama 3, the capabilities and scope of AI agents are poised for a dramatic transformation. GPT-5 is rumored to expand its parameters into the trillions, enhancing its reasoning abilities and introducing robust multimodality, potentially integrating video and image understanding alongside text. This leap is expected to pave the way for more nuanced interactions and deeper contextual awareness, possibly incorporating personal data like emails and calendars to offer a more tailored user experience.
Another angle may be that it simply allows GPT 4 to “think” longer or brings AI agents to a chat interface for anyone to use. Meanwhile, Llama 3’s larger model may focus on efficiency and scalability, adhering to Meta's ongoing strategy.
Both models are part of the broader movement towards achieving artificial general intelligence (AGI). This evolution in AI not only promises enhanced capabilities but also foresees AI agents that are democratized.
Open Source
In the rapidly evolving landscape of artificial intelligence, open-source AI agents like AutoGen, Devika, and CrewAI offer compelling alternatives to proprietary models such as those from OpenAI, including the hypothetical GPT-5. These platforms stand out by providing flexibility, customization, and a collaborative development environment that potentially matches or exceeds proprietary models in innovation and engagement.
AutoGen offers a customizable agent framework that integrates seamlessly with existing tools and human input, enhancing its usability in complex multi-agent interactions. Devika excels as an AI software engineer capable of autonomous complex coding tasks, making it a robust tool for software development projects. CrewAI facilitates sophisticated multi-agent collaborations, allowing for diverse applications from content generation to data analysis.
The open-source nature of these tools ensures transparency and fosters rapid innovation through community contributions, providing a level of customization and adaptability that closed-source systems often lack. This model of development can lead to open-source agents being formidable competitors in the AI space by leveraging community-driven improvements and specific adaptability.
The open-source community continues to thrive and with the release of Llama 3, developers are even more excited. It’s hard to say what the next year will look like, but it seems that AI agents are the path to the next level of developer productivity.
Pieces for Developers
Pieces has several advantages as we head into an AI agent future. Thanks to its on-device, air-gapped nature, security concerns are almost non-existent. Their commitment to open source ensures that developers get a chance to adapt systems at scale. The speed of local inference running on Pieces OS feels faster than many data center-driven APIs.
Something new and exciting – Pieces’ Workstream Pattern Engine is launching and is a strong base for AI Agents to stand on. The engine knows everything you’re working on, everything you’ve worked on in the past, and one day will solve problems straight from your command line. Coupling this with a Devin-like AI Agent would allow for a massive acceleration in work.
If you’re excited about building on-device AI agents without the security concerns of cloud-based providers, check out the Pieces Python CLI open-source project. It’s currently in the early stages of development and with open-source contributions, seeks to be the central AI operating system for a world of AI agents.
Top comments (1)
Hey, check out also this open-source Code Interpreter SDK for AI agents:
github.com/e2b-dev/code-interpreter