DEV Community

Cover image for Building a Multi-Source AI Agent: Bridging Databases, APIs, and AI Models
Burhanuddin Ahmed
Burhanuddin Ahmed

Posted on

Building a Multi-Source AI Agent: Bridging Databases, APIs, and AI Models

We are experimenting to build a POC to generate chart raw data automatically from a datasource without framework like Langchain, Langgraph, etc. Under the hood, it's just a simple workflow that will involve some processes like inserting the raw data, meta data, fetching both of raw and meta data, interacting with LLM APIs, and showing the response.

We are using Gemini (Vertex AI) for the LLM and Python (FastAPI) as the main weapon.

what is multi-source AI agent?

An AI agent is a system that can do some process to perform one or multiple task for human automatically with less human intervention.

While multi-source here will refer to the entity that will be involved in my workflow process, whether it's an insertion or fetch process.

Here, the entities that will be used in my experiment.

  • MySQL for storing metadata
  • MySQL for storing raw data
  • Vertex AI
  • Our own FastAPI's APIs

The Architecture

image of multi source workflow

Let's assume, We have two separated workflows, uploader API is for uploading raw data such as CSV file while fetcher API is for getting the raw data for chart which already customized based on AI suggested insight.

Showing data into a chart needs to have predefined purpose, impossible to show all random raw data to a chart without any purpose of what insight we might get.

Insight means like top sales in a month, number of GMV per month in a year, etc. Those thing will be determined by AI agent.

Uploader API

This API is responsible to interact with LLM API and two databases to ingest the raw data into an internal data source. This approach might have a drawback like cost, table and databases maintenance but loading the external data into my own database will make me easier when dealing with huge data and easier to query the data using SQL.

Another alternative besides using MySQL is using Columnar DB, NoSQL or using memory database (but I need the data to be persisted). Each choice will have its own drawback.

Basically, after uploading a CSV, it will create own table in MySQL for each CSV. The system will fist determine the data schema of the file and use it as the reference to create a table.

Each table created will have different schema based on the uploaded CSV file. So We can filter the data and column from database level.

- table_csv_1 
- table_csv_2
- table_csv_3
etc
Enter fullscreen mode Exit fullscreen mode

Another options is by storing the whole CSV data in a column as a raw data or as a JSON data. We didn't see this option as a good approach as it will be difficult to maintain the data particularly when dealing with large datasets, because data processing like filtering the data and column will be occurred in the application logic level.

|ID|entity_name       |raw_data     |created_at
|1 |the name of entity|{...raw data}|2024-11-12
Enter fullscreen mode Exit fullscreen mode

In short, the processes are,

  • Upload a file
  • Get a few rows for the sample of the data
  • throw into LLM to determine the schema for this CSV
  • save the schema in MySQL for meta data, later We can use it for LLM prompt
  • perform CREATE TABLE based on the schema
  • insert the data into newly created table

Fetcher API

If uploader is used for a CSV data upload, fetcher will responsible for generating the data that will be ready to be used by Frontend chart.

The process will involve a process like determine the insight based on the dataset schema and generate the query based for the insight.

Process:

  • fetch meta data based on entity ID, here we will get the data such as schema, table name, and other meta data.
  • call LLM API with data schema included as the prompt
  • the LLM will give us the query that we can use it to query into our raw data

Fetcher flow

result and challenge

At the end, fetcher API will respond with data such as this for example:

{
  "data": [
    {
      "date": "2024-01-05",
      "total_sales": 10000
    },
    {
      "date": "2024-01-10",
      "total_sales": 90000
    },
    {
      "date": "2024-01-17",
      "total_sales": 92000
    }
  ],
  "insight": "sales per day",
  "summary": "Lorem imsum, lorem ipsum"
}
Enter fullscreen mode Exit fullscreen mode

Later you can use those response to generate chart using chart library you like.

AI can hallucinate and can provide incorrect information. Misleading data schema or SQL query usually are the most common things that LLM responses provide incorrectly. It often causes a runtime error because of column not found or wrong data format.

Hello, My name is Burhan. A software engineer who love thinkering about product building, software performance, and islamic finance.

Let's connect on Linkedin https://www.linkedin.com/in/burhanahmeed/

Top comments (0)