DEV Community

Naoto Shibata for Morph

Posted on • Originally published at Medium on

Tutorial: Building SQL Agent with Langchain, OpenAI and DuckDB.

We will explain how to implement an SQL Agent using LangChain, OpenAI API, and DuckDB , and how to turn it into an application with Morph . This app will generate SQL queries using an LLM, execute them in DuckDB, and use the results to answer user questions.

LangChain is an excellent framework equipped with components and third-party integrations for developing applications that leverage LLMs.

In this article, we will build an AI workflow using LangChain and construct an AI agent workflow by issuing SQL queries on CSV data with DuckDB.

This time, we will implement an agent that performs SQL-based Q&A on demo data containing web advertisement traffic and order performance from the following CSV file.

Traffic_Orders_Demo_Data.csv

The final output of this tutorial is available in the SQL Agent App in the following repository.

https://github.com/morph-data/morph-sample-apps

Setup

Use the morph new to create a new project. It is recommended to select Poetry as the package manager.

morph new sql-agent-app
Enter fullscreen mode Exit fullscreen mode

In pyproject.toml, set the dependencies as follows. Since DuckDB execution is handled by execute_sql included in morph-data, there is no need to install it separately.

[tool.poetry.dependencies]
python = "<3.13,>=3.9"
morph-data = "0.2.0"
langchain = "0.3.16"
langchain-core = "0.3.32"
langchain-openai = "0.3.2"
Enter fullscreen mode Exit fullscreen mode

In this tutorial, set API key in .env for OpenAI API usage.

OPENAI_API_KEY=[Put your API KEY]
Enter fullscreen mode Exit fullscreen mode

Please save the CSV data for demo data in data/Traffic_Orders_Demo_Data.csv.

Buidling frontend

With Morph, you can build frontend using mdx files in src/pages .

This time, we will use the component to create a chat interface built using LangChain. Since we want to view the data while asking questions, we display the data in a table format using side by side.

We also adjust the layout using and Tailwind CSS utility classes.

# Q&A Agent over SQL

You can ask questions to the SQL Agent about the data and get the answer in SQL.
Only admin user can ask questions to the SQL Agent.
p
<Grid cols={2} className="py-4">
    <div>
        <DataTable loadData="example_data" height={400} />
    </div>
    <div className="p-4 bg-gray-50 rounded-lg shadow-sm border border-gray-200 hover:shadow-md transition-shadow">
        <Chat postData="sql_agent" />
    </div>
</Grid>
Enter fullscreen mode Exit fullscreen mode

You can check the pages by accessing to http://localhost:8080 after running morph serve in terminal.

frontend

Building Q&A Agent with Text-to-SQL Using LangChain

We will use LangChain’s Runnable API and StructuredOutputParser to generate the necessary SQL queries to answer user questions. By executing these SQL queries with DuckDB and using the results, we will build an AI agent that answers user questions by leveraging the Runnable API again.

In MDX, the component automatically sends prompt and thread_id:

  • prompt : The message entered by the user.
  • thread_id : A unique identifier for the thread displayed in the chat UI (not used in this implementation).

By receiving the prompt and using it in the messages parameter for LangChain’s invoke function, we enable the interaction between the AI agent and the front-end.

Generating SQL Queries and Retrieving Data Based on User Questions in LangChain

First, we create a function decorated with @morph.func so that it can be handled within the Morph framework. We then initialize an instance of LangChain’s OpenAI model.

import morph
from morph import MorphGlobalContext
from morph_lib.types import MorphChatStreamChunk
from langchain_openai import ChatOpenAI

@morph.func
def sql_agent(
    context: MorphGlobalContext,
) -> Generator[MorphChatStreamChunk, None, None]:
    chat = ChatOpenAI(
        model="gpt-4o",
        temperature=1,
        streaming=False
    )
Enter fullscreen mode Exit fullscreen mode

For the CSV file used in this implementation, we specify the file path and data description in the system prompt. By providing format_instructions, we enable StructuredOutputParser to define the SQL output format.

SYSTEM_TEMPLATE = """Please execute SQL queries on a table named `./data/Traffic_Orders_Demo_Data.csv` in DuckDB with the following schema:
date: text - date
source: text - traffic source (Coupon, Google Organic など)
traffic: int - traffic count
orders: int - order count

This table contains traffic and order data for the marketing campaigns.

As a source, you have the following data:
- Coupon
- Google Organic
- Google Paid
- TikTok Ads
- Meta Ads
- Referral

Generate a SQL query to answer the user's question.
{format_instructions}
"""
Enter fullscreen mode Exit fullscreen mode

We use StructuredOutputParser to make the generated text parsable as structured data (JSON).

Then, leveraging LangChain’s Runnable API , we build a data flow for generating SQL queries.

# Setup the SQL query output parser
sql_schema = ResponseSchema(name="sql", description="The SQL query to execute")
output_parser = StructuredOutputParser.from_response_schemas([sql_schema])
format_instructions = output_parser.get_format_instructions()

prompt = ChatPromptTemplate.from_messages([
    ("system", SYSTEM_TEMPLATE),
    ("human", "{question}")
])
# Create the chain for SQL generation
chain = (
    {"question": RunnablePassthrough(), "format_instructions": lambda _: format_instructions}
    | prompt
    | chat
    | StrOutputParser()
    | output_parser
)
Enter fullscreen mode Exit fullscreen mode

Execute the created data flow to generate SQL. The generated SQL is streamed back to the chat using the stream_chat function.

from morph_lib.stream import stream_chat

# Generate SQL query
result = chain.invoke(context.vars["prompt"])
sql = result["sql"]
        # display generated sql
yield stream_chat(f"""
### SQL
{sql}
""")
Enter fullscreen mode Exit fullscreen mode

Next, execute the generated SQL. By specifying “DUCKDB” as the second argument of execute_sql, we can use DuckDB as the engine.

The execution result is returned as a DataFrame, which is then converted into a Markdown string to be passed to the LLM.

from morph_lib.database import execute_sql

# Execute SQL and get results
data = execute_sql(sql, "DUCKDB")
data_md = data.to_markdown(index=False)
Enter fullscreen mode Exit fullscreen mode

Generating Responses Using SQL Results

Using the retrieved results, we leverage LangChain’s Runnable API again to create an agent that constructs prompts, processes questions, and streams responses to answer user queries.

# Create analysis prompt with results
analysis_prompt = ChatPromptTemplate.from_messages([
    ("system", f"""Please answer in markdown format.
You can use the following data:
{data_md}
The data is from the following SQL query:
{sql}
"""),
    ("human", "{question}")
])

# Create analysis chain
analysis_chain = (
    {"question": RunnablePassthrough()}
    | analysis_prompt
    | chat
)
# stream analysis result
for chunk in analysis_chain.stream(context.vars["prompt"]):
    if chunk.content:
        yield stream_chat(chunk.content)
Enter fullscreen mode Exit fullscreen mode

All Python code

from typing import Generator

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from morph_lib.database import execute_sql
from morph_lib.stream import stream_chat
from morph_lib.types import MorphChatStreamChunk

import morph
from morph import MorphGlobalContext

SYSTEM_TEMPLATE = """Please execute SQL queries on a table named `./data/Traffic_Orders_Demo_Data.csv` in DuckDB with the following schema:
date: text - date
source: text - traffic source (Coupon, Google Organic など)
traffic: int - traffic count
orders: int - order count

This table contains traffic and order data for the marketing campaigns.

As a source, you have the following data:
- Coupon
- Google Organic
- Google Paid
- TikTok Ads
- Meta Ads
- Referral

Generate a SQL query to answer the user's question.
{format_instructions}
"""

@morph.func
def sql_agent(
    context: MorphGlobalContext,
) -> Generator[MorphChatStreamChunk, None, None]:
    chat = ChatOpenAI(
        model="gpt-4o",
        temperature=0,
        streaming=False,
    )

    # Setup the SQL query output parser
    sql_schema = ResponseSchema(name="sql", description="The SQL query to execute")
    output_parser = StructuredOutputParser.from_response_schemas([sql_schema])
    format_instructions = output_parser.get_format_instructions()

    prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_TEMPLATE),
        ("human", "{question}")
    ])

    # Create the chain for SQL generation
    chain = (
        {"question": RunnablePassthrough(), "format_instructions": lambda _: format_instructions}
        | prompt
        | chat
        | StrOutputParser()
        | output_parser
    )

    # Generate SQL query
    result = chain.invoke(context.vars["prompt"])
    sql = result["sql"]
    # display generated sql
    yield stream_chat(f"""
### SQL
{sql}
""")

    # Execute SQL and get results
    data = execute_sql(sql, "DUCKDB")
    data_md = data.to_markdown(index=False)
    # Create analysis prompt with results
    analysis_prompt = ChatPromptTemplate.from_messages([
        ("system", f"""Please answer in markdown format.
You can use the following data:
{data_md}

The data is from the following SQL query:
{sql}
"""),
        ("human", "{question}")
    ])

    # Create analysis chain
    analysis_chain = (
        {"question": RunnablePassthrough()}
        | analysis_prompt
        | chat
    )

    # stream analysis result
    for chunk in analysis_chain.stream(context.vars["prompt"]):
        if chunk.content:
            yield stream_chat(chunk.content)
Enter fullscreen mode Exit fullscreen mode

Deploying and Sharing the App

Morph’s dashboard supports automated deployment using GitHub integration.

By pushing your code to GitHub and selecting the repository from the dashboard, you can deploy the app as an internal application with authentication enabled automatically.

Reference:

https://docs.morph-data.io/docs/en/quickstart/deploy

Once the deployment is complete, access the app from the “Open” button in the dashboard. You can easily share the SQL Agent app with your team!

frontend-result


Top comments (0)