We will explain how to implement an SQL Agent using LangChain, OpenAI API, and DuckDB , and how to turn it into an application with Morph . This app will generate SQL queries using an LLM, execute them in DuckDB, and use the results to answer user questions.
LangChain is an excellent framework equipped with components and third-party integrations for developing applications that leverage LLMs.
In this article, we will build an AI workflow using LangChain and construct an AI agent workflow by issuing SQL queries on CSV data with DuckDB.
This time, we will implement an agent that performs SQL-based Q&A on demo data containing web advertisement traffic and order performance from the following CSV file.
The final output of this tutorial is available in the SQL Agent App in the following repository.
https://github.com/morph-data/morph-sample-apps
Setup
Use the morph new to create a new project. It is recommended to select Poetry as the package manager.
morph new sql-agent-app
In pyproject.toml, set the dependencies as follows. Since DuckDB execution is handled by execute_sql included in morph-data, there is no need to install it separately.
[tool.poetry.dependencies]
python = "<3.13,>=3.9"
morph-data = "0.2.0"
langchain = "0.3.16"
langchain-core = "0.3.32"
langchain-openai = "0.3.2"
In this tutorial, set API key in .env for OpenAI API usage.
OPENAI_API_KEY=[Put your API KEY]
Please save the CSV data for demo data in data/Traffic_Orders_Demo_Data.csv.
Buidling frontend
With Morph, you can build frontend using mdx files in src/pages .
This time, we will use the component to create a chat interface built using LangChain. Since we want to view the data while asking questions, we display the data in a table format using side by side.
We also adjust the layout using and Tailwind CSS utility classes.
# Q&A Agent over SQL
You can ask questions to the SQL Agent about the data and get the answer in SQL.
Only admin user can ask questions to the SQL Agent.
p
<Grid cols={2} className="py-4">
<div>
<DataTable loadData="example_data" height={400} />
</div>
<div className="p-4 bg-gray-50 rounded-lg shadow-sm border border-gray-200 hover:shadow-md transition-shadow">
<Chat postData="sql_agent" />
</div>
</Grid>
You can check the pages by accessing to http://localhost:8080 after running morph serve in terminal.
Building Q&A Agent with Text-to-SQL Using LangChain
We will use LangChain’s Runnable API and StructuredOutputParser to generate the necessary SQL queries to answer user questions. By executing these SQL queries with DuckDB and using the results, we will build an AI agent that answers user questions by leveraging the Runnable API again.
In MDX, the component automatically sends prompt and thread_id:
- prompt : The message entered by the user.
- thread_id : A unique identifier for the thread displayed in the chat UI (not used in this implementation).
By receiving the prompt and using it in the messages parameter for LangChain’s invoke function, we enable the interaction between the AI agent and the front-end.
Generating SQL Queries and Retrieving Data Based on User Questions in LangChain
First, we create a function decorated with @morph.func so that it can be handled within the Morph framework. We then initialize an instance of LangChain’s OpenAI model.
import morph
from morph import MorphGlobalContext
from morph_lib.types import MorphChatStreamChunk
from langchain_openai import ChatOpenAI
@morph.func
def sql_agent(
context: MorphGlobalContext,
) -> Generator[MorphChatStreamChunk, None, None]:
chat = ChatOpenAI(
model="gpt-4o",
temperature=1,
streaming=False
)
For the CSV file used in this implementation, we specify the file path and data description in the system prompt. By providing format_instructions, we enable StructuredOutputParser to define the SQL output format.
SYSTEM_TEMPLATE = """Please execute SQL queries on a table named `./data/Traffic_Orders_Demo_Data.csv` in DuckDB with the following schema:
date: text - date
source: text - traffic source (Coupon, Google Organic など)
traffic: int - traffic count
orders: int - order count
This table contains traffic and order data for the marketing campaigns.
As a source, you have the following data:
- Coupon
- Google Organic
- Google Paid
- TikTok Ads
- Meta Ads
- Referral
Generate a SQL query to answer the user's question.
{format_instructions}
"""
We use StructuredOutputParser to make the generated text parsable as structured data (JSON).
Then, leveraging LangChain’s Runnable API , we build a data flow for generating SQL queries.
# Setup the SQL query output parser
sql_schema = ResponseSchema(name="sql", description="The SQL query to execute")
output_parser = StructuredOutputParser.from_response_schemas([sql_schema])
format_instructions = output_parser.get_format_instructions()
prompt = ChatPromptTemplate.from_messages([
("system", SYSTEM_TEMPLATE),
("human", "{question}")
])
# Create the chain for SQL generation
chain = (
{"question": RunnablePassthrough(), "format_instructions": lambda _: format_instructions}
| prompt
| chat
| StrOutputParser()
| output_parser
)
Execute the created data flow to generate SQL. The generated SQL is streamed back to the chat using the stream_chat function.
from morph_lib.stream import stream_chat
# Generate SQL query
result = chain.invoke(context.vars["prompt"])
sql = result["sql"]
# display generated sql
yield stream_chat(f"""
### SQL
{sql}
""")
Next, execute the generated SQL. By specifying “DUCKDB” as the second argument of execute_sql, we can use DuckDB as the engine.
The execution result is returned as a DataFrame, which is then converted into a Markdown string to be passed to the LLM.
from morph_lib.database import execute_sql
# Execute SQL and get results
data = execute_sql(sql, "DUCKDB")
data_md = data.to_markdown(index=False)
Generating Responses Using SQL Results
Using the retrieved results, we leverage LangChain’s Runnable API again to create an agent that constructs prompts, processes questions, and streams responses to answer user queries.
# Create analysis prompt with results
analysis_prompt = ChatPromptTemplate.from_messages([
("system", f"""Please answer in markdown format.
You can use the following data:
{data_md}
The data is from the following SQL query:
{sql}
"""),
("human", "{question}")
])
# Create analysis chain
analysis_chain = (
{"question": RunnablePassthrough()}
| analysis_prompt
| chat
)
# stream analysis result
for chunk in analysis_chain.stream(context.vars["prompt"]):
if chunk.content:
yield stream_chat(chunk.content)
All Python code
from typing import Generator
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from morph_lib.database import execute_sql
from morph_lib.stream import stream_chat
from morph_lib.types import MorphChatStreamChunk
import morph
from morph import MorphGlobalContext
SYSTEM_TEMPLATE = """Please execute SQL queries on a table named `./data/Traffic_Orders_Demo_Data.csv` in DuckDB with the following schema:
date: text - date
source: text - traffic source (Coupon, Google Organic など)
traffic: int - traffic count
orders: int - order count
This table contains traffic and order data for the marketing campaigns.
As a source, you have the following data:
- Coupon
- Google Organic
- Google Paid
- TikTok Ads
- Meta Ads
- Referral
Generate a SQL query to answer the user's question.
{format_instructions}
"""
@morph.func
def sql_agent(
context: MorphGlobalContext,
) -> Generator[MorphChatStreamChunk, None, None]:
chat = ChatOpenAI(
model="gpt-4o",
temperature=0,
streaming=False,
)
# Setup the SQL query output parser
sql_schema = ResponseSchema(name="sql", description="The SQL query to execute")
output_parser = StructuredOutputParser.from_response_schemas([sql_schema])
format_instructions = output_parser.get_format_instructions()
prompt = ChatPromptTemplate.from_messages([
("system", SYSTEM_TEMPLATE),
("human", "{question}")
])
# Create the chain for SQL generation
chain = (
{"question": RunnablePassthrough(), "format_instructions": lambda _: format_instructions}
| prompt
| chat
| StrOutputParser()
| output_parser
)
# Generate SQL query
result = chain.invoke(context.vars["prompt"])
sql = result["sql"]
# display generated sql
yield stream_chat(f"""
### SQL
{sql}
""")
# Execute SQL and get results
data = execute_sql(sql, "DUCKDB")
data_md = data.to_markdown(index=False)
# Create analysis prompt with results
analysis_prompt = ChatPromptTemplate.from_messages([
("system", f"""Please answer in markdown format.
You can use the following data:
{data_md}
The data is from the following SQL query:
{sql}
"""),
("human", "{question}")
])
# Create analysis chain
analysis_chain = (
{"question": RunnablePassthrough()}
| analysis_prompt
| chat
)
# stream analysis result
for chunk in analysis_chain.stream(context.vars["prompt"]):
if chunk.content:
yield stream_chat(chunk.content)
Deploying and Sharing the App
Morph’s dashboard supports automated deployment using GitHub integration.
By pushing your code to GitHub and selecting the repository from the dashboard, you can deploy the app as an internal application with authentication enabled automatically.
Reference:
https://docs.morph-data.io/docs/en/quickstart/deploy
Once the deployment is complete, access the app from the “Open” button in the dashboard. You can easily share the SQL Agent app with your team!
Top comments (0)