DEV Community

Speak Your Queries: How Langchain Lets You Chat with Your Database

Ngonidzashe Nzenze on May 01, 2023

Imagine effortlessly conversing with your database as if it were a close friend, asking questions and receiving instant, accurate responses. Welcom...

Read full post

Michael Tharrington • May 1 '23

Wow, this is really cool and seems like it'd be quite helpful, especially for someone non-technical (like mysef!) who wants to query the database but doesn't know anything about writing queries. Thanks for sharing, Ngonidzashe!

Ngonidzashe Nzenze • May 1 '23

Glad you find it helpful! LangChain simplifies database interactions for everyone, including non-technical users. If you need any help, just ask. Enjoy querying! 😊

sajeev • May 14 '23

Great article. Its perfectly working for less number of tables.
Comments from my side:

For a small DB of students and mark - I got this error - group by syntax issue `which student scored the highest mark?

SQLQuery:SELECT "studentname", MAX("marks") FROM myschema.student_mark
INNER JOIN myschema.student ON myschema.student_mark."student_id" = myschema.student."student_id"
LIMIT 5;(psycopg2.errors.GroupingError) column "student.studentname" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "studentname", MAX("marks") FROM myschema.student_mar...
^
SQL: SELECT "studentname", MAX("marks") FROM myschema.student_mark
INNER JOIN myschema.student ON myschema.student_mark."student_id" = myschema.student."student_id"
LIMIT 5;`

For huge data set, _"# Setup database" _ was taking time(was waiting more than 3 hrs for a DB with 1200+ tables and 10+ years if enterprise ERP data). What is the best recommended system spec for huge dataset?

Ngonidzashe Nzenze • May 17 '23

Thank you for your feedback and questions. I'm glad you found the article helpful. I understand you encountered some issues when working with larger datasets. Let me attempt to address your concerns.

By default, when you initialize your open AI language model, it uses the text-davinci-003 model. When I have been running some of the queries using this model, I would get syntax errors but they disappeared the moment I specified the model name to gpt-3.5-turbo:

I have updated the article to use ChatOpenAI which uses gpt-3.5-turbo by default. You can simply change the llm variable to:
llm = ChatOpenAI(temperature=0, openai_api_key=API_KEY, model_name='gpt-3.5-turbo')

A system with a large amount of RAM and a strong CPU is likely required for a dataset with 1200+ tables and 10+ years of business ERP data. You may also think about utilizing a distributed database system, which can manage huge amounts of data.

I hope this helps.

I love integers • May 3 '23

For people who are interested how this is done, there are the prompts used to design the chain: github.com/hwchase17/langchain/blo...

seck saliou • May 8 '23

great article very helpfull Thank's very much

Ngonidzashe Nzenze • May 17 '23

I'm glad you liked it!

Daniel Villacis • May 30 '23

Hi, have you done the same but using a MS SQL Database? I'm trying to do it using the AdventureWorksLT demo db, and everything go well until I try to run the chain.

Here is my code:

connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=server.database.windows.net;DATABASE=AdventureWorksLT;UID=user;PWD=password"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
db = SQLDatabase.from_uri(connection_url)
llm = ChatOpenAI(openai_api_key=key,temperature=0, verbose=True, model_name="gpt-3.5-turbo")
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
response = db_chain.run("Describe the Category table")

I got an InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 5190 tokens. Please reduce the length of the messages.

The AdventureWorksLT is a relative small db, not sure why is using that amount of tokens.

Any advice will be greatly appreciated.
Best!

Kaushik Ganesan • Jun 10 '23

I have the same problem

Kaushik Ganesan • Jun 10 '23 • Edited

I was able to resolve the issue by including only a few tables. For example in the below connection use include_tables to resolve this issue.

db = SQLDatabase.from_uri(uri, include_tables=["table_name"])

Nikk • Jan 25 '24

Hi,
Have you been able to include views anyhow?

Rakshith Raj N • Aug 8 '23

Hey,
I'm getting the following error, could any one please help me knowing the reason.

(pyodbc.OperationalError) ('08001', '[08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (17) (SQLDriverConnect); [08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (53)')

Below is my code.
bd=SQLDatabase.from_uri(f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}")

Dhanashree Mhatre • Jul 31 '24

uri = f'mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver.replace("{", "").replace("}", "")}'
this works for check if it works

Prameela1610 • Jul 7 '23

SQLQuery:This model's maximum context length is 4097 tokens. However, your messages resulted in 23032 tokens. Please reduce the length of the messages.
I go the above error while asking simple question like count of table .

saptarshitiger • Jun 16 '23

Great work! Have you tried adding memory to the database chain as well ? So that it can refer back to the previous queries which the user asked and create a result combination of the chat history and the new question asked, for me in context of a database memory does not seem to work properly, that is why was curious about the same

Deepak Kashyap • Jun 7 '23

Hey Ngonidzashe,
I am using SQL agent from langchain, for some context I have a large postgres data source. Upon asking questions that might involve joining tables, ordering and filtering. The agent gets hit to its maximum iterations. I am currently testing on my laptop that has 8GB of Ram and a decent CPU. Can you suggest me something that could help me to avoid these errors

Nirav • Jan 11 '24

can we do this with Oracle database? can you please guide me on this as I have one use case for one of our client