DEV Community

Speak Your Queries: How Langchain Lets You Chat with Your Database

Ngonidzashe Nzenze on May 01, 2023

Imagine effortlessly conversing with your database as if it were a close friend, asking questions and receiving instant, accurate responses. Welcom...
Collapse
 
michaeltharrington profile image
Michael Tharrington

Wow, this is really cool and seems like it'd be quite helpful, especially for someone non-technical (like mysef!) who wants to query the database but doesn't know anything about writing queries. Thanks for sharing, Ngonidzashe!

Collapse
 
ngonidzashe profile image
Ngonidzashe Nzenze

Glad you find it helpful! LangChain simplifies database interactions for everyone, including non-technical users. If you need any help, just ask. Enjoy querying! ๐Ÿ˜Š

Collapse
 
sajeevsahadev profile image
sajeev

Great article. Its perfectly working for less number of tables.
Comments from my side:

  • For a small DB of students and mark - I got this error - group by syntax issue `which student scored the highest mark?

SQLQuery:SELECT "studentname", MAX("marks") FROM myschema.student_mark
INNER JOIN myschema.student ON myschema.student_mark."student_id" = myschema.student."student_id"
LIMIT 5;(psycopg2.errors.GroupingError) column "student.studentname" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "studentname", MAX("marks") FROM myschema.student_mar...
^
SQL: SELECT "studentname", MAX("marks") FROM myschema.student_mark
INNER JOIN myschema.student ON myschema.student_mark."student_id" = myschema.student."student_id"
LIMIT 5;
`

  • For huge data set, _"# Setup database" _ was taking time(was waiting more than 3 hrs for a DB with 1200+ tables and 10+ years if enterprise ERP data). What is the best recommended system spec for huge dataset?
Collapse
 
ngonidzashe profile image
Ngonidzashe Nzenze

Thank you for your feedback and questions. I'm glad you found the article helpful. I understand you encountered some issues when working with larger datasets. Let me attempt to address your concerns.

By default, when you initialize your open AI language model, it uses the text-davinci-003 model. When I have been running some of the queries using this model, I would get syntax errors but they disappeared the moment I specified the model name to gpt-3.5-turbo:

I have updated the article to use ChatOpenAI which uses gpt-3.5-turbo by default. You can simply change the llm variable to:
llm = ChatOpenAI(temperature=0, openai_api_key=API_KEY, model_name='gpt-3.5-turbo')

A system with a large amount of RAM and a strong CPU is likely required for a dataset with 1200+ tables and 10+ years of business ERP data. You may also think about utilizing a distributed database system, which can manage huge amounts of data.

I hope this helps.

Collapse
 
yuhuishishishi profile image
I love integers

For people who are interested how this is done, there are the prompts used to design the chain: github.com/hwchase17/langchain/blo...

Collapse
 
saliouseck2009 profile image
seck saliou

great article very helpfull Thank's very much

Collapse
 
ngonidzashe profile image
Ngonidzashe Nzenze

I'm glad you liked it!

Collapse
 
danielvillacis profile image
Daniel Villacis

Hi, have you done the same but using a MS SQL Database? I'm trying to do it using the AdventureWorksLT demo db, and everything go well until I try to run the chain.

Here is my code:

connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=server.database.windows.net;DATABASE=AdventureWorksLT;UID=user;PWD=password"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
db = SQLDatabase.from_uri(connection_url)
llm = ChatOpenAI(openai_api_key=key,temperature=0, verbose=True, model_name="gpt-3.5-turbo")
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
response = db_chain.run("Describe the Category table")

I got an InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 5190 tokens. Please reduce the length of the messages.

The AdventureWorksLT is a relative small db, not sure why is using that amount of tokens.

Any advice will be greatly appreciated.
Best!

Collapse
 
kaushik_ram_g profile image
Kaushik Ganesan

I have the same problem

Collapse
 
kaushik_ram_g profile image
Kaushik Ganesan • Edited

I was able to resolve the issue by including only a few tables. For example in the below connection use include_tables to resolve this issue.

db = SQLDatabase.from_uri(uri, include_tables=["table_name"])

Thread Thread
 
nikk189 profile image
Nikk

Hi,
Have you been able to include views anyhow?

Collapse
 
raksh19 profile image
Rakshith Raj N

Hey,
I'm getting the following error, could any one please help me knowing the reason.

(pyodbc.OperationalError) ('08001', '[08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (17) (SQLDriverConnect); [08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (53)')

Below is my code.
bd=SQLDatabase.from_uri(f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}")

Collapse
 
dhanashree_mhatre_96a668d profile image
Dhanashree Mhatre

uri = f'mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver.replace("{", "").replace("}", "")}'
this works for check if it works

Collapse
 
prameela1610 profile image
Prameela1610

SQLQuery:This model's maximum context length is 4097 tokens. However, your messages resulted in 23032 tokens. Please reduce the length of the messages.
I go the above error while asking simple question like count of table .

Collapse
 
saptarshitiger profile image
saptarshitiger

Great work! Have you tried adding memory to the database chain as well ? So that it can refer back to the previous queries which the user asked and create a result combination of the chat history and the new question asked, for me in context of a database memory does not seem to work properly, that is why was curious about the same

Collapse
 
deepakkashyap3013 profile image
Deepak Kashyap

Hey Ngonidzashe,
I am using SQL agent from langchain, for some context I have a large postgres data source. Upon asking questions that might involve joining tables, ordering and filtering. The agent gets hit to its maximum iterations. I am currently testing on my laptop that has 8GB of Ram and a decent CPU. Can you suggest me something that could help me to avoid these errors

Collapse
 
shahn33 profile image
Nirav

can we do this with Oracle database? can you please guide me on this as I have one use case for one of our client