Generate SQL queries based on natural language input
In this blog post, we'll explore how to build a service that interacts with a PostgreSQL database and uses OpenAI's GPT-4 model to generate SQL queries based on natural language input. This service, called NorthwindServicefromDB
, is designed to make it easier for users to query a database without needing to write SQL themselves. We'll walk through the code step-by-step, explaining each component and how it fits into the overall architecture.
Overview
The NorthwindServicefromDB
service is a .NET class that provides a method called AnswerFromDB
. This method takes a natural language query as input, generates a corresponding SQL query using OpenAI's GPT-4 model, executes the SQL query against a PostgreSQL database, and returns the results. The service is designed to work with the Northwind database, a sample database often used for learning and testing.
Key Features:
- Natural Language to SQL Conversion: The service uses OpenAI's GPT-4 model to convert natural language queries into SQL queries.
- Database Schema Retrieval: The service dynamically retrieves the schema of all tables in the database to provide context for the GPT-4 model.
- SQL Query Execution: The service executes the generated SQL query and returns the results in a structured format.
-
Security: The service only allows
SELECT
queries to be executed, preventing any modifications to the database.
Code Walkthrough
Let's dive into the code and understand how each part works.
1. Setting Up the Environment
The service uses environment variables to securely store sensitive information, such as the API key for OpenAI. The DotNetEnv
package is used to load these variables from a .env
file.
Env.Load(".env");
string githubKey = Env.GetString("GITHUB_KEY");
2. Initializing the OpenAI Chat Client
The service uses the AzureOpenAIClient
to interact with OpenAI's GPT-4 model. The client is initialized with the API endpoint and the API key.
IChatClient client =
new AzureOpenAIClient(
new Uri("<https://models.inference.ai.azure.com>"),
new AzureKeyCredential(githubKey))
.AsChatClient(modelId: "gpt-4o-mini");
3. Retrieving Database Schema
To generate accurate SQL queries, the service needs to know the structure of the database. The GetAllTableSchemas
method retrieves the schema of all tables in the database.
static async Task<string> GetAllTableSchemas()
{
using var connection = new NpgsqlConnection(_connectionString);
await connection.OpenAsync();
var tableNames = new List<string>();
using (var command = new NpgsqlCommand("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'", connection))
using (var reader = await command.ExecuteReaderAsync())
{
while (await reader.ReadAsync())
{
tableNames.Add(reader["table_name"].ToString());
}
}
var allSchemas = new StringBuilder();
foreach (var tableName in tableNames)
{
allSchemas.AppendLine(await GetTableSchema(tableName));
}
return allSchemas.ToString();
}
The GetTableSchema
method retrieves the schema for a specific table, including the column names and data types.
static async Task<string> GetTableSchema(string tableName)
{
using var connection = new NpgsqlConnection(_connectionString);
await connection.OpenAsync();
using var command = new NpgsqlCommand($"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{tableName}'", connection);
using var reader = await command.ExecuteReaderAsync();
var schema = new StringBuilder();
schema.AppendLine($"Table: {tableName}");
schema.AppendLine("Columns:");
while (await reader.ReadAsync())
{
schema.AppendLine($"- {reader["column_name"]} ({reader["data_type"]})");
}
return schema.ToString();
}
4. Generating SQL Queries with OpenAI
The AnswerFromDB
method combines the database schema with the user's natural language query and sends it to the GPT-4 model to generate a SQL query.
var response = await client.CompleteAsync($"{allTableSchemas}\\n{query}");
var sqlQuery = ExtractSqlQuery(response.Message.Text);
Console.WriteLine("Generated Query: " + sqlQuery);
The ExtractSqlQuery
method extracts the SQL query from the model's response, which is expected to be enclosed in triple backticks.
static string ExtractSqlQuery(string response)
{
var startIndex = response.IndexOf("```
{% endraw %}
sql", StringComparison.OrdinalIgnoreCase);
if (startIndex == -1) return "";
startIndex += 7; // Move past "
{% raw %}
```sql"
var endIndex = response.IndexOf("```
{% endraw %}
", startIndex, StringComparison.OrdinalIgnoreCase);
if (endIndex == -1) return "";
return response.Substring(startIndex, endIndex - startIndex).Trim();
}
{% raw %}
5. Executing the SQL Query
Once the SQL query is generated, the service checks if it is a SELECT
query (to prevent any modifications to the database) and then executes it.
if (sqlQuery.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
{
var result = await ExecuteQuery(sqlQuery);
return $"Query result: {JsonSerializer.Serialize(result)}";
}
else
{
return "Only SELECT queries are supported.";
}
The ExecuteQuery
method executes the SQL query and returns the results as a list of dictionaries, where each dictionary represents a row in the result set.
static async Task<List<Dictionary<string, object>>> ExecuteQuery(string query)
{
using var connection = new NpgsqlConnection(_connectionString);
await connection.OpenAsync();
using var command = new NpgsqlCommand(query, connection);
using var reader = await command.ExecuteReaderAsync();
var results = new List<Dictionary<string, object>>();
while (await reader.ReadAsync())
{
var row = new Dictionary<string, object>();
for (int i = 0; i < reader.FieldCount; i++)
{
row[reader.GetName(i)] = reader.GetValue(i);
}
results.Add(row);
}
return results;
}
Conclusion
The NorthwindServicefromDB
service is a powerful tool that bridges the gap between natural language and database queries. By leveraging OpenAI's GPT-4 model, it allows users to interact with a database using plain English, making it accessible to non-technical users. The service is designed with security in mind, ensuring that only read-only queries are executed.
Potential Enhancements:
- Error Handling: Add more robust error handling to manage cases where the GPT-4 model generates invalid SQL queries.
- Caching: Implement caching for the database schema to reduce the number of database calls.
- User Authentication: Add user authentication and authorization to restrict access to the service.
This service is a great example of how AI can be integrated into traditional software development to create more intuitive and user-friendly applications. Whether you're building a business intelligence tool or a data exploration platform, this approach can significantly enhance the user experience.
Top comments (0)