DEV Community

baraneetharan
baraneetharan

Posted on

Building a Database Query Service with OpenAI and PostgreSQL in .NET

Generate SQL queries based on natural language input

In this blog post, we'll explore how to build a service that interacts with a PostgreSQL database and uses OpenAI's GPT-4 model to generate SQL queries based on natural language input. This service, called NorthwindServicefromDB, is designed to make it easier for users to query a database without needing to write SQL themselves. We'll walk through the code step-by-step, explaining each component and how it fits into the overall architecture.

Overview

The NorthwindServicefromDB service is a .NET class that provides a method called AnswerFromDB. This method takes a natural language query as input, generates a corresponding SQL query using OpenAI's GPT-4 model, executes the SQL query against a PostgreSQL database, and returns the results. The service is designed to work with the Northwind database, a sample database often used for learning and testing.

Key Features:

  • Natural Language to SQL Conversion: The service uses OpenAI's GPT-4 model to convert natural language queries into SQL queries.
  • Database Schema Retrieval: The service dynamically retrieves the schema of all tables in the database to provide context for the GPT-4 model.
  • SQL Query Execution: The service executes the generated SQL query and returns the results in a structured format.
  • Security: The service only allows SELECT queries to be executed, preventing any modifications to the database.

Code Walkthrough

Let's dive into the code and understand how each part works.

1. Setting Up the Environment

The service uses environment variables to securely store sensitive information, such as the API key for OpenAI. The DotNetEnv package is used to load these variables from a .env file.

Env.Load(".env");
string githubKey = Env.GetString("GITHUB_KEY");

Enter fullscreen mode Exit fullscreen mode

2. Initializing the OpenAI Chat Client

The service uses the AzureOpenAIClient to interact with OpenAI's GPT-4 model. The client is initialized with the API endpoint and the API key.

IChatClient client =
    new AzureOpenAIClient(
        new Uri("<https://models.inference.ai.azure.com>"),
        new AzureKeyCredential(githubKey))
    .AsChatClient(modelId: "gpt-4o-mini");

Enter fullscreen mode Exit fullscreen mode

3. Retrieving Database Schema

To generate accurate SQL queries, the service needs to know the structure of the database. The GetAllTableSchemas method retrieves the schema of all tables in the database.

static async Task<string> GetAllTableSchemas()
{
    using var connection = new NpgsqlConnection(_connectionString);
    await connection.OpenAsync();

    var tableNames = new List<string>();
    using (var command = new NpgsqlCommand("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'", connection))
    using (var reader = await command.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            tableNames.Add(reader["table_name"].ToString());
        }
    }

    var allSchemas = new StringBuilder();
    foreach (var tableName in tableNames)
    {
        allSchemas.AppendLine(await GetTableSchema(tableName));
    }

    return allSchemas.ToString();
}

Enter fullscreen mode Exit fullscreen mode

The GetTableSchema method retrieves the schema for a specific table, including the column names and data types.

static async Task<string> GetTableSchema(string tableName)
{
    using var connection = new NpgsqlConnection(_connectionString);
    await connection.OpenAsync();
    using var command = new NpgsqlCommand($"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{tableName}'", connection);
    using var reader = await command.ExecuteReaderAsync();
    var schema = new StringBuilder();
    schema.AppendLine($"Table: {tableName}");
    schema.AppendLine("Columns:");
    while (await reader.ReadAsync())
    {
        schema.AppendLine($"- {reader["column_name"]} ({reader["data_type"]})");
    }
    return schema.ToString();
}

Enter fullscreen mode Exit fullscreen mode

4. Generating SQL Queries with OpenAI

The AnswerFromDB method combines the database schema with the user's natural language query and sends it to the GPT-4 model to generate a SQL query.

var response = await client.CompleteAsync($"{allTableSchemas}\\n{query}");
var sqlQuery = ExtractSqlQuery(response.Message.Text);
Console.WriteLine("Generated Query: " + sqlQuery);

Enter fullscreen mode Exit fullscreen mode

The ExtractSqlQuery method extracts the SQL query from the model's response, which is expected to be enclosed in triple backticks.

static string ExtractSqlQuery(string response)
{
    var startIndex = response.IndexOf("```
{% endraw %}
sql", StringComparison.OrdinalIgnoreCase);
    if (startIndex == -1) return "";
    startIndex += 7; // Move past "
{% raw %}
```sql"
    var endIndex = response.IndexOf("```
{% endraw %}
", startIndex, StringComparison.OrdinalIgnoreCase);
    if (endIndex == -1) return "";
    return response.Substring(startIndex, endIndex - startIndex).Trim();
}
{% raw %}


Enter fullscreen mode Exit fullscreen mode

5. Executing the SQL Query

Once the SQL query is generated, the service checks if it is a SELECT query (to prevent any modifications to the database) and then executes it.



if (sqlQuery.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
{
    var result = await ExecuteQuery(sqlQuery);
    return $"Query result: {JsonSerializer.Serialize(result)}";
}
else
{
    return "Only SELECT queries are supported.";
}


Enter fullscreen mode Exit fullscreen mode

The ExecuteQuery method executes the SQL query and returns the results as a list of dictionaries, where each dictionary represents a row in the result set.



static async Task<List<Dictionary<string, object>>> ExecuteQuery(string query)
{
    using var connection = new NpgsqlConnection(_connectionString);
    await connection.OpenAsync();
    using var command = new NpgsqlCommand(query, connection);
    using var reader = await command.ExecuteReaderAsync();
    var results = new List<Dictionary<string, object>>();
    while (await reader.ReadAsync())
    {
        var row = new Dictionary<string, object>();
        for (int i = 0; i < reader.FieldCount; i++)
        {
            row[reader.GetName(i)] = reader.GetValue(i);
        }
        results.Add(row);
    }
    return results;
}



Enter fullscreen mode Exit fullscreen mode

Super Shop Bot

Conclusion

The NorthwindServicefromDB service is a powerful tool that bridges the gap between natural language and database queries. By leveraging OpenAI's GPT-4 model, it allows users to interact with a database using plain English, making it accessible to non-technical users. The service is designed with security in mind, ensuring that only read-only queries are executed.

Potential Enhancements:

  • Error Handling: Add more robust error handling to manage cases where the GPT-4 model generates invalid SQL queries.
  • Caching: Implement caching for the database schema to reduce the number of database calls.
  • User Authentication: Add user authentication and authorization to restrict access to the service.

This service is a great example of how AI can be integrated into traditional software development to create more intuitive and user-friendly applications. Whether you're building a business intelligence tool or a data exploration platform, this approach can significantly enhance the user experience.

Source code GitHub Link

Top comments (0)