DEV Community

Cover image for Function-based RAG: Extending LLMs Beyond Static Knowledge Bases
Patrick Chan for Gentoro

Posted on • Originally published at gentoro.com

Function-based RAG: Extending LLMs Beyond Static Knowledge Bases

RAG Defined

Retrieval-Augmented Generation (RAG) effectively overcomes a significant limitation in the field of Large Language Models (LLMs). Traditional LLMs are restricted to the knowledge contained within their training data. However, RAG enables these models to connect with external data sources, expanding their knowledge. This integration is crucial because it ensures that LLM responses are not limited to pre-trained information but also include up-to-date external data. This capability is especially valuable in fields where knowledge evolves quickly and staying current is essential.

Basic RAG Process

The key steps a RAG process are as follows:

  1. Document Chunking: The first step involves breaking down extensive documents into smaller, more manageable chunks. This is essential because LLMs have size constraints on the amount of data they can process at one time. By dividing documents into smaller parts, it ensures that the LLM can handle the data without being overwhelmed. This process requires careful consideration to ensure that the integrity and context of the information in the documents are maintained even after they are chunked.
  2. Vector Database Embeddings: Once the documents are chunked, the next step is to create embeddings for each chunk using the LLM. Embeddings are essentially numerical representations of text data that capture the contextual meanings of words or phrases. The embeddings are then stored in a vector database, which makes it possible to find chunks that are relevant to an incoming prompt.
  3. Vector Database Lookup: In this step, the LLM generates an embedding for the input prompt. This prompt embedding is then used to search the vector database for matching document chunks. The vector database contains the embeddings of all the document chunks created in the previous step. By comparing the prompt embedding with the embeddings in the vector database, the system can identify which chunks of the document are most relevant to the prompt.
  4. Response Integration: The final step involves integrating the most relevant document chunks, identified in the vector database lookup, into the original prompt. This integration is done in a way that maintains the coherence and context of the response. The integrated prompt, now enriched with information from the external documents, is then processed by the LLM to formulate a comprehensive and informative response. This step is vital in ensuring that the response generated by the LLM is not only based on its pre-trained knowledge but also supplemented with up-to-date, external information.

Document-based RAG vs. Function-based RAG

Document-based RAG utilizes static documents as its information source. This method is suitable for scenarios where the information is relatively constant and does not require frequent updates. For example:

What are the symptoms of high blood pressure?

However, document-based RAG cannot handle requests that necessitate real-time data. For example:

What are the most recent blood pressure readings for patient 30046822?

The system’s inability to interact with dynamic data sources limits its applicability in scenarios where up-to-date information is critical. In contrast, function-based RAG systems are specifically designed to excel in real-time operations by interfacing with various information systems, such as databases and data lakes. This capability allows them to access and process current data, making them highly effective in scenarios requiring up-to-the-minute information. The function-based RAG process encompasses several key components:

  1. Function Description: This involves writing comprehensive descriptions for each function that the system might need to execute. These descriptions are crucial as they provide a clear understanding of what each function does, its inputs, and expected outputs. Once these descriptions are created, their embeddings are generated using a LLM. These embeddings effectively capture the essence of the function in a format that the system can understand and utilize. These function descriptions and their embeddings are then stored in a vector database, creating a repository of functions that the system can draw upon.
  2. Function Matching: When a prompt is received, the system searches the vector database to find functions that match the prompt. This matching process is critical as it determines how the system will respond to the query. The system uses the embeddings of the functions stored in the vector database to find the best match for the prompt, ensuring that the response is as accurate and relevant as possible.
  3. Function Execution: Once a matching function is identified, the system then proceeds to execute it. This step involves using the information provided in the prompt to determine how the function should be executed. This often includes extracting specific parameters from the prompt that are necessary for the function’s execution. For instance, if the prompt is asking for the latest blood pressure readings, the function execution step would involve extracting the patient’s identifier and the time range for which the readings are required.
  4. Response Integration: After the function is executed, the results are then integrated back into the original prompt. This integration is a crucial step as it ensures that the response generated by the system is both relevant and contextually appropriate. The LLM processes this integrated information to generate a response that not only answers the query but also incorporates the results of the function call.

In summary, while document-based RAG is suitable for static data scenarios, function-based RAG systems offer a more dynamic and interactive solution, especially suited for real-time data processing and operations. The detailed process of function description, matching, execution, and response integration enables function-based RAG systems to handle complex queries and provide accurate, up-to-date responses.

The diagram illustrates the various components involved in a RAG system. Components highlighted in yellow are utilized specifically in document-based RAG systems, whereas those in blue are indicative of the components necessary for a function-based RAG system.

Challenges in Function Definition Generation

In function-based RAG systems, the quality of outcomes heavily depends on the accuracy of function definitions. This can be challenging, especially in complex corporate database schemas with numerous, cryptically named tables and columns. In such environments, manual documentation for each element becomes impractical.

A critical element in these systems is the Function Generation Engine. Its primary task is to extract as much metadata as possible from connected systems and use this information, along with a LLM, to create function definitions for storage in the vector database. However, the success of this process varies. For example, mature corporate schemas might have hundreds of tables and thousands of cryptically named columns, often including undocumented constants that are difficult for both LLMs and humans to interpret. In these cases, manual annotation might be necessary to enhance the accuracy of function definitions.

Enhancing function definition quality can be achieved by incorporating additional information sources. For example, using SQL log queries provides deeper insights into the dataset, leading to improved function definitions. Incorporating a knowledge graph can also be extremely valuable. Hence, the Function Definition Engine needs the versatility to extract useful data from a broad spectrum of sources.

Additionally, the system’s workbench, discussed later, continually improves function definition generation. Access to extra information simply boosts the initial accuracy level.

Function-based RAG Requires Function-capable LLMs

In a document-based RAG system, the chunks consist of natural language entities, making it compatible with any foundational LLM. However, in function-based RAG systems, specialized LLMs trained to handle function entities are essential. Two widely recognized LLMs, ChatGPT and GorillaLLM, excel in this domain. These models are fine-tuned to go beyond textual prompts; they are trained to match these prompts against a list of functions and appropriately populate these functions with the required parameters.

Moreover, these specialized LLMs must possess a versatile range of output capabilities. Traditional LLM outputs are typically limited to standard text formats, which are adequate for general queries and responses. However, function-based RAG often necessitates conveying complex function information and parameters that go beyond plain text. For instance, these models might need to generate outputs in the form of a special JSON format header, as implemented by OpenAI. This format allows for structured and precise communication of function calls, making it easier for the connected systems to interpret and execute these calls accurately.

The need for such specialized output formats stems from the diverse and intricate nature of the tasks that function-based RAG systems are expected to perform. In many cases, these tasks involve interacting with systems that require detailed, structured data inputs — such as querying a database for specific records or sending a command to a software application. The ability of function-aware LLMs to generate these structured outputs ensures seamless integration and communication between the LLM and the various systems it interacts with.

Furthermore, the development of function-aware LLMs involves extensive training on a wide range of functions and systems. This training includes not only the syntax and structure of different programming languages and database query languages but also an understanding of the logical flow and practical application of these functions. As a result, these models can accurately interpret the intent behind a prompt and generate responses that are not only contextually appropriate but also technically correct and executable by the target system.

In summary, function-based RAG necessitates specialized, function-aware LLMs capable of understanding and interacting with a variety of complex functions and systems. These models represent a critical leap forward in enabling LLMs to perform a broader range of tasks, extending their utility beyond simple text generation to more sophisticated and impactful applications in various technological and computational fields.

AI-powered Orchestration

The AI-powered Orchestrator in a RAG system is much more than a mere conduit for data flow. It is an intelligent, dynamic manager that plays an indispensable role in ensuring the seamless and effective operation of the system. This orchestrator is responsible for guiding a prompt through all the intricate components of the system to produce a response that is not only accurate and relevant but also of high quality.

In a typical workflow, the orchestrator first directs the prompt to a function-aware LLM. This is a crucial step, as the LLM needs to understand and interpret the prompt to generate an appropriate function call. Once the function call is made, the orchestrator then forwards this call to the execution engine. The execution engine, another critical component of the system, is where the actual processing happens — be it data retrieval, computation, or any other specific action required by the function. The orchestrator then waits for the results from the execution engine and ensures that these results are correctly formatted and returned as a coherent response to the initial prompt.

A leading example in the development of such orchestration workflows is LangChain, which has been instrumental in coding and refining these processes. However, the field is rapidly evolving with ongoing research aimed at enhancing the reliability and sophistication of these systems. A significant area of focus is the reduction of errors such as ‘hallucinations’, a term used to describe instances where LLMs generate incorrect or nonsensical responses. One promising approach to mitigate this issue is the Chain-of-Thought method. This method involves the LLM itself in a sort of self-reflection or internal dialogue to evaluate the quality and logic of its responses before they are forwarded.

Such advanced techniques highlight the need for more flexible and dynamic orchestration methods. Traditional, rigid workflows are ill-suited to these innovative approaches. In response to this challenge, LLMs themselves are being leveraged to articulate complex workflows in plain, understandable language. This capability enables the orchestrator to adapt these workflows dynamically, ensuring that the system can handle a wide array of tasks and scenarios with increased efficiency and reliability.

Moreover, the sophistication of an LLM-powered orchestrator allows for the integration of context-aware modifications within the application. For example, it can be programmed to recognize emotional cues in prompts, such as expressions of frustration or urgency. Upon detecting such cues, the orchestrator could automatically initiate specific actions, like escalating the prompt to human customer service representatives or triggering an alert in the system. This level of responsiveness not only enhances the user experience but also contributes to the overall effectiveness and efficiency of the system.

In conclusion, the AI-powered Orchestrator represents a significant advancement in the management of complex RAG systems. Its ability to intelligently guide prompts through various system components, adapt to advanced processing techniques, and respond dynamically to the context of prompts, marks a leap forward in the capabilities of automated data processing and response generation systems. As these technologies continue to evolve, they promise to revolutionize the way we interact with and leverage the power of LLMin various applications.

Reducing Costs by Training a Smaller Internal LLM

The use of advanced models like ChatGPT-4 in RAG systems, while powerful, can be financially burdensome, especially under conditions of heavy or continuous usage. To mitigate these costs without compromising on functionality, a strategic approach involves training a smaller, more cost-effective internal LLM.

This smaller LLM can be trained using response pairs generated by the more advanced models. This training method allows the smaller model to learn and mimic the response patterns and decision-making processes of its more sophisticated counterparts. Over time, as this internal model becomes more proficient, it can gradually take on an increasing number of tasks, such as function matching, which are traditionally handled by the larger, more expensive models. This shift not only reduces operational costs but also leverages the learning and adaptability inherent in LLMs.

However, it’s important to recognize that certain tasks, particularly those requiring deep language understanding and complex integrations, such as response integration and nuanced context processing, may still require the capabilities of the more advanced models like ChatGPT-4. The smaller LLM, while efficient and improving, may not yet possess the sophistication needed for these complex tasks.

To balance cost and performance, the RAG system can be configured to initially employ the smaller model for basic tasks and preliminary evaluations. The system can use this model to assess the accuracy and relevance of its responses and decisions. If the performance of the smaller model meets predefined standards of accuracy and reliability, it can continue to handle the task. However, if the output falls below a certain quality threshold, the system can seamlessly transition the task to the more advanced model. This approach ensures that the quality of output is not compromised while still maximizing cost efficiency.

The smaller model’s learning capabilities are a key asset in this approach. As it continues to process and learn from various prompts and responses, its ability to effectively handle similar or related tasks is expected to improve over time. This means that with each interaction and learning opportunity, the smaller model becomes progressively more capable, potentially taking on more complex tasks and reducing the frequency of escalations to the more advanced models.

Furthermore, the smaller internal LLM can be customized and fine-tuned to the specific needs and nuances of the organization’s operations. This customization allows for a more targeted approach to learning and response generation, potentially increasing efficiency and relevance in the specific contexts in which the organization operates.

In summary, the strategic use of a smaller internal LLM in RAG systems represents a smart balancing act between cost efficiency and performance optimization. By leveraging the learning capabilities of LLMs and intelligently allocating tasks between models based on complexity and required expertise, organizations can significantly reduce operational costs while maintaining, and in some cases even enhancing, the quality of output and responsiveness of their RAG systems.

AI-powered Workbench

In the practical application of RAG systems within production environments, various challenges can impede successful execution of prompts. These include:

  1. Missing functions.
  2. Unfound or incorrectly matched functions.
  3. Incorrectly assigned parameters.
  4. Too many results returned.
  5. Systemic failures during function execution.

To navigate and resolve these issues efficiently and precisely, the integration of an AI-powered workbench is not just beneficial but essential.

The traditional approach to addressing these issues often required manual intervention, where users had to painstakingly analyze and rectify each problem. This process was not only time-consuming but also prone to human error, especially in complex scenarios. The advent of an LLM-powered workbench marks a significant evolution in this domain. This advanced workbench harnesses the analytical prowess of LLMs to swiftly identify and diagnose problems within the RAG system.

More than just identifying issues, the LLM-powered workbench is capable of suggesting intelligent, context-aware solutions. Leveraging the extensive knowledge base and analytical capabilities of LLMs, it can propose fixes that are not only relevant but also optimized for the specific context of the encountered problem. Once a solution is proposed, it empowers the user with the decision to approve or modify the suggested fix, ensuring that human oversight and control remain integral to the process.

Upon user approval, the system takes a proactive stance, automatically implementing the chosen solution. This automation streamlines the rectification process, significantly reducing the time and effort involved in manual troubleshooting. Furthermore, the workbench incorporates a sophisticated retesting mechanism. After implementing a solution, it systematically retests the prompt to ensure that the fix has not only resolved the specific issue but also that it hasn’t led to any regressions or new problems within the system.

This AI-powered workbench, therefore, represents a critical component in the management and optimization of RAG systems. It not only enhances the efficiency and effectiveness of problem resolution but also contributes to the overall robustness and reliability of the system. The integration of such an advanced tool in RAG systems paves the way for smoother, more efficient operations and a significant reduction in downtime, making it a cornerstone of modern, AI-enhanced data processing environments.

Execution Engine

The execution engine serves as a vital component of a data processing system, primarily responsible for managing various connections to external information systems. This includes handling credentials, overseeing key management, and efficiently maintaining connection pools. One of its key functions is to execute SQL queries or other types of code that are associated with specific functions within the system.

Apart from executing predefined code, the execution engine plays a crucial role in the process of function generation. It assists the function generation engine by retrieving sample values from connected information systems. These sample values are instrumental in defining and refining the functions, as they provide real data contexts for testing and optimization.

The flexibility of the execution engine is of paramount importance. In a rapidly evolving technological landscape, the ability to seamlessly integrate new types of connections is essential. This adaptability ensures that the system remains relevant and efficient in the face of changing data sources and evolving information system technologies. It must be designed with a modular architecture, allowing for the easy addition and integration of new connection types. This could involve supporting a wide range of database types, API protocols, and other data access methodologies.

Furthermore, the execution engine should also ensure secure and efficient data handling. This includes implementing robust encryption standards for data in transit and at rest, adhering to compliance and data protection regulations, and optimizing query execution for speed and resource utilization. Advanced features like connection pooling and intelligent cache management can significantly improve the performance and scalability of the system.

Overall, the execution engine is a critical component that not only ensures effective data interaction and processing but also enables the system to adapt and expand its capabilities in line with emerging data sources and evolving business needs.

Privacy Filter

The privacy component is a pivotal aspect of any system dealing with sensitive data, particularly when interfacing with external LLMs such as ChatGPT. In the context of enterprise systems, which often handle confidential and personally identifiable information, this component plays a dual role.

Firstly, it ensures the pseudo-anonymization of data before it is transmitted to the LLM. This process involves altering data in a way that the original values are not directly exposed but are still meaningful enough for the LLM to process and generate relevant responses. Techniques like tokenization, where sensitive elements are replaced with non-sensitive equivalents, or data masking, where specific data fields are obscured, can be employed.

Secondly, upon receiving responses from the LLM, the privacy component is responsible for restoring the original values from the anonymized data. This process is essential to maintain the integrity and applicability of the LLM’s responses within the context of the enterprise’s operations.

Even in scenarios involving internal LLMs — for instance, within a large organization with multiple departments — maintaining privacy is crucial. Different departments may handle various levels of sensitive information, and it’s imperative to ensure that data privacy is upheld when such information is shared or processed by LLMs.

An advanced privacy system in this context goes beyond basic anonymization. It automatically anonymizes all results flowing through the system. This system should be flexible, allowing users to adjust protection levels based on the sensitivity of the data and the requirements of the task at hand. For instance, a department dealing with highly sensitive customer data may require stricter anonymization compared to a department handling less sensitive, internal operational data.

Additionally, this privacy system should be built with compliance in mind, adhering to relevant data protection regulations like GDPR, HIPAA, or CCPA. It should have the capability to track and audit data processing activities, ensuring transparency and accountability in how data is handled.

In summary, the privacy component is not just a passive filter but an active, dynamic system crucial for maintaining data integrity, confidentiality, and compliance, especially in environments utilizing LLMs like ChatGPT. Its ability to adapt to varying levels of privacy needs and comply with legal standards makes it an indispensable part of any data-driven, AI-augmented enterprise.

Conclusion

In conclusion, the exploration of RAG within this whitepaper highlights its pivotal role in elevating the capabilities of LLMs like ChatGPT. RAG’s innovative approach allows LLMs to access and integrate recent data from both static documents and dynamic, complex information systems. This integration significantly broadens the application scope of LLMs, enabling them to tap into vast and varied data sources.

The distinction between document-based and function-based RAG systems is particularly noteworthy. While the former facilitates interaction with static data, the latter, albeit more challenging to implement, unlocks the potential for LLMs to interface with intricate, real-time data systems. This capability is a game-changer, enabling LLMs to not only generate more informed and contextually relevant responses but also to delve deeper into data analysis and insight generation.

As LLMs continue to advance in sophistication and power, their ability to synthesize and interpret large volumes of data will only increase. This progression will further enhance their utility in various sectors, ranging from healthcare to finance, where real-time data analysis and response generation are critical. RAG, in this context, acts as a catalyst, transforming LLMs from mere conversational agents into powerful analytical tools capable of providing valuable insights and aiding in decision-making processes.

This article underscores the transformative impact of RAG on the future of LLMs. By bridging the gap between LLMs and real-time, dynamic data sources, RAG paves the way for more intelligent, responsive, and data-informed AI systems. The potential applications of such enhanced LLMs are vast and varied, heralding a new era of AI-driven data interaction and analysis in our increasingly digital world.

References

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks https://arxiv.org/abs/2005.11401

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
https://arxiv.org/abs/2201.11903

Top comments (0)