Retrieval Augmented Generation (RAG) is one of the most popular implementations of GenAI applications. RAG enables businesses to leverage the power of LLMs on their internal knowledge base, such as documents, reports, handbooks, etc., to answer users' queries. However, the implementation of RAG is not straightforward, as it requires some technical knowledge to get a RAG system smoothly up and running.
In this article, we're going to show you how you can easily build a RAG system with the help of two technologies: Dify and Milvus. Dify will act as the orchestration platform to get our RAG system up and running in a split second, while Milvus will serve as a vector database that stores the internal knowledge or documents for our RAG system. So, without further ado, let's get started with a brief introduction to RAG!
The Fundamentals of RAG
In essence, RAG is an approach designed to help mitigate the risk of LLM hallucination. As you might already know, LLMs are trained and fine-tuned on massive datasets with specific cut-off dates. They will most likely work if we ask general questions available on the internet before their cut-off dates. However, if we ask them questions that require internal knowledge, i.e., information from internal documents, our LLMs would likely fail to give us correct answers.
The problem is, when LLMs give us incorrect information, they do it convincingly, and it's difficult to spot errors if we're asking questions in domains we're not familiar with. RAG helps mitigate this issue by providing our LLMs with relevant contexts that might help answer users' queries. As the name suggests, there are three components in a RAG system: retrieval, augmentation, and generation.
RAG workflow.
In the retrieval component, the top-k most relevant contexts according to a given user query are fetched. Next, these relevant contexts are reranked based on their similarity score with the user query. In the augmentation component, the most relevant context after the reranking process is integrated into the final prompt together with the original user query as input to our LLM. Finally, in the generation component, the LLM generates the final answer to the user query using the relevant contexts fetched in the previous components.
Before we can implement RAG as described above, we need to set up a few things first. For example, a typical implementation of RAG involves converting raw internal data into their embedding representations so that we can perform similarity searches on them. Therefore, we need an embedding model for that. Also, we need a storage system like vector databases to store all of the embeddings of those internal data and an LLM to generate a response back to the user.
As you can see, there are many things that we need to set up before we can get our RAG system up and running. RAG applications can be complex to implement without an orchestration platform that encapsulates all of the RAG workflow described above. This is where we need Dify.
What is Dify?
Dify, which originates from the words "Define" and "Modify," is an open-source platform that enables us to build various GenAI applications without the hassle of setting up various components. This is because Dify offers a combination of Backend-as-a-Service and LLMOps to orchestrate the workflow of popular GenAI applications, including RAG. In the next section, we'll see how straightforward it is for us to implement RAG with Dify.
Dify provides a low-code workflow that enables us to skip the complexity of setting up different RAG components described above. This means that we don't need to be an expert to get our own RAG system smoothly up and running. Dify as a platform also offers several advantages, such as easy integration with popular LLM providers, a flexible AI agent framework, an intuitive and easy-to-use UI and APIs, and high-quality RAG engines.
Below is a table that explains how Dify simplifies the development process of sophisticated GenAI applications:
How Dify simplifies the development of GenAI apps. Source.
One key aspect when developing a GenAI app is the ability to self-host the app within our own environment. This is important since we would like to have full control over our private data and its security. Also, self-hosting our GenAI app gives us or developers flexibility to maintain or customize its infrastructure.
Therefore, there are two ways to self-host Dify: with Docker Compose deployment and with local code source start. Out of these two methods, deploying with Docker Compose is simpler to set up compared to local code source start.
In the next section, we'll host Dify with Docker Compose deployment. But before we get into the implementation details, let's talk about another system that we'll use in this article, which is Milvus.
What is Milvus?
While Dify is very useful for orchestrating the workflow of a RAG application, there is another crucial component that we also need to pay attention to. As explained in the previous section about the fundamentals of RAG, a vector database plays a big role in storing as well as performing similarity searches to fetch promising contexts for a given user query. Therefore, choosing the right vector database is crucial for us.
Milvus is an open-source vector database that's perfectly suitable for GenAI applications, including RAG. This is because Milvus offers many advanced features to optimize the implementation of RAG, such as advanced data indexing methods, easy integration with popular orchestration tools, and the ability to use advanced searching methods such as hybrid search.
The workflow of transforming unstructured data into embeddings and storing them in Milvus.
In terms of indexing methods, we can choose the FLAT index to perform exhaustive similarity searches to obtain the most relevant contexts according to a user query. To speed up the similarity search without sacrificing the results too much, we can opt for more advanced indexing methods such as IVF_FLAT, HNSW, or SCANN. To further compress the memory of the data, we can use product quantization on our data while using IVF_FLAT and HNSW.
Milvus can also be easily integrated with popular embedding models, LLMs, and orchestration platforms like Dify, in which you'll see the detailed implementation in the next section.
During the similarity search operation, we can use a hybrid search with Milvus to improve the overall quality of contexts and, subsequently, the generation quality of our LLM as well. With hybrid search, we have the option to combine dense and sparse embeddings to find similar contexts, and we can further refine the top-k most relevant contexts with the help of metadata filtering.
RAG with Dify and Milvus
In this section, we're going to build a simple RAG app with Dify where we can ask questions about the information contained in a research paper. For the research paper, you can use any paper you want, but in this case, we're going to use the famous paper that introduced us to the Transformers architecture, "Attention is All You Need".
As mentioned in the previous sections, we need to set up at least three important components before creating a RAG app: a vector storage, an embedding model, and an LLM.
We're going to use Milvus as our vector storage, where we'll store all of the necessary contexts. In our case, this would be a collection of text chunks contained in the "Attention is All You Need" paper. For the embedding model and the LLM, we'll use models from OpenAI, therefore we need to set up an OpenAI API key first. You can learn more about how to set it up here.
Step 1: Starting Dify and Milvus Containers
In this example, we’ll self-host Dify with Docker compose. Therefore, before we start, make sure that you have Docker installed on your local machine. If you haven't, install Docker by referring to its installation page.
Once we have Docker installed, we need to clone the Dify source code into our local machine with the following command:
git clone <<https://github.com/langgenius/dify.git>>
Next, go to the docker
directory inside of the source code that you’ve just cloned. There, you need to copy the .env
file with the following command:
cd dify/docker
cp .env.example .env
In a nutshell, .env
file contains the configurations needed to set your Dify app up and running, such as the selection of vector databases, the credentials necessary to access your vector database, the address of your Dify app, etc.
Since we’re going to use Milvus as our vector database, then we need to change the value of VECTOR_STORE
variable inside .env
file to milvus
. Also, we need to change the MILVUS_URI
variable to http://host.docker.internal:19530 to ensure that there’s no communication issue between Docker containers later on after deployment.
VECTOR_STORE=milvus
MILVUS_URI=http://host.docker.internal:19530
Now we are ready to start the Docker containers. To do so, all we need to do is to run the docker compose up -d
command. After it finishes, you’ll see similar output in your terminal as bellow:
docker compose up -d
We can check the status of all containers and see if they’re up and running healthily with docker compose ps
command. If they’re all healthy, you’ll see an output as below:
docker compose ps
And finally, if we head up to http://localhost/install, you’ll see a Dify landing page where we can sign up and start building our RAG application in no time.
Once you’ve signed up, then you can just log into Dify with your credentials.
Step 2: Setting Up OpenAI API Key
The first thing we need to do after signing up for Dify is to set up our API keys that we'll use to call the embedding model as well as the LLM. Since we're going to use models from OpenAI, we need to insert our OpenAI API key into our profile. To do so, go to "Settings" by hovering your cursor over your profile on the top right of the UI, as you can see in the screenshot below:
Next, go to "Model Provider," hover your cursor on OpenAI, and then click "Setup." You'll then see a pop-up screen where you're prompted to enter your OpenAI API key. Once we're done, we're ready to use models from OpenAI as our embedding model and LLM.
Step 3: Inserting Documents into Knowledge Base
Now let's store the knowledge base for our RAG app. The knowledge base consists of a collection of internal documents or texts that can be used as relevant contexts to help the LLM generates more accurate responses.
In our use case, our knowledge base is essentially the "Attention is All You Need" paper. However, we can't store the paper as it is due to multiple reasons. First, the paper is too long, and giving an overly long context to the LLM wouldn't help as the context is too broad. Second, we can't perform similarity searches to fetch the most relevant context if our input is raw text.
Therefore, there are at least two steps we need to take before storing our paper into the knowledge base. First, we need to divide the paper into text chunks, and then transform each chunk into an embedding via an embedding model. Finally, we can store these embeddings into Milvus as our vector database.
Dify makes it easy for us to split the texts in the paper into chunks and turn them into embeddings. All we need to do is upload the PDF file of the paper, set the chunk length, and choose the embedding model via a slider. To do all these steps, go to "Knowledge" and then click "Create Knowledge". Next, you'll be prompted to upload the PDF file from your local computer. Therefore, it's better if you download the paper from ArXiv and save it on your computer first.
Once we've uploaded the file, we can set the chunk length, indexing method, the embedding model we want to use, and retrieval settings.
In the "Chunk Setting" area, you can choose any number as the maximum chunk length (in our use case, we'll set it to 100). Next, for "Index Method," we need to choose the "High Quality" option as it'll enable us to perform similarity searches to find relevant contexts. For "Embedding Model," you can choose any embedding model from OpenAI you want, but in this example, we're going to use the text-embedding-3-small model. Lastly, for "Retrieval Setting," we need to choose "Vector Search" as we want to perform similarity searches to find the most relevant contexts.
Now if you click on "Save & Process" and everything goes well, you'll see a green tick appear as shown in the following screenshot:
Step 4: Creating the RAG App
Up until this point, we have successfully created a knowledge base and stored it inside our Milvus database. Now we're ready to create the RAG app.
Creating the RAG app with Dify is very straightforward. We need to go to "Studio" instead of “Knowledge” like before, and then click on "Create from Blank." Next, choose "Chatbot" as the app type and give your App a name inside the provided field. Once you're done, click "Create." Now you'll see the following page:
Under the "Instruction" field, we can write a system prompt such as "Answer the query from the user concisely." Next, as "Context," we need to click on the "Add" symbol, and then add the knowledge base that we've just created. This way, our RAG app will fetch possible contexts from this knowledge base to answer the user's query.
Now that we've added the knowledge base to our RAG app, the last thing we need to do is choose the LLM from OpenAI. To do so, you can click on the model list available in the upper right corner, as you can see in the screenshot below:
And now we're ready to publish our RAG application! In the upper right-hand corner, click "Publish," and there you can find many ways to publish our RAG app: we can simply run it in a browser, embed it on our website, or access the app via API. In this example, we'll just run our app in a browser, so we can click on "Run App".
And that's it! Now you can ask the LLM anything related to the "Attention is All You Need" paper or any documents included in our knowledge base.
Conclusion
In this article, we’ve seen how we can build a RAG app easily with Dify and Milvus. By using both Dify and Milvus, implementing a RAG system becomes significantly more accessible, even for those without deep technical expertise. Dify streamlines the orchestration of various RAG components, while Milvus provides efficient vector storage and retrieval capabilities.
With only just a few clicks, your RAG-powered chatbot can now deliver contextualized answers based on your internal documents stored in the knowledge base. This, in turn, improves the reliability of LLM-generated responses.
Top comments (0)