DEV Community

Cover image for Building a Retrieval-Augmented Generation (RAG) System with Langchain, LangGraph, Tavily and LangSmith in TypeScript
Vasilis Drosatos
Vasilis Drosatos

Posted on

Building a Retrieval-Augmented Generation (RAG) System with Langchain, LangGraph, Tavily and LangSmith in TypeScript

In this blog post, we'll guide you through creating a Retrieval-Augmented Generation (RAG) system using TypeScript, leveraging Langchain, LangGraph LangSmith and Tavily. RAG systems effectively combine retrieval-based models with generative models, allowing you to fetch relevant documents and generate contextually appropriate responses.

Prerequisites
You need to be a bit familiar with the following concepts and tools to follow along with this tutorial:
Embeddings: Representations of text in a vector space.
Retrieval-based models: Models that retrieve relevant documents based on user queries.
Langchain: A library for building language model applications.

Before we start coding, make sure you have Node.js installed. Then, install the following npm packages:

Note: We're using pnpm for this tutorial, but feel free to use your preferred package manager.

pnpm add langchain @langchain/core @langchain/langgraph @langchain/openai @langchain/community
Enter fullscreen mode Exit fullscreen mode

Since we will be writing our RAG in TypeScript, we'll use the tsx package, which is the fastest and easiest way to run TypeScript files in Node.js.
Also, we will need to store our API key from OpenAI in an env variable. The best way to manage our local environmental variables is the dotenv package so let's install it as well.

pnpm add -D tsx dotenv @types/node
Enter fullscreen mode Exit fullscreen mode

Next, create a file named graph.ts in the root of your project with the following content:

console.log('Hello langgraph');
Enter fullscreen mode Exit fullscreen mode

Let's create a .env file as well (do not forget to include it in your .gitignore) and write your OpenAI API key

OPENAI_API_KEY="your-key"
Enter fullscreen mode Exit fullscreen mode

Add a start script to your package.json and set the type to "module":

"type": "module",  
"scripts": {
    "start": "tsx -r dotenv/config graph.ts"
  }
Enter fullscreen mode Exit fullscreen mode

Now, let's run it in the terminal:

pnpm start
Enter fullscreen mode Exit fullscreen mode

You should see the following output:

> tsx graph.ts

Hello langgraph
Enter fullscreen mode Exit fullscreen mode

With that, we're ready to start building our RAG!

Overview of What We'll Build

In this tutorial, we'll develop a RAG (Retrieval-Augmented Generation) system that consists of the following components:

  1. Vector Store: This will hold embeddings of sample documents. We'll utilize the MemoryVectorStore from Langchain for simplicity. However, in a production setting, it's recommended to use a more robust vector store provider, such as Chroma or Pinecone.

  2. Retrieval Node: This node retrieves relevant documents from our vector store based on user queries.

  3. Web Search Node: This node fetches relevant documents from the web in response to user queries.

  4. Generation Node: This node takes the documents retrieved and generates a response.

  5. LangSmith Integration: This tool will be integrated to aid in debugging and monitoring our RAG system during both development and production phases.

You can find the complete code for this tutorial in the Github Repo

Step 1: Setting up the Vector Store and the Embeddings
Embeddings are crucial in our RAG system as they form the foundation of our graph. For this tutorial, we'll use the OpenAI embeddings with the text-embedding-3-small model. However, you should explore and select the best embeddings suited to your needs (e.g., multilingual support, vector size, etc.).

Add the following code to your graph.ts file:

import { OpenAIEmbeddings } from '@langchain/openai';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import type { DocumentInterface } from '@langchain/core/documents';

// Sample documents for the vector store
const documents: DocumentInterface[] = [
  {
    pageContent:
      'JavaScript is a versatile programming language primarily used for web development.',
    metadata: {
      id: '1',
    },
  },
  {
    pageContent:
      'Langchain is a powerful library for building language model applications.',
    metadata: {
      id: '2',
    },
  },
  {
    pageContent:
      'Retrieval-Augmented Generation combines retrieval-based and generative models.',
    metadata: {
      id: '3',
    },
  },
  {
    pageContent:
      'Langsmith is a tool that aids in the development and debugging of language model applications.',
    metadata: {
      id: '4',
    },
  },
];

// Create embeddings from the documents
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

// Create a new vector store
const vectorStore = new MemoryVectorStore(embeddings);

// Add the documents in the vector store
await vectorStore.addDocuments(documents);
Enter fullscreen mode Exit fullscreen mode

In the code above, we convert some sample documents into embeddings, create a memory vector store, and add the documents to it.

Step 2: Setting Up LangGraph for Node-Based Processing
LangGraph allows us to structure the retrieval and generation process into different nodes. Also, it gives us the flexibility to define the graph state.

Let's continue in graph.ts file writing the state of our graph.

import { Annotation } from '@langchain/langgraph';

// Represents the state of our graph.
const GraphState = Annotation.Root({
  documents: Annotation<DocumentInterface[]>({
    reducer: (x, y) => (y ? y.concat(x ?? []) : []),
  }),
  question: Annotation<string>({
    reducer: (x, y) => y ?? x ?? '',
  }),
  generation: Annotation<string>({
    reducer: (x, y) => y ?? x,
  }),
});
Enter fullscreen mode Exit fullscreen mode

In the code above we defined the state of our graph consisting of:

  1. documents: The retrieved documents from the nodes below
  2. question: The user's query
  3. generation: The response from the generation node below

Retrieval Node
Let's continue by defining the retrieval node in our graph.ts file. This node will retrieve documents from our MemoryVectorStore defined earlier.

import type { RunnableConfig } from '@langchain/core/runnables';
import { ScoreThresholdRetriever } from 'langchain/retrievers/score_threshold';

/**
 * Retrieve documents
 *
 * @param {typeof GraphState.State} state The current state of the graph.
 * @param {RunnableConfig | undefined} config The configuration object for tracing.
 * @returns {Promise<Partial<typeof GraphState.State>>} The new state object.
 */
async function retrieve(
  state: typeof GraphState.State,
  config?: RunnableConfig
): Promise<Partial<typeof GraphState.State>> {
  console.log('---RETRIEVE---');

  const retriever = ScoreThresholdRetriever.fromVectorStore(vectorStore, {
    minSimilarityScore: 0.3, // Finds results with at least this similarity score
    maxK: 1, // Maximum number of results to return
    kIncrement: 1, // Increment the number of results by this amount
  });

  const relatedDocuments = await retriever
    // Optional: Set the run name for tracing - useful for debugging
    .withConfig({ runName: 'FetchRelevantDocuments' })
    .invoke(state.question, config);

  return {
    documents: relatedDocuments,
  };
}
Enter fullscreen mode Exit fullscreen mode

In the code above we created a ScoreThresholdRetriever from the vector store defined earlier. You can also use any other retriever defined in the langchain or even create a custom one.
Then, we invoke the retriever passing the user's question from the state.

Note: We will explain the config parameters later

Web Search Node
On this node, we will connect our LLM to the web using the Tavily. You can register and get a Free API key (1,000 API calls Monthly). We store the TAVILY_API_KEY to the .env file.

Let's write the node to graph.ts file

import { TavilySearchAPIRetriever } from '@langchain/community/retrievers/tavily_search_api';

/**
 * Web search based on the question using Tavily API.
 *
 * @param {typeof GraphState.State} state The current state of the graph.
 * @param {RunnableConfig | undefined} config The configuration object for tracing.
 * @returns {Promise<Partial<typeof GraphState.State>>} The new state object.
 */
async function webSearch(
  state: typeof GraphState.State,
  config?: RunnableConfig
): Promise<Partial<typeof GraphState.State>> {
  console.log('---WEB SEARCH---');

  const retriever = new TavilySearchAPIRetriever({
    apiKey: process.env.TAVILY_API_KEY,
    k: 1,
  });

  const webDocuments = await retriever
    // Optional: Set the run name for tracing - useful for debugging
    .withConfig({ runName: 'FetchRelevantDocuments' })
    .invoke(state.question, config);

  return {
    documents: webDocuments,
  };
}
Enter fullscreen mode Exit fullscreen mode

We created a retriever from TavilySearchAPIRetriever imported by @langchain/community package and invoked it with the user's question from the state.

Generation Node
Finally, we define the node to generate the final answer based on the retrieved documents.

Let's add the code below to our graph.ts file

/**
 * Generate answer
 *
 * @param {typeof GraphState.State} state The current state of the graph.
 * @param {RunnableConfig | undefined} config The configuration object for tracing.
 * @returns {Promise<Partial<typeof GraphState.State>>} The new state object.
 */
async function generate(
  state: typeof GraphState.State,
  config?: RunnableConfig
): Promise<Partial<typeof GraphState.State>> {
  console.log('---GENERATE---');

  // Define the LLM
  const model = new ChatOpenAI({
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o-mini',
    temperature: 0,
  });

  // Pull the RAG prompt from the hub - https://smith.langchain.com/hub/rlm/rag-prompt
  const prompt = await pull<ChatPromptTemplate>('rlm/rag-prompt');
  // Construct the RAG chain by piping the prompt, model, and output parser
  const ragChain = prompt.pipe(model).pipe(new StringOutputParser());

  const generation = await ragChain
    // Optional: Set the run name for tracing - useful for debugging
    .withConfig({ runName: 'GenerateAnswer' })
    .invoke(
      {
        context: formatDocumentsAsString(state.documents),
        question: state.question,
      },
      config
    );

  return {
    generation,
  };
}
Enter fullscreen mode Exit fullscreen mode

In the code above, we define a model using ChatOpenAI from @langchain/openai package. We use a RAG prompt template from the hub and create a new runnable sequence containing our prompt, the model and the StringOutputParser which will be the final answer. Finally we invoke the chain passing the context (the documents retrieved before) and the user's question.

Define the Graph nodes
Let's define our graph nodes to our graph.ts file:

import { StateGraph } from '@langchain/langgraph';

// Define the workflow and add the nodes
const workflow = new StateGraph(GraphState)
  .addNode('retrieve', retrieve)
  .addNode('webSearch', webSearch)
  .addNode('generate', generate);
Enter fullscreen mode Exit fullscreen mode

Step 3: Define the edges and build the Graph
On the previous steps we defined the different nodes that our graph will contain. Next, we need to define our graph edges.

Let's continue to our graph.ts file:

import { START, END } from '@langchain/langgraph';

// Define the edges
workflow.addEdge(START, 'retrieve');
// - If no documents are retrieved, go to web search
// - If documents are retrieved, go to generate
workflow.addConditionalEdges(
  'retrieve',
  (state: typeof GraphState.State) =>
    state.documents.length === 0 ? 'webSearch' : 'generate',
  {
    webSearch: 'webSearch',
    generate: 'generate',
  }
);
workflow.addEdge('webSearch', 'generate');
workflow.addEdge('generate', END);
Enter fullscreen mode Exit fullscreen mode

Here we define a simple graph as illustrated below:

Visuallize the graph

We start by retrieving the documents from our Vector Store based on user's question. Then we define a conditional edge based on the documents retrieved. If no documents retrieved then we go to the web search node, otherwise we go to the generation node, skipping the web search. Finally, we generate the answer and finish the graph.

Step 4: Integrating LangSmith for Debugging
LangSmith is a tool for debugging and monitoring the execution of your Langchain/Langgraph workflows. Let’s integrate it to help debug the chain.

You need an account to use LangSmith. If you don't have one, you can sign up here. They offer a free developer plan with 5000 traces per month which should more than enough for development purposes. Then you need to create a API key and store it in the .env file.

# This will enable tracing for LangSmith by default when using Langchain/Langgraph
LANGCHAIN_TRACING_V2=true
# The API key created on LangSmith
LANGCHAIN_API_KEY="your-key"
# The project name on LangSmith to store the traces
LANGCHAIN_PROJECT="langgraph-rag-demo"

# Optional: Enable background callbacks for tracing - Use it when you are not using a serverless environment
LANGCHAIN_CALLBACKS_BACKGROUND=true
Enter fullscreen mode Exit fullscreen mode

Step 5: Compile and Run the graph

Let's add the code below to our graph.ts file:

// Compile the workflow
const app = workflow.compile();

// Visualize the graph
const graphPng = await app.getGraph().drawMermaidPng();
const buffer = Buffer.from(await graphPng.arrayBuffer());
// Save the graph to a file
fs.writeFileSync('graph.png', buffer);

// Invoke the graph
const question = process.argv[2] ?? 'What is Langchain?'; // Get the question from the command line
const output = await app.invoke({ question }); // Invoke the graph with the question
Enter fullscreen mode Exit fullscreen mode

Next, run the graph in the terminal with a question that should be retrieved from our vector store:

pnpm start "What is Langchain?"
Enter fullscreen mode Exit fullscreen mode

You should see the following output (or similar):

---RETRIEVE---
---GENERATE---
{
  documents: [
    Document {
      pageContent: 'Langchain is a powerful library for building language model applications.',
      metadata: [Object],
      id: undefined
    }
  ],
  question: 'What is Langchain?',
  generation: 'Langchain is a powerful library designed for creating applications that utilize language models. It provides tools and frameworks to facilitate the development of these applications.'
}
Enter fullscreen mode Exit fullscreen mode

As you can see, the graph successfully retrieved the relevant document from the vector store and generated a response based on the user's question.

Also, you can check here the trace of the execution in the LangSmith dashboard.

Langgraph trace 1

On the screenshot above you can see all the nodes executed, the time taken by each one and the final output. Also, the config parameters that we defined earlier are used in the Langsmith trace.
Lastly, you can be informed about the costs and the number of tokens used by the OpenAI model.

Let's try a question that is not defined in our vector store:

pnpm start "What is the capital of Greece?"
Enter fullscreen mode Exit fullscreen mode

You should see the following output (or similar):

---RETRIEVE---
---WEB SEARCH---
---GENERATE---
{
  documents: [
    Document {
      pageContent: 'Recognizing the importance of the past in maintaining national identity, the government focused on efforts to restore and preserve monuments and temples like the Parthenon as well as ancient locales like the agora. Today, Athens is the capital of Greece and among the most often visited and highly regarded cultural centers in the world.',
      metadata: [Object],
      id: undefined
    }
  ],
  question: 'What is the capital of Greece?',
  generation: 'The capital of Greece is Athens. It is a highly regarded cultural center and is known for its historical significance.'
}
Enter fullscreen mode Exit fullscreen mode

The graph successfully fetched the relevant document from the web search node and generated a response based on the user's question.

Check the trace of the execution in the LangSmith dashboard.

Langgraph trace 2

Troubleshooting

If you encounter any issues while running the script, ensure that you have set up the environment variables correctly and that the API keys are valid. Also, check the console logs for any error messages that may help identify the problem. Lastly, check the versions of the libraries you're using to ensure compatibility.
If you're still facing issues, feel free to write a comment below or create an issue in the Github Repo, and we'll be happy to help you out.

Conclusion
In this post, we demonstrated how to use Langchain, LangGraph, and LangSmith to build a Retrieval-Augmented Generation (RAG) system in TypeScript. By combining these powerful tools, you can create sophisticated workflows that leverage both retrieval-based and generative models, with added debugging and monitoring capabilities.

Feel free to experiment with different nodes and workflows to explore the full potential of these libraries. You can also integrate other tools and services to enhance your RAG system further. We hope this tutorial has been helpful in getting you started with building your own RAG system.


If you found this tutorial helpful or have any questions, drop a comment below or connect with me on Dev.to.

Additional Resources
Github Repo
Langchain Documentation
LangGraph Documentation
LangSmith Documentation

Happy building! 🚀

Top comments (0)