DEV Community

Cover image for Moving from Typescript and Langchain to Rust and Loops
Anush for OpenSauced

Posted on • Edited on

Moving from Typescript and Langchain to Rust and Loops

Embarking on your open-source journey can feel like exploring uncharted territory. Imagine being a newcomer, eager to contribute but struggling to navigate the codebase. That's where Repo-Query comes in. Let's explore the journey of Repo-Query and how it facilitates easier contributions to open-source projects.

The Evolution of Repo-Query

Repo-Query, a REST service that indexes public repositories and provides insightful answers to user queries, all within your browser through the OpenSaucedAI browser extension.

Repo-Query in Action

Adding a Chat Window to the browser extension#226

From Inception to Prototype

The prototype of Repo-Query was quickly put together in about a day or two. Using modern web development tools and technologies like Typescript and Langchain, the initial version of the service was created. The sandbox for this initial experiment was the repository gh-answering-proto, and the results of the semantic search closely matched what a human would find when asked to locate a relevant code snippet.

> example@1.0.0 start
> tsx ask.ts
What is your query?
How is the AI PR description being generated? response {
text: 'The AI PR description is being generated by leveraging the getAiDescription function, which takes in a pull request URL and uses it to retrieve the pull request API URL. It then retrieves the description configuration and checks if it is empty. Finally, it uses the `getDescriptionContext` function to get the diff and commit messages based on the configuration source and generates the description using AI. The generated description is then displayed using the `setContent` function.'
What is your query?
How are the project releases made? response {
text: 'The project releases are made using their configuration for semantic-release, which allows them to automatically generate changelogs and releases for their projects based on the commit messages. The beta branch is the default branch, and they squash & merge PRS to the beta branch. They never commit directly to `main. A merge to `beta` will trigger a beta release, while a merge to `main` will trigger a full release.'
}
What is your query?
How is the project preventing duplicate button injections? 
response {
text: 'The project is preventing duplicate button injections by checking if the button already exists on the page before injecting it. This is done inside the injection script, where it checks if the button DOM element already exists using `document.getElementById("ai-description-button")`. If the element already exists, the injection script returns without injecting the button again.'
}
What is your query?
How is authentication being done in the project?
response {
text: 'Authentication in the project is being done through the `checkAuthentication function imported from the `../utils/checkAuthentication` module. This function takes in several parameters such as `hasOpted LogOut`, `getCookie`, `checkTokenValidity`, `setAccessTokenInChrome Storage`, `removeAuthTokenFromStorage', and 'logError`. These parameters are used to check if the user has opted out, get the authentication cookie, check the validity of the token, set the access token in Chrome storage, remove the auth token from storage, and log any errors that may occur. The checkAuthentication function is triggered by the chrome.cookies.onChanged event listener.'
}
Enter fullscreen mode Exit fullscreen mode

Unveiling Challenges

As we continued hacking, we encountered challenges, with the first significant hurdle being performance bottlenecks. Through rigorous testing on the extensive insights.opensauced.pizza repository, we uncovered issues. The process of generating embeddings, a critical component, was taking an excruciatingly long time, which was not ideal for users looking for quick answers. This was a bottleneck, highlighting the critical need for a more efficient solution.

The Langchain Q&A retrieval system, while powerful, posed a unique challenge. It operated in a one-shot manner, lacking the capability to receive feedback from answers and further explore the knowledge base. This limitation resulted in incomplete answers.

The poor performance stemmed from several factors:

  • We found that using OpenAI embeddings for codebases was inefficient and impractical. It took over 15 minutes to generate embeddings for the insights.opensauced.pizza repository.

  • During the prototype's development, the Langchain GitHub loader would send one request per file, resulting in long download times. For the insights.opensauced.pizza repository, it took about 2 minutes. However, this issue was later resolved in hwchase17/langchainjs#2224 by enabling parallel requests for faster retrieval.

  • the process of chunking the codebase using Langchain's recursive splitting strategy required optimization.

Performance Woes
No one would wait for such a long time to get an answer. I'd rather watch a YesTheory video and finally get off my desk.

Embracing Rust

In the pursuit of more efficient solutions, the ONNX runtime stands out as a paragon of performance. The decision to shift from Typescript to Rust was unorthodox, yet crucial. Leveraging Rust's robust parallel processing capabilities via Rayon and integration with ONNX through the ort crate, Repo-Query seemed to have unparalleled efficiency. The outcome? A transition from slow processing to - dare I say it - blazing-fast performance.

The Dual Acts:

Let's dissect Repo-Query's two key acts:

Act 1: /embed

The /embed endpoint powers the process of downloading and generating embeddings for GitHub repositories. Instead of making individual requests, Repo-Query uses the GitHub API's /archive service to fetch repositories more simply and efficiently. This condenses repository downloads into a single request per repository, eliminating the need to go through individual file retrieval requests, similar to Langchain's GitHub document loader. As a result, the download time for the repository at (https://github.com/open-sauced/app) was reduced to 5 seconds (50 Mbps) for me.

src/github/mod.rs

pub async fn fetch_repo_files(repository: &Repository) -> Result<Vec<File>> {
    let Repository {
        owner,
        name,
        branch,
    } = repository;

    let url = format!("https://github.com/{owner}/{name}/archive/{branch}.zip");
    let response = reqwest::get(url).await?.bytes().await?;
    ...
}
Enter fullscreen mode Exit fullscreen mode

The improvement is due to the collaboration between the ONNX Runtime, Ort, and Rayon crates. This partnership has made the embedding process much faster. The insights.opensauced.pizza repository can now be embedded in just seconds, a significant improvement from the previous 15+ minutes. The embeddings now only take about 30 seconds to generate, making them truly blazing-fast.

src/github/mod.rs

pub async fn embed_repo<M: EmbeddingsModel + Send + Sync>(
    repository: &Repository,
    files: Vec<File>,
    model: &M,
) -> Result<RepositoryEmbeddings> {
    let file_embeddings: Vec<FileEmbeddings> = files
        .into_par_iter()
        .filter_map(|file| {
            let embed_content = file.to_string();
            let embeddings = model.embed(&embed_content).unwrap();
            Some(FileEmbeddings {
                path: file.path,
                embeddings,
            })
        })
        .collect();
}
Enter fullscreen mode Exit fullscreen mode

Act 2: /query

At the centre of the /query endpoint, there is a feedback loop. Unlike conventional loops that follow a predetermined path, this loop is a journey into the unknown. It responds and evolves based on context and interactions, as determined by GPT-3.5.

OpenAI's function calling entered the stage opportunely. Within the heart of the loop, some semantic search functions are exposed. GPT-3.5 leverages these functions to gather pertinent information from the codebase, aligning with the user's query and intent.

The magic happens when these functions interweave. GPT-3.5 dynamically chooses which function to invoke based on the conversation. As GPT-3.5 traverses through the functions, the loop adapts, collecting more information, refining its understanding, and ultimately crafting responses that are not just relevant, but insightful.

src/conversation/mod.rs

pub async fn generate(&mut self) -> Result<()> {
    'conversation: loop {
        let request = generate_completion_request(self.messages.clone(), "auto");
        ...
        match parsed_function_call.name {
            Function::SearchCodebase=> ...
            Function::SearchFile => ...
            Function::SearchPath => ...
            Function::Done => ...
        }
    ...
}
Enter fullscreen mode Exit fullscreen mode

The Epilogue: Rust and GPT-3.5 – An Unconventional Symbiosis

Transitioning to Rust for crafting an AI application may not align with convention, sure. But hey, who said innovation was all about following the recipe?

You can try out the service at https://opensauced.ai/.
For the source code, visit Repo-Query's GitHub repository.

Stay Saucy🍕

Top comments (7)

Collapse
 
abhishekbose profile image
AbhishekBose

Great blog. Wanted to understand what is the ONNX runtime being used here for if embeddings are being created using OpenAI apis?

Collapse
 
anush008 profile image
Anush

To generate the embeddings. OpenAI is used for completion and function calling.

Collapse
 
fadhli profile image
Fadhli

So essentially, the embeddings are generated locally via ONNX inference (which is using OpenAI model) right?

Thread Thread
 
anush008 profile image
Anush

Yes. Specifically using github.com/Anush008/fastembed-rs.

Collapse
 
_seanlee_ profile image
Sean Lee

Just to clarify, you didn't use openai embedding model right? Instead used fastembed-rs locally and used OpenAI model for generating answers. Correct?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.