DEV Community

Cover image for Local Embeddings with Fastembed, Rig & Rust
Josh Mo
Josh Mo

Posted on

Local Embeddings with Fastembed, Rig & Rust

Introduction

Hello world! Today we're going to be talking about embeddings. Used in recommendation systems, semantic search and Retrieval Augmented Generation (RAG) pipelines, embeddings are the core of being able to easily retrieve contextual information for a given query. This result can then be plugged into an LLM to retrieve a result - or alternatively, the resulting embedding can be stored in a database.

In this article, we're going to get into why you should use local embeddings vs. embeddings from managed models, as well as the technical implementation for how to create and use your local embeddings.

Interested in checking out the code for this example? Have a look at the example on our GitHub repo.

Why local embeddings?

Embeddings, while an important part of the RAG ecosystem, can be relatively expensive to create, especially as you start storing embeddings from bigger and bigger amounts of documents. Some simple use cases where this can happen might be:

  • If you're running a directory that aggregates a list of businesses
  • If you have a large amount of products that you sell (a supermarket or large grocery store being a good example of this)
  • You're storing user-submitted uploads

For all the convenience that managed models provide, there are also data concerns. If you're working on a commercial project, you may not want to hand over your data to OpenAI - especially in industries where data privacy is absolutely paramount. In this case, a local model has exactly what you need: text embedding in the knowledge that your data will be staying totally on your infrastructure.

Building your project

Without further do, let's get to building! Before we start, don't forget to have the Rust programming language installed.

Getting started

To get started, we'll spin up a new project:

cargo init local-embeddings-rig
cd local-embeddings-rig
Enter fullscreen mode Exit fullscreen mode

Next, we'll need to add our dependencies:

cargo add rig-core serde rig-fastembed -F rig-core/derive,serde/derive
Enter fullscreen mode Exit fullscreen mode

Note that this one-line command adds the following dependencies:

  • rig-core: The rig crate which houses all the traits (interfaces) we need to create the embeddings. We've added the derive feature to get the Embed derive macro which will make it much easier to embed whatever we need to.
  • rig-fastembed: The fastembed-rs integration for rig.
  • serde: A de/serialization library. We've added the derive feature for the Serialize and Deserialize macros, both of which are extremely helpful for serializing and deserializing simple structs without hassle.

Let's get embedding with Rig

Next, we'll go into main.rs and define a struct which will serve as the payload we want to associate with our struct.

use rig::Embed;
use rig_fastembed::FastembedModel;
use serde::{Deserialize, Serialize};

// Shape of data that needs to be RAG'ed.
// The definition field will be used to generate embeddings.
#[derive(Embed, Clone, Deserialize, Debug, Serialize, Eq, PartialEq, Default)]
struct WordDefinition {
    id: String,
    word: String,
    #[embed]
    definitions: Vec<String>,
}
Enter fullscreen mode Exit fullscreen mode

Once done, we'll define our main function (which will use the Tokio runtime macro to allow for easy async) as well as a new rig-fastembed::Client which we'll be using to create our embedding models from.

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Create OpenAI client
    let fastembed_client = rig_fastembed::Client::new();

    let embedding_model = fastembed_client.embedding_model(&FastembedModel::AllMiniLML6V2Q);


    // .. some more code goes here

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

The rig-core library exposes an EmbeddingsBuilder struct which we can use to store our documents in. Note that the field which we've annotated previously with #[embed] will be used as the text source for embedding, but if you want to change this, you can create custom behavior by implementing rig::Embed manually.

let documents = vec![
        WordDefinition {
            id: "doc0".to_string(),
            word: "flurbo".to_string(),
            definitions: vec![
                "A green alien that lives on cold planets.".to_string(),
                "A fictional digital currency that originated in the animated series Rick and Morty.".to_string()
            ]
        },
        WordDefinition {
            id: "doc1".to_string(),
            word: "glarb-glarb".to_string(),
            definitions: vec![
                "An ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.".to_string(),
                    "A fictional creature found in the distant, swampy marshlands of the planet Glibbo in the Andromeda galaxy.".to_string()
            ]
        },
        WordDefinition {
            id: "doc2".to_string(),
            word: "linglingdong".to_string(),
            definitions: vec![
                "A term used by inhabitants of the sombrero galaxy to describe humans.".to_string(),
                "A rare, mystical instrument crafted by the ancient monks of the Nebulon Mountain Ranges on the planet Quarm.".to_string()
            ]
        },
    ];

let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
    documents(documents)?
    .build()
    .await?;

Enter fullscreen mode Exit fullscreen mode

And that's it! The embeddings variable will now be an iterator that outputs (embedding, T). You can use this however you want. For example:

for (embedding, document) in embeddings {
    println!("{embedding}");
    println!("{document:?}");
}
Enter fullscreen mode Exit fullscreen mode

Using embeddings with vector stores in Rig

Next, we create our vector store and add the embeddings to it. Note that embeddings actually returns an (embedding, doc) tuple. The from_documents_with_id_f takes a tuple, as well as a closure (an anonymous function) for the document ID. Additionally we also have other integrations for vector stores if you'd like to use production-ready alternatives such as Qdrant, LanceDB and MongoDB. Check them out on our GitHub repo.

// Create vector store with the embeddings
let vector_store = InMemoryVectorStore::from_documents_with_id_f(embeddings, |doc| doc.id.clone());

// Create vector store index
let index = vector_store.index(embedding_model);
Enter fullscreen mode Exit fullscreen mode

Finally, to use our vector store we use top_n to grab the results, annotating the function with the type we want to return from the function.

Note that here the returned type of the results variable is Vec<(f64, String, T)> (where T in this case is the WordDefinition struct). However, because we don't necessarily need to know what type it is (or it's obvious what type it is), we've left it out in the type annotation (Vec<_>).

let results = index
    .top_n::<WordDefinition>("I need to buy something in a fictional universe. What type of money can I use for this?", 1)
    .await?
    .into_iter()
    .map(|(score, id, doc)| (score, id, doc.word))
    .collect::<Vec<_>>();

    println!("Results: {:?}", results);

let id_results = index
    .top_n_ids("I need to buy something in a fictional universe. What type of money can I use for this?", 1)
    .await?
    .into_iter()
    .collect::<Vec<_>>();

println!("ID results: {:?}", id_results);
Enter fullscreen mode Exit fullscreen mode

So what's next?

Thanks for reading! If you have any thoughts, feel free to let us know in the comments.

For additional Rig resources and community engagement:

Top comments (0)