DEV Community

sagaruprety
sagaruprety

Posted on • Edited on • Originally published at Medium

Music Video Search with Qdrant

Introduction

In a world inundated with content, discovering new music videos can feel like stumbling through a maze of endless options. As music enthusiasts, we often find ourselves craving a more intuitive and personalized way to explore the vast sea of melodies.

In this blog post, we’ll discuss how to build a music video search and discovery application powered by the capabilities of Qdrant vector store in Python. Qdrant offers advanced search and indexing functionalities, offering a robust and versatile solution for efficient data retrieval and exploration. Designed to meet the evolving needs of modern applications, it provides a powerful framework for organizing, querying, and analyzing vast datasets with speed and precision.

For this blog, we will be using the search and discovery functionalities of Qdrant, so let’s go through the relatively newly introduced Discovery API. All the code in this blog can also accessed at: https://github.com/sagaruprety/youtube_video_search

Qdrant Discovery API

In the Discovery API, Qdrant introduces the concept of “context,” utilized for partitioning the space. Context comprises positive-negative pairs of data points, each pair delineating the space into positive and negative zones. During its usage, the search prioritizes points based on their presence in positive zones or avoidance of negative ones.

The method for supplying context is either via the IDs of the data points or their embeddings. However, they must be provided as positive-negative pairs.

The Discovery API facilitates two novel search types:

  1. Discovery Search: Utilizes the context (positive-negative vector pairs) and a target to retrieve points most akin to the target while adhering to the context’s constraints.

a figure showing scattered data points grouped into positive and negative context. It also contains a target data point.

Source: https://qdrant.tech/documentation/concepts/explore/#discovery-search

  1. Context Search: Solely employs the context pairs to fetch points residing in the optimal zone, where loss is minimized. No need for specifying a target embedding. This can be used for recommendation when one has obtained a few data points about a user’s likes and dislikes.

A figure containing scattered data points which are grouped into positive and negative contexts
Source: https://qdrant.tech/documentation/concepts/explore/#context-search

Installations

You need to first start a local Qdrant server. The easiest way to do this is via docker. Ensure you have docker installed in your system. Then, go to a terminal and paste the following commands:

docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
Enter fullscreen mode Exit fullscreen mode

You can then go to your browser at http://localhost:6333/dashboard and see the Qdrant dashboard.

Also, we need to install the Qdrant Python client, Sentence-transformers library, which contains the vector embedding models, and Pandas for data preprocessing:

pip install qdrant-client pandas sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Dataset

We use an openly available YouTube videos dataset from Kaggle. This dataset consists of the most trending YouTube videos and was scrapped using YouTube’s API. It is essentially a CSV file with a video URL and some metadata about the videos. Specifically, there are 5 fields — Title, Videourl, Category, and Description.

 Create Collection

We first download the above-mentioned dataset, load it using the Pandas library, and pre-process it. We filter out all videos which do not belong to the category — ‘Art&Music’.

import pandas as pd

# Load the CSV file into a Pandas DataFrame
csv_file = './data/Youtube_Video_Dataset.csv'
df = pd.read_csv(csv_file)

# filter out all other categories
only_music = df[df['Category'] == 'Art&Music']

# convert all values into string type
only_music['Title'] = only_music['Title'].astype(str)
only_music['Description'] = only_music['Description'].astype(str)
only_music['Category'] = only_music['Category'].astype(str)
only_music.head()
Enter fullscreen mode Exit fullscreen mode
Title Videourl Category Description
9446 FINE ART Music and Painting PEACEFUL SELECTION... /watch?v=13E5azGDK1k Art&Music CALM MELODIES AND BEAUTIFUL PICTURES\nDebussy,...
9447 Improvised Piano Music and Emotional Art Thera... /watch?v=5mWjq2BsD9Q Art&Music When watching this special episode of The Perf...
9448 babyfirst art and music /watch?v=rrJbuF6zOIk Art&Music nan
9449 Art: music & painting - Van Gogh on Caggiano, ... /watch?v=1b8xiXKd9Kk Art&Music ♫ Buy “Art: Music & Painting - Van Gogh on on ...
9450 The Great Masterpieces of Art & Music /watch?v=tsKlRF2Gw1s Art&Music Skip the art museum and come experience “Great...

Next, we create a Qdrant collection. We need to instantiate a Qdrant client and connect it to Qdrant’s local server running at port 6333. The recreate_collection function takes in a collection_name argument, which is the name you want to give to your collection. Note also the vectors_config argument, where we define the size of vector embeddings (our embedding model will be 384 dimensions), and similarity calculation metric, where we use cosine similarity. One can also use the create_collection function, but it will throw an error if you call the function again with the same collection name.

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient("localhost", port=6333)

client.recreate_collection(
   collection_name="youtube_music_videos",
   vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),
)
Enter fullscreen mode Exit fullscreen mode

We also initialize the embeddings model. Here we use the Sentence-transformer library and the MiniLM model, which is a lightweight embedding model and good enough for common language words.

# Initialize SentenceTransformer model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
Enter fullscreen mode Exit fullscreen mode

We also need to convert the Pandas dataframe to a dictionary of records, to insert into the Qdrant collection.

# convert pandas dataframe to a dictionary of records for inserting into Qdrant collection
music_videos_dict = only_music.to_dict(orient='records')
music_videos_dict
Enter fullscreen mode Exit fullscreen mode
[{'Title': 'FINE ART Music and Painting PEACEFUL SELECTION (Calm Melodies and Beautiful Pictures)',
 'Videourl': '/watch?v=13E5azGDK1k',
 'Category': 'Art&Music',
 'Description': 'CALM MELODIES AND BEAUTIFUL PICTURES\nDebussy, Milena Stanisic,\nPiano, Flute, Harp,\nFlowers, Sailing, Mediterranean, Lavender,',
 {'Title': 'Improvised Piano Music and Emotional Art Therapy - Featuring Erica Orth',
 'Videourl': '/watch?v=5mWjq2BsD9Q',
 'Category': 'Art&Music',
 'Description': 'When watching this special episode of The Perfect Note, keep in mind, every single note heard and stroke of paint seen in this video is completely improvised…'},
 {'Title': 'babyfirst art and music',
 'Videourl': '/watch?v=rrJbuF6zOIk',
 'Category': 'Art&Music',
 'Description': 'nan',…]
Enter fullscreen mode Exit fullscreen mode

Finally, we insert the records into the collection, including converting the text in the combined_text columns to embeddings:

# upload the records in the Qdrant collection, including creating the vector embeddings of the Title column
for idx, doc in enumerate(music_videos_dict):
 client.upload_records(
 collection_name="youtube_music_videos",
 records=[
 models.Record(
 id=idx, vector=model.encode(doc["Title"]), payload=doc
 )])
Enter fullscreen mode Exit fullscreen mode

Now that we have the data in our collection, let’s do some semantic search on it.

# perform semantic search for a given query in the collection
def search_video(query: str) -> list[dict]:
   collection_name = "youtube_music_videos"
# Convert text query into vector
   vector = model.encode(query).tolist()

   # Use `vector` for search for closest vectors in the collection
   search_results = client.search(
       collection_name=collection_name,
       query_vector=vector,
       query_filter=None,  # If you don't want any other filters
       limit=10,  # get 10 most similar results
   )
   # `search_results` contains found vector ids with similarity scores along with the stored payload
   results = []
   for hit in search_results:
       item = {}
       # print(hit)
       item['score'] = hit.score
       item['Title'] = hit.payload['Title']
       url = hit.payload['Videourl']
       item['URL'] = f'youtube.com{url}'
       results.append(item)
   return results
Enter fullscreen mode Exit fullscreen mode

We have the search function ready. All we need is a query:

# query the collection
query = 'dua lipa'
search_video(query)
Enter fullscreen mode Exit fullscreen mode
[{'score': 0.8309551,
  'Title': 'Dua Lipa - New Rules (Official Music Video)',
  'URL': 'youtube.com/watch?v=k2qgadSvNyU'},
 {'score': 0.8116781,
  'Title': 'Dua Lipa - IDGAF (Official Music Video)',
  'URL': 'youtube.com/watch?v=Mgfe5tIwOj0'},
 {'score': 0.80936086,
  'Title': 'Dua Lipa - Be The One (Official Music Video)',
  'URL': 'youtube.com/watch?v=-rey3m8SWQI'},
 {'score': 0.55487275,
  'Title': 'Sean Paul - No Lie ft. Dua Lipa (Krajnc Remix) (Baywatch Official Music Video)',
  'URL': 'youtube.com/watch?v=hMiHGkzr3ZQ'},
 {'score': 0.49306965,
  'Title': 'Lana Del Rey - Music To Watch Boys To (Official Music Video)',
  'URL': 'youtube.com/watch?v=5kYsxoWfjCg'},
 {'score': 0.48478898,
  'Title': 'Smash Mouth - All Star (Official Music Video)',
  'URL': 'youtube.com/watch?v=L_jWHffIx5E'},
 {'score': 0.47906196,
  'Title': 'Iggy Azalea - Fancy ft. Charli XCX (Official Music Video)',
  'URL': 'youtube.com/watch?v=O-zpOMYRi0w'},
 {'score': 0.47792414,
  'Title': 'ZAYN - PILLOWTALK (Official Music Video)',
  'URL': 'youtube.com/watch?v=C_3d6GntKbk'},
 {'score': 0.46913695,
  'Title': 'ZAYN - Dusk Till Dawn ft. Sia (Official Music Video)',
  'URL': 'youtube.com/watch?v=tt2k8PGm-TI'},
 {'score': 0.46150804,
  'Title': 'Sia - Chandelier (Official Music Video)',
  'URL': 'youtube.com/watch?v=2vjPBrBU-TM'}]
Enter fullscreen mode Exit fullscreen mode

We see that the search API is excellent in retrieving all Dua Lipa videos. However, since our search limit is 10, we also see other videos retrieved. However, the score tells us that the videos not of Dua Lipa have a very low score compared to Dua Lipa videos. For the cosine similarity metric that we have used, the higher the score the better.

We can set the score_threshold parameter to 0.5 in the Qdrant search function to filter out the results below a certain score. Then we only get the top 4 results, despite the maximum limit being set as 10.

Video Discovery

Let’s proceed to use Qdrant’s discovery API service to discover some music videos without explicitly searching with a query. Assume that your music search website has captured user’s preferences by either directly asking them or their search history.

As mentioned above, the discovery API takes in a context of positive and negative data points and searches for new points that are far away from the negative points and close to the positive points.

Let’s assume a given user likes classical and instrumental music and dislikes heavy metal or rock music. As we don’t have an explicit target query, we use the context search functionality of Qdrant.

# specify likes and dislikes as positive and negative queries
negative_1 = 'heavy metal'
positive_1 = 'piano music'

negative_2 = 'rock music'
positive_2 = 'classical music'

# only used when a target query is available
target_embedding = model.encode(query).tolist()

# calculate embeddings for the positive and negative points
positive_embedding_1 = model.encode(positive_1).tolist()
negative_embedding_1= model.encode(negative_1).tolist()

# calculate embeddings for the another pair of positive and negative points
positive_embedding_2 = model.encode(positive_2).tolist()
negative_embedding_2= model.encode(negative_2).tolist()

# create the context example pair
context = [models.ContextExamplePair(positive=positive_embedding_1, negative=negative_embedding_1),
          models.ContextExamplePair(positive=positive_embedding_2, negative=negative_embedding_2)]

# call the discover api
discover = client.discover(
   collection_name = "youtube_music_videos",
       context = context,
       limit=5,

)

# organize the results from the discover api
results = []
for hit in discover:
   item = {}
   item['Title'] = hit.payload['Title']
   url = hit.payload['Videourl']
   item['URL'] = f'youtube.com{url}'
   results.append(item)

display(results)
Enter fullscreen mode Exit fullscreen mode
[{'Title': 'The computer as artist: AI art and music',
  'URL': 'youtube.com/watch?v=ZDcaDv0U8yw'},
 {'Title': 'Arts For Healing: Music and Art Therapy',
  'URL': 'youtube.com/watch?v=6By9oTQIQxQ'},
 {'Title': 'Elephants, Art and Music on the River Kwai',
  'URL': 'youtube.com/watch?v=r1uDNRzcAV0'},
 {'Title': "Art: music & painting - Van Gogh on Caggiano, Floridia, Boito, Mahler and Brahms' music",
  'URL': 'youtube.com/watch?v=1b8xiXKd9Kk'},
 {'Title': 'The Artist Who Paints What She Hears',
  'URL': 'youtube.com/watch?v=zbh7tAnwLCY'}]
Enter fullscreen mode Exit fullscreen mode

We see the results are not perfect. But we still get some results related to music for relaxing or healing purposes. It could very well be because we don’t have such music videos in the original dataset. Also, the vectors have been generated using the Title of the music videos as the embeddings. One can see that the titles do not carry much information about the video. We can use the Description column to create embeddings but the description also contains many irrelevant details which can create further distortion in the vector space. Nevertheless, the context search still leads to discovering the available videos closest to our interest. Also, they are far away in content from the negative examples given. We thus see the power of Qdrant’s discovery API to discover data points of interest in a large multidimensional space of points related to YouTube music videos.

Conclusion

In this blog, we saw how we can leverage Qdrant to build a music search and discovery system. Qdrant abstracts a lot of the hard stuff and makes it easy to implement such a system with a few lines of code. For this specific example, we can improve the search further by using a better embedding model, or by embedding the videos themselves.

Qdrant also offers robust support for filters, which restrict searches based on certain metadata, but utilizing discovery search allows for additional constraints within the vector space where the search is conducted.

Top comments (0)