DEV Community

Cover image for How to Build a YouTube Video Search Voice Assistant
Khumbo Klein Chilamwa
Khumbo Klein Chilamwa

Posted on • Edited on

How to Build a YouTube Video Search Voice Assistant

Voice assistants are software programs that can understand and respond to voice commands. With the release of Amazon’s Echo and Google Home, voice assistants are becoming more popular than ever. Searching for a video on YouTube can be a hassle, especially if you don’t know what you’re looking for. But what if you could have a voice assistant that could search through YouTube videos for you? That would be pretty cool, right? In this article, I will show you how to build a YouTube video search voice assistant using SpeechRecognition and the pyttsx3 Python packages. So if you’re ready to get started, let’s dive right in!

Here is the Table of Contents:

  • Creating a Virtual Environment
  • Installing the Required Packages
  • Building the YouTube Video Voice Assistant
    • Creating the speak() Function
    • Creating the voice_assistant() Function
    • Creating the search() Function
  • Testing the YouTube Video Search Voice Assistant
  • Conclusion

Creating a Virtual Environment

First things first, let us create the virtual environment for the project, in your terminal run this command:
$ python -m venv project
You can call your virtual environment any name, but just make sure the name is meaningful, not cryptic.

Installing the Required Packages

Now that the virtual environment is taken care of, it is time we install all the required packages for this project, so activate the virtual environment by running this command on Windows:
$ .\project\Scripts\activate

And on Linux or macOS use:
$ source project/bin/activate

Having activated the virtual environment, we are ready to install the packages, run this command:
$ pip install pyttsx3 SpeechRecognition pyaudio

Installing packages inside a virtual environment means that they are not global, but rather local for that one special project they are installed for. And these virtual environments come in handy because they help in managing packages very easily.

Building the Voice Assistant

In this section we will build the actual YouTube Video Search Voice Assistant, we will build it from the ground up, so roll up your sleeves for we are about to go coding.

The first task that we have to do is to create a new Python file, let us call it video_search_assistant.py, as usual just be extra careful when naming your Python files, make sure the names are meaningful and that they do not conflict with the packages being used in your project. Open the file and do the following imports:

# this package is a web browser controller
import webbrowser
# this package is for speach recognition
import speech_recognition
# this package converts text to speech
import pyttsx3
# this package will be used to get the PC username
import os
Enter fullscreen mode Exit fullscreen mode

Let us break down the above code, our first import is the webbrowser package, this package is the web browser controller, and we will use it for launching the web browser. The speech_recognition package is simply for recognizing speech, it works hand in hand with the pyaudio package to capture voice from the microphone and process it accordingly.

The other package that we are importing is the pyttsx3, this package will convert a given text to speech and the final package being imported is the os package, which we will use for getting the computer’s username.

After taking care of the imports, our next task will be creating the speak() function, this function will help turn text data into audio data, now just below the imports paste the following code:

# function to turn textual data into audio data
def speak(command):
    # initializing the pyttsx3 object
    engine = pyttsx3.init()
    # gets the speaking rate
    rate = engine.getProperty('rate')
    # setting the speaking rate
    engine.setProperty('rate', 125)
    # getting the available voices
    voices = engine.getProperty('voices')
    # setting the second voice, the female voice
    engine.setProperty('voice', voices[1].id)
    # this function takes the word to be spoken
    engine.say(command)
    # this function processes the voice commands
    engine.runAndWait()
Enter fullscreen mode Exit fullscreen mode

In the code above, we are creating a function speak() that is taking command as an argument, inside this function we are initializing the pyttsx3 object, then using this object we are getting and setting the speaking rate. We are also able to get a voice from the available voices and set it, in our case we are using the second voice which happens to be the female voice.

After this, we are passing the command to the say() function, and the runAndWait() function processes the voice commands.

Our next task is to create the voice_assistant() function, so in the file just below the speak() function paste the following code:

# creating a Recognizer object
recognizer = speech_recognition.Recognizer()
# prints the message to let the user know to start speaking
print(f'Say something, am listening')
# the function for recognizing speech and processing it accordingly
def voice_assistant():
    # the try statement will execute the speech recognizing code
    try:
        # here we are creating a context for the speech_recognition.Microphone() function
        # this enables us to use the microphone
        with speech_recognition.Microphone() as mic:
            # this listens to the ambient noise for the specified duration
            recognizer.adjust_for_ambient_noise(mic, duration=0.5)
            # capturing speech from the default microphone
            audio = recognizer.listen(mic)
            # recognizing speech using google speech recognition API 
            text = recognizer.recognize_google(audio)
            # this converts the text to lowercase text
            text = str(text.lower())
            # calling the speak function, it takes text as an argument
            speak(f'Be patient while i search for you all {text} videos')
            # this will print the captured word
            print(f'Searching {text}')

    # this except statement catches an error when the assistant fails to recognize the said word   
    except speech_recognition.UnknownValueError:
        # calling the speak function, it takes text as an argument
        speak(f'Sorry i did not hear you!!!')
        print(f'Sorry i did not hear you!!!')

    # this except statement catches an error when the assistant fails to recognize the said word 
    except speech_recognition.RequestError:
        # calling the speak function, it takes text as an argument
        speak(f'Make sure you have a stable internet connection!!!')
        print(f'Make sure you have a stable internet connection!!!')

# this is a function call 
voice_assistant()
Enter fullscreen mode Exit fullscreen mode

Before we move any further, let us break this code down so that we are on the same page. We are creating a speech recognition object via speech_recognition.Recognizer() function, below it we have a print() function. Then we are creating a function called voice_assistant(), inside this function we have a try/exceptblock, the first except statement catches UnknownValueError which occurs when the voice assistant fails to recognize the speech and the second except statement catches a RequestError caused by unstable or no internet connection.

If you notice, inside these two except statements we are calling the speak() function and we are passing a string as an argument and we are also printing the same string via the print() function. Having the speak() function and the print statement inside the two except statements at once seems repetitive, but this is for testing purposes only.

Inside the try statement, we are declaring the microphone as the source of input and this code:
recognizer.adjust_for_ambient_noise(mic, duration=0.5)
helps let the recognizer waits for a duration of 0.5 seconds to adjust the energy threshold based on the surrounding noise levels. According to SpeechRecognition documentation, the best duration for accurate results is 0.5 seconds.

After adjusting the ambient noise, we are letting the **recognizer **capture audio from the microphone using the code:
audio = recognizer.listen(mic)

This audio data is then converted to text using this line of code:
text = recognizer.recognize_google(audio)

Finally, we are calling the speak() function as well and we are passing the argument text and we are printing the text.

We will now test the program, make sure your computer’s microphone is working and that your environment is quiet. Such an environment is helpful because it will enable the voice assistant to recognize the speech without any problems, but if the environment is the noise the voice assistant will have difficulties distinguishing the speech from noise.

In your terminal run this command:
$ python video_search_assistant.py

Say a word when you are prompted to, the first output will be an audible voice and the second will be this:

Say something, am listening
Searching programming
Enter fullscreen mode Exit fullscreen mode

It seems the program can capture the speech and process it accordingly.

We will now move on to creating the search() function, this function will enable the voice assistant to search for YouTube videos, so above the speak() function or just below the imports, paste this code:

# function for searching youtube videos
def search(query):
    # creating the search url
    url = f'https://www.youtube.com/results?search_query={query}'
    # opening the webbrowser
    webbrowser.get().open(url)
Enter fullscreen mode Exit fullscreen mode

Here we are creating a function that takes query as an argument, the function is called search(), then we are creating a URL, this is just a youtube video search URL. After everything, the webbrowser.get().open(url) will open the URL in the web browser. Remember, we will not type the URL manually but we will get it from the microphone.

Now to do a YouTube video search we need to call the search() function, we will call it in the try statement of the voice_assistant() function since we want it to search the data captured from the microphone. Below this line of code:
text = str(text.lower())
Paste this line of code:
search(text)

And below the voice_assistant() function, add this code:

# this gets the username of the computer 
username = os.getlogin()
# calling the speak function to welcome the user by the username
speak(f'Welcome {username}')
# asking the user what to search for
speak(f'What do you want to search for?')  
Enter fullscreen mode Exit fullscreen mode

Testing the YouTube Video Search Voice Assistant

In this section, we will test the program, and let us search for programming videos. You can run it again using:
$ python video_search_assistant.py

If the program runs successfully, it will welcome you and ask you what videos you want to search for, and if you provide valid input you will get this out:

Image description

And in the terminal you will get this output:

Say something, am listening
Searching programming
Enter fullscreen mode Exit fullscreen mode

If you run the program and speak a cryptic word, the program will give you this output:

Say something, am listening
Sorry i did not hear you!!!
Enter fullscreen mode Exit fullscreen mode

And at the same time, the speak() function will echo the same message.

And if you have an unstable or no internet connection, the output will be as follows:

Say something, am listening
Make sure you have a stable internet connection!!!
Enter fullscreen mode Exit fullscreen mode

This message will again be echoed by the speak() function.

Congratulations on creating your voice assistant, and to make the code a little cleaner you can remove all the print statements. If you observe well there are four print statements.

Conclusion

This article has shown you how to build a YouTube Video Search Voice Assistant. I hope there is so much that you have learned and that you can use this knowledge to create your own amazing projects. If you want to further this project feel free to do so, you can add as many cool features as you want. Thanks for reading.

Top comments (0)