DEV Community

Cover image for Automating IT Interviews with Ollama and Audio Capabilities in Python
Josmel Noel
Josmel Noel

Posted on

Automating IT Interviews with Ollama and Audio Capabilities in Python

In today’s tech-driven world, automation is revolutionizing recruitment. Imagine having a virtual IT interviewer that not only interacts intelligently but also communicates verbally with candidates. This post will guide you through building an IT interviewer using Ollama and Python, integrating audio capabilities for a more immersive experience.

📚 Introduction
Finding the right talent can be challenging and time-consuming. With advancements in AI and audio processing, it's possible to automate the initial interview phase. This project showcases how to create an interactive IT interviewer that asks questions and processes answers through voice, using Ollama and Google Cloud's Speech-to-Text and Text-to-Speech APIs.

🚀 What You Will Learn

  • How to set up Ollama for conversation handling.
  • Integrate Google Cloud’s Speech-to-Text and Text-to-Speech APIs for audio capabilities.
  • Structure a Python project to automate interviews.

🛠️ Prerequisites

  • Python 3.7+
  • Google Cloud Account: For Speech-to-Text and Text-to-Speech APIs.
  • Ollama Account: For conversational AI.

📂 Project Setup
1. Clone the Repository
Start by cloning the project repository:

git clone https://github.com/josmel/ollama-it-interviewer.git
cd ollama-it-interviewer
Enter fullscreen mode Exit fullscreen mode

2. Create and Activate a Virtual Environment
Set up a virtual environment to manage dependencies:

python -m venv venv
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

3. Install Dependencies
Install the required Python packages:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

4. Configure Google Cloud
a. Enable the APIs
Enable the Speech-to-Text and Text-to-Speech APIs in your Google Cloud Console.

b. Create Service Account and Download JSON Key

  1. Go to IAM & Admin > Service accounts.
  2. Create a new service account, grant it the necessary roles, and download the JSON credentials file.

c. Set the Environment Variable
Set the environment variable to point to your credentials file

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
Enter fullscreen mode Exit fullscreen mode

Replace /path/to/your/service-account-file.json with the actual path to your credentials file.

  1. Prepare Audio Files
    Add sample audio files in the audio_samples/ directory. You need a candidate-response.mp3 file to simulate a candidate's response. You can record your voice or use text-to-speech tools to generate this file.

  2. Update Configuration
    Edit src/config.py to configure your Ollama credentials:

OLLAMA_API_URL = 'https://api.ollama.com/v1/conversations'  # Or replace with your Ollama local
OLLAMA_MODEL = 'your-ollama-model'  # Replace with your Ollama model
Enter fullscreen mode Exit fullscreen mode

7. Run the Project
Run the interviewer script:

# Option 1: Run as a module from the project root
python3 -m src.interviewer
Enter fullscreen mode Exit fullscreen mode

or

# Option 2: Ensure PYTHONPATH is set and run directly
export PYTHONPATH=$(pwd)
python3 src/interviewer.py
Enter fullscreen mode Exit fullscreen mode

📝 Detailed Explanation
interviewer.py
The main script orchestrates the interview process:

from pydub import AudioSegment
from pydub.playback import play
from src.ollama_api import ask_question
from src.speech_to_text import recognize_speech
from src.text_to_speech import synthesize_speech
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Configure FFmpeg for macOS/Linux
os.environ["PATH"] += os.pathsep + '/usr/local/bin/'

def main():
    question = "Tell me about your experience with Python."
    synthesize_speech(question, "audio_samples/question.mp3")

    question_audio = AudioSegment.from_mp3("audio_samples/question.mp3")
    play(question_audio)

    candidate_response = recognize_speech("audio_samples/candidate-response.mp3")

    ollama_response = ask_question(candidate_response)
    print(f"Ollama Response: {ollama_response}")

    synthesize_speech(ollama_response, "audio_samples/response.mp3")

    response_audio = AudioSegment.from_mp3("audio_samples/response.mp3")
    play(response_audio)

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

ollama_api.py
Handles interaction with Ollama API:

import requests
from src.config import OLLAMA_API_URL, OLLAMA_MODEL

def ask_question(question):
    response = requests.post(
        OLLAMA_API_URL,
        json={"model": OLLAMA_MODEL, "input": question}
    )
    response_data = response.json()
    return response_data["output"]
Enter fullscreen mode Exit fullscreen mode

Converts audio to text using Google Cloud:

from google.cloud import speech
import io

def recognize_speech(audio_file):
    client = speech.SpeechClient()

    with io.open(audio_file, "rb") as audio:
        content = audio.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.MP3,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)
    for result in response.results:
        return result.alternatives[0].transcript
Enter fullscreen mode Exit fullscreen mode

text_to_speech.py
Converts text to audio using Google Cloud:

from google.cloud import texttospeech
import os

def synthesize_speech(text, output_file):
    # Verify that the environment variable is set
    assert 'GOOGLE_APPLICATION_CREDENTIALS' in os.environ, "GOOGLE_APPLICATION_CREDENTIALS not set"

    client = texttospeech.TextToSpeechClient()

    synthesis_input = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    with open(output_file, "wb") as out:
        out.write(response.audio_content)
        print(f"Audio content written to file {output_file}")
Enter fullscreen mode Exit fullscreen mode

🎉 Conclusion
By integrating Ollama and Google Cloud’s audio capabilities, you can create a virtual IT interviewer that enhances the recruitment process by automating initial candidate interactions. This project demonstrates the power of combining conversational AI with audio processing in Python.

Give it a try and share your thoughts in the comments! If you encounter any issues or have suggestions, feel free to ask.

📂 Project Structure

ollama-it-interviewer/
│
├── audio_samples/
│   ├── candidate-response.mp3
│
├── src/
│   ├── interviewer.py
│   ├── ollama_api.py
│   ├── speech_to_text.py
│   ├── text_to_speech.py
│   └── config.py
│
├── requirements.txt
├── README.md
└── .gitignore
Enter fullscreen mode Exit fullscreen mode

🛠️ Resources

  • Ollama
  • Google Cloud Speech-to-Text
  • Google Cloud Text-to-Speech
  • Python pydub

💬 Questions or Comments?
Feel free to leave any questions or comments below. I’m here to help!

Repository : https://github.com/josmel/ollama-it-interviewer

Top comments (2)

Collapse
 
syeo66 profile image
Red Ochsenbein (he/him) • Edited

Wow. If a company does this I'm out immediately. Why should I waste my time with AI crap if the company does not value my time? It's a little project to try those technologies, but please don't actually use it.

Collapse
 
josmel profile image
Josmel Noel

Thank you for sharing your perspective! I completely understand your concern. This project is designed more as an exploration of the technical capabilities of Ollama AI and real-time audio integration, rather than as a replacement for the human interview process.