Building a conversational AI voice agent has become incredibly accessible thanks to OpenAI’s real-time APIs. In this article, we’ll create a fully functional conversational AI voice agent using Next.js 15. By the end, you’ll have a basic voice-enabled AI agent that listens to users, generates responses in real-time, and speaks back to them.
Let’s dive in step by step.
Prerequisites
- Basic Knowledge of JavaScript/React: You should be comfortable with basic coding concepts.
- Node.js Installed: Ensure you have Node.js v16 or higher installed.
- OpenAI API Key: Create an account and obtain an API key from OpenAI.
- Microphone and Speaker: Required for testing voice input and output.
Step 1: Setting Up a New Next.js 15 Project
Start by creating a new Next.js project.
npx create-next-app@latest conversational-ai-agent
cd conversational-ai-agent
Install necessary dependencies:
npm install openai react-speech-recognition react-speech-kit
-
openai
: For integrating OpenAI APIs. -
react-speech-recognition
: For handling voice input. -
react-speech-kit
: For text-to-speech functionality.
Step 2: Configure the OpenAI API in Next.js
Create a file called .env.local
in the root directory and add your OpenAI API key:
OPENAI_API_KEY=your-openai-api-key
Now, create a utility function for interacting with OpenAI’s API.
utils/openai.js
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
export const getChatResponse = async (prompt) => {
const response = await openai.createChatCompletion({
model: "gpt-4",
messages: [{ role: "user", content: prompt }],
});
return response.data.choices[0].message.content;
};
This function sends a user’s query to OpenAI and retrieves the AI’s response.
Step 3: Add Speech Recognition and Text-to-Speech
We’ll now set up the microphone to capture voice input and a text-to-speech system to read AI responses aloud.
pages/index.js
import { useState } from "react";
import SpeechRecognition, { useSpeechRecognition } from "react-speech-recognition";
import { useSpeechSynthesis } from "react-speech-kit";
import { getChatResponse } from "../utils/openai";
export default function Home() {
const [conversation, setConversation] = useState([]);
const [isProcessing, setIsProcessing] = useState(false);
const { speak } = useSpeechSynthesis();
const { transcript, resetTranscript } = useSpeechRecognition();
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return <p>Your browser does not support Speech Recognition.</p>;
}
const handleStart = () => {
resetTranscript();
SpeechRecognition.startListening({ continuous: true });
};
const handleStop = async () => {
SpeechRecognition.stopListening();
setIsProcessing(true);
const userMessage = transcript;
const updatedConversation = [...conversation, { role: "user", content: userMessage }];
setConversation(updatedConversation);
// Get AI response
const aiResponse = await getChatResponse(userMessage);
setConversation([...updatedConversation, { role: "assistant", content: aiResponse }]);
// Speak AI response
speak({ text: aiResponse });
setIsProcessing(false);
};
return (
<div style={{ padding: "2rem", fontFamily: "Arial, sans-serif" }}>
<h1>Conversational AI Voice Agent</h1>
<div>
<p><strong>AI:</strong> {conversation.map((msg, idx) => (
<span key={idx}>
<em>{msg.role === "assistant" ? msg.content : ""}</em><br />
</span>
))}</p>
<p><strong>You:</strong> {transcript}</p>
</div>
<button onClick={handleStart} disabled={isProcessing}>
Start Listening
</button>
<button onClick={handleStop} disabled={isProcessing || !transcript}>
Stop and Process
</button>
</div>
);
}
Key Features:
- SpeechRecognition: Captures the user’s voice and continuously listens.
- SpeechSynthesis: Converts AI text responses into speech.
- Conversation State: Maintains a history of messages between the user and AI.
Step 4: Add CSS for Better UX
Create a styles/global.css
file and add the following:
body {
margin: 0;
padding: 0;
font-family: Arial, sans-serif;
background-color: #f4f4f9;
color: #333;
}
h1 {
text-align: center;
color: #4a90e2;
}
button {
padding: 10px 20px;
margin: 5px;
background-color: #4a90e2;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
}
button:disabled {
background-color: #ccc;
}
div {
max-width: 600px;
margin: 0 auto;
}
Step 5: Run Your Application
Start your development server:
npm run dev
Open your browser and navigate to http://localhost:3000.
- Click Start Listening to begin capturing your voice.
- Speak a question or command.
- Click Stop and Process to send your input to OpenAI and hear the AI’s response.
Step 6: Deploy the App (Optional)
Deploy your app to a platform like Vercel for wider accessibility:
npx vercel
Follow the prompts to deploy your app and share the generated URL with others.
Final Thoughts
Congratulations! 🎉 You’ve successfully created a conversational AI voice agent using Next.js 15 and OpenAI’s API. This simple implementation can be expanded with features like custom commands, improved UI, and multi-language support. The possibilities are endless!
Top comments (0)