fotiecodes

Posted on Jan 19 • Originally published at blog.fotiecodes.com

HearItServer: Your Offline TTS Server for Local Speech Synthesis

#tts #kokoro #onnxruntime #ai

Nowadays AI-driven text-to-speech (TTS) solutions are dominated by cloud-based APIs, HearItServer emerges as a powerful alternative, bringing blazing-fast speech synthesis to local machines. Built on top of Kokoro-ONNX, the fastest and most efficient open-source TTS model, HearItServer provides developers with a ready-to-use, high-performance text-to-speech solution that can seamlessly integrate into their applications, enabling offline speech synthesis without requiring an internet connection.

I built HearItServer as a core component of a larger project I'm working on at the moment, a tool designed to help users read books, documents, and other text-based content faster and more efficiently. My goal is to develop an app that enables users to consume more books while making reading more engaging, all offline. HearItServer powers the offline TTS functionality of this project, but I realized it could also be useful to developers looking for a lightweight, private, and fast text-to-speech solution. So, I decided to make it free and open for others to build on.

If you need real-time speech synthesis without latency, data privacy concerns, or API rate limits, this is the ultimate local TTS solution.

Why Use HearItServer?

Unlike traditional TTS services that require online APIs, HearItServer is designed to run entirely on your local machine. This means:

✅ Lightning-Fast Inference – Thanks to Kokoro-ONNX, the inference is optimized for speed.

✅ Privacy-Preserving – No data is sent to external servers, making it ideal for secure environments.

✅ Fully Offline – No need for API keys or internet connectivity.

✅ Easy Integration into any application – Exposes a simple REST API for seamless integration into any application you built.

How It Works

HearItServer is essentially a lightweight Flask-based REST API that hosts Kokoro-ONNX, allowing any application to send text and receive high-quality, natural-sounding speech in response. This makes it incredibly easy to integrate into desktop applications, automation workflows, and AI assistants.

Setting Up HearItServer

1️⃣ Install HearIt

Download and install the HearItServer application on your machine. Once installed, launch it, and a menu bar icon will appear on macOS.

2️⃣ Start the TTS Server

Click on the menu icon and select "Start TTS Server". The server will now be running locally at:

http://localhost:7008

Using the API (100% local)

The HearItServer provides a simple API endpoint to generate speech from text.

Endpoint:

POST http://localhost:7008/v1/audio/speech

Request Body (JSON):

{
  "text": "Hello, this is a test message!",
  "voice": "af_sarah",
  "speed": 1.0,
  "lang": "en-us"
}

Available Voices:

af_sarah
af_bella
af_nicole
af_sky
am_adam
am_michael
bf_emma
bf_isabella
bm_george
bm_lewis

Response:

Success: A .wav file is returned as a binary response.
Error: A JSON object containing an error message.

Example: Using HearItServer in TypeScript

To integrate HearIt into your application, you can send requests using TypeScript and Axios:

import axios from 'axios';
import * as fs from 'fs';

const url = "http://localhost:7008/v1/audio/speech";
const headers = { "Content-Type": "application/json" };
const data = {
    text: "Hello, world!",
    voice: "af_sarah",
    speed: 1.0,
    lang: "en-us"
};

axios.post(url, data, { responseType: 'arraybuffer' })
    .then(response => {
        fs.writeFileSync("output.wav", Buffer.from(response.data));
        console.log("Audio saved as output.wav");
    })
    .catch(error => {
        console.error("Error:", error.response ? error.response.data : error.message);
    });

This script sends a request to the local TTS server, receives the audio response, and saves it as a .wav file.

Stopping the TTS Server

Click on the menu bar icon.
Select "Stop TTS Server" to terminate the service.

Build Anything with Local TTS

The beauty of HearItServer is its flexibility, it provides a universal interface for local TTS inference, meaning anyone can build applications on top of it! Some potential use cases include:

🤖 AI Assistants – Power your local AI chatbot with real-time speech synthesis.
📝 Voice Narration – Generate high-quality audio for videos or presentations.
🎮 Game Development – Implement dynamic in-game voice synthesis without cloud dependency.
🦾 Automation – Integrate TTS into scripts, notifications, or smart assistants.

With HearItServer, developers get full control over their text-to-speech processing, powered by the fastest open-source TTS model Kokoro-82M.

Conclusion

If you're looking for a fast, efficient, and private way to generate speech locally, HearItServer is your best bet. It harnesses the power of Kokoro to deliver ultra-fast TTS inference, making it ideal for real-world applications.

Ready to get started? go ahead and download HearItServer and use it for your apps

📖 Learn more about Kokoro-ONNX: GitHub Repository

PS: This project is still in development and there might be bugs, expect frequent updates and improvements as I continue refining it. Feedback are always welcome!

DEV Community