Introducing MARS5, open-source, insanely prosodic text-to-speech (TTS) model.

#speech #ai #machinelearning

CAMB.AI introduces MARS5, a fully open-source (commercially usable) TTS with break-through prosody and realism available on our Github: https://www.github.com/camb-ai/mars5-tts

Watch our full release video here:
https://www.youtube.com/watch?v=bmJSLPYrKtE

Why is it different?
MARS5 is able to replicate performances (from 2-3s of audio reference) in 140+ languages, even for extremely tough prosodic scenarios like sports commentary, movies, anime and more; hard prosody that most closed-source and open-source TTS models struggle with today.

We're excited for you to try, build on and use MARS5 for research and creative applications. Let us know any feedback on our Discord!

Top comments (3)

Akshat Prakash • Jun 10

Highlights:
Training data: Trained on over 150K+ hours of data.
Params: 1.2 Bn (750/450)
Multilingual: Open-sourcing in English to begin with, but can access it in 140+ languages on camb.ai
Diversity in prosody: can handle very hard prosodic elements like commentary, shouting, anime etc.

Akshat Prakash • Jun 10

The model follows a two-stage setup, operating on 6kbps encodec tokens. Concretely, it consists of a ~750M parameter autoregressive part (which we call the AR model) and a ~450M parameter non-autoregressive multinomial diffusion part (which we call the NAR model). The AR model iteratively predicts the most coarse (lowest level) codebook value for the encodec features, while the NAR model takes the AR output and infers the remaining codebook values in a discrete denoising diffusion task. Specifically, the NAR model is trained as a DDPM using a multinomial distribution on encodec features, effectively ‘inpainting’ the remaining codebook entries after the AR model has predicted the coarse codebook values.

The model was trained on a combination of publicly available datasets, as well as internally provided by our customers which include large sports leagues, and international creatives.

Akshat Prakash • Jun 10 • Edited

Links:
Discord: discord.gg/4GVdQ28cZC
Github: github.com/camb-ai/mars5-tts
Website: camb.ai
Youtube: youtube.com/@camb-ai

DEV Community

Introducing MARS5, open-source, insanely prosodic text-to-speech (TTS) model.

Top comments (3)

Read next

Behavioral Questions in AI Interviews: 2025 Insights

Emergent Abilities of Large Language Models – Fact or Mirage?

The Power of Customer Feedback: Why It Matters and How to Use It

Lang Everything: The Missing Guide to LangChain's Ecosystem