Show HN: MiraTTS, a 48kHz Open-Source TTS at 100x Real-Time Speed

I’ve been working on MiraTTS, a fine-tune of Spark-TTS designed for high realism and stable text-to-speech. The goal was to create an incredibly fast but high quality model.

Most open TTS models are either computationally heavy or generate 16-24kHz audio. Mira achieves high fidelity and speed by combining two things:

FlashSR: For generating crisp and clearer 48kHz audio outputs.

LMDeploy: Heavily optimized inference allowing for 100x real-time speed and low latency (roughly150ms).

I built this so local users have access to a high quality local text-to-speech model that works for any usecase. It’s currently in its early stages, and I'm currently experimenting with multilingual versions and multi-speaker versions. Streaming is coming soon as well.

Repo: https://github.com/ysharma3501/MiraTTS

Model: https://huggingface.co/YatharthS/MiraTTS

I also wrote a breakdown on how these LLM based TTS models work: https://huggingface.co/blog/YatharthS/llm-tts-models

Summary

MiraTTS is an open-source, high-quality text-to-speech (TTS) system developed by researchers. It uses a novel neural network architecture and training approach to produce realistic and natural-sounding speech from text.

Story

Show HN: MiraTTS, a 48kHz Open-Source TTS at 100x Real-Time Speed