MARS5 TTS
Open-source, insanely prosodic text-to-speech model
Listed in categories:
GitHubSoftware EngineeringArtificial IntelligenceDescription
MARS5 is a novel English speech model TTS from CAMBAI. It follows a two-stage AR-NAR pipeline with a distinctively novel NAR component, enabling it to generate speech for prosodically hard and diverse scenarios like sports commentary and anime. The model can be steered with punctuation and capitalization to guide the prosody of the output. Speaker identity can be specified using an audio reference file, enhancing the quality of the output.
How to use MARS5 TTS?
To use MARS5, load the AR and NAR models from torch hub, pick a reference audio and optionally its transcript, choose between shallow or deep clone inference, and perform synthesis to generate speech output. Tune the inference settings for optimal results.
Core features of MARS5 TTS:
1️⃣
Two-stage AR-NAR pipeline
2️⃣
Prosody guidance with punctuation and capitalization
3️⃣
Speaker identity specification
4️⃣
Deep clone for improved quality
5️⃣
Inference settings tuning
Why could be used MARS5 TTS?
# | Use case | Status | |
---|---|---|---|
# 1 | Sports commentary | ✅ | |
# 2 | Anime voice dubbing | ✅ | |
# 3 | Voice cloning | ✅ |
Who developed MARS5 TTS?
CAMBAI is a research team of Interspeech-published Carnegie Mellon ex-Siri engineers, dedicated to making everyone's voice count. They actively welcome contributions and are open to collaborations.