Subscribe to get weekly email with the most promising tools 🚀

MARS5 TTS

Open-source, insanely prosodic text-to-speech model

Listed in categories:

GitHubSoftware EngineeringArtificial Intelligence
MARS5 TTS-image-0
MARS5 TTS-image-1

Description

MARS5 is a novel English speech model TTS from CAMBAI. It follows a two-stage AR-NAR pipeline with a distinctively novel NAR component, enabling it to generate speech for prosodically hard and diverse scenarios like sports commentary and anime. The model can be steered with punctuation and capitalization to guide the prosody of the output. Speaker identity can be specified using an audio reference file, enhancing the quality of the output.

How to use MARS5 TTS?

To use MARS5, load the AR and NAR models from torch hub, pick a reference audio and optionally its transcript, choose between shallow or deep clone inference, and perform synthesis to generate speech output. Tune the inference settings for optimal results.

Core features of MARS5 TTS:

1️⃣

Two-stage AR-NAR pipeline

2️⃣

Prosody guidance with punctuation and capitalization

3️⃣

Speaker identity specification

4️⃣

Deep clone for improved quality

5️⃣

Inference settings tuning

Why could be used MARS5 TTS?

#Use caseStatus
# 1Sports commentary
# 2Anime voice dubbing
# 3Voice cloning
0

Who developed MARS5 TTS?

CAMBAI is a research team of Interspeech-published Carnegie Mellon ex-Siri engineers, dedicated to making everyone's voice count. They actively welcome contributions and are open to collaborations.

FAQ of MARS5 TTS