1. Tutorial Introduction

NeuTTS-Air is an end-to-end speech synthesis model (TTS) released by Neuphonic in October 2025. Based on the 0.5B Qwen LLM backbone and NeuCodec audio codec, it demonstrates few-shot learning capabilities in on-device deployment and instant voice cloning. System evaluation shows that NeuTTS Air has reached the SOTA level among open source models, especially in ultra-realistic synthesis and real-time inference benchmarks. It can also generalize to new scenarios such as embedded agents and style transfer, support 3-second audio cloning, and generate natural conversation content. Post-training introduces GGML/ONNX support and watermarking mechanism, leading the open source field in on-device TTS and power optimization evaluation, and some scenarios are comparable to closed-source models.

This tutorial uses CPU resources, the model only supports English, and it takes more than half a minute to synthesize a voice. If you want to experience faster processing speed, you can use a single card RTX 5090 Clone TutorialNeuTTS-Air: A lightweight and efficient voice cloning model".

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

How to use

The minimum input audio length is 3 seconds, and the recommended length is 3 to 15 seconds. The maximum length of the output audio is approximately 30 seconds

HyperAI

Run this Notebook

Date

3 months ago

Size

860.52 KB

1. Tutorial Introduction

This tutorial uses CPU resources, the model only supports English, and it takes more than half a minute to synthesize a voice. If you want to experience faster processing speed, you can use a single card RTX 5090 Clone TutorialNeuTTS-Air: A lightweight and efficient voice cloning model".

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

How to use

The minimum input audio length is 3 seconds, and the recommended length is 3 to 15 seconds. The maximum length of the output audio is approximately 30 seconds

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Notebooks

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

3 months ago

Size

860.52 KB

1. Tutorial Introduction

This tutorial uses CPU resources, the model only supports English, and it takes more than half a minute to synthesize a voice. If you want to experience faster processing speed, you can use a single card RTX 5090 Clone TutorialNeuTTS-Air: A lightweight and efficient voice cloning model".

2. Project Examples

3. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Once you enter the webpage, you can use the model

If "Bad Gateway" is displayed, it means that the code is executing in the background. Please wait about 2-3 minutes and refresh the page.

When using the Safari browser, the audio may not be played directly and needs to be downloaded before playing.

How to use

The minimum input audio length is 3 seconds, and the recommended length is 3 to 15 seconds. The maximum length of the output audio is approximately 30 seconds

Related Notebooks

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

Open-AutoGLM: Smart Assistant for Mobile Devices

2 months ago

kyutai-tts-1.6 b-en_fr Audio Generation

a month ago

One-click Deployment of Qwen-Image-Lightning

2 months ago

Dia2-TTS: Real-time Speech Synthesis Service

2 months ago

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

2 months ago

One-click Deployment of DeepSeek-R1-70B

3 months ago

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

2 months ago

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

CPU Deployment NeuTTS-Air Voice Cloning Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Build AI with AI

HyperAI Newsletters

Command Palette

CPU Deployment NeuTTS-Air Voice Cloning Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Related Notebooks

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Open-AutoGLM: Smart Assistant for Mobile Devices

kyutai-tts-1.6 b-en_fr Audio Generation

One-click Deployment of Qwen-Image-Lightning

Dia2-TTS: Real-time Speech Synthesis Service

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

One-click Deployment of DeepSeek-R1-70B

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.

Build AI with AI

HyperAI Newsletters

Command Palette

CPU Deployment NeuTTS-Air Voice Cloning Model

1. Tutorial Introduction

2. Project Examples

3. Operation steps

Related Notebooks

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Open-AutoGLM: Smart Assistant for Mobile Devices

kyutai-tts-1.6 b-en_fr Audio Generation

One-click Deployment of Qwen-Image-Lightning

Dia2-TTS: Real-time Speech Synthesis Service

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

One-click Deployment of DeepSeek-R1-70B

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.

Build AI with AI

HyperAI Newsletters

Related Notebooks

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Open-AutoGLM: Smart Assistant for Mobile Devices

kyutai-tts-1.6 b-en_fr Audio Generation

One-click Deployment of Qwen-Image-Lightning

Dia2-TTS: Real-time Speech Synthesis Service

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

One-click Deployment of DeepSeek-R1-70B

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.

Related Notebooks

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Open-AutoGLM: Smart Assistant for Mobile Devices

kyutai-tts-1.6 b-en_fr Audio Generation

One-click Deployment of Qwen-Image-Lightning

Dia2-TTS: Real-time Speech Synthesis Service

VibeVoice-Realtime TTS: Real-time Speech Synthesis Service

One-click Deployment of DeepSeek-R1-70B

Supertonic: A high-speed TTS Speech Synthesis Model Based on ONNX

SoulX-Podcast: Podcast-quality long-text Speech Generation for Multiple dialects.