Command Palette
Search for a command to run...
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee

Abstract
This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficientwaveform generation model via adversarial flow matching optimization. Recently,conditional flow matching (CFM) generative models have been successfullyadopted for waveform generation tasks, leveraging a single vector fieldestimation objective for training. Although these models can generatehigh-fidelity waveform signals, they require significantly more ODE stepscompared to GAN-based models, which only need a single generation step.Additionally, the generated samples often lack high-frequency information dueto noisy vector field estimation, which fails to ensure high-frequencyreproduction. To address this limitation, we enhance pre-trained CFM-basedgenerative models by incorporating a fixed-step generator modification. Weutilized reconstruction losses and adversarial feedback to acceleratehigh-fidelity waveform generation. Through adversarial flow matchingoptimization, it only requires 1,000 steps of fine-tuning to achievestate-of-the-art performance across various objective metrics. Moreover, wesignificantly reduce inference speed from 16 steps to 2 or 4 steps.Additionally, by scaling up the backbone of PeriodWave from 29M to 70Mparameters for improved generalization, PeriodWave-Turbo achieves unprecedentedperformance, with a perceptual evaluation of speech quality (PESQ) score of4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints willbe available at https://github.com/sh-lee-prml/PeriodWave.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-synthesis-on-libritts | PeriodWave-Turbo-L | M-STFT: 0.7358 PESQ: 4.454 Periodicity: 0.0528 V/UV F1: 0.9756 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.