Command Palette
Search for a command to run...
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee

Abstract
Recently, universal waveform generation tasks have been investigatedconditioned on various out-of-distribution scenarios. Although GAN-basedmethods have shown their strength in fast waveform generation, they arevulnerable to train-inference mismatch scenarios such as two-stagetext-to-speech. Meanwhile, diffusion-based models have shown their powerfulgenerative performance in other domains; however, they stay out of thelimelight due to slow inference speed in waveform generation tasks. Above all,there is no generator architecture that can explicitly disentangle the naturalperiodic features of high-resolution waveform signals. In this paper, wepropose PeriodWave, a novel universal waveform generation model. First, weintroduce a period-aware flow matching estimator that can capture the periodicfeatures of the waveform signal when estimating the vector fields.Additionally, we utilize a multi-period estimator that avoids overlaps tocapture different periodic features of waveform signals. Although increasingthe number of periods can improve the performance significantly, this requiresmore computational costs. To reduce this issue, we also propose a singleperiod-conditional universal estimator that can feed-forward parallel byperiod-wise batch inference. Additionally, we utilize discrete wavelettransform to losslessly disentangle the frequency information of waveformsignals for high-frequency modeling, and introduce FreeU to reduce thehigh-frequency noise for waveform generation. The experimental resultsdemonstrated that our model outperforms the previous models both inMel-spectrogram reconstruction and text-to-speech tasks. All source code willbe available at https://github.com/sh-lee-prml/PeriodWave.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-synthesis-on-libritts | PeriodWave + FreeU | M-STFT: 1.0269 PESQ: 4.248 Periodicity: 0.0765 V/UV F1: 0.9651 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.