HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Neural Vocoder is All You Need for Speech Super-resolution

Liu Haohe ; Choi Woosung ; Liu Xubo ; Kong Qiuqiang ; Tian Qiao ; Wang DeLiang

Neural Vocoder is All You Need for Speech Super-resolution

Abstract

Speech super-resolution (SR) is a task to increase speech sampling rate bygenerating high-frequency components. Existing speech SR methods are trained inconstrained experimental settings, such as a fixed upsampling ratio. Thesestrong constraints can potentially lead to poor generalization ability inmismatched real-world cases. In this paper, we propose a neural vocoder basedspeech super-resolution method (NVSR) that can handle a variety of inputresolution and upsampling ratios. NVSR consists of a mel-bandwidth extensionmodule, a neural vocoder module, and a post-processing module. Our proposedsystem achieves state-of-the-art results on the VCTK multi-speaker benchmark.On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8% and37% respectively on log spectral distance and achieves a significantly betterperceptual quality. We also demonstrate that prior knowledge in the pre-trainedvocoder is crucial for speech SR by performing mel-bandwidth extension with asimple replication-padding method. Samples can be found inhttps://haoheliu.github.io/nvsr.

Code Repositories

haoheliu/ssr_eval
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
audio-super-resolution-on-vctk-multi-speaker-1NVSR
Log-Spectral Distance: 0.78

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp