Command Palette
Search for a command to run...
Kuleshov Volodymyr Enam S. Zayd Ermon Stefano

Abstract
We introduce a new audio processing technique that increases the samplingrate of signals such as speech or music using deep convolutional neuralnetworks. Our model is trained on pairs of low and high-quality audio examples;at test-time, it predicts missing samples within a low-resolution signal in aninterpolation process similar to image super-resolution. Our method is simpleand does not involve specialized audio processing techniques; in ourexperiments, it outperforms baselines on standard speech and music benchmarksat upscaling ratios of 2x, 4x, and 6x. The method has practical applications intelephony, compression, and text-to-speech generation; it demonstrates theeffectiveness of feed-forward convolutional architectures on an audiogeneration task.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-super-resolution-on-piano-1 | U-Net | Log-Spectral Distance: 3.4 |
| audio-super-resolution-on-vctk-multi-speaker-1 | U-Net | Log-Spectral Distance: 3.1 |
| audio-super-resolution-on-voice-bank-corpus-1 | U-Net | Log-Spectral Distance: 3.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.