HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Abstract

Large Language models (LLM) have demonstrated the capability to handle avariety of generative tasks. This paper presents the UniAudio system, which,unlike prior task-specific approaches, leverages LLM techniques to generatemultiple types of audio (including speech, sounds, music, and singing) withgiven input conditions. UniAudio 1) first tokenizes all types of target audioalong with other condition modalities, 2) concatenates source-target pair as asingle sequence, and 3) performs next-token prediction using LLM. Also, amulti-scale Transformer model is proposed to handle the overly long sequencescaused by the residual vector quantization based neural codec in tokenization.Training of UniAudio is scaled up to 165K hours of audio and 1B parameters,based on all generative tasks, aiming to obtain sufficient prior knowledge notonly in the intrinsic properties of audio but also the inter-relationshipbetween audio and other modalities. Therefore, the trained UniAudio model hasthe potential to become a foundation model for universal audio generation: itshows strong capability in all trained tasks and can seamlessly support newaudio generation tasks after simple fine-tuning. Experiments demonstrate thatUniAudio achieves state-of-the-art or at least competitive results on most ofthe 11 tasks. Demo and code are released athttps://github.com/yangdongchao/UniAudio

Code Repositories

yangdongchao/uniaudio
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-music-generation-on-musiccapsUniAudio
FAD: 3.65
KL_passt: 1.87

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp