HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS

{Zhen Yang Xiang Li Dong Liu Qichen Han∗ Weiqiang Yuan ∗}

THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS

Abstract

This technical report describes the system participating to the De-tection and Classification of Acoustic Scenes and Events(DCASE) 2021 Challenge, Task 6: automated audio captioning.We use encoder-decoder modeling framework for audio under-standing and caption generation. Our solution focuses on solvingtwo problems in automated audio captioning: data insufficiencyand word selection indeterminacy. As limited audios with goldencaptions are available, we collect large-scale weakly labeled da-taset from Web with heuristic methods. Then we pre-train the en-coder-decoder models with this dataset followed by fine-tuningon Clotho dataset. To solve the word selection indeterminacyproblem, we use keywords extracted from captions of similar au-dios and audio event tags produced by pre-trained models to guidewords generation in decoding stage. We tested our submissionsusing the development-testing dataset. Our best submissionachieved 31.8 SPIDEr score where that of the baseline system is5.4.

Benchmarks

BenchmarkMethodologyMetrics
audio-captioning-on-clothoEnsemble
CIDEr: 0.400
SPICE: 0.137
SPIDEr: 0.318

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS | Papers | HyperAI