HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

Nam Hyeonuk ; Park Yong-Hwa

Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency
  Dynamic Convolution

Abstract

Frequency dynamic convolution (FDY conv) has been a milestone in the soundevent detection (SED) field, but it involves a substantial increase in modelsize due to multiple basis kernels. In this work, we propose partial frequencydynamic convolution (PFD conv), which concatenates outputs by conventional 2Dconvolution and FDY conv as static and dynamic branches respectively. PFD-CRNNwith proportion of dynamic branch output as one eighth reduces 51.9% ofparameters from FDY-CRNN while retaining the performance. Additionally, wepropose multi-dilated frequency dynamic convolution (MDFD conv), whichintegrates multiple dilated frequency dynamic convolution (DFD conv) brancheswith different dilation size sets and a static branch within a singleconvolution layer. Resulting best MDFD-CRNN with five non-dilated FDY Convbranches, three differently dilated DFD Conv branches and a static branchachieved 3.17% improvement in polyphonic sound detection score (PSDS) over FDYconv without class-wise median filter. Application of sound event bounding boxas post processing on best MDFD-CRNN achieved true PSDS1 of 0.485, which is thestate-of-the-art score in DESED dataset without external dataset or pretrainedmodel. From the results of extensive ablation studies, we discovered that notonly multiple dynamic branches but also specific proportion of static branchhelps SED. In addition, non-dilated dynamic branches are necessary in additionto dilated dynamic branches in order to obtain optimal SED performance. Theresults and discussions on ablation studies further enhance understanding andusability of FDY conv variants.

Code Repositories

frednam93/MDFD-SED
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sound-event-detection-on-desedABC + MDFD-CRNN
PSDS1: 0.577
sound-event-detection-on-desedMDFD-CRNN
PSDS1: 0.485

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp