3 months ago

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Archontis Politis Kazuki Shimada Parthasaarathy Sudarsanam Sharath Adavanne Daniel Krause Yuichiro Koyama Naoya Takahashi Shusuke Takahashi Yuki Mitsufuji Tuomas Virtanen

Abstract

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone array. Sound events in the dataset belonging to 13 target sound classes are annotated both temporally and spatially through a combination of human annotation and optical tracking. The dataset serves as the development and evaluation dataset for the Task 3 of the DCASE2022 Challenge on Sound Event Localization and Detection and introduces significant new challenges for the task compared to the previous iterations, which were based on synthetic spatialized sound scene recordings. Dataset specifications are detailed including recording and annotation process, target classes and their presence, and details on the development and evaluation splits. Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format. Results of the baseline indicate that with a suitable training strategy a reasonable detection and localization performance can be achieved on real sound scene recordings. The dataset is available in https://zenodo.org/record/6387880.

Code Repositories

prerak23/dir_srcmic_doa

pytorch

Mentioned in GitHub

sharathadavanne/seld-dcase2022

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
sound-event-localization-and-detection-on-1	Baseline (FOA)	Class-dependent localization error: 29.3 Class-dependent localization recall: 46 Localization-dependent error rate (20°): 71 location-dependent F1-score (macro): 21 location-dependent F1-score (micro): 0.36
sound-event-localization-and-detection-on-1	Baseline (MIC)	Class-dependent localization error: 32.2 Class-dependent localization recall: 47 location-dependent F1-score (macro): 18 location-dependent F1-score (micro): 0.36

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette