5 months ago

Improved Soccer Action Spotting using both Audio and Video Streams

Vanderplaetse Bastien ; Dupont Stéphane

Abstract

In this paper, we propose a study on multi-modal (audio and video) actionspotting and classification in soccer videos. Action spotting andclassification are the tasks that consist in finding the temporal anchors ofevents in a video and determine which event they are. This is an importantapplication of general activity understanding. Here, we propose an experimentalstudy on combining audio and video information at different stages of deepneural network architectures. We used the SoccerNet benchmark dataset, whichcontains annotated events for 500 soccer game videos from the Big Five Europeanleagues. Through this work, we evaluated several ways to integrate audio streaminto video-only-based architectures. We observed an average absoluteimprovement of the mean Average Precision (mAP) metric of $7.43\%$ for theaction classification task and of $4.19\%$ for the action spotting task.

Benchmarks

Benchmark	Methodology	Metrics
action-spotting-on-soccernet	AudioVid (Vanderplaetse et al.)	Average-mAP: 56.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette