HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MARS: Motion-Augmented RGB Stream for Action Recognition

{ Cordelia Schmid Karteek Alahari Philippe Weinzaepfel Nieves Crasto}

MARS: Motion-Augmented RGB Stream for Action Recognition

Abstract

Most state-of-the-art methods for action recognition consist of a two-stream architecture with 3D convolutions: an appearance stream for RGB frames and a motion stream for optical flow frames. Although combining flow with RGB improves the performance, the cost of computing accurate optical flow is high, and increases action recognition latency. This limits the usage of two-stream approaches in real-world applications requiring low latency. In this paper, we introduce two learning approaches to train a standard 3D CNN, operating on RGB frames, that mimics the motion stream, and as a result avoids flow computation at test time. First, by minimizing a feature-based loss compared to the Flow stream, we show that the network reproduces the motion stream with high fidelity. Second, to leverage both appearance and motion information effectively, we train with a linear combination of the feature-based loss and the standard cross-entropy loss for action recognition. We denote the stream trained using this combined loss as Motion-Augmented RGB Stream (MARS). As a single stream, MARS performs better than RGB or Flow alone, for instance with 72.7% accuracy on Kinetics compared to 72.0% and 65.6% with RGB and Flow streams respectively.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-kinetics-400MARS+RGB+Flow (64 frames)
Acc@1: 74.9
action-classification-on-kinetics-400MARS+RGB+Flow (16 frames)
Acc@1: 68.9
action-classification-on-minikineticsMARS+RGB+Flow (16 frames)
Top-1 Accuracy: 73.5
action-recognition-in-videos-on-hmdb-51MARS+RGB+FLow (64 frames, Kinetics pretrained)
Average accuracy of 3 splits: 80.9
action-recognition-in-videos-on-something-1MARS+RGB+Flow (16 frames, Kinetics pretrained)
Top 1 Accuracy: 40.4
action-recognition-in-videos-on-something-1MARS+RGB+Flow (64 frames, Kinetics pretrained)
Top 1 Accuracy: 53.0
action-recognition-in-videos-on-ucf101MARS+RGB+Flow (64 frames, Kinetics pretrained)
3-fold Accuracy: 97.8
action-recognition-in-videos-on-ucf101MARS+RGB+Flow (16 frames)
3-fold Accuracy: 95.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp