HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Cooperative Cross-Stream Network for Discriminative Action Representation

Jingran Zhang; Fumin Shen; Xing Xu; Heng Tao Shen

Cooperative Cross-Stream Network for Discriminative Action Representation

Abstract

Spatial and temporal stream model has gained great success in video action recognition. Most existing works pay more attention to designing effective features fusion methods, which train the two-stream model in a separate way. However, it's hard to ensure discriminability and explore complementary information between different streams in existing works. In this work, we propose a novel cooperative cross-stream network that investigates the conjoint information in multiple different modalities. The jointly spatial and temporal stream networks feature extraction is accomplished by an end-to-end learning manner. It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features. Furthermore, different from the conventional ConvNet that learns the deep separable features with only one cross-entropy loss, our proposed model enhances the discriminative power of the deeply learned features and reduces the undesired modality discrepancy by jointly optimizing a modality ranking constraint and a cross-entropy loss for both homogeneous and heterogeneous modalities. The modality ranking constraint constitutes intra-modality discriminative embedding and inter-modality triplet constraint, and it reduces both the intra-modality and cross-modality feature variations. Experiments on three benchmark datasets demonstrate that by cooperating appearance and motion feature extraction, our method can achieve state-of-the-art or competitive performance compared with existing results.

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-in-videos-on-hmdb-51CCS + TSN (ImageNet+Kinetics pretrained)
Average accuracy of 3 splits: 81.9
action-recognition-in-videos-on-somethingCCS + two-stream + TRN
Top-1 Accuracy: 61.2
Top-5 Accuracy: 89.3
action-recognition-in-videos-on-ucf101CCS + TSN (ImageNet+Kinetics pretrained)
3-fold Accuracy: 97.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cooperative Cross-Stream Network for Discriminative Action Representation | Papers | HyperAI