Command Palette
Search for a command to run...
Farabi Shafkat ; Himel Hasibul ; Gazzali Fakhruddin ; Hasan Md. Bakhtiar ; Kabir Md. Hasanul ; Farazi Moshiur

Abstract
Action quality assessment (AQA) aims at automatically judging human actionbased on a video of the said action and assigning a performance score to it.The majority of works in the existing literature on AQA divide RGB videos intoshort clips, transform these clips to higher-level representations usingConvolutional 3D (C3D) networks, and aggregate them through averaging. Thesehigher-level representations are used to perform AQA. We find that the currentclip level feature aggregation technique of averaging is insufficient tocapture the relative importance of clip level features. In this work, wepropose a learning-based weighted-averaging technique. Using this technique,better performance can be obtained without sacrificing too much computationalresources. We call this technique Weight-Decider(WD). We also experiment withResNets for learning better representations for action quality assessment. Weassess the effects of the depth and input clip size of the convolutional neuralnetwork on the quality of action score predictions. We achieve a newstate-of-the-art Spearman's rank correlation of 0.9315 (an increase of 0.45%)on the MTL-AQA dataset using a 34 layer (2+1)D ResNet with the capability ofprocessing 32 frame clips, with WD aggregation.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-quality-assessment-on-mtl-aqa | ResNet34-(2+1)D-WD | Spearman Correlation: 93.15 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.