Command Palette
Search for a command to run...
Deep Hierarchical Representation of Point Cloud Videos via Spatio-Temporal Decomposition
{Mohan Yi; Kankanhalli Xin; Yang Hehe; Yu Fan}
Abstract
In point cloud videos, point coordinates are irregular and unordered but point timestamps exhibit regularities and order. Grid-based networks for conventional video processing cannot be directly used to model raw point cloud videos. Therefore, in this work, we propose a point-based network that directly handles raw point cloud videos. First, to preserve the spatio-temporal local structure of point cloud videos, we design a point tube covering a local range along spatial and temporal dimensions. By progressively subsampling frames and points and enlarging the spatial radius as the point features are fed into higher-level layers, the point tube can capture video structure in a spatio-temporally hierarchical manner. Second, to reduce the impact of the spatial irregularity on temporal modeling, we decompose space and time when extracting point tube representations. Specifically, a spatial operation is employed to capture the local structure of each spatial region in a tube and a temporal operation is used to model the dynamics of the spatial regions along the tube.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-action-recognition-on-ntu-rgb-d-1 | PSTNet++ | Cross Subject Accuracy: 91.4 Cross View Accuracy: 96.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.