Sibo Song
Deep Adaptive Temporal Pooling for Activity Recognition
Song, Sibo; Cheung, Ngai-Man; Chandrasekhar, Vijay; Mandal, Bappaditya
Abstract
Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling of long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a self-attention mechanism to adaptively pool the classification scores of different video segments. Specifically, using frame-level features, DATP regresses importance of different temporal segments, and generates weights for them. Remarkably, DATP is trained using only the video-level label. There is no need of additional supervision except video-level activity class label. We conduct extensive experiments to investigate various input features and different weight models. Experimental results show that DATP can learn to assign large weights to key video segments. More importantly, DATP can improve training of frame-level feature extractor. This is because relevant temporal segments are assigned large weights during back-propagation. Overall, we achieve state-of-the-art performance on UCF101, HMDB51 and Kinetics datasets.
Citation
Song, S., Cheung, N.-M., Chandrasekhar, V., & Mandal, B. (2018, October). Deep Adaptive Temporal Pooling for Activity Recognition. Presented at MM '18: ACM Multimedia Conference, Seoul Republic of Korea
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | MM '18: ACM Multimedia Conference |
Start Date | Oct 22, 2018 |
End Date | Oct 26, 2018 |
Acceptance Date | Jul 1, 2018 |
Online Publication Date | Oct 15, 2018 |
Publication Date | Oct 21, 2018 |
Publicly Available Date | May 26, 2023 |
Publisher | Association for Computing Machinery (ACM) |
ISBN | 978-1-4503-5665-7 |
DOI | https://doi.org/10.1145/3240508.3240713 |
Keywords | Human activity recognition, adaptive temporal pooling |
Public URL | https://keele-repository.worktribe.com/output/412083 |
Publisher URL | http://doi.org/10.1145/3240508.3240713 |
Files
Paper25Jul2018.pdf
(3 Mb)
PDF
You might also like
Stand-Alone Composite Attention Network for Concrete Structural Defect Classification
(2021)
Journal Article
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search