Sibo Song
Deep Adaptive Temporal Pooling for Activity Recognition
Song, Sibo; Cheung, Ngai-Man; Chandrasekhar, Vijay; Mandal, Bappaditya
Abstract
Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling of long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a self-attention mechanism to adaptively pool the classification scores of different video segments. Specifically, using frame-level features, DATP regresses importance of different temporal segments, and generates weights for them. Remarkably, DATP is trained using only the video-level label. There is no need of additional supervision except video-level activity class label. We conduct extensive experiments to investigate various input features and different weight models. Experimental results show that DATP can learn to assign large weights to key video segments. More importantly, DATP can improve training of frame-level feature extractor. This is because relevant temporal segments are assigned large weights during back-propagation. Overall, we achieve state-of-the-art performance on UCF101, HMDB51 and Kinetics datasets.
Citation
Song, S., Cheung, N., Chandrasekhar, V., & Mandal, B. (2018). Deep Adaptive Temporal Pooling for Activity Recognition. . https://doi.org/10.1145/3240508.3240713
Conference Name | MM '18: ACM Multimedia Conference |
---|---|
Conference Location | Seoul Republic of Korea |
Start Date | Oct 22, 2018 |
End Date | Oct 26, 2018 |
Acceptance Date | Jul 1, 2018 |
Online Publication Date | Oct 15, 2018 |
Publication Date | Oct 21, 2018 |
Publicly Available Date | May 26, 2023 |
Publisher | Association for Computing Machinery (ACM) |
ISBN | 978-1-4503-5665-7 |
DOI | https://doi.org/10.1145/3240508.3240713 |
Keywords | Human activity recognition, adaptive temporal pooling |
Publisher URL | http://doi.org/10.1145/3240508.3240713 |
Files
Paper25Jul2018.pdf
(3 Mb)
PDF
You might also like
Interpretative Attention Networks for Structural Component Recognition
(2024)
Journal Article
Towards Quantification of Eye Contacts Between Trainee Doctors and Simulated Patients in Consultation Videos
(2024)
Conference Proceeding
Unified Deep Ensemble Architecture for Multiple Classification Tasks
(2024)
Conference Proceeding
Grid LSTM based Attention Modelling for Traffic Flow Prediction
(2024)
Conference Proceeding
Visual Attention Assisted Games
(2023)
Conference Proceeding
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search