Sibo Song
Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition
Song, Sibo; Chandrasekhar, Vijay; Mandal, Bappaditya; Li, Liyuan; Lim, Joo-Hwee; Babu, Giduthuri Sateesh; San, Phyo Phyo; Cheung, Ngai-Man
Authors
Vijay Chandrasekhar
Dr Bappaditya Mandal b.mandal@keele.ac.uk
Liyuan Li
Joo-Hwee Lim
Giduthuri Sateesh Babu
Phyo Phyo San
Ngai-Man Cheung
Abstract
In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.
Citation
Song, S., Chandrasekhar, V., Mandal, B., Li, L., Lim, J., Babu, G. S., …Cheung, N. (2016). Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition. . https://doi.org/10.1109/cvprw.2016.54
Conference Name | 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |
---|---|
Conference Location | Las Vegas, NV, USA |
Start Date | Jun 26, 2016 |
End Date | Jul 1, 2016 |
Online Publication Date | Dec 19, 2016 |
Publication Date | 2016-06 |
Deposit Date | Jun 14, 2023 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
DOI | https://doi.org/10.1109/cvprw.2016.54 |
You might also like
Interpretative Attention Networks for Structural Component Recognition
(2024)
Journal Article
Towards Quantification of Eye Contacts Between Trainee Doctors and Simulated Patients in Consultation Videos
(2024)
Conference Proceeding
Unified Deep Ensemble Architecture for Multiple Classification Tasks
(2024)
Conference Proceeding
Grid LSTM based Attention Modelling for Traffic Flow Prediction
(2024)
Conference Proceeding
Visual Attention Assisted Games
(2023)
Conference Proceeding
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search