Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Song, Sibo; Chandrasekhar, Vijay; Mandal, Bappaditya; Li, Liyuan; Lim, Joo-Hwee; Babu, Giduthuri Sateesh; San, Phyo Phyo; Cheung, Ngai-Man

doi:10.1109/cvprw.2016.54

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Song, Sibo; Chandrasekhar, Vijay; Mandal, Bappaditya; Li, Liyuan; Lim, Joo-Hwee; Babu, Giduthuri Sateesh; San, Phyo Phyo; Cheung, Ngai-Man

Authors

Sibo Song

Vijay Chandrasekhar

Dr Bappaditya Mandal b.mandal@keele.ac.uk

Liyuan Li

Joo-Hwee Lim

Giduthuri Sateesh Babu

Phyo Phyo San

Ngai-Man Cheung

Abstract

In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

Citation

Song, S., Chandrasekhar, V., Mandal, B., Li, L., Lim, J., Babu, G. S., …Cheung, N. (2016). Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition. . https://doi.org/10.1109/cvprw.2016.54

Conference Name	2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Conference Location	Las Vegas, NV, USA
Start Date	Jun 26, 2016
End Date	Jul 1, 2016
Online Publication Date	Dec 19, 2016
Publication Date	2016-06
Deposit Date	Jun 14, 2023
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
DOI	https://doi.org/10.1109/cvprw.2016.54

Interpretative Attention Networks for Structural Component Recognition (2024)
Journal Article

Towards Quantification of Eye Contacts Between Trainee Doctors and Simulated Patients in Consultation Videos (2024)
Conference Proceeding

Unified Deep Ensemble Architecture for Multiple Classification Tasks (2024)
Conference Proceeding

Grid LSTM based Attention Modelling for Traffic Flow Prediction (2024)
Conference Proceeding

Visual Attention Assisted Games (2023)
Conference Proceeding

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations