Skip to main content

Research Repository

Advanced Search

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Song, Sibo; Chandrasekhar, Vijay; Mandal, Bappaditya; Li, Liyuan; Lim, Joo-Hwee; Babu, Giduthuri Sateesh; San, Phyo Phyo; Cheung, Ngai-Man

Authors

Sibo Song

Vijay Chandrasekhar

Liyuan Li

Joo-Hwee Lim

Giduthuri Sateesh Babu

Phyo Phyo San

Ngai-Man Cheung



Abstract

In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

Citation

Song, S., Chandrasekhar, V., Mandal, B., Li, L., Lim, J., Babu, G. S., …Cheung, N. (2016). Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition. . https://doi.org/10.1109/cvprw.2016.54

Conference Name 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Conference Location Las Vegas, NV, USA
Start Date Jun 26, 2016
End Date Jul 1, 2016
Online Publication Date Dec 19, 2016
Publication Date 2016-06
Deposit Date Jun 14, 2023
Publisher Institute of Electrical and Electronics Engineers (IEEE)
DOI https://doi.org/10.1109/cvprw.2016.54