Skip to main content

Research Repository

Advanced Search

Exploring online public survey lifestyle datasets with statistical analysis, machine learning and semantic ontology

Chatterjee, Ayan; Riegler, Michael A.; Johnson, Miriam Sinkerud; Das, Jishnu; Pahari, Nibedita; Ramachandra, Raghavendra; Ghosh, Bikramaditya; Saha, Arpan; Bajpai, Ram

Authors

Ayan Chatterjee

Michael A. Riegler

Miriam Sinkerud Johnson

Jishnu Das

Nibedita Pahari

Raghavendra Ramachandra

Bikramaditya Ghosh

Arpan Saha



Abstract

Lifestyle diseases significantly contribute to the global health burden, with lifestyle factors playing a crucial role in the development of depression. The COVID-19 pandemic has intensified many determinants of depression. This study aimed to identify lifestyle and demographic factors associated with depression symptoms among Indians during the pandemic, focusing on a sample from Kolkata, India. An online public survey was conducted, gathering data from 1,834 participants (with 1,767 retained post-cleaning) over three months via social media and email. The survey consisted of 44 questions and was distributed anonymously to ensure privacy. Data were analyzed using statistical methods and machine learning, with principal component analysis (PCA) and analysis of variance (ANOVA) employed for feature selection. K-means clustering divided the pre-processed dataset into five clusters, and a support vector machine (SVM) with a linear kernel achieved 96% accuracy in a multi-class classification problem. The Local Interpretable Model-agnostic Explanations (LIME) algorithm provided local explanations for the SVM model predictions. Additionally, an OWL (web ontology language) ontology facilitated the semantic representation and reasoning of the survey data. The study highlighted a pipeline for collecting, analyzing, and representing data from online public surveys during the pandemic. The identified factors were correlated with depressive symptoms, illustrating the significant influence of lifestyle and demographic variables on mental health. The online survey method proved advantageous for data collection, visualization, and cost-effectiveness while maintaining anonymity and reducing bias. Challenges included reaching the target population, addressing language barriers, ensuring digital literacy, and mitigating dishonest responses and sampling errors. In conclusion, lifestyle and demographic factors significantly impact depression during the COVID-19 pandemic. The study’s methodology offers valuable insights into addressing mental health challenges through scalable online surveys, aiding in the understanding and mitigation of depression risk factors.

Citation

Chatterjee, A., Riegler, M. A., Johnson, M. S., Das, J., Pahari, N., Ramachandra, R., …Bajpai, R. (2024). Exploring online public survey lifestyle datasets with statistical analysis, machine learning and semantic ontology. Scientific Reports, 14(1), 1-24. https://doi.org/10.1038/s41598-024-74539-6

Journal Article Type Article
Acceptance Date Sep 26, 2024
Online Publication Date Oct 15, 2024
Publication Date Oct 15, 2024
Deposit Date Oct 25, 2024
Publicly Available Date Oct 25, 2024
Journal Scientific Reports
Electronic ISSN 2045-2322
Publisher Nature Publishing Group
Peer Reviewed Peer Reviewed
Volume 14
Issue 1
Article Number 24190
Pages 1-24
DOI https://doi.org/10.1038/s41598-024-74539-6
Keywords survey, datasets, COVID-19, depression, machine learning, semantics, LIME
Public URL https://keele-repository.worktribe.com/output/952920

Files

Exploring online public survey lifestyle datasets with statistical analysis, machine learning and semantic ontology (2.8 Mb)
Archive

Licence
https://creativecommons.org/licenses/by-nc-nd/4.0/

Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/

Copyright Statement
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.






You might also like



Downloadable Citations