Didi Awovi Ahavi-Tete
Emotion Classification on Software Engineering Q&A Websites
Awovi Ahavi-Tete, Didi; Sangeeta, Sangeeta
Abstract
Background: With the rapid proliferation of question-and-answer websites for software
developers like Stack Overflow, there is an increasing need to discern developers’ emotions from
their posts to assess the influence of these emotions on their productivity such as efficiency in
bug fixing.
Aim: We aimed to develop a reliable emotion classification tool capable of accurately categorizing
emotions in Software Engineering (SE) websites using data augmentation techniques to address
the data scarcity problem because previous research has shown that tools trained on other
domains can perform poorly when applied to SE domain directly.
Method: We utilized four machine learning techniques, namely BERT, CodeBERT, RFC
(Random Forest Classifier), and LSTM. Taking an innovative approach to dataset augmentation,
we employed word substitution, back translation, and easy data augmentation methods. Using
these we developed sixteen unique emotion classification models: EmoClassBERT-Original,
EmoClassRFC-Original, EmoClassLSTM-Original, EmoClassCodeBERT-Original
EmoClassLSTM-Substitution, EmoClassBERT-Substitution, EmoClassRFC-Substitution,
EmoClassCodeBERT-Substitution, EmoClassBERT-Translation, EmoClassLSTM-Translation,
EmoClassRFC-Translation, EmoClassCodeBERT-Translation, EmoClassBERT-EDA,
EmoClassLSTM-EDA, EmoClassCodeBERT-EDA, and EmoClassRFC-EDA. We compared
the performance of this model on a gold standard state-of-the-art database and techniques
(Multi-label SO BERT and EmoTxt).
Results: An initial investigation of models trained on the augmented datasets
demonstrated superior performance to those trained on the original dataset.
EmoClassLSTM-Substitution, EmoClassBERT-Substitution, EmoClassCodeBERT-Substitution,
and EmoClassRFC-Substitution models show improvements of 13%, 5%, 5%, and 10% as com-
pared to EmoClassLSTM-Original, EmoClassBERT-Original, EmoClassCodeBERT-Original, and
EmoClassRFC-Original, respectively, in average F1 score. The EmoClassCodeBERT-Substitution
performed the best and outperformed the Multi-label SO BERT and Emotxt by 2.37% and
21.17%, respectively, in average F1-score. A detailed investigation of the models on 100 runs of
the dataset shows that BERT-based and CodeBERT-based models gave the best performance.
This detailed investigation reveals no significant differences in the performance of models trained
on augmented datasets and the original dataset on multiple runs of the dataset.
Conclusion: This research not only underlines the strengths and weaknesses of each architecture
but also highlights the pivotal role of data augmentation in refining model performance, especially
in the software engineering domain.
Citation
Awovi Ahavi-Tete, D., & Sangeeta, S. (2025). Emotion Classification on Software Engineering Q&A Websites. e-Informatica Software Engineering Journal (EISEJ), https://doi.org/10.37190/e-Inf250104
Journal Article Type | Article |
---|---|
Acceptance Date | Oct 6, 2024 |
Online Publication Date | Jan 4, 2025 |
Publication Date | Jan 4, 2025 |
Deposit Date | Oct 10, 2024 |
Journal | e-Informatica Software Engineering Journal (EISEJ) |
Print ISSN | 1897-7979 |
Publisher | Software Engineering Section of the Committee on Informatics of the Polish Academy of Sciences |
Peer Reviewed | Peer Reviewed |
DOI | https://doi.org/10.37190/e-Inf250104. |
Keywords | empirical and experimental studies in software engineering, data mining in software engineering, prediction models in software engineering, AI and knowledge based software engineering |
Public URL | https://keele-repository.worktribe.com/output/949906 |
Publisher URL | https://www.e-informatyka.pl/ |
You might also like
Analysis and Classification of Crime Tweets
(2020)
Journal Article
A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites
(2019)
Journal Article
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search