Developing & Comparing Various Topic Modeling Algorithms on a Stack Overflow Dataset
  • Author(s): Raphael Ibraimoh ; Kwame Ofosu Debrah ; Emmanuel Nwambuowo
  • Paper ID: 1706049
  • Page: 243-253
  • Published Date: 17-07-2024
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 8 Issue 1 July-2024
Abstract

This research extracts and compares coherent and instructive topics from Stack Overflow, a tech and programming community. Scraping questions, summaries, and tags, exploratory data analysis, rigorous pre-processing, and topic models to find latent topics are the study's main steps. LSA, LDA, and BERTopic are popular topic models. To achieve the best models for each algorithm, base model hyperparameters were tweaked and refined. Then, each algorithm's models were compared for performance and accuracy using coherence score, topic distinctiveness, and different visualization techniques to examine semantic separation. Each technique was tested to see how well it handled different data dimensions. The comparison study showed that BERTopic was the best topic model, achieving more granular and semantically meaningful categorizations through improved semantic comprehension, topic distinguishability, and topic extraction coherence. This research shows how advanced topic modelling may extract nuanced insights from text data, giving a complete process from data acquisition to subject categorization. The results demonstrate BERTopic's ability to decipher complicated textual relationships and generate coherent words for varied themes. Thus, this research improves information retrieval and user experience on online community platforms like Stack Overflow by using advanced natural language processing models.

Keywords

BERTopic, Latent Semantic Allocation, Latent Dirichlet Allocation

Citations

IRE Journals:
Raphael Ibraimoh , Kwame Ofosu Debrah , Emmanuel Nwambuowo "Developing & Comparing Various Topic Modeling Algorithms on a Stack Overflow Dataset" Iconic Research And Engineering Journals Volume 8 Issue 1 2024 Page 243-253

IEEE:
Raphael Ibraimoh , Kwame Ofosu Debrah , Emmanuel Nwambuowo "Developing & Comparing Various Topic Modeling Algorithms on a Stack Overflow Dataset" Iconic Research And Engineering Journals, 8(1)