Repository logo
 

COMPARISON OF TOPIC MODELING METHODS FOR ANALYZING TWEETS ON COVID-19 VACCINE

dc.access.optionRestricted Campus Access Only
dc.contributor.advisorTabrizi, M. H. N
dc.contributor.authorKhanjari Nezhad Jooneghani, Zeinab
dc.contributor.departmentComputer Science
dc.date.accessioned2021-09-11T16:52:57Z
dc.date.available2023-01-01T09:01:54Z
dc.date.created2021-07
dc.date.issued2021-07-20
dc.date.submittedJuly 2021
dc.date.updated2021-08-30T15:41:37Z
dc.degree.departmentComputer Science
dc.degree.disciplineMS-Computer Science
dc.degree.grantorEast Carolina University
dc.degree.levelMasters
dc.degree.nameM.S.
dc.description.abstractTwitter is a microblogging site and a popular social media platform for sharing thoughts on current world events. The dynamic of Twitter discussions makes it a valuable data source for mining people's opinions and emotions towards world events. Tweets' dynamic nature can be used to analyze opinion shifting and sentiment shifting for specific targets. The COVID-19 outbreak is one of the recent worldwide events that affect people's lives worldwide in the last two years. Many people share their feelings and experiences through social media towards this pandemic. COVID-19-related tweets have recently been the subject of some research. This thesis also analyzes tweets related to the COVID-19 vaccine. The main objective of this thesis is to mine human concerns towards the COVID-19 vaccine using Twitter data. This thesis applies three topic modeling methods to discover the discussed subjects about the COVID-19 vaccine and analyze the topics' dynamic over a specific period. The models are Latent Dirichlet Allocation (LDA), LDA with Gibbs Sampling, Nonnegative Matrix Factorization (NMF), and Top2vec models. Furthermore, this thesis compares these three topic modeling methods based on human judgment, coherence value, and topics uniqueness. The results show both LDA outperformed NMF in terms of Jaccard score. In addition, LDA-Mallet outperformed LDA and NMF in terms of Coherence score. It is difficult to determine which one of NMF and LDA definitely provided the better score for some of the experiments. But, at all, it can be stated NMF performed better than LDA in terms of Coherence score. Top2Vec returned 255 topics for this case study, which is not desired for the purpose of this study. Three other methods outperform Top2vec in terms of Jaccard score and coherence value.
dc.embargo.lift2023-01-01
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/9414
dc.language.isoen
dc.publisherEast Carolina University
dc.subjectTopic modeling
dc.subjectSocial media analysis
dc.subject.lcshCOVID-19 Pandemic, 2020- , in mass media.
dc.subject.lcshCOVID-19 Pandemic, 2020- --Social aspects
dc.subject.lcshCOVID-19 vaccines
dc.subject.lcshHealth risk assessment
dc.subject.lcshTwitter
dc.subject.lcshSocial media and society
dc.titleCOMPARISON OF TOPIC MODELING METHODS FOR ANALYZING TWEETS ON COVID-19 VACCINE
dc.typeMaster's Thesis
dc.type.materialtext
ecu.embargo.choiceextended 1 year at request of author

Files

Collections