A study by Wright State University students suggests that social media can be utilized to detect COVID-related events before the news media and potentially could be used to understand public reaction and quickly react to emergent pandemics.
Titled “Leveraging Natural Language Processing to Mine Issues on Twitter During the COVID-19 Pandemic,” the study has been accepted for presentation at a prestigious virtual big data conference hosted by the Institute of Electrical and Electronics Engineers in December.
“They generated a machine to filter out relevant tweets about COVID with over 90% accuracy — one of the first studies to do this,” said William Romine, director of Wright State’s Data Science for Education Laboratory.
The study was conducted by computer science and engineering students Ankita Agarwal, Preetham Salehundam and Swati Padhee. Romine and Tanvi Banerjee, associate professor of computer science, were co-authors of the research paper.
Romine said social media allows access to timely information about disease symptoms and its prevention and can play an essential role in understanding public attitudes and behaviors during crises. Efficient identification of thoughts, attitudes, feelings and concerns about the COVID-19 pandemic can help policymakers, health care professionals and the public identify concerns and address them, he said.
Agrawal said the most challenging part of the study was identifying gaps in the literature related to COVID-19 and coming up with novel research questions that would yield interesting results.
“We tried to connect the dots with the topics we observed from the relevant tweets with major media coverage of the pandemic,” said Padhee, a Ph.D. student who served as a mentor to the other student researchers. “Furthermore, we also made sure that while we continue to explore the data to find meaningful insights, we also prepare it to share with the research community as we are still in the middle of the pandemic.”
The students generated a dataset of over 688,000 relevant tweets between Jan. 1 and April 30 that can be used by researchers to understand public discussions of COVID-19.
“The most important finding is how we can use Twitter data to evaluate the opinions of people around the globe,” said Salehundam. “The fascinating part is how we could identify the topics people are discussing with the events happening as the pandemic started to unfold.”
Interest in case statistics and the origins of the coronavirus prevailed through January. Into February and through March, once the coronavirus was officially declared a pandemic, interest in the public health response took hold. Concerns about the government response began prevailing at the end of February.
“We think these topics, and their order of emergence, may be relevant not only to COVID-19 but also to future pandemics,” said Romine. “Their work tied these topics to the actual events that were occurring, showing that social media if taken collectively gives detailed and accurate insights into what is happening with the pandemic and can potentially be used to detect COVID-related events faster than the news cycle.”
Banerjee said the research lays the foundation for a deeper study into the thoughts and sentiment of the public as the pandemic continues to spread.
“What we hope to pursue further is to detect fake news within the tweets we mined and see what kind of response that generated in the public, as a precursor to help label misinformation in social media,” Banerjee said.