Over the last few years, there has been a significant increase in the use of Twitter to share updates, seek help and report emergencies during a disaster. Social media platforms can be instrumental for keeping track of events like damage to personal property or injuries during natural disasters. However, algorithms keeping track of social media posts to signal the occurrence of natural disasters must be swift so that relief operations can be mobilized immediately.
A team of researchers led by Dr. Ruihong Huang, assistant professor in the Department of Computer Science and Engineering at Texas A&M University, has developed a novel weakly supervised approach that can train machine learning algorithms quickly to recognize tweets related to disasters.
“Because of the sudden nature of disasters, there’s not much time available to build an event recognition system,” said Huang. “Our goal is to be able to detect life-threatening events using individual social media messages and recognize similar events in the affected areas.”
The researchers described their findings in the proceedings from the Association for the Advancement of Artificial Intelligence’s 34th Conference on Artificial Intelligence.
Texts on social media platforms, like Twitter, can be categorized using standard algorithms called classifiers. Most classifiers are an integral part of machine learning algorithms that make predictions based on carefully labeled sets of data. In the past, machine learning algorithms have been used for event detection based on tweets or a burst of words within tweets. To ensure a reliable classifier for the machine learning algorithms, human annotators have to manually label large amounts of data instances one by one, which usually takes several days, sometimes even weeks or months.
The researchers also found that it is essentially impossible to find a keyword that does not have more than one meaning on social media depending on the context of the tweet. For example, if the word “dead” is used as a keyword, it will pull in tweets talking about a variety of topics such as a phone battery being dead or the television series “The Walking Dead.”
To build more reliable labeled datasets, the researchers first used an automatic clustering algorithm to put them into small groups. Next, a domain expert looked at the context of the tweets in each group to identify if it was relevant to the disaster. The labeled tweets were then used to train the classifier how to recognize the relevant tweets.
Using data gathered from the most impacted time periods for hurricanes Harvey and Florence, the researchers found that their data labeling method and overall weakly-supervised system took one to two person-hours instead of the 50 person-hours that were required to go through thousands of carefully annotated tweets using the supervised approach.
Despite the classifier’s overall good performance, they also observed that the system still missed several tweets that were relevant but used a different vocabulary than the predetermined keywords.
“Users can be very creative when discussing a particular type of event using the predefined keywords, so the classifier would have to be able to handle those types of tweets,” said Huang. “There’s room to further improve the system’s coverage.”
In the future, the researchers will look to explore how to extract information about the user’s location so first responders will know exactly where to dispatch their resources.
This work is supported by funds from the National Science Foundation.