Skip to content

richlo01/covidLies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

COVIDLies Research (UCI ML Hackathon)

Note: built with a partner [Rionel Dmello]

Over the past recent years, the use of tweets as a source of news has increased. Sometimes, it has proven to be quite incorrect and misleading -- sometimes causing death. Social media enables a rapid spread of misconceptions. With a partner, our goal was to see if a tweet propagated a misconception. Given the COVIDLies dataset of 7+ million tweets, we planned to create a Neural Net that learned a tweet and a specific misconception. Dataset is excluded for sensitivity.
We used transfer learning to solve this problem. To create word embeddings, we used FastText. We trained it using a subset of our data and an included "lee_corpus" to learn formal English. Then our idea involved using RAKE (Rapid Automatic Keyword Extraction) to get important parts of the misconception and a query tweet. We would attach adjectives and adverbs to grab negations if either the misconception or the tweet had them. Then, we would use cosine similarity to get the result. The process is as follows:


What we found was that we weren't quite as successful. Tweets weren't representative of english and it was difficult to recognize negated sentences from normal ones. Here are the solutions:


Here is a picture of the word embeddings:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages