Social Media like Tweeter, Facebook and Instagram play a huge role in everyday life. It is a big source of information but is it possible for a computer to understand a person's true intentions? In both of the Tweets:
- "Fire near Brooke Street! Stay safe"
- "Your photo is absolutely fire!!!"
appears the word 'fire'. We humans have no trouble understanding the different meaning of the same word. Will a computer also be able to do so?
The main purpose of our project is to construct a machine learning model capable of discerning genuine meaning and intentions behing a given Tweet.
We are using data from Kaggle competition, which is a dataframe created by figure-eight and originally shared on their ‘Data For Everyone’ website. In the dataset, the following columns are included:
- id - a unique identifier for each tweet
- keyword - a particular keyword from the tweet
- location - the location the tweet was sent from
- text - the text of the tweet
- target - denotes whether a tweet is about a real disaster (1) or not (0)
DisasterTweets_KaggleCompetition/
├── data/ # Raw data & submission file
├── img/ # Where images are stored
├── notebooks/ # Jupyter Notebooks
├── pipelines/ # pipelines for processing data
├── reqirements.txt # needed tool versions
├── Project_report.pdf # documentation for this project
└── README.md # This file
In this part we focused on analysing the data. How can we gather more information from a plain text? Using available NLP methods and data visualisations we tried to discover interesting patterns.
Technical information: Code to this part can be found in Tweets_EDA, but all transformations like adding new columns are performed inside transformers for pipelines in /src files
After adding lots of new features we need to take a step back and analyse which of them are really relevant for our models, and which of them are just a noise. In this part we used common feature importance methods, as well as correlation matrix analysis.
Technical information: Code to this step is can be found in Tweets_Feature_Importance, however as previous all dataframe transformations will be performed in transformer classes in /src
...