Fake News Detection¶
Fake news is a term that has been used to describe very different issues, from satirical articles to completely fabricated news and plain government propaganda in some outlets. Fake news, information bubbles, news manipulation and the lack of trust in the media are growing problems with huge ramifications in our society. However, in order to start addressing this problem, we need to have an understanding on what Fake News is. Only then can we look into the different techniques and fields of machine learning (ML), natural language processing (NLP) and artificial intelligence (AI) that could help us fight this situation.
“Fake news” has been used in a multitude of ways in the last half a year and multiple definitions have been given. For instance, the New York times defines it as “a made-up story with an intention to deceive”. This definition focuses on two dimensions: the intentionality (very difficult to prove) and the fact that the story is made up.
First Draft News, an organisation dedicated to improving skills and standards in the reporting and sharing of online information, has published a great article that explains the fake news environment and proposes 7 types of fake content:
- False Connection: Headlines, visuals or captions don’t support the content
- False Context: Genuine content is shared with false contextual information
- Manipulated content: Genuine information or imagery is manipulated
- Satire or Parody: No intention to cause harm but potential to fool
- Misleading Content: Misleading use of information to frame an issue/individual
- Imposter Content: Impersonation of genuine sources
- Fabricated content: New content that is 100% false
In this notebook, we'll build models to for classification of fake news dataset which is available in kaggle librabry.
- Understand the Problem Statement.
- Import libraries and datasets
- Perform Exploratory Data Analysis
- Perform Data Cleaning
- Visualize the cleaned data
- Prepare the data by tokenizing.
- Understand the theory and intuition behind NLP.
- Build and train the model
- Assess trained model performance
-
Python 3
-
Jupyter Notebook
- You can check any news for fake or real by simply pasting the news in last cell of this notebook.
- MultinomialNB - 89.33%
- Linear Model - 93.63%