Skip to content

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

License

Notifications You must be signed in to change notification settings

Jimut123/LeaderBehaviour

Repository files navigation

Leader Behaviour Prediction

This project will deal with extracting and gathering information about the behaviour/ bad work (corresponding to predefined adjectives ) of a leader/ representative by constantly scraping news website.

We have converted the output to a JSON file.

Installing required libraries

sudo pip install requirements.txt

Scraping Times of India website

I have scraped Times Of India Website specially for this purpose.

The dataset got after scraping Times of India website

This dataset have the details of the scrapped article. We have to scrap the text and get the names. Then we have to match the details of the adjective with the matched names that is got.

The dataset is present in the path :

LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/newsTOI.sqlite

Dataset

Scraped names of the members of parliaments in US :

LeaderBehaviour/getUSNames/getUSNames/spiders/getUSNames.json

Scraped the names of the members of parliaments in India :

LeaderBehaviour/getIndianPolNames/getIndianPolNames/spiders/getIndianPolNames.json

Additional Objectives :

* used headers/ user-agent in scrapy.
* need to use proxy/ integrate with Tor to make it completely untraceable.

Possible name extraction from the extracted text :

LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/extractNamesTOI.py
LeaderBehaviour/leaderBehaviour/leaderBehaviour/spiders/probable_names_extracted.json

Note

Go to the directory real_shit, then copy the scrapTOI.sqlite, then run *** python get_neg.py***.

About

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages