Skip to content

Latest commit

 

History

History
41 lines (26 loc) · 2.21 KB

README.md

File metadata and controls

41 lines (26 loc) · 2.21 KB

Tweets-Scrapper

This script has helped me to scrap more than 30K+ tweets from more that 40 authors. The script is written such that you only have to give it a list of Twitter handles and output csv file path and it'll download all the tweets, process them and save them to a csv file without any hassle. You can checkout the dataset here on Github and here on Kaggle. Also, I have done a comprehensive data analysis which you can find here. You can also checkout the jupyter notebook I have used to scrap 30K+ tweets here.

How the script works

The script will download tweets from all the authors whose Twitter handles are written in the authors.txt file in the newline seperated format. The script will download direct tweets, retweets and retweets with a comment. In a retweeted tweet, I took all the information (name, handle, tweetcontent and creation time) of the orignal author and stored it. Furthermore, Every retweet with a comment contains <Q> and </Q> tags. The author's comment is followed by <Q> tag and then the content of the retweet comes which is followed by </Q>.

How to run it

  1. First clone the repository
git clone https://github.com/Hsankesara/Tweets-Scrapper.git
  1. Then download the python dependencies.
cd Tweets-Scrapper
pip3 install -r requirements.txt
  1. Now, create cred.json file which is the copy of cred.json.sample,
cp cred.json.sample cred.json
  1. Get Twitter credentials and write them in cred.json file. You can follow this to get your access tokens. Now update the cred.json file with the tokens you've received from Twitter.

  2. Write the Twitter handle of the accounts you want to scrap in authors.txt in newline seperated format.

  3. run the script

python3 scrap.py authors.txt tweets.csv
  1. Wait for it! And you'll get all the tweets soon in the csv format.