Skip to content
This repository has been archived by the owner on Sep 10, 2023. It is now read-only.

Commit

Permalink
Merge pull request #53 from TimLChan/tweetdata
Browse files Browse the repository at this point in the history
Add in basic twitter archive to Corpus functionality, thanks to @TimLChan.
  • Loading branch information
tommeagher authored Aug 19, 2018
2 parents 5771baa + fea6f3c commit 3dfd3c1
Show file tree
Hide file tree
Showing 5 changed files with 39 additions and 1 deletion.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@
.git
/.idea
__pycache__
*.csv
.env/
local_settings.py
3 changes: 2 additions & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@
* [varjmes](https://github.com/varjmes)
* [meggle](https://github.com/meggle)
* [superstrong](https://github.com/superstrong)
* [andrlik](https://github.com/andrlik)
* [andrlik](https://github.com/andrlik)
* [TimlChan](https://github.com/TimLChan)
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,18 @@ To scrape content from the web, set `SCRAPE_URL` to `True`. This bot makes use o

__Note:__ Web scraping is experimental and may give you unexpected results. Make sure to test the bot in debugging mode before publishing.

#### Twitter archive
To use tweets from a Twitter account you have access to, you can download your Twitter Archive by following the steps from [Twitter's Help Center](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive).

1. Request your Twitter archive
2. Extract the CSV file and ensure it is named the same as the `TWITTER_ARCHIVE_NAME` in `local_settings.py`
3. In `local_settings.py`, retweets are ignored by default. If you want to include retweets in your corpus, change `IGNORE_RETWEETS` to `False`.
4. Update `TEST_SOURCE` and specify the name of the parsed Twitter archive
5. Once that is all set, run `twittereater.py` and it will automatically create a corpus file based on the `TEST_SOURCE` variable in `local_settings.py`

If you want to use the Twitter corpus to generate tweets, set `STATIC_TEST = True`


## Debugging

If you want to test the script or to debug the tweet generation, you can skip the random number generation and not publish the resulting tweets to Twitter.
Expand Down
4 changes: 4 additions & 0 deletions local_settings_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@

DEBUG = True # Set this to False to start Tweeting live
TWEET_ACCOUNT = "" # The name of the account you're tweeting to.

#Configuration for Twitter parser. TEST_SOURCE will be re-used as as the corpus location.
TWITTER_ARCHIVE_NAME = "tweets.csv" #Name of your twitter archive
IGNORE_RETWEETS = True #If you want to remove retweets
18 changes: 18 additions & 0 deletions twittereater.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# -*- coding: utf-8 -*-
import csv
from local_settings import TWITTER_ARCHIVE_NAME, TEST_SOURCE, IGNORE_RETWEETS

f = open(TWITTER_ARCHIVE_NAME, 'r')
tweets = []
reader = csv.reader(f,quotechar='"')
next(reader) #get rid of the twitter header


tweetarchive = open(TEST_SOURCE, 'w')
for row in reader:
if IGNORE_RETWEETS:
if not row[8]: #9th column is the timestamp of the retweet
tweetarchive.write("'%s'," % (row[5]))
else:
tweetarchive.write("'%s'," % (row[5]))

0 comments on commit 3dfd3c1

Please sign in to comment.