-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraping comprehensive tweets #113
Comments
Searching on Twitter Web gives same number |
get_next_page() does not work. Would you sugguest correct approach to scarp entire pages? |
The objective is to scrap whole posts about Starbucks in the year of 2019. Can I apply get_next_page() for the syntax? |
better approach would be to use |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am trying to collect whole tweets metioning "starbucks" in 2019. Although I have changed page number from pages=10 to pages= 30, the total number of rows from the outcome is limited approximately 86 all the time regardless of the page constraints. The number of tweets is expected to be around 50,000, if the process had worked right.
How can I scrapping whole tweets for the time horizon without missing texts?
`
from tweety import Twitter
import pandas as pd
app = Twitter("session")
app.start()
all_tweets = app.search("(Starbucks) lang:en until:2019-01-11 since:2019-01-01 -filter:links -filter:replies", pages=30, wait_time=2)
df_tweets = pd.DataFrame(columns=["Date","Text", "Author","Likes", "Retweets"])
for tweet in all_tweets:
new_row = pd.DataFrame({
"Date": [tweet.date],
"Text": [tweet.text],
"Author": [tweet.author.username],
"Likes": [tweet.likes],
"Retweets": [tweet.retweet_counts]
})
df_tweets = pd.concat([df_tweets, new_row], ignore_index=True)
print(f"Total rows in the DataFrame: {df_tweets.shape[0]}")
df_tweets.to_csv('tweets_data.csv', index=False)
print(df_tweets)
`
The text was updated successfully, but these errors were encountered: