Pipelines & NLP for identification of trending topics in crypto, by ecosystem, over the last 24-48 hours.
- clone repo and open
trending-topics.Rproj
- You'll need a twitter api key
twitter-secret.txt
and chatGPT api keychatgpt-secret.txt
these are gitignored and placed in thetrending_topics/
directory where the scheduled Rmarkdown lives (update_topics_pipeline.Rmd
). - You'll also need
snowflake-details.json
for the submitSnowflake function in the form:
{ "driver": "YOUR-LOCAL-SNOWFLAKE-DRIVER-HERE",
"server_url": "YOUR-URL-HERE.snowflakecomputing.com",
"username": "PUT-USERNAME-HERE",
"password": "PUT-PASSWORD-HERE",
"role": "INTERNAL_DEV",
"warehouse": "DATA_SCIENCE",
"database": ""
}
- The
reprex-for-pulltweets-ai-summarize.R
contains the internal functions & example accounts for the broader pipeline. It sources fromtrending_topics/source_funcs_and_secrets.R
which will load the required libraries, functions, and secrets to run the pipeline.
Full pipeline diagram included as an image + pdf.
- Pulls
target_twitter_accounts
and callspull_account_tweets
. - Dump dataframe form in
raw_tweet_dump
, calladd_new_tweets()
proc to clean & append toprocessed_tweets
. - Ingest unused tweets to call
chatgpt_id_topic
at the day-ecosystem level and get thesubjects
andsummaries
. - Update used tweets to
used_in_summary = TRUE
and insert intoai_summary
.
Website (in-dev) offers a UI over the summaries that uses term frequency to link back to relevant tweets at the day-ecosystem level.