Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: split tweet table into mutable and immutable components #11

Open
SamHames opened this issue Mar 7, 2022 · 0 comments
Open

Comments

@SamHames
Copy link
Collaborator

SamHames commented Mar 7, 2022

I think the tweet table can be split into two groups of columns:

  • Immutable components that are inherent to the tweet, such as the text of the tweet (and content deriving from the text like mentions and urls)
  • Mutable components, such as the engagement metrics, and the whether there are any restrictions on who can reply

This allows us to track the mutable components with (tweet_id, collected_at) as the primary key, and the immutable component with whatever version is seen first (subject to the limitations you've noticed with tweets that are only in the includes).

In the context of streaming or longitudinal data collections this allows very neatly to track engagement with a tweet over time, as retweets of a popular tweet will give updates to the engagement metrics of the original tweet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant