Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a star schema model for the datasette logs #9

Open
bendnorman opened this issue May 27, 2022 · 0 comments
Open

Use a star schema model for the datasette logs #9

bendnorman opened this issue May 27, 2022 · 0 comments

Comments

@bendnorman
Copy link
Member

Right now the datasette logs produced by the ELT pipeline live in a flat denormalized table. This result in some duplicate information and could result in a very wide table if we continue to add columns.

We could use star schemas to model the datasette logs. Each transaction is a log with an http request, timestamp and size. We could create a dimension table for the ip address information. This is where most of the duplicate information is because ip addresses make multiple requests.

@jdangerx jdangerx moved this to 🆕 New in Catalyst Megaproject Feb 7, 2023
@jdangerx jdangerx moved this from 🆕 New to 📋 Backlog in Catalyst Megaproject Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Icebox
Development

No branches or pull requests

2 participants