Use a star schema model for the datasette logs #9

bendnorman · 2022-05-27T21:02:13Z

Right now the datasette logs produced by the ELT pipeline live in a flat denormalized table. This result in some duplicate information and could result in a very wide table if we continue to add columns.

We could use star schemas to model the datasette logs. Each transaction is a log with an http request, timestamp and size. We could create a dimension table for the ip address information. This is where most of the duplicate information is because ip addresses make multiple requests.

jdangerx added this to Catalyst Megaproject Feb 7, 2023

jdangerx moved this to 🆕 New in Catalyst Megaproject Feb 7, 2023

jdangerx added the inframundo label Feb 7, 2023

jdangerx moved this from 🆕 New to 📋 Backlog in Catalyst Megaproject Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a star schema model for the datasette logs #9

Use a star schema model for the datasette logs #9

bendnorman commented May 27, 2022

Use a star schema model for the datasette logs #9

Use a star schema model for the datasette logs #9

Comments

bendnorman commented May 27, 2022