You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking at IL Campaign Finance's ETL as a nice example of how to structure the Makefile so it's a little more intuitive, and only rebuilds what's necesary (instead of downloading or importing everything from scratch).
DataMade also has some guidelines about how to structure ETL pipelines with some stated principles that we should follow (I think we're already most of the way there, but good to be explicit).
Never destroy data - treat source data as immutable, and show your work when you modify it
Be able to deterministically produce the final data with one command
Write as little custom code as possible
Use standard tools whenever possible
Keep source data under version control
I think that if we're more explicit about how we extract and transform the data, it might make it easier to spot bugs, or introduce tests/invariants along the way.
The text was updated successfully, but these errors were encountered:
Another principle I would want for our site, is that the data should be easy to get at. E.g. at minimum, we should provide a link to the data source within the UI. In the ideal case, we could provide download link to the data post- tranformations and cleaning.
I'm looking at IL Campaign Finance's ETL as a nice example of how to structure the Makefile so it's a little more intuitive, and only rebuilds what's necesary (instead of downloading or importing everything from scratch).
DataMade also has some guidelines about how to structure ETL pipelines with some stated principles that we should follow (I think we're already most of the way there, but good to be explicit).
I think that if we're more explicit about how we extract and transform the data, it might make it easier to spot bugs, or introduce tests/invariants along the way.
The text was updated successfully, but these errors were encountered: