-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate ferc1 outputs using Dagster asset factories #3147
Comments
@catalyst-cooperative/com-dev Is this still open? I'd like to work on this as a first time contributor |
Hey there! Yes, this is still open. I was thinking about this as a good one after getting your office hours signup. There are lots of other examples of asset factories floating around that you could use as a guide. If you have a chance to get the PUDL / Dagster local development environment running, this should be a pretty easy thing to test out locally. |
* #3147 draft1 * [pre-commit.ci] auto fixes from pre-commit.com hooks For more information, see https://pre-commit.ci * cleaned up with linters and ran from container for precommit hooks * [pre-commit.ci] auto fixes from pre-commit.com hooks For more information, see https://pre-commit.ci * updated draft of asset factory solution * Update asset factory to include all standard output tables * Update docstrings * Update release notes --------- Co-authored-by: hfeuer <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: hfireborn <[email protected]> Co-authored-by: Zane Selvans <[email protected]>
The top portion of the
pudl.output.ferc1
module contains a number of individual asset definitions for denormalized / output tables with very similar structures, which could be consolidated into a small number of asset factories using the pattern adopted in e.g.pudl.extract.ferc714
(after PR #3123). See Dagster's blog post Factory Patterns in Python for some more background on the factory design pattern, and its application to Dagster assets.Note that the calls to
pudl.helpers.organize_cols()
found in the current FERC 1 output asset definitions are no longer required, as the ordering of columns in the database is determined by the resource definitions / database schema now. These calls are leftover from when we were producing dataframes for users on request rather than writing these tables to the database.Note that some of these assets currently create new columns containing derived values, and those would need to be preserved, either with their own asset definitions, or some way of keeping track of which calculations should be done for what tables inside the asset factory.
The text was updated successfully, but these errors were encountered: