-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update PUDL to use Pydantic 2 #2842
Comments
I'm currently getting an error in the conversion of our settings classes into Dagster configs, I think because of the taxonomy URL being part of our settings. But that element is getting a type of
I was trying to look at converting our settings classes over to being Dagster Configs directly, but it seems like there are only two high-level Dagster Configs that we're constructing, one for all of the FERC to SQLite conversions, and one of all the rest of the datasets. Is that what we want? Would that mean we need to consolidate all of our existing ETL settings together into a single class? |
It seems like we just have to make all of the |
The current set up I created for sure feels awkward / not usable with Pydantic 2.0 and the new dagster Config based on pydantic. I made a I think it makes sense to convert our pydnatic settings classes directly into dagster config now that dagster configs can be pydantic classes. No need to have legacy configuration passed to a pydantic resource to do validation dagster's legacy config system didn't support. I think we can do this using nested dagster configurations. Something like this: from dagster import asset, Config, RunConfig
class EpaCemsSettings(Config):
years: list[int]
states: list[str]
class DatasetsSettings(Config):
eia: EiaSettings
epacems: EpaCemsSettings
...
@asset
def extract_eia860(config: DatasetsSettings):
years = config.eia.eia860.years
... I wonder if it still make sense to have all of our dataset configs to live one |
Ah but if we don't keep a configurable Resource around the syntax for configuring the assets gets weird: assets:
extract_eia860:
config:
years: [2020,2021]
glue:
config:
years: [2020,2021]
...
# every asset that depends on dataset configuration
|
Ok I finally got through all of the new I think we should keep all of our dataset configurations as a single resource. It seems like we can create nested I need to play around with this a bit more but I must eat dinner now :) |
Okay so... a Do we actually want these to be configurable at run time in Launchpad? Or should they always be read from the settings YAML files? I've found that when I touch configurations in launchpad, the changes stick around and get messed up, with extra copies of individual years and other weirdness. Don't we pretty much always want to run the DAG based on either the fast or the full ETL settings? Having a GUI that each of us could have twiddled and thus end up running totally different configurations that aren't linked back to what's in the repo seems like a recipe for confusion. |
We can't switch over to using Pydantic 2 until Dagster supports it but I wanted a place to collect tasks for once it's supported.
Tasks
Issues to Create
The text was updated successfully, but these errors were encountered: