Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply new naming conventions to devtool notebooks #3228

Merged
merged 5 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 15 additions & 8 deletions devtools/debug-eia-etl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
"outputs": [],
"source": [
"%%time\n",
"asset_key = \"raw_generator_retired_eia860\"\n",
"asset_key = \"raw_eia860__generator_retired\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
Expand All @@ -145,7 +145,7 @@
"outputs": [],
"source": [
"%%time\n",
"get_asset_group_keys(\"clean_eia860\", default_assets)"
"get_asset_group_keys(\"_core_eia860\", default_assets)"
]
},
{
Expand All @@ -157,7 +157,7 @@
"outputs": [],
"source": [
"%%time\n",
"asset_key = \"clean_generators_eia860\"\n",
"asset_key = \"_core_eia860__generators\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
Expand Down Expand Up @@ -197,7 +197,7 @@
"outputs": [],
"source": [
"%%time\n",
"asset_key = \"raw_generator_eia923\"\n",
"asset_key = \"raw_eia923__generator\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
Expand All @@ -218,7 +218,7 @@
},
"outputs": [],
"source": [
"get_asset_group_keys(\"clean_eia923\", default_assets)"
"get_asset_group_keys(\"_core_eia923\", default_assets)"
]
},
{
Expand All @@ -230,7 +230,7 @@
"outputs": [],
"source": [
"%%time\n",
"asset_key = \"clean_generation_eia923\"\n",
"asset_key = \"_core_eia923__generation\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
Expand All @@ -251,7 +251,7 @@
},
"outputs": [],
"source": [
"get_asset_group_keys(\"norm_eia\", default_assets)"
"get_asset_group_keys(\"core_eia\", default_assets)"
]
},
{
Expand All @@ -268,6 +268,13 @@
"\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -286,7 +293,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down
31 changes: 5 additions & 26 deletions devtools/debug-ferc1-etl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"\n",
"from pudl.resources import dataset_settings\n",
"\n",
"years = [2020, 2021] # add desired years here\n",
"years = [2020, 2022] # add desired years here\n",
"configured_dataset_settings = {\"ferc1\": {\"years\": years}}\n",
"\n",
"dataset_init_context = build_init_resource_context(config=configured_dataset_settings)\n",
Expand Down Expand Up @@ -156,8 +156,8 @@
"\n",
"def get_table_classes(module):\n",
" classes = [member[1] for member in inspect.getmembers(module, inspect.isclass)]\n",
" table_classes = [x for x in classes if x.__name__.endswith(\"Ferc1TableTransformer\")]\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmgosnell I ran into some FERC related errors I wasn't sure how to solve in the last two cells of this notebook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay on the last cell i believe is a @katherinelamb / @zschira question/ verification of my assumption: it looks like the new FERC plant classifier got pulled out of the transform step and thus this special exception of needing to pass in the fuel table into the steam tables' transform is no longer needed! so if that's true i think we can delete this cell and take out the if table_name == "core_ferc1__yearly_steam_plants_sched402": in the previous cell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the second to last cell failed while validating the calculations in the table. the expected error rates were all set using full and fast etl settings and this notebook has a cell up top that set years = [2020, 2021] if you change it to years = [2020, 2022] and rerun the notebook this will work fine.

Copy link
Member

@cmgosnell cmgosnell Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely a fragile part of testing the transform step because BELIEVE IT OR NOT the newer years are much less clean in the calculations than the old years.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmgosnell, yes that's true! core_ferc1__yearly_steam_plants_sched402 is now handled like any other transform, and no longer has plant IDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks y'all! Just made the changes.

" return [x for x in table_classes if x.__name__ != \"AbstractFerc1TableTransformer\"]\n",
" table_classes = [x for x in classes if x.__name__.endswith(\"TableTransformer\")]\n",
" return [x for x in table_classes if x.__name__ not in (\"AbstractTableTransformer\", \"Ferc1AbstractTableTransformer\")]\n",
"\n",
"\n",
"classes = get_table_classes(pudl.transform.ferc1)\n",
Expand Down Expand Up @@ -320,40 +320,19 @@
"execution_count": null,
"id": "08da1d39-6272-435f-8241-8b3879078393",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"transformed_tables = {}\n",
"for table_name, transformer in transformers.items():\n",
" if table_name == \"core_ferc1__yearly_steam_plants_sched402\":\n",
" # core_ferc1__yearly_steam_plants_sched402 is a special case. It depends on the transformed core_ferc1__yearly_steam_plants_fuel_sched402 table.\n",
" continue\n",
" transformed_tables[transformer.table_id.value] = transformer.transform(\n",
" raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],\n",
" raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"instant\"],\n",
" raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"duration\"],\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6cc37ac5-594b-425d-aad3-7b6e9b6453da",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Handle special case for \"core_ferc1__yearly_steam_plants_sched402\"\n",
"transformer = transformers[\"core_ferc1__yearly_steam_plants_sched402\"]\n",
"transformed_tables[transformer.table_id.value] = transformer.transform(\n",
" raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],\n",
" raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"instant\"],\n",
" raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"duration\"],\n",
" transformed_fuel=transformed_tables[\"core_ferc1__yearly_steam_plants_fuel_sched402\"],\n",
")"
]
}
],
"metadata": {
Expand All @@ -372,7 +351,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down
8 changes: 4 additions & 4 deletions devtools/debug-harvesting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,13 @@
"source": [
"%%time\n",
"\n",
"clean_assets = get_asset_group_keys(\"clean_eia923\", default_assets)\n",
"clean_assets += get_asset_group_keys(\"clean_eia860\", default_assets)\n",
"_core_assets = get_asset_group_keys(\"_core_eia923\", default_assets)\n",
"_core_assets += get_asset_group_keys(\"_core_eia860\", default_assets)\n",
"\n",
"clean_dfs = {}\n",
"with defs.get_asset_value_loader() as loader:\n",
" clean_dfs = {\n",
" asset: loader.load_asset_value(AssetKey(asset)) for asset in clean_assets\n",
" asset: loader.load_asset_value(AssetKey(asset)) for asset in _core_assets\n",
" }\n",
"\n",
"# this Enum defines the valid values of entity\n",
Expand Down Expand Up @@ -163,7 +163,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.11.7"
},
"vscode": {
"interpreter": {
Expand Down
77 changes: 5 additions & 72 deletions devtools/inspect-assets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"metadata": {},
"source": [
"# Inspecting dagster assets\n",
"This notebooks allows you to inspect dagster asset values.\n",
"This notebooks allows you to inspect dagster asset values. **This is just a template notebook. Do your asset explorations in a copy of this notebook.** \n",
"\n",
"Some assets are written to the database in which case you can just pull the tables into pandas or explore them in the database. However, many assets use the default IO Manager which writes asset values to the `$DAGSTER_HOME/storage/` directory as pickle files. Dagster provides a method for inspecting asset values no matter what IO Manager the asset uses."
]
Expand Down Expand Up @@ -50,61 +50,8 @@
"\n",
"from pudl.etl import defs\n",
"\n",
"asset_key = \"exploded_balance_sheet_assets_ferc1\"\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some custom asset exploration work that got committed here. This is just suppose to be a template notebook.

"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"#df[df.row_type_xbrl == \"correction\"].xbrl_factoid.value_counts()\n",
"#df[(df.xbrl_factoid.isin([\"operation_expense\", \"maintenance_expense\"]))&(df.rel_diff.notnull())&(df.rel_diff!=0)].sort_values(['utility_id_ferc1', 'report_year', 'xbrl_factoid', 'rel_diff']).head(50)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2d99594",
"metadata": {},
"outputs": [],
"source": [
"df[(df.xbrl_factoid==\"accumulated_depreciation\")&(df.plant_status==\"in_service\")&(df.plant_function==\"total\")]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "467111b1",
"metadata": {},
"outputs": [],
"source": [
"df[df.xbrl_factoid.isin(factoids)&(df.utility_id_ferc1==9)&(df.report_year==1998)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6f7427a",
"metadata": {},
"outputs": [],
"source": [
"factoids = ['distribution_maintenance_expense_electric',\n",
" 'hydraulic_power_generation_maintenance_expense',\n",
" 'maintenance_of_general_plant',\n",
" 'nuclear_power_generation_maintenance_expense',\n",
" 'other_power_generation_maintenance_expense',\n",
" 'regional_market_maintenance_expense',\n",
" 'steam_power_generation_maintenance_expense',\n",
" 'transmission_maintenance_expense_electric']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "951b718d",
"metadata": {},
"outputs": [],
"source": [
"asset_key = \"calculation_components_xbrl_ferc1\"\n",
"calcs = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"calcs[(calcs.xbrl_factoid_parent == \"accumulated_depreciation\")].head(50)"
"asset_key = \"_core_eia861__balancing_authority\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))"
]
},
{
Expand All @@ -128,25 +75,11 @@
"\n",
"from pudl.etl import defs\n",
"\n",
"asset_key = \"emissions_unit_ids_epacems\"\n",
"asset_key = \"core_eia923__monthly_generation\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f0d118b",
"metadata": {},
"outputs": [],
"source": [
"from pudl.output.epacems import epacems\n",
"\n",
"test_epacems = epacems(states = [\"ID\"], years = [2022])\n",
"\n",
"test_epacems[test_epacems.operating_datetime_utc>=\"2022-01-04\"].head(40)"
]
}
],
"metadata": {
Expand All @@ -165,7 +98,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down
Loading