Try to run transformation of new year of FERC 1 data and document results #3698
Labels
data-update
When fresh data is integrated into PUDL from quarterly or annual updates
ferc1
Anything having to do with FERC Form 1
rmi
Tasks
Note: I used the new metadata only. presumably we will still want to do this #3650
Failures
🔴
core_ferc1__yearly_hydroelectric_plants_sched406
-> Easy 🟢Stage: enforce_schema (last step!)
Error:
Why: There is a decimal in the
project_num
column and we have this column dtyped as anint
Solution: Easy
CG tried option 2 locally and... there are more than one of these.. I used removeprefix and it worked locally:
🔴
core_ferc1__yearly_operating_revenues_sched300
-> Easy 🟢Stage:
apply_xbrl_calculation_fixes
(early)Why: It looks like
commercial_and_industrial
is trying to be added and deleted. Which means many the calculation components for this table changed with the new metadata.Solution: Easy/Mild
Go track down what changed in the metadata and update
xbrl_calculation_component_fixes.csv
accordingly. So i double checked and it does look like thiscommercial_and_industrial
calculation component that used to be in the metadata's calculation which we had been removing is now just removed! which is great. it means they fixed something 😎 . So the solution here is just to delete the line in thexbrl_calculation_component_fixes
. CG did this locally and the rest of the transform transformed 🟢🔴
core_ferc1__yearly_plant_in_service_sched204
Stage:
pudl.io_managers.FercXBRLSQLiteIOManager.filter_for_freshest_data
(vv early)Error:
Why: Our apply-diffs methodology takes the last non-null value for a fact. Which is probably mostly fine, except when a respondent comes back in to null a previously reported value. This check is to ensure that we don't get too many more non-null records than is suggested with our best-snapshot methodology. In this run we got 0.31% greater non-null records using the apply-diffs methodology as compared to the best-snapshot and we expect that threshold to be only 0.3% or less.
Solutions: Medium
0. (temp) Up the threshold w/o looking just to see if the rest transform passes -> 🟢 it does! wahoo
filter_for_freshest_data
method?🔴
core_ferc1__yearly_steam_plants_fuel_sched402
-> Easy 🟢Stage:
assign_record_id
(mid)Error:
Why: The categorization of the fuel_type_code_pudl during
categorize_strings
nulled one string which caused this error in defining the record_id bc the fuel type is a part of the pk of the original table/Solution: Easy
Add the string into the
FUEL_CATEGORIES
(CG tried this locally and it fixed the problem)
what hasn't been tested bc of these failures?
core_ferc1__yearly_plant_in_service_sched204
is one of the inputs to the exploded/detail table. So all of that stuff feeding into the rate base table isn't yet being tested. All of the intra-table calculations are being tested! Just none of the post-transform metadata/calculation tables or the detialed -> rate base assetsThe text was updated successfully, but these errors were encountered: