-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate 2023 FERC1 Update #3701
Conversation
this was done with a manual hand-off of the xbrl sqlite db
src/pudl/io_managers.py
Outdated
threshold_pct = 0.3 | ||
threshold_pct = 0.32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a hack! we need to investigate this to see if its really a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some investigation, I do not think this is a problem. For the table that caused this value error in the first place (electric_plant_in_service_204_duration
), there are 33 excess non-null records out of a possible 10472. Which means that if there were 3 fewer non-null records this test would pass fine np.
After reviewing some of the 2023 excess non-nulls with the apply_diffs methodology, its pretty ambiguous what was supposed to be a record that was updating a previously non-null record into a null vs just that they didn't report anything when they were updating other values. Honestly the fluxxuation in the non-null values from submission to submission was slightly alarming (not all of the values changed by any means! but some were just super different from submission to submission...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bumping this up again for the fast tests. This is the same # of excess non-nulls, but with one less year involved.
ValueError: Found more than expected excess non-null values using the currently implemented
apply_diffs methodology (#7285) as compared to the best_snapshot methodology (#7254). We expected
the apply_diffs methodology to result in no more than 100.32% non-null records but found 100.43%.
@@ -80,7 +80,6 @@ core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_fer | |||
core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_ferc1__yearly_operating_revenues_sched300,sales_of_water_and_water_power,1.0,,, | |||
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,large_or_industrial,1.0,,, | |||
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,small_or_commercial,1.0,,, | |||
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_sales_by_rate_schedules_sched304,commercial_and_industrial,,,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were removing this calculation component from this parent factoid because it didn't belong here. The 2023 metadata was updated and they removed this calculation component. wahoo they fixed a thing.
RN we are entirely using the 2023 xbrl metadata. This may need to updated in a world in which we are using different metadata for different years of xbrl re: #3713. Presumably we'd need to add report_year
in this file to have year-specific calculation component fixes. We could do this like we are currently doing the dimension columns (keep the dimensions null in the manually compiled csv's if the calc fix is not dimension specific & only add dimensions when the fix is dimension specific -> fill in any null dimensions found in the data). This allows us to minimally specify calc fixes but gives us the flexibility to add dimension specific fixes. BUT this is OOS for this PR and is all for #3713
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see this thread for what's going on in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the missing new-table metadata issue appears in the branch this is merging into so this looks good to go. I asked a couple of non-blocking questions.
@@ -35,7 +35,7 @@ description: > | |||
version: 0.1.0 | |||
datasets: | |||
ferc1: | |||
years: [2020, 2021, 2022] | |||
years: [2020, 2021, 2023] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect having a gap in the test years here to be totally fine?
* Update xbrl extraction to use new version * Add 2023 to xbrl extraction years and update how settings are loaded * Add 2023 to default Ferc714XbrlToSqlite settings class * Add 2023 to FERC 6 & 60 data source working partitions. * Integrate 2023 FERC1 Update (#3701) * very draft first pass of transforming the 2023 ferc1 data this was done with a manual hand-off of the xbrl sqlite db * update the filter_for_freshest_data thredhold * Update XBRL settings & metadata to extract 2023 XBRL data * Disable experiment tracking by default * map new ferc1 plants * update min/max rows with new year of data * add release notes * bump the apply_diffs threshhold for the fast tests * light docs updates for project_num fix --------- Co-authored-by: Zane Selvans <[email protected]> Co-authored-by: zschira <[email protected]> * Update water limited capacity ratio validation * Require ferc_xbrl_extractor>=1.5.1 since 1.5.0 has a bug. * Update release notes for new FERC stuff. --------- Co-authored-by: Christina Gosnell <[email protected]> Co-authored-by: Zane Selvans <[email protected]>
Overview
Working on #3700.
Most of the transform changes are documented over in this scoping issue #3698
2024/07/12 Update: Right now there is one integration test failure coming from
ferc_xbrl_updates
, which @zschira is tackling. Locally I've run the full and fast ETL. With a full db I've run the full validation tests. (I haven't been able to run thebuild-deploy-pudl
because the integration test fails.. I'm xfail-ing the one test locally to test everything in one go).Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list