-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run 2021 FERC 1 data through new, more complete extractor #2810
Comments
I'm getting these integration test failures locally:
And these validation test failures:
|
The CI seems to be having some trouble - getting cancellation events out of the blue, usually a few minutes into taxonomy parsing. @zschira and I noticed that the taxonomy parsing is somehow taking much longer than expected, so we might be running into some resource limits. In other contexts, the
|
Using branch |
The integration errors mostly pertain to unmapped ferc1 plant ids from A) remove all the B) map all the 87 totals so we have them all; C) remove just these 87 totals because there is something ~ different ~ about them seeing as they were recovered from this new extractor. |
@aesharpe tbh I don't know that I have enough context to give an informed opinion on the existing |
Can you explain what an "unmapped ferc1 plant ID from total record" is? When you say "plant ID" do you mean the plant name, which is "total"? Or do you mean the integer plant ID that's algorithmically assigned based on record similarity across years? Are the "total" records getting fed into the plant ID assignment process? Where are the pre-existing "total" plants coming from? Are those junk records from the small plants table? Or were there also some total records from the large steam plants? Have we historically mapped "total" plant IDs? Or are they associated with unmapped plants? I'm a little wary of dropping the old totals since the way plants are reported in the DBF data is so messy -- maybe in some cases that's the only data that's available? But at the same time it's probably very low value data, and may also be resulting in double counting of stuff like capacity and generation, if the total is present and also non-totals are present. If we're going to drop them, it seems like these totals shouldn't be getting dropped from |
When you run the script
I think that's what I'm talking about here...that these total records are appearing in the plant id assignment process.
Many come from the small plants table but some come from the steam table. All the new total values come from the steam table.
Generally I agree, but it feels weird to drop some but not all unless we have a very specific reason. Maybe that reason is just the messy-ness of the DBF files?
Yeah, I was just using this table to look at all the potential "total" records in one place. |
I agree with @zschira's comment on Slack: if there are totals in DBF, and some totals that already exist and are getting mapped in the XBRL (because they were reported in a way that made them show up with the old extractor) then the new totals that are showing up because we've fixed the extractor to catch the ones that weren't coming through before, they should be retained and mapped for uniformity across these 3 categories. |
We got the XBRL extractor to drop a lot less data! Let's make sure it works with just one year of data.
catalyst-cooperative/ferc-xbrl-extractor#105
We should point PUDL at the
api_compat
branch offerc-xbrl-extractor
, which has both the data completeness fixes and some PUDL API compatibility fixes.We should also use a branch based off of
explode_ferc1
to do this testing, becauseexplode_ferc1
has a lot of changes to how we handle FERC1 transformations.Scope
The text was updated successfully, but these errors were encountered: