You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The raw EIA 860m data contains lists of generators and their attributes. One of those attributes is the balancing authority (BA) that the generator connects to. The current processing seems to discard this data. Perhaps it was assumed that utility/BA relationships from EIA 861 (core_eia861__assn_balancing_authority) are adequate to reconstruct this information. Unfortunately this is not the case, for at least the following reasons:
861 and 860m have shockingly different utility coverage, to the point I suspect there is a deeper error (see table below)
monthly 860m data covers 1-2 years more than the annual 861 data, making joins impossible for new entries
the 861 data (core_eia861__assn_balancing_authority) cannot be used to recover exact relationships because around 1.5% of utility/state:BA relationships are m:m. This creates fan-out when joining onto the generators for each possible BA and I can no longer tell which is the correct one.
Bug Severity
How badly is this bug affecting you? Critical: The 1.5% m:m fan-out is an annoyance, the temporal gap is a concern, but the utility mismatches are simply a deal breaker. There is barely any overlap between 861 and 860m.
in_861
in_860m
n_utility_ids
0
1
7714
1
0
2813
1
1
823
To Reproduce
The following query reproduces the utility_id_eia table above.
WITH
e861 as (
SELECT-- report_date,-- balancing_authority_id_eia,
distinct utility_id_eia
-- stateFROM core_eia861__assn_balancing_authority
),
e860m as (
SELECT
distinct utility_id_eia
FROM core_eia860m__changelog_generators
),
joined as (
-- sqlite doesn't support full outer join, so I have to do it manually-- as left + (right - inner). SQLite also doesn't support right join-- so I implement it as left join with the tables swappedselecte861.utility_id_eiaas id861,
e860m.utility_id_eiaas id860m
from e861
left join e860m
one861.utility_id_eia=e860m.utility_id_eiaUNION ALLselecte861.utility_id_eiaas id861,
e860m.utility_id_eiaas id860m
from e860m
left join e861
one860m.utility_id_eia=e861.utility_id_eiawheree861.utility_id_eia is null-- exclude the intersection rows that were already included in the first join
)
select
id861 is not nullas in_861,
id860m is not nullas in_860m,
count(*) as n_utility_ids
from joined
group by1,2order by1,2
;
The following query produces a histogram of utility-states vs number of BAs for the given report_date.
WITH
utility_states AS (
SELECT
utility_id_eia,
state,
count(*) as n_BAs
FROM core_eia861__assn_balancing_authority
WHERE report_date ='2019-01-01'GROUP BY1, 2
)
SELECT
n_BAs,
count(*) as n_utility_states
FROM utility_states
GROUP BY1ORDER BY1asc
n_BAs
n_utility_states
1
4107
2
45
3
6
4
6
5
2
6
1
7
1
8
1
Expected behavior
I expect 860m projects to have a clear association with a single balancing authority, as they do in the raw data.
Software Environment?
pudl.sqlite v2024.2.6
The text was updated successfully, but these errors were encountered:
I was under the impression that the balancing authority column in core_eia860m__changelog_generators would reflect whatever was reported in 860m and if it changed for a generator it would show up in the changelog like any other column, at least I that was my understanding from #3437. Did I misunderstand the resolution? Is that not relevant to this issue?
You understood perfectly! I was using an out-of-date version from before the issue you referenced had been integrated. The newer 860m data includes the BA data.
I also misinterpreted the relationship to 861: I thought because the BA column was gone, it must have been normalized out to another table, and 861 was the only place I saw BAs. But that was never the case.
Data loss in 860m processing
The raw EIA 860m data contains lists of generators and their attributes. One of those attributes is the balancing authority (BA) that the generator connects to. The current processing seems to discard this data. Perhaps it was assumed that utility/BA relationships from EIA 861 (
core_eia861__assn_balancing_authority
) are adequate to reconstruct this information. Unfortunately this is not the case, for at least the following reasons:core_eia861__assn_balancing_authority
) cannot be used to recover exact relationships because around 1.5% of utility/state:BA relationships are m:m. This creates fan-out when joining onto the generators for each possible BA and I can no longer tell which is the correct one.Bug Severity
How badly is this bug affecting you?
Critical: The 1.5% m:m fan-out is an annoyance, the temporal gap is a concern, but the utility mismatches are simply a deal breaker. There is barely any overlap between 861 and 860m.
To Reproduce
The following query reproduces the utility_id_eia table above.
The following query produces a histogram of utility-states vs number of BAs for the given report_date.
Expected behavior
I expect 860m projects to have a clear association with a single balancing authority, as they do in the raw data.
Software Environment?
pudl.sqlite v2024.2.6
The text was updated successfully, but these errors were encountered: