Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project <-> BA relationships are lost during normalization of 860m #3688

Closed
TrentonBush opened this issue Jun 22, 2024 · 3 comments
Closed
Labels
bug Things that are just plain broken.

Comments

@TrentonBush
Copy link
Contributor

TrentonBush commented Jun 22, 2024

Data loss in 860m processing

The raw EIA 860m data contains lists of generators and their attributes. One of those attributes is the balancing authority (BA) that the generator connects to. The current processing seems to discard this data. Perhaps it was assumed that utility/BA relationships from EIA 861 (core_eia861__assn_balancing_authority) are adequate to reconstruct this information. Unfortunately this is not the case, for at least the following reasons:

  • 861 and 860m have shockingly different utility coverage, to the point I suspect there is a deeper error (see table below)
  • monthly 860m data covers 1-2 years more than the annual 861 data, making joins impossible for new entries
  • the 861 data (core_eia861__assn_balancing_authority) cannot be used to recover exact relationships because around 1.5% of utility/state:BA relationships are m:m. This creates fan-out when joining onto the generators for each possible BA and I can no longer tell which is the correct one.

Bug Severity

How badly is this bug affecting you?
Critical: The 1.5% m:m fan-out is an annoyance, the temporal gap is a concern, but the utility mismatches are simply a deal breaker. There is barely any overlap between 861 and 860m.

in_861 in_860m n_utility_ids
0 1 7714
1 0 2813
1 1 823

To Reproduce

The following query reproduces the utility_id_eia table above.

WITH
e861 as (
    SELECT
        -- report_date,
        -- balancing_authority_id_eia,
        distinct utility_id_eia
        -- state
    FROM core_eia861__assn_balancing_authority
),
e860m as (
    SELECT
        distinct utility_id_eia
    FROM core_eia860m__changelog_generators
),
joined as (
    -- sqlite doesn't support full outer join, so I have to do it manually
    -- as left + (right - inner). SQLite also doesn't support right join
    -- so I implement it as left join with the tables swapped
    select
        e861.utility_id_eia as id861,
        e860m.utility_id_eia as id860m
    from e861
    left join e860m
    on e861.utility_id_eia = e860m.utility_id_eia

    UNION ALL

    select
        e861.utility_id_eia as id861,
        e860m.utility_id_eia as id860m
    from e860m
    left join e861
    on e860m.utility_id_eia = e861.utility_id_eia
    where e861.utility_id_eia is null -- exclude the intersection rows that were already included in the first join
)
select
    id861 is not null as in_861,
    id860m is not null as in_860m,
    count(*) as n_utility_ids
from joined
group by 1,2
order by 1,2
;

The following query produces a histogram of utility-states vs number of BAs for the given report_date.

WITH
utility_states AS (
    SELECT
        utility_id_eia,
        state,
        count(*) as n_BAs
    FROM core_eia861__assn_balancing_authority
    WHERE report_date = '2019-01-01'
    GROUP BY 1, 2
)
SELECT
    n_BAs,
    count(*) as n_utility_states
FROM utility_states
GROUP BY 1
ORDER BY 1 asc
n_BAs n_utility_states
1 4107
2 45
3 6
4 6
5 2
6 1
7 1
8 1

Expected behavior

I expect 860m projects to have a clear association with a single balancing authority, as they do in the raw data.

Software Environment?

pudl.sqlite v2024.2.6

@TrentonBush TrentonBush added the bug Things that are just plain broken. label Jun 22, 2024
@arengel
Copy link
Collaborator

arengel commented Jul 15, 2024

I was under the impression that the balancing authority column in core_eia860m__changelog_generators would reflect whatever was reported in 860m and if it changed for a generator it would show up in the changelog like any other column, at least I that was my understanding from #3437. Did I misunderstand the resolution? Is that not relevant to this issue?

@TrentonBush
Copy link
Contributor Author

You understood perfectly! I was using an out-of-date version from before the issue you referenced had been integrated. The newer 860m data includes the BA data.

I also misinterpreted the relationship to 861: I thought because the BA column was gone, it must have been normalized out to another table, and 861 was the only place I saw BAs. But that was never the case.

@arengel
Copy link
Collaborator

arengel commented Jul 16, 2024

Thanks for that clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken.
Projects
Archived in project
Development

No branches or pull requests

2 participants