Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing and incorrect latitude / longitude data in plants_entity_eia #3141

Open
grgmiller opened this issue Dec 8, 2023 · 5 comments
Open
Labels
bug Things that are just plain broken.

Comments

@grgmiller
Copy link
Collaborator

grgmiller commented Dec 8, 2023

Describe the bug

Several related issues:

  • It appears that somewhere in the pudl pipeline, the latitude (but not longitude) data is being dropped from plants_entity_eia and other tables that contain lat/long data. As far as I can tell from manually inspecting the raw EIA-860 plants file from 2022, there are a small number of plants that are missing both lat/long data, but none that are only missing latitude data.
  • It also appears that there are several plants where the sign of the longitude is being flipped from negative to positive (which locates these plants in China)
  • There is one plant that is being assigned a nonsense longitude of -188 (plant 61445)
  • There are a handful of plants that are assigned seemingly made-up coordinates in the middle of the Atlantic ocean, generally around (42, -42)

Bug Severity

Medium: With some effort, I can work around the bug.

To Reproduce

I downloaded data from https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/v2023.12.01/pudl.sqlite.gz, and am loading the plants_entity_eia table using pd.read_sql("SELECT plant_id_eia, latitude, longitude FROM plants_entity_eia", PUDL_ENGINE)

213 plants are missing latitudes
image

16 plants have coordinates further east than the east coast of the US:
image

One plant has a non-existant coordinate:
image

Spot checking these plants revealed that they all appear to have valid/non-missing coordinates in the 2022 EIA-860 plants file.

Expected behavior

I expect the lat/long data in plants_entity_eia to match the lat/long data in the most recent raw EIA-860 table for which lat/long data is available.

Software Environment?

  • Operating System Windows
  • Python version and distribution Python 3.11.4
  • How did you install PUDL? N/A

Additional context

Add any other context about the problem here.

@grgmiller grgmiller added the bug Things that are just plain broken. label Dec 8, 2023
@grgmiller
Copy link
Collaborator Author

Just wanted to bump this issue: we're having some issues with bad timezone data due to bad lat/long values (#1192). We're going to try and patch this on our end, but it would be helpful if this could be fixed in pudl as well!

@grgmiller
Copy link
Collaborator Author

Linking this to #971 and #402

@grgmiller
Copy link
Collaborator Author

We are also noticing this issue for Pegasus Wind (plant ID 61916), which in EIA-860 is listed with a coordinate of:

latitude             43.452003
longitude            -83.50721

Which correctly places it in Michigan. However, for some reason in PUDL (v2024.5.0), the coordinates are changed to:

latitude         43.452003
longitude       -111.55111

Which puts it in Idaho.

Not sure why this is happening - maybe has to do with inconsistent coordinates being reported? @zaneselvans

@zaneselvans
Copy link
Member

That is a big difference! Not sure what's happening there either.

@ktehranchi mentioned he might be interested in taking on this issue more generally and implementing a more principled method of choosing a best lat/lon point that actually treats the lat/lon as a geopoint. See also #1280 #656

@grgmiller
Copy link
Collaborator Author

After diving into the raw EIA-860 tables a bit more, it looks like in some of the earlier years (eg 2017) for plant 61916, they were incorrectly reporting a longitude of -111 for this plant, so maybe pudl is taking the first value as a default?

However, it looks like even in the yearly plant output table in pudl, -111 is reported for all years, even though this was fixed in some of the later years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken.
Projects
Status: New
Development

No branches or pull requests

2 participants