-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing and incorrect latitude / longitude data in plants_entity_eia
#3141
Comments
Just wanted to bump this issue: we're having some issues with bad timezone data due to bad lat/long values (#1192). We're going to try and patch this on our end, but it would be helpful if this could be fixed in pudl as well! |
We are also noticing this issue for Pegasus Wind (plant ID 61916), which in EIA-860 is listed with a coordinate of:
Which correctly places it in Michigan. However, for some reason in PUDL (v2024.5.0), the coordinates are changed to:
Which puts it in Idaho. Not sure why this is happening - maybe has to do with inconsistent coordinates being reported? @zaneselvans |
That is a big difference! Not sure what's happening there either. @ktehranchi mentioned he might be interested in taking on this issue more generally and implementing a more principled method of choosing a best lat/lon point that actually treats the lat/lon as a geopoint. See also #1280 #656 |
After diving into the raw EIA-860 tables a bit more, it looks like in some of the earlier years (eg 2017) for plant 61916, they were incorrectly reporting a longitude of -111 for this plant, so maybe pudl is taking the first value as a default? However, it looks like even in the yearly plant output table in pudl, -111 is reported for all years, even though this was fixed in some of the later years. |
Describe the bug
Several related issues:
plants_entity_eia
and other tables that contain lat/long data. As far as I can tell from manually inspecting the raw EIA-860 plants file from 2022, there are a small number of plants that are missing both lat/long data, but none that are only missing latitude data.Bug Severity
Medium: With some effort, I can work around the bug.
To Reproduce
I downloaded data from https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/v2023.12.01/pudl.sqlite.gz, and am loading the
plants_entity_eia
table usingpd.read_sql("SELECT plant_id_eia, latitude, longitude FROM plants_entity_eia", PUDL_ENGINE)
213 plants are missing latitudes
16 plants have coordinates further east than the east coast of the US:
One plant has a non-existant coordinate:
Spot checking these plants revealed that they all appear to have valid/non-missing coordinates in the 2022 EIA-860 plants file.
Expected behavior
I expect the lat/long data in plants_entity_eia to match the lat/long data in the most recent raw EIA-860 table for which lat/long data is available.
Software Environment?
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: