-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing retirement dates from 860m data #2834
Comments
Whatever the issue ends up being here, we should make sure we add a new data validation test that checks for entirely NULL values in these columns when selecting just monthly update records. |
The monthly EIA data that gets subsumed into tables like the I'm looking at this specific part of process_raw for the EIA860m extractor: def process_raw(self, df, page, **partition):
"""Adds source column and report_year column if missing."""
df = df.rename(columns=self._metadata.get_column_map(page, **partition))
if "report_year" not in df.columns:
df["report_year"] = datetime.strptime(
list(partition.values())[0], "%Y-%m"
).year In the db table all the monthly data has the same date: |
No, there shouldn't be any monthly resolution data in the The original report dates are monthly, since it's a monthly update, but the data that they're reporting is being modeled as having annual frequency within our database, so I think that just retaining the year is appropriate in the extraction here. We're just using the most recent monthly update to indicate an annual value for the new year that hasn't yet been completely reported. And we only ever use a single monthly snapshot from the EIA-860m in the ETL -- we aren't ever extracting more than one So I think all of the most recently updated EIA-860M data should have Is there any other way that the monthly frequency information might be sneaking in? I could see that messing up a merge later on down the line that causes all of these values to get lost. |
Looks like the issue had to do with the 860m data column maps. The monthly data retirement dates were getting mapped to I updated the spreadsheet to map to |
Unfortunately, we now have a problem with building the Docker container that runs the nightly builds. Looking into it now. We can run the ETL locally and hand off a fresh PUDL DB if need be. |
Describe the bug
The
planned_generator_retirement_date
andgenerator_retirement_date
columns are empty for monthly_update generator data.Bug Severity
How badly is this bug affecting you?
To Reproduce
Read the data from the database downloaded Sept 5, 2023 at noon.
Then looking at the specific columns, all the values are
pd.NaT
.Expected behavior
These columns would be populated with retirement and planned retirement data from EIA 860 monthly.
Software Environment?
The text was updated successfully, but these errors were encountered: