Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parsers, MX): drop seldom occuring datapoints with an hour set to 25 #7696

Merged
merged 2 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/zones/MX.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ capacity:
value: 7320.0
contributors:
- scriptator
- consideRatio
country: MX
emissionFactors:
direct:
Expand Down
9 changes: 9 additions & 0 deletions parsers/CENACE.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,18 @@ def fetch_csv_for_date(dt, session: Session | None = None):

# cleanup and parse the data
df.columns = df.columns.str.strip()

# transform 01-24 entries where 24 means 00 the next day
df["Hora"] = df["Hora"].apply(lambda x: "00" if int(x) == 24 else f"{int(x):02d}")
df["Dia"] = pd.to_datetime(df["Dia"], format="%d/%m/%Y")
df.loc[df["Hora"] == "00", "Dia"] = df["Dia"] + pd.Timedelta(days=1)

# The hour column has been seen at least once (3rd Nov 2024) to include 1-25
# hours rather than the expected 1-24, due to this, we are for now dropping
# such entries if they show up
df = df.drop(df[df["Hora"] == "25"].index)
Comment on lines +151 to +154
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this even happen...

Is it possible this is related to daylight savings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that as well, but this was 3rd Nov, and I don't think it would change in November - also, and it seems MX hasn't had DST since 2022 or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then IDK what it can be but since it's not every day we should probably just get rid of that datapoint.

I don't even know where it would end up as our database is strictly 24 hours and not 25 😅


# create datetime objects
df["Dia"] = df["Dia"].dt.strftime("%d/%m/%Y")
df["instante"] = pd.to_datetime(df["Dia"] + " " + df["Hora"], format="%d/%m/%Y %H")
df["instante"] = df["instante"].dt.tz_localize(TIMEZONE)
Expand Down
Loading