Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Herbie can't find old GFS data older than 2021-1-1? #355

Closed
vwgeiser opened this issue Jul 30, 2024 · 12 comments
Closed

Herbie can't find old GFS data older than 2021-1-1? #355

vwgeiser opened this issue Jul 30, 2024 · 12 comments

Comments

@vwgeiser
Copy link

vwgeiser commented Jul 30, 2024

Not sure if this is a known issue but it seems Herbie can't find GFS data older than 2021-01-01?

H = Herbie("2021-1-1", model="gfs", fxx=0)
H.inventory()

✅ Found ┊ model=gfs ┊ product=pgrb2.0p25 ┊ 2021-Jan-01 00:00 UTC F00 ┊ GRIB2 @ aws-old ┊ IDX @ aws-old

H = Herbie("2020-12-31", model="gfs", fxx=0)
H.inventory()

💔 Did not find ┊ model=gfs ┊ product=pgrb2.0p25 ┊ 2020-Dec-31 00:00 UTC F00

I've tried a few more cases and dates but they all follow the same pattern.

Is there a known reason for this? Or are there other resources I should use if I need GFS forecasts data older than 2021?

@williamhobbs
Copy link
Contributor

I seem to recall this being related to a major update to GFS in early 2021 that changed the file names and structures, or something like that, and it isn’t accounted for in Herbie. But I can’t find and notes to confirm that, so don’t take this as definitive.

You could use GEFS - not the same as GFS, but similar.

@blaylockbk
Copy link
Owner

Hmm, thanks for bringing this to my attention. Looks like the model temple needs to be updated to look for older-style filename. I admit I haven't spent much time sorting out all the bugs for the GFS template.

@vwgeiser
Copy link
Author

@blaylockbk Is there a certain section of the code that this corresponds to? I'm new to Herbie but I can see if I can find a new way to work in a model template for older GFS data if I have an example.

@vwgeiser
Copy link
Author

vwgeiser commented Jul 31, 2024

@williamhobbs @blaylockbk The same behavior happens with the the GEFS on the boundary of 2020-09-22 and 2020-09-23.

Ideally I would find coverage for 2019 but a change accounting for 2019 might also cover years prior as well.

@williamhobbs
Copy link
Contributor

@vwgeiser Apologies, I didn't realize that GEFS had a similar issue. Maybe you could try the GEFS Reforecast with model="gefs_reforecast" - it should cover 2000 through 2019, but I think is a different version from the current operational GEFS.

Regarding the code for the GFS template, I think this is what you are looking for: https://github.com/blaylockbk/Herbie/blob/main/herbie/models/gfs.py.

@vwgeiser
Copy link
Author

vwgeiser commented Jul 31, 2024

Ah yes, that is what I was looking for!

Some older GFS data could potentially be accessed from the threads server or through ncei itself? Although, as opposed to the analysis files, forecast files are far from comprehensive:
https://www.ncei.noaa.gov/thredds/catalog/model-gfs-004-files/catalog.html
https://www.ncei.noaa.gov/data/global-forecast-system/access/historical/forecast/

Also see #325 potentially?

@vwgeiser
Copy link
Author

vwgeiser commented Jul 31, 2024

Since I'm not sure how I would add in a new data source for GFS yet I'll work on getting the GEFS through 2017 up and working. looks like the older GEFS data follow a slightly different file structure, but the data is still on AWS so that is a plus.

The best way to do this is probably just an if statement checking if self.date:%Y%m%d/%H < 20200923?

Hmm, on 2020-09-23 AWS started the current file structure. Before that AWS GEFS only has members 0-20 :/ without the spread or average members. On 2018-07-27 they began breaking it up into pgrb2a\ and pgrb2b\ too. The files before that threshold are not split up and will need to have a separate Herbie representation too.

@vwgeiser
Copy link
Author

vwgeiser commented Aug 2, 2024

My first try implementation is below, however it does not work, is there anything else I need to update besides filepaths? It should be the same AWS source at least?

        date_int = int(self.date.strftime("%Y%m%d%H"))

        if date_int < 20200923:
            if self.product == "wave":
                raise ValueError(
                    f"Dates before 2020-09-23 do not have wave producs :("
                )
            if self.product == "atmos.25":
                raise ValueError(
                    f"Dates before 2020-09-23 do not have atmos.25 producs :("
                )
            if self.product.startswith("chem"):
                raise ValueError(
                    f"Dates before 2020-09-23 do not have chem producs :("
                )
            if self.product.startswith("atmos"):
                if self.member == "spread":
                    raise ValueError(
                        f"Dates before 2020-09-23 do not have spread member :("
                    )
                elif self.member == "mean":
                    raise ValueError(
                        f"Dates before 2020-09-23 do not have mean member :("
                    )
        else:
            if self.product == "wave":
                if self.member == "spr":
                    self.member = "spread"
                elif self.member == "avg":
                    self.member = "mean"
            elif self.product.startswith("atmos"):
                if self.member == "spread":
                    self.member = "spr"
                elif self.member == "mean":
                    self.member = "avg"

        if self.member == 0:
            self.member = "c00"
        elif isinstance(self.member, int):
            self.member = f"p{self.member:02d}"

        filedir = f"gefs.{self.date:%Y%m%d/%H}"
        if date_int < 20180727:
            filepaths = {
                "atmos.5": f"{filedir}/ge{self.member}.t{self.date:%H}z.pgrb2af{self.fxx:03d}",
                "atmos.5b": f"{filedir}/ge{self.member}.t{self.date:%H}z.pgrb2bf{self.fxx:03d}",
            }
        elif date_int < 20200923:
            filepaths = {
                "atmos.5": f"{filedir}/pgrb2a/ge{self.member}.t{self.date:%H}z.pgrb2af{self.fxx:03d}",
                "atmos.5b": f"{filedir}/pgrb2b/ge{self.member}.t{self.date:%H}z.pgrb2bf{self.fxx:03d}",
            }
        else:            
            filepaths = {
                "atmos.5": f"{filedir}/atmos/pgrb2ap5/ge{self.member}.t{self.date:%H}z.pgrb2a.0p50.f{self.fxx:03d}",
                "atmos.5b": f"{filedir}/atmos/pgrb2bp5/ge{self.member}.t{self.date:%H}z.pgrb2b.0p50.f{self.fxx:03d}",
                "atmos.25": f"{filedir}/atmos/pgrb2sp25/ge{self.member}.t{self.date:%H}z.pgrb2s.0p25.f{self.fxx:03d}",
                "wave": f"{filedir}/wave/gridded/gefs.wave.t{self.date:%H}z.{self.member}.global.0p25.f{self.fxx:03d}.grib2",
                "chem.5": f"{filedir}/chem/pgrb2ap25/gefs.chem.t{self.date:%H}z.a2d_0p25.f{self.fxx:03d}.grib2",
                "chem.25": f"{filedir}/chem/pgrb2ap25/gefs.chem.t{self.date:%H}z.a2d_0p25.f{self.fxx:03d}.grib2",
            }

@vwgeiser
Copy link
Author

vwgeiser commented Aug 7, 2024

Following #358's format for a date check the new code would be:

from datetime import datetime

"""
A Herbie template for the GEFS (2017-Present) and GEFS Reforecast (2000-2019)
GRIB2 products.


"""


class gefs:
    def template(self):
        self.DESCRIPTION = "Global Ensemble Forecast System (GEFS)"
        self.DETAILS = {
            "Amazon Open Data": "https://registry.opendata.aws/noaa-gefs/",
            "NOMADS": "https://www.nco.ncep.noaa.gov/pmb/products/gens/",
        }

        self.PRODUCTS = {
            "atmos.5": "Half degree atmos PRIMARY fields (pgrb2ap5); ~83 most common variables.",
            "atmos.5b": "Half degree atmos SECONDARY fields (pgrb2bp5); ~500 least common variables",
            "atmos.25": "Quarter degree atmos PRIMARY fields (pgrb2sp25); ~35 most common variables",
            "wave": "Global wave products.",
            "chem.5": "Chemistry fields on 0.5 degree grid",
            "chem.25": "Chemistry fields on 0.25 degree grid",
        }

        if self.product is None:
            # Just select the first PRODUCT as default
            self.product = list(self.PRODUCTS)[0]

        # date_int = int(self.date.strftime("%Y%m%d%H"))

        if self.date < datetime(2020, 9, 23):
            if self.product == "wave":
                raise ValueError(
                    f"Dates before 2020-09-23 do not have wave products :("
                )
            if self.product == "atmos.25":
                raise ValueError(
                    f"Dates before 2020-09-23 do not have atmos.25 products :("
                )
            if self.product.startswith("chem"):
                raise ValueError(
                    f"Dates before 2020-09-23 do not have chem products :("
                )
            if self.product.startswith("atmos"):
                if self.member == "spread":
                    raise ValueError(
                        f"Dates before 2020-09-23 do not have spread member :("
                    )
                elif self.member == "mean":
                    raise ValueError(
                        f"Dates before 2020-09-23 do not have mean member :("
                    )
        else:
            if self.product == "wave":
                if self.member == "spr":
                    self.member = "spread"
                elif self.member == "avg":
                    self.member = "mean"
            elif self.product.startswith("atmos"):
                if self.member == "spread":
                    self.member = "spr"
                elif self.member == "mean":
                    self.member = "avg"

        if self.member == 0:
            self.member = "c00"
        elif isinstance(self.member, int):
            self.member = f"p{self.member:02d}"

        filedir = f"gefs.{self.date:%Y%m%d/%H}"
        if self.date < datetime(2018, 7, 27):
            filepaths = {
                "atmos.5": f"{filedir}/ge{self.member}.t{self.date:%H}z.pgrb2af{self.fxx:03d}",
                "atmos.5b": f"{filedir}/ge{self.member}.t{self.date:%H}z.pgrb2bf{self.fxx:03d}",
            }
        elif self.date < datetime(2020, 9, 23):
            filepaths = {
                "atmos.5": f"{filedir}/pgrb2a/ge{self.member}.t{self.date:%H}z.pgrb2af{self.fxx:03d}",
                "atmos.5b": f"{filedir}/pgrb2b/ge{self.member}.t{self.date:%H}z.pgrb2bf{self.fxx:03d}",
            }
        else:            
            filepaths = {
                "atmos.5": f"{filedir}/atmos/pgrb2ap5/ge{self.member}.t{self.date:%H}z.pgrb2a.0p50.f{self.fxx:03d}",
                "atmos.5b": f"{filedir}/atmos/pgrb2bp5/ge{self.member}.t{self.date:%H}z.pgrb2b.0p50.f{self.fxx:03d}",
                "atmos.25": f"{filedir}/atmos/pgrb2sp25/ge{self.member}.t{self.date:%H}z.pgrb2s.0p25.f{self.fxx:03d}",
                "wave": f"{filedir}/wave/gridded/gefs.wave.t{self.date:%H}z.{self.member}.global.0p25.f{self.fxx:03d}.grib2",
                "chem.5": f"{filedir}/chem/pgrb2ap25/gefs.chem.t{self.date:%H}z.a2d_0p25.f{self.fxx:03d}.grib2",
                "chem.25": f"{filedir}/chem/pgrb2ap25/gefs.chem.t{self.date:%H}z.a2d_0p25.f{self.fxx:03d}.grib2",
            }

        valid_members = {
            "atmos.5": [f"p{i:02d}" for i in range(1, 31)] + ["c00", "spr", "avg"],
            "atmos.5b": [f"p{i:02d}" for i in range(1, 31)] + ["c00"],
            "atmos.25": [f"p{i:02d}" for i in range(1, 31)] + ["c00", "spr", "avg"],
            "wave": [f"p{i:02d}" for i in range(1, 31)] + ["spread", "mean", "prob"],
            "chem.5": None,
            "chem.25": None,
        }

        filepath = filepaths.get(self.product)
        if filepath is None:
            raise ValueError(
                f"product={self.product} not recognized. Must be one of {self.PRODUCTS.keys()}"
            )

        _member = valid_members.get(self.product)
        if _member is not None and self.member not in _member:
            raise ValueError(
                f"For GEFS product {self.product}, member must be one of {_member}"
            )

        self.SOURCES = {
            "aws": f"https://noaa-gefs-pds.s3.amazonaws.com/{filepath}",
            # "aws-old": f"https://noaa-gefs-pds.s3.amazonaws.com/gefs.20170101/00/gec00.t00z.pgrb2af018
            "nomads": f"https://nomads.ncep.noaa.gov/pub/data/nccf/com/gens/prod/{filepath}",
            "google": f"https://storage.googleapis.com/gfs-ensemble-forecast-system/{filepath}",
            "azure": f"https://noaagefs.blob.core.windows.net/gefs/{filepath}",
        }

        self.IDX_SUFFIX = [".idx", ".grb2.idx", ".grib2.idx"]
        self.LOCALFILE = f"{self.get_remoteFileName}"

@blaylockbk
Copy link
Owner

Hi @vwgeiser.

You noticed that I started looking into updating the GFS template.

Not sure if this is a known issue but it seems Herbie can't find GFS data older than 2021-01-01?

This is correct,

  • AWS archive begins 2021-1-1
  • Google archive begins 2021-1-1
  • Azure archive only has last 30 days

@vwgeiser
Copy link
Author

vwgeiser commented Aug 7, 2024

The above is an attempt to update the GEFS template because the data exists on AWS but follows a slightly different file structure. with the above edits it still is unable to find data prior to 2020-09-23, which marks the start of the most current GEFS file structure.

It potentially belongs in a separate issue?

Although I've been following along with updating the GFS template as well.

@blaylockbk
Copy link
Owner

Closed by #358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants