Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gridded VIIRS from the NOAA AWS bucket and more #164

Open
wants to merge 54 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
b30c059
add new nesdis_viirs_aod_gridded data from AWS - remove old ftp
bbakernoaa Mar 12, 2024
b0ada71
update
bbakernoaa Mar 12, 2024
7510ea0
fix formatting
bbakernoaa Mar 12, 2024
9f8769c
move imports to module
bbakernoaa Mar 12, 2024
4f21f1c
remove blank lines at beginning
bbakernoaa Mar 12, 2024
22e3290
remove erronious code
bbakernoaa Mar 12, 2024
80d9131
fixes
bbakernoaa Mar 12, 2024
0862863
update for NRT products from NESDIS STAR https server
bbakernoaa Mar 13, 2024
7a57d21
changes for precommit
bbakernoaa Mar 13, 2024
29c6a1c
update __init__.py
bbakernoaa Mar 13, 2024
c53bc29
format attempt fix
bbakernoaa Mar 13, 2024
c7216c7
format fix
bbakernoaa Mar 13, 2024
1674be1
Add h5netcdf to dev env
zmoon Mar 14, 2024
fee81d6
Add initial test for gridded VIIRS AOD
zmoon Mar 14, 2024
de7f76c
Satellite and resolution cases
zmoon Mar 14, 2024
57e7449
Fix satellite name check
zmoon Mar 14, 2024
20da100
Update monetio/sat/nesdis_viirs_aod_aws_gridded.py
bbakernoaa Mar 18, 2024
e8d6515
Skip VIIRS test on 3.6
zmoon Mar 18, 2024
d2c6fa4
adding error checking and other recommended fixes
bbakernoaa Mar 19, 2024
c6a3098
Merge branch 'feature/viirs_aws_gridded' of https://github.com/bbaker…
bbakernoaa Mar 19, 2024
95ec965
remove untracked files - accident
bbakernoaa Mar 19, 2024
0e4f23e
fix avhrr and ndvi
bbakernoaa Mar 19, 2024
28572b3
Test input validation
zmoon Mar 19, 2024
20f307f
Raise
zmoon Mar 19, 2024
f54b0a6
Clean up docstring a bit
zmoon Mar 19, 2024
8e402cc
consistency
zmoon Mar 19, 2024
1d1f55a
More notes
zmoon Mar 19, 2024
ce1b644
Add date info to open_dataset docstrings
zmoon Mar 19, 2024
9212c80
sp
zmoon Mar 19, 2024
efa245b
Raise in mf version too
zmoon Mar 19, 2024
915e9fe
'both' option doesn't work currently
zmoon Mar 19, 2024
535dc12
Initial mf test
zmoon Mar 19, 2024
0d4a410
Update mf docstring
zmoon Mar 19, 2024
4a6a1be
Skip missing daily file with warning by default
zmoon Mar 19, 2024
efc3b33
Skip for monthly as well
zmoon Mar 19, 2024
cec5823
Always error for not found file in open_dataset
zmoon Mar 19, 2024
62ea648
Fix test for pre 3.10
zmoon Mar 19, 2024
45d8522
Merge branch 'noaa-oar-arl:stable' into feature/viirs_aws_gridded
bbakernoaa Aug 14, 2024
5d201fa
Merge branch 'develop' into feature/viirs_aws_gridded
bbakernoaa Aug 14, 2024
192c0e3
NDVI
zmoon Aug 14, 2024
402067b
AVHRR AOT
zmoon Aug 14, 2024
45679c4
Drop decode error culprit
zmoon Aug 14, 2024
f88a7fe
Hopefully avoid TIMEOFDAY warnings
zmoon Aug 15, 2024
957aa4a
AVHRR AOT test
zmoon Aug 15, 2024
ddd6fca
NESDIS VIIRS AOD NRT
zmoon Aug 15, 2024
1d3fd7c
Add NESDIS VIIRS AOD NRT tests
zmoon Aug 15, 2024
2dbf8d6
Remove warns
zmoon Aug 15, 2024
b754cad
Sort / most sat aren't in top-level
zmoon Aug 15, 2024
65fea0c
Merge remote-tracking branch 'noaa/develop' into feature/viirs_aws_gr…
zmoon Sep 19, 2024
be835c4
Merge branch 'noaa-oar-arl:stable' into feature/viirs_aws_gridded
bbakernoaa Nov 18, 2024
79dd79c
updates
bbakernoaa Nov 19, 2024
ceae7cb
update lai bucket
bbakernoaa Nov 19, 2024
129b644
imports
bbakernoaa Nov 19, 2024
561b616
pass kwarg to xaray
bbakernoaa Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions monetio/sat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@
_tropomi_l2_no2_mm,
goes,
modis_ornl,
nesdis_edr_viirs,
nesdis_eps_viirs,
nesdis_avhrr_aot_aws_gridded,
nesdis_eps_viirs_aod_nrt,
nesdis_frp,
nesdis_viirs_aod_aws_gridded,
nesdis_viirs_ndvi_aws_gridded,
)

__all__ = [
Expand All @@ -23,9 +25,11 @@
"_tropomi_l2_no2_mm",
"goes",
"modis_ornl",
"nesdis_edr_viirs",
"nesdis_eps_viirs",
"nesdis_avhrr_aot_aws_gridded",
"nesdis_eps_viirs_aod_nrt",
"nesdis_frp",
"nesdis_viirs_aod_aws_gridded",
"nesdis_viirs_ndvi_aws_gridded",
]

__name__ = "sat"
232 changes: 232 additions & 0 deletions monetio/sat/nesdis_avhrr_aot_aws_gridded.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
"""
NOAA Climate Data Record (CDR) Aerosol Optical Depth (AOD) Dataset Access Module

This module provides access to NOAA's satellite-derived Aerosol Optical Depth data:

Aerosol Optical Depth (AOD):
- Source: NOAA CDR AVHRR AOT (Aerosol Optical Thickness)
- Period: 1981-present
- Sensor: Advanced Very High Resolution Radiometer (AVHRR)
- Resolution: 0.1° x 0.1° (approximately 11km at equator)
- Coverage: Global over ocean
- Temporal Resolution:
* Daily averages
* Monthly averages
- Key Variables:
* aot_550: Aerosol Optical Thickness at 550nm
* number_of_retrievals: Number of valid retrievals
* quality_flags: Quality assurance flags
- AWS Path: noaa-cdr-aerosol-optical-thickness-pds/

Dataset Description:
The AVHRR AOT CDR provides a consistent, long-term record of aerosol optical
thickness over global oceans. This parameter is crucial for:
- Climate change studies
- Atmospheric correction
- Air quality monitoring
- Radiative forcing calculations

Data Access:
Files are stored in NetCDF format on AWS S3, organized by:
- Daily data: /data/daily/YYYY/
- Monthly data: /data/monthly/YYYY/

Usage:
>>> # Single date access (daily)
>>> dataset = open_dataset("2023-01-01")

>>> # Monthly data
>>> dataset = open_dataset("2023-01-01", averaging_time=AveragingTime.MONTHLY)

>>> # Multiple dates
>>> dates = pd.date_range("2023-01-01", "2023-01-10")
>>> dataset_multi = open_mfdataset(dates)

References:
- Dataset Documentation: https://www.ncdc.noaa.gov/cdr/atmospheric/aerosol-optical-thickness
- Algorithm Theoretical Basis Document (ATBD):
https://www.ncdc.noaa.gov/cdr/atmospheric/aerosol-optical-thickness/documentation

Notes:
- Data is only available over ocean surfaces
- Quality flags should be consulted for optimal data usage
- Monthly averages are computed from daily data
"""
from s3fs import S3FileSystem
from enum import Enum
from datetime import datetime
from pathlib import Path
from typing import List, Tuple, Union
import warnings
import pandas as pd
import s3fs
import xarray as xr

AOD_BASE_PATH = "noaa-cdr-aerosol-optical-thickness-pds/data/daily"
AOD_FILE_PATTERN = "AOT_AVHRR_*_daily-avg_"

class AveragingTime(Enum):
DAILY = "daily"
MONTHLY = "monthly"

def create_daily_aod_list(
date_generated: List[datetime],
fs: S3FileSystem,
warning: bool = False
) -> Tuple[List[str], int]:
"""
Creates a list of daily AOD (Aerosol Optical Depth) files and calculates the total size of the files.

Parameters:
date_generated (list): A list of dates for which to check the existence of AOD files.
fs (FileSystem): The file system object used to check file existence and size.
warning (bool, optional): If True, warns instead of raising error when file not found. Defaults to False.

Returns:
tuple[list[str | None], int]: A tuple containing:
- List of file paths (str) or None for missing files if warning=True
- Total size of the files in bytes
"""
# Loop through observation dates & check for files
nodd_file_list = []
nodd_total_size = 0
for date in date_generated:
file_date = date.strftime("%Y%m%d")
year = file_date[:4]
prod_path = Path(AOD_BASE_PATH) / year
file_names = fs.glob(str(prod_path / f"{AOD_FILE_PATTERN}{file_date}_*.nc"))
# If file exists, add path to list and add file size to total
if file_names:
nodd_file_list.extend(file_names)
nodd_total_size += sum(fs.size(f) for f in file_names)
else:
msg = f"File does not exist on AWS: {prod_path}/{AOD_FILE_PATTERN}{file_date}_*.nc"
if warning:
warnings.warn(msg)
nodd_file_list.append(None)
else:
raise ValueError(msg)

return nodd_file_list, nodd_total_size

def create_monthly_aod_list(date_generated, fs, warning=False):
"""
Creates a list of daily AOD (Aerosol Optical Depth) files and calculates the total size of the files.

Parameters:
date_generated (list): A list of dates for which to check the existence of AOD files.
fs (FileSystem): The file system object used to check file existence and size.

Returns:
tuple: A tuple containing the list of file paths and the total size of the files.
"""
# Loop through observation dates & check for files
nodd_file_list = []
nodd_total_size = 0
for date in date_generated:
file_date = date.strftime("%Y%m%d")
year = file_date[:4]
prod_path = "noaa-cdr-aerosol-optical-thickness-pds/data/monthly/" + year + "/"
patt = "AOT_AVHRR_*_daily-avg_"
file_names = fs.glob(prod_path + patt + file_date + "_*.nc")
# If file exists, add path to list and add file size to total
if file_names:
nodd_file_list.extend(file_names)
nodd_total_size += sum(fs.size(f) for f in file_names)
else:
msg = "File does not exist on AWS: " + prod_path + patt + file_date + "_*.nc"
if warning:
warnings.warn(msg)
nodd_file_list.append(None)
else:
raise ValueError(msg)

return nodd_file_list, nodd_total_size

def open_dataset(
date: Union[str, datetime],
averaging_time: AveragingTime = AveragingTime.DAILY
) -> xr.Dataset:
"""
Opens a dataset for the given date, satellite, data resolution, and averaging time.

Parameters:
date (str or datetime.datetime): The date for which to open the dataset.
averaging_time (str, optional): The averaging time.
Valid values are 'daily', or 'monthly'. Defaults to 'daily'.

Returns:
xarray.Dataset: The opened dataset.

Raises:
ValueError: If the input values are invalid.
"""
if isinstance(date, str):
date_generated = [pd.Timestamp(date)]
else:
date_generated = [date]

# Access AWS using anonymous credentials
fs = s3fs.S3FileSystem(anon=True)

if averaging_time == AveragingTime.MONTHLY:
file_list, _ = create_monthly_aod_list(date_generated, fs)
elif averaging_time == AveragingTime.DAILY:
file_list, _ = create_daily_aod_list(date_generated, fs)
else:
raise ValueError(
f"Invalid input for 'averaging_time' {averaging_time!r}: "
"Valid values are 'daily' or 'monthly'"
)

if len(file_list) == 0 or all(f is None for f in file_list):
raise ValueError(f"Files not available for product and date: {date_generated[0]}")

with fs.open(file_list[0]) as aws_file:
dset = xr.open_dataset(aws_file)

return dset

def open_mfdataset(dates, averaging_time: AveragingTime = AveragingTime.DAILY, error_missing=False):
"""
Opens and combines multiple NetCDF files into a single xarray dataset.

Parameters:
dates (pandas.DatetimeIndex): The dates for which to retrieve the data.
averaging_time (str, optional): The averaging time.
Valid values are 'daily', 'weekly', or 'monthly'. Defaults to 'daily'.

Returns:
xarray.Dataset: The combined dataset containing the data for the specified dates.

Raises:
ValueError: If the input parameters are invalid.

"""
from collections.abc import Iterable

if isinstance(dates, Iterable) and not isinstance(dates, str):
dates = pd.DatetimeIndex(dates)
else:
dates = pd.DatetimeIndex([dates])

# Access AWS using anonymous credentials
fs = s3fs.S3FileSystem(anon=True)

if averaging_time == AveragingTime.MONTHLY:
file_list, _ = create_monthly_aod_list(dates, fs, warning=not error_missing)
elif averaging_time == AveragingTime.DAILY:
file_list, _ = create_daily_aod_list(dates, fs, warning=not error_missing)
else:
raise ValueError(
f"Invalid input for 'averaging_time' {averaging_time!r}: "
"Valid values are 'daily' or 'monthly'"
)

if len(file_list) == 0 or all(f is None for f in file_list):
raise ValueError(f"Files not available for product and dates: {dates}")

aws_files = [fs.open(f) for f in file_list if f is not None]

with xr.open_mfdataset(aws_files, concat_dim="time", combine="nested") as dset:
return dset
110 changes: 0 additions & 110 deletions monetio/sat/nesdis_edr_viirs.py

This file was deleted.

Loading
Loading