You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.
It would be useful to be able to run the ETL pipeline locally for generating local data files for development, and possibly for back-filling missing data in AWS.
Solution
Add a command-line interface (CLI) that can download diag files from S3 (or access them locally) and run them through the ETL pipeline. I was just doing this interactively in ipython and it’s not too hard to handle.
importosimportboto3fromugdataimportaws, diagos.environ["UG_DIAG_ZARR"] ="./diag.zarr"s3=boto3.client("s3")
paginator=s3.get_paginator("list_objects_v2")
bucket="parallelcluster-rtma-cluster"prefix="UDD_3DRTMA_HRRR_DIAG/"forpageinpaginator.paginate(Bucket=bucket, Prefix=prefix):
forobjinpage["Contents"]:
# This was just to avoid parsing known bad dataifdiag.parse_diag_filename(obj['Key']).initialization_time<"2023-02-02T14:00":
continuerecords.append({"s3": {"bucket": {"name": page["Name"]}, "object": {"key": obj["Key"]}}})
aws.lambda_handler({"Records": records}, {})
The CLI should support:
Setting the Zarr location as a parameter instead of an environment variable
Filtering by initialization time
No Gos
Describe any features or behaviors that have been considered and rejected as out of scope for this project.
Rabbit Holes
Describe any solutions to problems that pose a risk to completing this project on time.
The text was updated successfully, but these errors were encountered:
Problem
It would be useful to be able to run the ETL pipeline locally for generating local data files for development, and possibly for back-filling missing data in AWS.
Solution
Add a command-line interface (CLI) that can download diag files from S3 (or access them locally) and run them through the ETL pipeline. I was just doing this interactively in ipython and it’s not too hard to handle.
The CLI should support:
No Gos
Describe any features or behaviors that have been considered and rejected as out of scope for this project.
Rabbit Holes
Describe any solutions to problems that pose a risk to completing this project on time.
The text was updated successfully, but these errors were encountered: