Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Awareness for PySPEDAS #1061

Merged
merged 14 commits into from
Jan 14, 2025
Merged

Cloud Awareness for PySPEDAS #1061

merged 14 commits into from
Jan 14, 2025

Conversation

edmondb
Copy link
Collaborator

@edmondb edmondb commented Nov 20, 2024

This is the completed PR that brings Cloud Awareness to PySPEDAS using the fsspec filesystem protocol (includes AWS and GCS support). The proposed changes do not break current API usage.

Functionality/Summary:

  • For invocation, changing one of the SPEDAS_DATA_DIR, local_data_dir, or remote_data_dir environment variables is required.
  • Retrieve files from http/https servers and place onto URI-path data storage (e.g., pull from host and put onto AWS)
  • Read/Stream from URI storage (preferred method of use) when given a URI for remote/local path.
  • Update URI-path storage from a separate URI-path storage if file mod time is newer.
  • Allow a user to force download from a remote URI (NOT RECOMMENDED) by using the force_download option.
  • Attempt to read/stream from local storage (POSIX or URI-based) if remote fails.
  • Mock cloud storage unit testing using moto; dependency is not included in requirements.txt file
  • Documentation updates for PySPEDAS
  • Note: MAVEN STS file types have issues when PyTplot and Cloud Awareness are in question because PyTplot is not cloud aware yet. The path taken was to report the error and ignore the file attempted to be read. This can be found in maven_load.py:543.

Dependencies:

  • fsspec
  • s3fs (for AWS)
  • aioboto3 (necessary due to cdflib's boto3 cloud implementation)
  • cdflib >= 1.0.0 (contains cloud storage reading functionality)

This code was tested using AWS' CLI on an EC2 resource provided by the HelioCloud project. Details for setup and temporary credential management not included.

Finally, a separate contribution of a Jupyter notebook will be submitted to the pyspedas_examples repository for use with public AWS storage of mission data (e.g., from CDAWEB).

closes #416

@edmondb edmondb linked an issue Nov 20, 2024 that may be closed by this pull request
@edmondb
Copy link
Collaborator Author

edmondb commented Dec 2, 2024

@jameswilburlewis The checks were not successful due to the MAVEN server being hit with too many requests. Let us know if there's anything you'd like us to do on our end regarding this PR.

@edmondb edmondb requested a review from nickssl December 5, 2024 20:38
@edmondb
Copy link
Collaborator Author

edmondb commented Dec 5, 2024

This PR was clean and will need conflict resolution due to recent PR merge from @nickssl .

b568d2d

@edmondb
Copy link
Collaborator Author

edmondb commented Dec 23, 2024

Resolved conflicting merged commit and reinstated Cloud Awareness for download.py. This should be ready to be incorporated now @jameswilburlewis

Copy link
Contributor

@jameswilburlewis jameswilburlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me -- it might be tricky to get the full test suite to run because of MAVEN's data server rate limits, but at least the quick_tests are passing, which is reassuring. I think it's ready to merge.

@jameswilburlewis jameswilburlewis merged commit 4f2b884 into master Jan 14, 2025
5 of 6 checks passed
@edmondb
Copy link
Collaborator Author

edmondb commented Jan 14, 2025

Ah! Yea, I had issues with this as well but the tests for this specific implementation were written and tested using the HelioCloud data on an AWS VM. If there's questions later regarding any tests failing (beyond MAVEN's server-related limitations), feel free to ping.

@jameswilburlewis
Copy link
Contributor

I did find one MAVEN bug introduced by the merge (now hopefully fixed in the master branch): when scraping the JPL NAIF index page to find MAVEN orbit files, the matching filenames weren't actually being added to the list to be downloaded, so anything that had to convert orbit numbers to times wouldn't work. Retests on github in progress now....fingers crossed!

@jameswilburlewis
Copy link
Contributor

There was a similar MAVEN bug in get_l2_files_from_date(utilities.py): 924b4f3

Should be fixed now, retest in progress...

@jameswilburlewis
Copy link
Contributor

All tests passing now! Next, I'll add the moto dependency and enable the new test suites in our github scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for loading data from Amazon S3
4 participants