Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune s3fs cache settings for optimal performance #81

Merged
merged 11 commits into from
Jul 3, 2024
Merged

Conversation

chuckwondo
Copy link
Collaborator

Based upon benchmarking tests, I've set the default s3fs cache settings to one of the better performing combinations. There were a number of combinations that performed very similarly, so the choice was a bit of a toss up, as many of them provide roughly a 10% speed improvement (on average) over the speed of downloading files. The speedup is not particularly notable, but that's not surprising, given that the granule files are not cloud-optimized.

However, this sets us up for possible further improvements, even with the non-cloud-optimized files, if we discover a more efficient means of writing our subsetting logic that requires fewer s3 read requests.

Fixes #77
Fixes #69

algorithm_config.yaml Show resolved Hide resolved
algorithm_config.yaml Outdated Show resolved Hide resolved
docs/MAAP_USAGE.md Show resolved Hide resolved
Copy link
Contributor

@wildintellect wildintellect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good for me, but you've got an error in the test due to a missing import for typing related to fsspec. Probably want to resolve that before merging.

Run make typecheck
+ [[ true == \t\r\u\e ]]
+ mypy src tests typings
src/gedi_subset/subset.py:23: error: Skipping analyzing "fsspec": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src/gedi_subset/subset.py:23: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 16 source files)
make: *** [Makefile:56: typecheck] Error 1
Error: Process completed with exit code 2.

@chuckwondo
Copy link
Collaborator Author

@wildintellect, I fixed the build failure. Did you want to take another pass over the changes before I merge?

@chuckwondo chuckwondo merged commit 4586dd6 into main Jul 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize s3fs read cache settings Change limit default value to be "unlimited"
2 participants