Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #621
This PR:
construct_gcs
function to async, which should hopefully improve performanceCurrently, every checkpoint instantiates a ObjectStore leading to a large number of calls to the metadata server. This can lead to a high number of concurrent DNS lookups, which may cause network latency and other undesirable effects. While the GCE metadata service endpoint has no official rate limit, we should avoid making unnecessary calls to it.
Took reference from Polars on this: pola-rs/polars#14384 (comment), https://github.com/pola-rs/polars/blob/main/crates/polars-io/src/cloud/object_store_setup.rs#L4
Note: the AWS metadata endpoint has a rate limit of 1024 packets per second, so it might be worth implementing this for the
construct_s3
function as well at some point.