-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47904: [C++] [Python] Allow S3 Filesystem Re-initialization #47903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
This would allow Arrow to coexist with other libraries that also use the AWS SDK. Internally since 2 years ago AWS SDK already has a refcount mechanism for `InitAPI` and supports re-init after deinit. - Introduced a mutex for thread-safe re-initialization of the S3 client after finalization, replacing the previous std::call_once mechanism. - Added an Initialize method to reset the finalized state of the S3 client. - Updated EnsureInitialized to allow re-initialization while ensuring thread safety. This change improves the flexibility and safety of the S3 client lifecycle management.
397c69b to
837205e
Compare
|
Thanks for doing this @corpoverlords ! It seems that the Python tests need updating now that re-initialization is allowed, can you take a look? |
| std::lock_guard<std::mutex> lock(init_mutex_); | ||
|
|
||
| if (!is_initialized_.load()) { | ||
| // Not already initialized, allow re-initialization after finalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we perhaps only allow it if the AWS SDK version is recent enough?
|
Hi @pitrou Thanks for the review! After internal use for a week now I think there is probably a need for revising the entire AWS SDK initialization mechanism. Since AWS SDK now supports a reference count we can actually let the |
|
What I think can be done:
|
Rationale for this change
This would allow Arrow to coexist with other libraries that also use the AWS SDK. Internally since 2 years ago AWS SDK already has a refcount mechanism for
InitAPIand supports re-init after deinit.This change improves the flexibility and safety of the S3 client lifecycle management.
What changes are included in this PR?
S3FS init changes
Are these changes tested?
Tested with our local infra. I can add a unit test in a dedicated cpp file (due to init/deinit usage) if wanted.
Are there any user-facing changes?
No