-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RetryStrategy for S3 file system #9736
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,8 @@ | |
|
||
#include <aws/core/Aws.h> | ||
#include <aws/core/auth/AWSCredentialsProviderChain.h> | ||
#include <aws/core/client/AdaptiveRetryStrategy.h> | ||
#include <aws/core/client/DefaultRetryStrategy.h> | ||
#include <aws/core/http/HttpResponse.h> | ||
#include <aws/core/utils/logging/ConsoleLogSystem.h> | ||
#include <aws/core/utils/stream/PreallocatedStreamBuf.h> | ||
|
@@ -70,7 +72,6 @@ Aws::IOStreamFactory AwsWriteableStreamFactory(void* data, int64_t nbytes) { | |
return [=]() { return Aws::New<StringViewStream>("", data, nbytes); }; | ||
} | ||
|
||
// TODO: Implement retry on failure. | ||
class S3ReadFile final : public ReadFile { | ||
public: | ||
S3ReadFile(const std::string& path, Aws::S3::S3Client* client) | ||
|
@@ -562,6 +563,11 @@ class S3FileSystem::Impl { | |
clientConfig.maxConnections = hiveConfig_->s3MaxConnections().value(); | ||
} | ||
|
||
auto retryStrategy = getRetryStrategy(); | ||
if (retryStrategy.has_value()) { | ||
clientConfig.retryStrategy = retryStrategy.value(); | ||
} | ||
|
||
auto credentialsProvider = getCredentialsProvider(); | ||
|
||
client_ = std::make_shared<Aws::S3::S3Client>( | ||
|
@@ -640,6 +646,58 @@ class S3FileSystem::Impl { | |
return getDefaultCredentialsProvider(); | ||
} | ||
|
||
// Return a client RetryStrategy based on the config. | ||
std::optional<std::shared_ptr<Aws::Client::RetryStrategy>> getRetryStrategy() | ||
const { | ||
auto retryMode = hiveConfig_->s3RetryMode(); | ||
auto maxAttempts = hiveConfig_->s3MaxAttempts(); | ||
if (retryMode.has_value()) { | ||
if (retryMode.value() == "standard") { | ||
if (maxAttempts.has_value()) { | ||
VELOX_USER_CHECK_GE( | ||
maxAttempts.value(), | ||
0, | ||
"Invalid configuration: specified 'hive.s3.max-attempts' value {} is < 0.", | ||
maxAttempts.value()); | ||
return std::make_shared<Aws::Client::StandardRetryStrategy>( | ||
maxAttempts.value()); | ||
} else { | ||
// Otherwise, use default value 3. | ||
return std::make_shared<Aws::Client::StandardRetryStrategy>(); | ||
} | ||
} else if (retryMode.value() == "adaptive") { | ||
if (maxAttempts.has_value()) { | ||
VELOX_USER_CHECK_GE( | ||
maxAttempts.value(), | ||
0, | ||
"Invalid configuration: specified 'hive.s3.max-attempts' value {} is < 0.", | ||
maxAttempts.value()); | ||
return std::make_shared<Aws::Client::AdaptiveRetryStrategy>( | ||
maxAttempts.value()); | ||
} else { | ||
// Otherwise, use default value 3. | ||
return std::make_shared<Aws::Client::AdaptiveRetryStrategy>(); | ||
} | ||
} else if (retryMode.value() == "legacy") { | ||
if (maxAttempts.has_value()) { | ||
VELOX_USER_CHECK_GE( | ||
maxAttempts.value(), | ||
0, | ||
"Invalid configuration: specified 'hive.s3.max-attempts' value {} is < 0.", | ||
maxAttempts.value()); | ||
return std::make_shared<Aws::Client::DefaultRetryStrategy>( | ||
maxAttempts.value()); | ||
} else { | ||
// Otherwise, use default value maxRetries = 10, scaleFactor = 25 | ||
return std::make_shared<Aws::Client::DefaultRetryStrategy>(); | ||
} | ||
} else { | ||
VELOX_USER_FAIL("Invalid retry mode for S3: {}", retryMode.value()); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add the check of invalid value of retry mode? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. |
||
} | ||
return std::nullopt; | ||
} | ||
|
||
// Make it clear that the S3FileSystem instance owns the S3Client. | ||
// Once the S3FileSystem is destroyed, the S3Client fails to work | ||
// due to the Aws::ShutdownAPI invocation in the destructor. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -559,7 +559,18 @@ Each query can override the config by setting corresponding query session proper | |
- integer | ||
- | ||
- Maximum concurrent TCP connections for a single http client. | ||
|
||
* - hive.s3.max-attempts | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By default, aws sdk uses 3, right? Let's mention this here |
||
- integer | ||
- | ||
- Maximum attempts for connections to a single http client, work together with retry-mode. By default, it's 3 for standard/adaptive mode | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this document describe, default 4 for legacy mode
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. their cpp_sdk and document have some difference. Let's follow sdk There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe different AWS version has different default value |
||
and 10 for legacy mode. | ||
* - hive.s3.retry-mode | ||
- string | ||
- | ||
- **Allowed values:** "standard", "adaptive", "legacy". By default it's empty, S3 client will be created with RetryStrategy. | ||
Legacy mode only enables throttled retry for transient errors. | ||
Standard mode is built on top of legacy mode and has throttled retry enabled for throttling errors apart from transient errors. | ||
Adaptive retry mode dynamically limits the rate of AWS requests to maximize success rate. | ||
``Google Cloud Storage Configuration`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
.. list-table:: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user doesn't config retryMode but config maxAttempts, does maxAttempts take effect? It should takes effect since we have default retry mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. In velox,
maxAttempts
need work together withretryMode
. IfretryMode
isn't configured, S3 client will be created w/oRetryStrategy
. But for Gluten, we've set default value forretryMode
. I updated the doc in this PR.