Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance MockS3Client to support real client delegation and configurable failures #2158

Merged
merged 8 commits into from
Feb 7, 2025

Conversation

G-D-Petrov
Copy link
Collaborator

@G-D-Petrov G-D-Petrov commented Jan 30, 2025

Reference Issues/PRs

ref monday ticket: 7971351691

What does this implement or fix?

This change introduces several improvements to the MockS3Client:

  • Add support for wrapping and delegating to a real S3 client
  • Implement a new check_failure method to enable configurable failure simulation for specific buckets through env variables
  • Update S3 storage initialization to use the new MockS3Client with real client delegation
  • Modify methods to first check for simulated failures before performing operations

The changes allow for more flexible testing scenarios and improved mock storage behavior.

Any other comments?

The intended use is something like:

# Enable failure
with config_context("S3Mock.EnableFailures", 1):
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[0]):
        time.sleep(5)
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[1]):
        time.sleep(5)
    # In these 5 seconds, all of the targets should have failed
    time.sleep(5)
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[0]):
        time.sleep(5)

# continue as usual

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

super().__init__(native_config)
self.http_protocol = "https" if use_ssl else "http"
self.ssl_test_support = ssl_test_support
self.bucket_versioning = bucket_versioning
self.default_prefix = default_prefix
self.use_raw_prefix = use_raw_prefix
self.use_mock_storage_for_testing = use_mock_storage_for_testing
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real change in this file.
The rest is just black reformatting.

Copy link
Collaborator

@IvoDD IvoDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only important thing is to allow triggering of the mock storage forwarding via a flag (or even a configsmap). This is needed to make the CI work.

It would also be nice to add some python tests with the new functionality to actually test that the whole forwarding thing works and can trigger failures as expected (both with the new bucket level errors and with the old symbol level errors).


if (conf.use_mock_storage_for_testing()){
ARCTICDB_RUNTIME_DEBUG(log::storage(), "Using Mock S3 storage");
s3_client_ = std::make_unique<MockS3Client>(std::move(s3_client_));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to leave the option open to create a regular MockS3Client() i.e. without forwarding to a real client.

This is currently breaking a bunch of C++ tests (which don't have the option to create a moto server).

What do you think about adding another flag use_mock_storage_with_forwarding?
Sorry after our offline discussion I thought I can do it in a later PR but I think we'll need to it here to make the C++ tests pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar change will be neeeded for the nfs_storage

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mock client and this wrapped real client are actually separate things. The real client case always just checks for a failure trigger then delegates. I think this would be cleaner if we have a new class for the "real client with failure simulation" rather than putting it in to this mock. I think we can tell this is the case because with the real client, this ceases to be a mock storage at all so all the use_mock_storage_for_testing checks are misleading.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess you are right, it is better if this is a wrapper over the client rather than an extension of the mock.

@@ -58,31 +58,92 @@ const auto not_found_error = create_error(Aws::S3::S3Errors::RESOURCE_NOT_FOUND)
const auto precondition_failed_error = create_error(Aws::S3::S3Errors::UNKNOWN, "PreconditionFailed", "Precondition failed", false, Aws::Http::HttpResponseCode::PRECONDITION_FAILED);
const auto not_implemented_error = create_error(Aws::S3::S3Errors::UNKNOWN, "NotImplemented", "A header you provided implies functionality that is not implemented", false);

std::optional<Aws::S3::S3Error> MockS3Client::check_failure(const std::string& bucket_name) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we discussed offline that we'd like to preserve the different modes of failure simulation how about we name this a bit more specifically. Something like check_bucket_level_failure or check_failure_from_configs_map?

And respectively rename the has_failure_trigger to e.g. check_symbol_level_failure or check_failure_from_symbol?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred the has_failure_trigger naming, I don't really understand what check_failure means

}

if (real_client_)
return real_client_->list_objects(name_prefix, bucket_name, continuation_token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most functions now have the following order of checks:

  1. Check for a bucket level config based failure
  2. Check for a symbol level symbol_name based failure
  3. Perform operation (either on real or on fake storage)

It is somewhat surprising that list_objects is the only function which does not do any checks symbol level checks in case we're using real storage.

I'll be rewriting the whole logic inside list_objects in a follow up PR so I will address this there and make it work similar to the other s3 operations. (so nothing to do here, just noting for second reviewer)

@@ -23,6 +25,7 @@
#include <arcticdb/util/exponential_backoff.hpp>
#include <arcticdb/util/configs_map.hpp>
#include <arcticdb/util/composite.hpp>
#include <chrono>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need the chrono and memory includes?

if (maybe_error.has_value()) {
return {*maybe_error};
}

for (auto& s3_object_name : s3_object_names){
auto maybe_error = has_failure_trigger(s3_object_name, StorageOperation::DELETE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows compilation failure here

@@ -121,14 +129,15 @@ def create_test_cfg(self, lib_name: str) -> EnvironmentConfigsMap:
is_https=self.factory.endpoint.startswith("https://"),
region=self.factory.region,
use_mock_storage_for_testing=self.factory.use_mock_storage_for_testing,
use_internal_client_wrapper_for_testing=self.factory.use_internal_client_wrapper_for_testing,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other real change.

},
[](py::tuple t) {
util::check(t.size() == 2, "Invalid S3Settings pickle objects");
s3::S3Settings settings(t[static_cast<uint32_t>(S3SettingsPickleOrder::AWS_AUTH)].cast<s3::AWSAuthMethod>(), t[static_cast<uint32_t>(S3SettingsPickleOrder::AWS_PROFILE)].cast<std::string>());
util::check(t.size() == 3, "Invalid S3Settings pickle objects");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't have much context on these S3Settings so this might be a dumb question. Should the pickling and unpickling be backwards compatible? Where are these pickled objects stored and will we ever try to unpickle the old version of the S3Settings?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe as pickled S3Settings is not persisted. The support of pickling should be for forking (more specifically, Spark) only

@@ -134,6 +135,11 @@ void S3Storage::create_s3_client(const S3Settings &conf, const Aws::Auth::AWSCre
ARCTICDB_RUNTIME_DEBUG(log::storage(), "Using provided auth credentials");
s3_client_ = std::make_unique<S3ClientImpl>(creds, get_s3_config(conf), Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never, conf.use_virtual_addressing());
}

if (conf.use_internal_client_wrapper_for_testing()){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a similar change for the nfs_backed_storage. Should we add one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I like how this allows to use a S3ClientWrapper around a MockS3Client to be able to simulate both types of failures. Nice :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have S3Settings for nfs_backed_storage and at the moment it is not needed.
I think that we should add something like for it and all other storages, but I don't think that this is in the scope of this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah makes sense. Didn't realize S3Settings does not apply to nfs.

// A wrapper around the actual S3 client which can simulate failures based on the configuration.
// The S3ClientWrapper delegates to the real client by default, but can intercept operations
// to simulate failures or track operations for testing purposes.
class S3ClientWrapper : public S3ClientInterface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3ClientWrapper seems kind of a generic name which doesn't signify it is used for simulating failures. What do you think about something like S3ClientWithFailureSimulation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer S3ClientWrapper as it better expresses the intention of the class.
The fact that we only use it for failures simulation at the moment, seems more like a implementation detail for the state at this time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, sorry for bikeshedding but what about S3ClientTestWrapper to signify it is for tests and not something which should be used outside of tests in any way?

lib.write("s", data=create_df())

with config_context("S3ClientWrapper.EnableFailures", 1):
with pytest.raises(NoDataFoundException, match="Unexpected network error: S3Error#99"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this raising a NoDataFoundException?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read is catching the exception internally and rethrowing it as NoDataFound

with pytest.raises(NoDataFoundException, match="Unexpected network error: S3Error#99"):
lib.read("s")
with config_context_string("S3ClientWrapper.FailureBucket", test_bucket.bucket):
with pytest.raises(StorageException, match="Unexpected network error: S3Error#99"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Also maybe it would be nice to add an assertion that the write works if FailureBucket is set to something else e.g. non-existant-bucket?
Mostly because nothing is testing the string munging code parsing the FailureBucket config.

…le failures

This change introduces several improvements to the MockS3Client:
- Add support for wrapping and delegating to a real S3 client
- Implement a new `check_failure` method to enable configurable failure simulation for specific buckets through env variables
- Update S3 storage initialization to use the new MockS3Client with real client delegation
- Modify methods to first check for simulated failures before performing operations

The changes allow for more flexible testing scenarios and improved mock storage behavior.
@G-D-Petrov G-D-Petrov merged commit f6b8933 into master Feb 7, 2025
152 checks passed
@G-D-Petrov G-D-Petrov deleted the enhance_s3_mock branch February 7, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants