Enhance MockS3Client to support real client delegation and configurable failures #2158

G-D-Petrov · 2025-01-30T16:21:54Z

Reference Issues/PRs

ref monday ticket: 7971351691

What does this implement or fix?

This change introduces several improvements to the MockS3Client:

Add support for wrapping and delegating to a real S3 client
Implement a new check_failure method to enable configurable failure simulation for specific buckets through env variables
Update S3 storage initialization to use the new MockS3Client with real client delegation
Modify methods to first check for simulated failures before performing operations

The changes allow for more flexible testing scenarios and improved mock storage behavior.

Any other comments?

The intended use is something like:

# Enable failure
with config_context("S3Mock.EnableFailures", 1):
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[0]):
        time.sleep(5)
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[1]):
        time.sleep(5)
    # In these 5 seconds, all of the targets should have failed
    time.sleep(5)
    with config_context_string("S3Mock.FailureBucket", target_bucket_names[0]):
        time.sleep(5)

# continue as usual

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

G-D-Petrov · 2025-01-30T16:22:44Z

python/arcticdb/storage_fixtures/s3.py

        super().__init__(native_config)
        self.http_protocol = "https" if use_ssl else "http"
        self.ssl_test_support = ssl_test_support
        self.bucket_versioning = bucket_versioning
        self.default_prefix = default_prefix
        self.use_raw_prefix = use_raw_prefix
+        self.use_mock_storage_for_testing = use_mock_storage_for_testing


This is the only real change in this file.
The rest is just black reformatting.

IvoDD

Only important thing is to allow triggering of the mock storage forwarding via a flag (or even a configsmap). This is needed to make the CI work.

It would also be nice to add some python tests with the new functionality to actually test that the whole forwarding thing works and can trigger failures as expected (both with the new bucket level errors and with the old symbol level errors).

IvoDD · 2025-01-31T08:51:49Z

cpp/arcticdb/storage/s3/s3_storage.cpp

+
+    if (conf.use_mock_storage_for_testing()){
+        ARCTICDB_RUNTIME_DEBUG(log::storage(), "Using Mock S3 storage");
+        s3_client_ = std::make_unique<MockS3Client>(std::move(s3_client_));


We still need to leave the option open to create a regular MockS3Client() i.e. without forwarding to a real client.

This is currently breaking a bunch of C++ tests (which don't have the option to create a moto server).

What do you think about adding another flag use_mock_storage_with_forwarding?
Sorry after our offline discussion I thought I can do it in a later PR but I think we'll need to it here to make the C++ tests pass.

A similar change will be neeeded for the nfs_storage

I think the mock client and this wrapped real client are actually separate things. The real client case always just checks for a failure trigger then delegates. I think this would be cleaner if we have a new class for the "real client with failure simulation" rather than putting it in to this mock. I think we can tell this is the case because with the real client, this ceases to be a mock storage at all so all the use_mock_storage_for_testing checks are misleading.

Yeah I guess you are right, it is better if this is a wrapper over the client rather than an extension of the mock.

IvoDD · 2025-01-31T08:59:27Z

cpp/arcticdb/storage/mock/s3_mock_client.cpp

@@ -58,31 +58,92 @@ const auto not_found_error = create_error(Aws::S3::S3Errors::RESOURCE_NOT_FOUND)
 const auto precondition_failed_error = create_error(Aws::S3::S3Errors::UNKNOWN, "PreconditionFailed", "Precondition failed", false, Aws::Http::HttpResponseCode::PRECONDITION_FAILED);
 const auto not_implemented_error = create_error(Aws::S3::S3Errors::UNKNOWN, "NotImplemented", "A header you provided implies functionality that is not implemented", false);

+std::optional<Aws::S3::S3Error> MockS3Client::check_failure(const std::string& bucket_name) const {


Since we discussed offline that we'd like to preserve the different modes of failure simulation how about we name this a bit more specifically. Something like check_bucket_level_failure or check_failure_from_configs_map?

And respectively rename the has_failure_trigger to e.g. check_symbol_level_failure or check_failure_from_symbol?

I preferred the has_failure_trigger naming, I don't really understand what check_failure means

IvoDD · 2025-01-31T09:09:49Z

cpp/arcticdb/storage/mock/s3_mock_client.cpp

+    }
+
+    if (real_client_)
+        return real_client_->list_objects(name_prefix, bucket_name, continuation_token);


Most functions now have the following order of checks:

Check for a bucket level config based failure

Check for a symbol level symbol_name based failure

Perform operation (either on real or on fake storage)

It is somewhat surprising that list_objects is the only function which does not do any checks symbol level checks in case we're using real storage.

I'll be rewriting the whole logic inside list_objects in a follow up PR so I will address this there and make it work similar to the other s3 operations. (so nothing to do here, just noting for second reviewer)

IvoDD · 2025-01-31T09:10:26Z

cpp/arcticdb/storage/mock/s3_mock_client.hpp

@@ -23,6 +25,7 @@
 #include <arcticdb/util/exponential_backoff.hpp>
 #include <arcticdb/util/configs_map.hpp>
 #include <arcticdb/util/composite.hpp>
+#include <chrono>


Nit: Do we need the chrono and memory includes?

poodlewars · 2025-01-31T09:27:31Z

cpp/arcticdb/storage/mock/s3_mock_client.cpp

+    if (maybe_error.has_value()) {
+        return {*maybe_error};
+    }
+
    for (auto& s3_object_name : s3_object_names){
        auto maybe_error = has_failure_trigger(s3_object_name, StorageOperation::DELETE);


Windows compilation failure here

G-D-Petrov · 2025-01-31T14:19:35Z

python/arcticdb/storage_fixtures/s3.py

@@ -121,14 +129,15 @@ def create_test_cfg(self, lib_name: str) -> EnvironmentConfigsMap:
            is_https=self.factory.endpoint.startswith("https://"),
            region=self.factory.region,
            use_mock_storage_for_testing=self.factory.use_mock_storage_for_testing,
+            use_internal_client_wrapper_for_testing=self.factory.use_internal_client_wrapper_for_testing,


This is the other real change.

IvoDD · 2025-02-03T10:45:00Z

cpp/arcticdb/storage/python_bindings.cpp

            },
            [](py::tuple t) {
-                util::check(t.size() == 2, "Invalid S3Settings pickle objects");
-                s3::S3Settings settings(t[static_cast<uint32_t>(S3SettingsPickleOrder::AWS_AUTH)].cast<s3::AWSAuthMethod>(), t[static_cast<uint32_t>(S3SettingsPickleOrder::AWS_PROFILE)].cast<std::string>());
+                util::check(t.size() == 3, "Invalid S3Settings pickle objects");


Sorry, I don't have much context on these S3Settings so this might be a dumb question. Should the pickling and unpickling be backwards compatible? Where are these pickled objects stored and will we ever try to unpickle the old version of the S3Settings?

I think this is safe as pickled S3Settings is not persisted. The support of pickling should be for forking (more specifically, Spark) only

IvoDD · 2025-02-03T10:49:57Z

cpp/arcticdb/storage/s3/s3_storage.cpp

@@ -134,6 +135,11 @@ void S3Storage::create_s3_client(const S3Settings &conf, const Aws::Auth::AWSCre
        ARCTICDB_RUNTIME_DEBUG(log::storage(), "Using provided auth credentials");
        s3_client_ = std::make_unique<S3ClientImpl>(creds, get_s3_config(conf), Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never, conf.use_virtual_addressing());
    }
+
+    if (conf.use_internal_client_wrapper_for_testing()){


I don't see a similar change for the nfs_backed_storage. Should we add one?

Also I like how this allows to use a S3ClientWrapper around a MockS3Client to be able to simulate both types of failures. Nice :)

We don't have S3Settings for nfs_backed_storage and at the moment it is not needed.
I think that we should add something like for it and all other storages, but I don't think that this is in the scope of this change.

ah makes sense. Didn't realize S3Settings does not apply to nfs.

IvoDD · 2025-02-03T10:56:46Z

cpp/arcticdb/storage/s3/s3_client_wrapper.hpp

+// A wrapper around the actual S3 client which can simulate failures based on the configuration.
+// The S3ClientWrapper delegates to the real client by default, but can intercept operations
+// to simulate failures or track operations for testing purposes.
+class S3ClientWrapper : public S3ClientInterface {


S3ClientWrapper seems kind of a generic name which doesn't signify it is used for simulating failures. What do you think about something like S3ClientWithFailureSimulation?

I prefer S3ClientWrapper as it better expresses the intention of the class.
The fact that we only use it for failures simulation at the moment, seems more like a implementation detail for the state at this time.

Makes sense, sorry for bikeshedding but what about S3ClientTestWrapper to signify it is for tests and not something which should be used outside of tests in any way?

IvoDD · 2025-02-03T11:05:34Z

python/tests/integration/arcticdb/test_s3.py

+    lib.write("s", data=create_df())
+
+    with config_context("S3ClientWrapper.EnableFailures", 1):
+        with pytest.raises(NoDataFoundException, match="Unexpected network error: S3Error#99"):


Why is this raising a NoDataFoundException?

The read is catching the exception internally and rethrowing it as NoDataFound

IvoDD · 2025-02-03T11:05:35Z

python/tests/integration/arcticdb/test_s3.py

+        with pytest.raises(NoDataFoundException, match="Unexpected network error: S3Error#99"):
+            lib.read("s")
+        with config_context_string("S3ClientWrapper.FailureBucket", test_bucket.bucket):
+            with pytest.raises(StorageException, match="Unexpected network error: S3Error#99"):


Nit: Also maybe it would be nice to add an assertion that the write works if FailureBucket is set to something else e.g. non-existant-bucket?
Mostly because nothing is testing the string munging code parsing the FailureBucket config.

…le failures This change introduces several improvements to the MockS3Client: - Add support for wrapping and delegating to a real S3 client - Implement a new `check_failure` method to enable configurable failure simulation for specific buckets through env variables - Update S3 storage initialization to use the new MockS3Client with real client delegation - Modify methods to first check for simulated failures before performing operations The changes allow for more flexible testing scenarios and improved mock storage behavior.

G-D-Petrov requested review from alexowens90, willdealtry and poodlewars as code owners January 30, 2025 16:21

G-D-Petrov commented Jan 30, 2025

View reviewed changes

IvoDD requested changes Jan 31, 2025

View reviewed changes

poodlewars reviewed Jan 31, 2025

View reviewed changes

G-D-Petrov commented Jan 31, 2025

View reviewed changes

IvoDD reviewed Feb 3, 2025

View reviewed changes

G-D-Petrov force-pushed the enhance_s3_mock branch from 72ef2c7 to a01abfd Compare February 4, 2025 08:07

IvoDD approved these changes Feb 4, 2025

View reviewed changes

G-D-Petrov force-pushed the enhance_s3_mock branch from a01abfd to b290b22 Compare February 7, 2025 11:18

G-D-Petrov added 8 commits February 7, 2025 13:56

Add S3ClientWrapper for configurable failure testing

9484194

Move the flag to S3Settings from the Config proto

f0b7d0b

Rename check_failure_from_configs_map to has_failure_trigger

d10e9e9

Add pytests for s3clientwrapper

57f460d

Add more test cases for S3ClientWrapper pytests

0baba82

Rename S3ClientWrapper to S3ClientTestWrapper for clarity

359ea68

Merge rebase

46e1d3a

G-D-Petrov force-pushed the enhance_s3_mock branch from b290b22 to 46e1d3a Compare February 7, 2025 12:35

poodlewars approved these changes Feb 7, 2025

View reviewed changes

G-D-Petrov merged commit f6b8933 into master Feb 7, 2025
152 checks passed

G-D-Petrov deleted the enhance_s3_mock branch February 7, 2025 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance MockS3Client to support real client delegation and configurable failures #2158

Enhance MockS3Client to support real client delegation and configurable failures #2158

G-D-Petrov commented Jan 30, 2025 •

edited

Loading

G-D-Petrov Jan 30, 2025

IvoDD left a comment

IvoDD Jan 31, 2025

IvoDD Jan 31, 2025

poodlewars Jan 31, 2025

G-D-Petrov Jan 31, 2025

IvoDD Jan 31, 2025

poodlewars Jan 31, 2025

IvoDD Jan 31, 2025

IvoDD Jan 31, 2025

poodlewars Jan 31, 2025

G-D-Petrov Jan 31, 2025

IvoDD Feb 3, 2025

phoebusm Feb 3, 2025

IvoDD Feb 3, 2025

IvoDD Feb 3, 2025

G-D-Petrov Feb 3, 2025

IvoDD Feb 3, 2025

IvoDD Feb 3, 2025

G-D-Petrov Feb 3, 2025

IvoDD Feb 3, 2025

IvoDD Feb 3, 2025

G-D-Petrov Feb 3, 2025

IvoDD Feb 3, 2025

Enhance MockS3Client to support real client delegation and configurable failures #2158

Enhance MockS3Client to support real client delegation and configurable failures #2158

Conversation

G-D-Petrov commented Jan 30, 2025 • edited Loading

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Choose a reason for hiding this comment

IvoDD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

G-D-Petrov commented Jan 30, 2025 •

edited

Loading