Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Scylla API for restore #4192

Draft
wants to merge 25 commits into
base: ml/scylla-api
Choose a base branch
from

Conversation

Michal-Leszczynski
Copy link
Collaborator

WIP

Michal-Leszczynski and others added 7 commits January 2, 2025 13:03
* fix(backup_test): add missing 'Integration' suffix to tests

Some tests were missing the Integration suffix in their names.
This resulted in not including them in the 'make pkg-integration-test'
command used when running tests on gh actions.

* refactor(testutils): export CheckAnyConstraint

It is also useful for backup svc tests.

* fix(backup_test): skip TestBackupSkipSchemaIntegration for older Scylla versions
This adds /cloud/metadata api call to agent which should return cloud
instance metadata, such as instance_type and cloud_provider.

Refs: #4130
This log does not contain any useful information, but it clogs
the log files since checking for closest DC is done during
every fresh scyllaclient creation, which is done by the
config cache service every minute.
…#4185)

It turns out that Scylla 2024.2 does not expose this API.
For now, it's not know which enterprise release will contain it,
so we need to fall back to the CQL workaround.

Fixes #4183
For Scylla to access object storage, it needs to be configured
in the 'object_storage.yaml' config file.
A separate column for Scylla task ID is needed because:
- it has a different type from agent job ID
- it make it clear which API was used
Those methods consist of both:
- direct Scylla backup API call
- helper Scylla Task Manager API calls
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/restore-scylla-api branch 2 times, most recently from 0792ccb to 3529e52 Compare January 8, 2025 14:56
When working with Rclone, SM specifies just the provider name,
and Rclone (with agent config) resolves it internally to the correct endpoint.
This made it so user didn't need to specify the exact endpoint when running SM backup/restore tasks.

When working with Scylla, SM needs to specify resolved host name on its own.
This should be the same name as specified in 'object_storage.yaml'
(See https://github.com/scylladb/scylladb/blob/92db2eca0b8ab0a4fa2571666a7fe2d2b07c697b/docs/dev/object_storage.md?plain=1#L29-L39).

In order to maximize compatibility and UX, we still want it to be possible
to specify just the provider name when running backup/restore.
In such case, SM sends provider name as the "endpoint" query param,
which is resolved by agent to proper host name when forwarding request to Scylla.
Different "endpoint" query params are not resolved.

Note that resolving "endpoint" query param in the proxy is just for the UX,
so it might not work correctly in all the cases.
In order to ensure correctness, "endpoint" should be specified directly by SM user
so that no resolving is needed.
Scylla backup API can be used when:
- node exposes Scylla backup API
- s3 is the used provider
- backup won't create versioned files
This commit adds code for using Scylla backup API.
Luckily for us, handling pause/resume and progress
is analogous to the Rclone API handling.

Fixes #4143
Fixes #4138
Fixes #4141
Some tests used interceptor for given paths
in order to wait/block/check some API calls.
Those interceptors were updated to also look
for Scylla backup API paths.
Using Scylla backup API does not result in changes
to Rclone transfers, rate limiting or cpu pinning,
so it shouldn't be checked as a part of the restore test.
This is a simple test for checking whether the correct API
is used during the backup.
When new restore task is executed, it should have its
own task ID and run ID, but the cluster ID should remain
the same. This commit fixes an autofill typo from the past.
It was discovered because it affected the config cache service.
Otherwise, we panic inside updateSingle method.
This commit also contains a small test for
testing this behavior.
This is required for testing Scylla restore API
as it does not work with integer based SSTables.
A separate column for Scylla task ID is needed because:
- it has a different type from agent job ID
- it make it clear which API was used
This commit extends SSTable structure with
its TOC component, which is needed when using
Scylla restore API.
Moreover, it introduces batch types, which are
also needed for deciding, whether given batch
can be restored with Scylla restore API or
the Rclone API.
It also makes sure that all SSTables within
the same batch belong to the same batch type.
This commit adds code for using Scylla restore API.
Luckily for us, handling pause/resume
is analogous to the Rclone API handling.

Fixes #4144
Fixes #4137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants