[Feature Request] Auto-recovery of cluster after hardware failure w/ remote store #11921

Bukhtawar · 2024-01-18T11:28:25Z

Is your feature request related to a problem? Please describe

Today on auto-restore we aren't able to auto-recover due to cases like isolated primaries #3706 espl cases like no-replica where we need to build a robust mechanism to ensure we don't have divergent writes.

Describe the solution you'd like

Once such mechanism to support zero replica is to use a empty replica that hosts no data, only metadata of the shard to ensure it doesn't lead to additional storage costs. This replica would perform continuous no-op replication on every indexing request and on failure of the primary can be promoted to the primary after the data has been synced from the S3. This simplifies problems with isolated writers and makes the replication protocol easy to reason about

Related component

Storage:Durability

Describe alternatives you've considered

No response

Additional context

No response

Bukhtawar added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 18, 2024

github-actions bot added the Storage:Durability Issues and PRs related to the durability framework label Jan 18, 2024

Bukhtawar mentioned this issue Jan 18, 2024

Easy and fluent disaster recovery (3.0?) #11894

Open

Bukhtawar removed the untriaged label Jan 18, 2024

Bukhtawar added this to Storage Project Board Feb 15, 2024

github-project-automation bot moved this to 🆕 New in Storage Project Board Feb 15, 2024

Bukhtawar moved this from 🆕 New to Later (6 months plus) in Storage Project Board May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Auto-recovery of cluster after hardware failure w/ remote store #11921

[Feature Request] Auto-recovery of cluster after hardware failure w/ remote store #11921

Bukhtawar commented Jan 18, 2024

[Feature Request] Auto-recovery of cluster after hardware failure w/ remote store #11921

[Feature Request] Auto-recovery of cluster after hardware failure w/ remote store #11921

Comments

Bukhtawar commented Jan 18, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context