Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3886 3/4] Start saga for region replacement #5839

Merged
merged 5 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 99 additions & 34 deletions nexus/db-queries/src/db/datastore/volume.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1672,6 +1672,50 @@ impl DataStore {
}
}

/// Check if a region is present in a Volume Construction Request
fn region_in_vcr(
vcr: &VolumeConstructionRequest,
region: &SocketAddrV6,
) -> anyhow::Result<bool> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function can just return a bool instead of wrapping it in a Result. There are no errors possible here. That would simplify the callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's unfortunately a let parsed_target: SocketAddrV6 = target.parse()?; that prevents this :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn, I totally missed that. All good.

let mut parts: VecDeque<&VolumeConstructionRequest> = VecDeque::new();
parts.push_back(vcr);

let mut region_found = false;

while let Some(vcr_part) = parts.pop_front() {
match vcr_part {
VolumeConstructionRequest::Volume { sub_volumes, .. } => {
for sub_volume in sub_volumes {
parts.push_back(sub_volume);
}

// Skip looking at read-only parent, this function only looks
// for R/W regions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now...

}

VolumeConstructionRequest::Url { .. } => {
// nothing required
}

VolumeConstructionRequest::Region { opts, .. } => {
for target in &opts.target {
let parsed_target: SocketAddrV6 = target.parse()?;
if parsed_target == *region {
region_found = true;
break;
}
}
}

VolumeConstructionRequest::File { .. } => {
// nothing required
}
}
}

Ok(region_found)
}

pub struct VolumeReplacementParams {
pub volume_id: Uuid,
pub region_id: Uuid,
Expand Down Expand Up @@ -1796,6 +1840,61 @@ impl DataStore {
.transaction(&conn, |conn| {
let err = err.clone();
async move {
// Grab the old volume first
let maybe_old_volume = {
volume_dsl::volume
.filter(volume_dsl::id.eq(existing.volume_id))
.select(Volume::as_select())
.first_async::<Volume>(&conn)
.await
.optional()
.map_err(|e| {
err.bail_retryable_or_else(e, |e| {
VolumeReplaceRegionError::Public(
public_error_from_diesel(
e,
ErrorHandler::Server,
)
)
})
})?
};

let old_volume = if let Some(old_volume) = maybe_old_volume {
old_volume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you don't do the

let old_volume = match volume_dsl::volume.filter....
    Some(ov) => ov,
    _ => return Err..

pattern here like you do below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, though it does reduce the indentation to bail early here.

} else {
// Existing volume was deleted, so return an error. We
// can't perform the region replacement now!
return Err(err.bail(VolumeReplaceRegionError::TargetVolumeDeleted));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only possible outcome other than Some(old_volume) from the volume_dsl::volume... a
deleted volume?
I guess the err.bail_retryable_or_else() will return error directly if the query fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, bail here will stop the transaction retry from looping. The volume being deleted is fatal for the region replacement request's progress so we bail here.

};

let old_vcr: VolumeConstructionRequest =
match serde_json::from_str(&old_volume.data()) {
Ok(vcr) => vcr,
Err(e) => {
return Err(err.bail(VolumeReplaceRegionError::SerdeError(e)));
},
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section just making sure that the replacement did not already happen in another
saga and we don't try to redo work that was already done by someone else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section makes the whole function idempotent, yeah, but not for another saga - only one saga can win the lock that happens at the beginning of the start saga, so it's only guarding against when a node is rerun.

// Does it look like this replacement already happened?
let old_region_in_vcr = match region_in_vcr(&old_vcr, &existing.region_addr) {
Ok(v) => v,
Err(e) => {
return Err(err.bail(VolumeReplaceRegionError::RegionReplacementError(e)));
},
};
let new_region_in_vcr = match region_in_vcr(&old_vcr, &replacement.region_addr) {
Ok(v) => v,
Err(e) => {
return Err(err.bail(VolumeReplaceRegionError::RegionReplacementError(e)));
},
};

if !old_region_in_vcr && new_region_in_vcr {
// It does seem like the replacement happened
return Ok(());
}

use db::schema::region::dsl as region_dsl;
use db::schema::volume::dsl as volume_dsl;

Expand Down Expand Up @@ -1838,40 +1937,6 @@ impl DataStore {
// Update the existing volume's construction request to
// replace the existing region's SocketAddrV6 with the
// replacement region's
let maybe_old_volume = {
volume_dsl::volume
.filter(volume_dsl::id.eq(existing.volume_id))
.select(Volume::as_select())
.first_async::<Volume>(&conn)
.await
.optional()
.map_err(|e| {
err.bail_retryable_or_else(e, |e| {
VolumeReplaceRegionError::Public(
public_error_from_diesel(
e,
ErrorHandler::Server,
)
)
})
})?
};

let old_volume = if let Some(old_volume) = maybe_old_volume {
old_volume
} else {
// existing volume was deleted, so return an error, we
// can't perform the region replacement now!
return Err(err.bail(VolumeReplaceRegionError::TargetVolumeDeleted));
};

let old_vcr: VolumeConstructionRequest =
match serde_json::from_str(&old_volume.data()) {
Ok(vcr) => vcr,
Err(e) => {
return Err(err.bail(VolumeReplaceRegionError::SerdeError(e)));
},
};

// Copy the old volume's VCR, changing out the old region
// for the new.
Expand Down
Loading
Loading