Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch multiple pieces during object reconstruction #3158

Merged
merged 9 commits into from
Nov 27, 2024
Merged

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Oct 22, 2024

This PR changes ObjectFetcher to use the multi-piece fetching methods of PieceProvider to get small numbers of pieces.

As part of this change, ObjectFetcher couldn't be used as a dyn trait any more, because of the generic bounds on get_pieces().

Code contributor checklist:

@teor2345 teor2345 added the enhancement New feature or request label Oct 22, 2024
@teor2345 teor2345 self-assigned this Oct 22, 2024
crates/subspace-gateway/src/piece_getter.rs Outdated Show resolved Hide resolved
shared/subspace-data-retrieval/src/piece_fetcher.rs Outdated Show resolved Hide resolved
shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
// - the number of remaining piece indexes gets smaller, eventually finishing the fetcher, or
// - the number of pending pieces gets smaller, eventually triggering another batch.
// We also exit early if we have enough pieces to reconstruct a segment.
'fetcher: while !piece_indexes.is_empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this into two stages:

  • first try to download all 128 source pieces, if successful reconstruction will be very cheap and fast
  • as a fallback when actual reconstruction is needed, schedule more pieces to download, including parity

Right now it is implemented in a way that is a bit wasteful in terms of bandwidth (triggers more downloads than needed) and in terms of CPU usage (has overwhelmingly high chance of not getting 128 source pieces for cheap segment reconstruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already what the code does?

The first batch contains 128 source piece indexes. If all pieces in that batch succeed, then there aren’t any parity piece requests.

But as soon as any pieces fail, a batch of parity piece indexes is created, which contains exactly the number of pieces needed to compensate for those failures. Then all batches are polled concurrently.

The code assumes that any pieces that are still pending will succeed, so there’s no wasted downloads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the way it is written wasn't as clear, but now I understand what it does. If you exhaust the stream of pieces you've generated every time, why do you keep already finished streams in piece_streams (the question above)? I don't see how multiple streams can be pooled concurrently here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the way it is written wasn't as clear, but now I understand what it does.

Good feedback, I might split it into multiple methods so it's clearer.

flatten_unordered() polls all the streams in the vector concurrently, and can return the pieces from any stream.

ready_chunks() waits until at least one piece result is ready, then returns all the ready pieces as a vector. But if any pieces are still pending, they are left in the stream.

Then if any of the piece results in that vector are None, we add a batch of parity pieces to replace them.

We can definitely drop streams once they're done, I'll make some notes about how to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually already wrote an efficient piece retrieval for reconstruction purposes in https://github.com/autonomys/subspace/blob/362c1f5dce076b6c13452977a533f9091d515bb1/crates/subspace-farmer-components/src/segment_reconstruction.rs

It might be a little simpler and it does try to download pieces in batches all the time, avoiding batches of 1 that you will most likely get majority of time after initial request due to the way ready_chunks works. It is overall less eager and tries to not do a lot of heavy lifting.

Can be extracted into a utility somewhere to return a segment worth of pieces and then reused in farmer code, here and in DSN sync on the node, where exactly the same logic will be necessary for piece retrieval, just what we do with those pieces is slightly different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll do this in another PR

Base automatically changed from gateway-args to main October 22, 2024 12:57
Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll turn some of my comments on this PR into code comments, so the structure of the code is clearer.

crates/subspace-gateway-rpc/src/lib.rs Outdated Show resolved Hide resolved
shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
// - the number of remaining piece indexes gets smaller, eventually finishing the fetcher, or
// - the number of pending pieces gets smaller, eventually triggering another batch.
// We also exit early if we have enough pieces to reconstruct a segment.
'fetcher: while !piece_indexes.is_empty()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already what the code does?

The first batch contains 128 source piece indexes. If all pieces in that batch succeed, then there aren’t any parity piece requests.

But as soon as any pieces fail, a batch of parity piece indexes is created, which contains exactly the number of pieces needed to compensate for those failures. Then all batches are polled concurrently.

The code assumes that any pieces that are still pending will succeed, so there’s no wasted downloads.

@teor2345 teor2345 enabled auto-merge October 22, 2024 20:25
Copy link
Member

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense overall, just a question related to a special case.

shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the reviews! I'll work on a refactor to make this code clearer.

shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
// - the number of remaining piece indexes gets smaller, eventually finishing the fetcher, or
// - the number of pending pieces gets smaller, eventually triggering another batch.
// We also exit early if we have enough pieces to reconstruct a segment.
'fetcher: while !piece_indexes.is_empty()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the way it is written wasn't as clear, but now I understand what it does.

Good feedback, I might split it into multiple methods so it's clearer.

flatten_unordered() polls all the streams in the vector concurrently, and can return the pieces from any stream.

ready_chunks() waits until at least one piece result is ready, then returns all the ready pieces as a vector. But if any pieces are still pending, they are left in the stream.

Then if any of the piece results in that vector are None, we add a batch of parity pieces to replace them.

We can definitely drop streams once they're done, I'll make some notes about how to do that.

shared/subspace-data-retrieval/src/segment_fetcher.rs Outdated Show resolved Hide resolved
@teor2345 teor2345 marked this pull request as draft November 21, 2024 07:23
auto-merge was automatically disabled November 21, 2024 07:23

Pull request was converted to draft

Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the segment changes, and I'll do that refactor in another PR. All other issues should be fixed now.

// - the number of remaining piece indexes gets smaller, eventually finishing the fetcher, or
// - the number of pending pieces gets smaller, eventually triggering another batch.
// We also exit early if we have enough pieces to reconstruct a segment.
'fetcher: while !piece_indexes.is_empty()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll do this in another PR

@teor2345 teor2345 marked this pull request as ready for review November 25, 2024 05:13
@teor2345 teor2345 enabled auto-merge November 25, 2024 05:13
@teor2345 teor2345 dismissed nazar-pc’s stale review November 25, 2024 20:29

Changes made (or deferred to another PR)

Comment on lines 70 to 71
.get_piece_from_archival_storage(piece_index, MAX_RANDOM_WALK_ROUNDS)
.await)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazar-pc is this (and the similar code below) still needed after your PR #3259?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, random walking is a cold storage fallback fr cases when a piece is not cached, in that case it is wandering around the network hoping to stumble upon someone storing a piece in their plot rather than cache

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, nothing critical

@@ -102,7 +102,7 @@ pub async fn run(run_options: RunOptions) -> anyhow::Result<()> {
Semaphore::new(out_connections as usize * PIECE_PROVIDER_MULTIPLIER),
);
let piece_getter = DsnPieceGetter::new(piece_provider);
let object_fetcher = ObjectFetcher::new(piece_getter, erasure_coding, Some(max_size));
let object_fetcher = ObjectFetcher::new(piece_getter.into(), erasure_coding, Some(max_size));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puzzled me for a moment, this is the first time I saw From used instead of Arc::new() 🤔

Comment on lines 70 to 71
.get_piece_from_archival_storage(piece_index, MAX_RANDOM_WALK_ROUNDS)
.await)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, random walking is a cold storage fallback fr cases when a piece is not cached, in that case it is wandering around the network hoping to stumble upon someone storing a piece in their plot rather than cache

Comment on lines +51 to +52
let mut pieces = Vec::new();
pieces.resize(piece_indexes.len(), Piece::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, but it is a bit unfortunate that we'll be allocating so much upfront (each piece) and then simply throwing those allocations away.

I'd probably push all results into HashMap<PieceIndex, Piece> and then extract them at the end by piece index (pieces have copy-on-write behavior, so it'll be efficient, we don't need to remove individual pieces, can just drop the whole hashmap at the end at once).

@teor2345 teor2345 added this pull request to the merge queue Nov 27, 2024
Merged via the queue into main with commit 239d791 Nov 27, 2024
8 checks passed
@teor2345 teor2345 deleted the obj-piece-list branch November 27, 2024 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants