Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: wiring for bandwidth scheduler #12234

Merged
merged 46 commits into from
Oct 24, 2024
Merged

feat: wiring for bandwidth scheduler #12234

merged 46 commits into from
Oct 24, 2024

Conversation

jancionear
Copy link
Contributor

@jancionear jancionear commented Oct 16, 2024

Add the wiring needed for the bandwidth scheduler algorithm.

Changes:

  • Add a new ProtocolFeature - BandwidthScheduler, its protocol version is set to nightly
  • Add a struct that will keep the bandwidth requests generated by the shards
  • Propagate the bandwidth requests through the blockchain - put the generated bandwidth requests in the shard headers, pass the previous bandwidth requests to the runtime
  • Add a struct that represents the bandwidth scheduler state, it's stored in the trie and modified on every scheduler invocation.
  • Mock implementation of bandwidth scheduler - it takes the previous bandwidth requests and the state and mocks the scheduler algorithm. It activates the requests propagation logic and breaks some tests.

Propagation of bandwidth requests

The flow of bandwidth requests looks as follows:

  • A chunk is applied and generates bandwidth requests. They are put in ApplyResult and ApplyChunkResult
  • The requests are taken from the apply result and put in ChunkExtra. ChunkExtra is persisted in the database
  • During chunk production, Client fetches ChunkExtra of the previous chunk and puts the bandwidth requests in chunk header
  • The produced chunks are included in the block
  • The new chunks are applied, their ApplyState contains bandwidth requests taken from all the chunk headers in the block that contains the applied chunks.
  • During the application, bandwidth scheduler looks at the requests created at the previous height and grants banwidth
  • Receipts are sent out
  • Then the chunk generates new bandwidth requests
  • etc

The flow is very similar to the one for congestion info.

Scheduler state

Bandwidth scheduler needs to keep some persistent state. In the future it'll be something like "how much every shard was granted lately", it'll be used to maintain fairness. For now it's just mock data.
Scheduler state should always be the same on all shards. All shards start with the same scheduler state, apply the scheduler at the same heights with the same inputs and always end up with the same scheduler state.
This means that the bandwidth scheduler also needs to be run for missing chunks. Luckily that can be easily achieved thanks to existing apply_old_chunk infrastructure (all missing chunk are applied, it counts as "implicit state transitions").
The state_root will now change at every height, even when there are no receipts to be processed. It breaks some tests which assumed that the state root wouldn't change.

The pull request is meant to be reviewed commit-by-commit, I tried to make the commit history nice.

Add the structs which will be used to represent bandwidth requests
generated by a shard.
For now the BandwidthRequest doesn't have the requested values,
they will be added later.
Chunk headers will contain bandwidth requests
generated during the previous chunk application.

We will collect bandwidth requests from all the shards
and use them in bandwidth scheduler during chunk application.
Bandwidth requests will be generated during chunk application
and then they'll be available in the ApplyResult.
Result of chunk application should keep the generated bandwidth requests.
ChunkExtra stores the results of chunk application
in a persistent way. Let's put the generated bandwidth
requests there and then fetch them when producing the
next chunk.
Collect the bandwith requests generated by all shards
at the previous height and expose them to the runtime.
Runtime needs to have the requests, as they're needed
to run the bandwidth scheduler.
Add a struct that keeps the persistent state
used by the bandwidth scheduler.
Add a mock implementation of the bandwidth scheduler algorithm.
Bandwidth scheduler takes the current state and previous bandwidth
requests and generates bandwidth grants. The mock implementation
takes the inputs and generates deterministic state changes based
on them, but it doesn't generate the bandwidth grants yet.
The mock implementation is enough to activate the logic that
propagates bandwidth requests throughout the blockchain and
break some tests.
This test assumed that the state root doesn't change when there are no receipts,
but this is no longer true. Bandwidth scheduler modifies the state on every height,
so now the state root changes every time.
state_viewer::apply_chunk has the ability to apply a chunk
when the block that contains the chunk isn't available.

Initially I passed empty bandwidth requests in the ApplyState,
as usually they're taken from the chunk headers in the block
that contains the applied chunk, and this block isn't available here.

But that breaks test_apply_chunk - a test which applies a chunk
in a normal way and compares the result with a chunk that was
applied without providing the block. It expects the state roots
to be the same, but that's not the case because the bandwidth
requests are different and bandwidth scheduler produces different state.

To deal with this we can try to fetch the original bandwidth requests
from the chunk extra of the previous chunks. It's an opportunistic
reconstruction - if the chunk extra is available it adds the requests
to the apply_state, if not it leaves them empty.
This is enough to fix the state root mismatch.
This tests creates a situation where the last few chunks
in an epoch are missing. It performs state sync, then
takes the state root of the first missing chunk on one node
and expects the state roots of all the missing chunks on the
other node to be the same as that first state root.
This breaks because bandwidth scheduler changes the state
at every height - even for missing chunks - so the state
root for later missing chunks is not the same as the state
root of the first missing chunk.
Fix the problem by comparing state roots of missing chunks
at the same heights.
This test performs state sync and then does a function call on the synced node
to test that the sync worked.
The function call at the end of the test started failing with `MissingTrieValue`.
I suspect that the function call is done with the wrong state root - it worked previously,
when all the state roots were the same, as the chunks don't have any transactions,
but broke when bandwidth scheduler started changing the state at every height.
The `MissingTrieValue` error stops occuring when the state root is taken from
the previous block.
My understanding of state sync isn't very good, but I think this theory makes sense.
Add an extra check to ensure that the scheduler state
stays the same on all shards.
@jancionear jancionear requested a review from wacban October 16, 2024 13:49
@jancionear jancionear requested a review from a team as a code owner October 16, 2024 13:49
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 84.94624% with 70 lines in your changes missing coverage. Please review.

Project coverage is 71.56%. Comparing base (cd319ac) to head (ed32376).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
core/primitives/src/views.rs 73.50% 31 Missing ⚠️
chain/chain/src/validate.rs 54.83% 13 Missing and 1 partial ⚠️
runtime/runtime/src/bandwidth_scheduler/mod.rs 90.00% 5 Missing and 2 partials ⚠️
core/primitives/src/bandwidth_scheduler.rs 84.00% 4 Missing ⚠️
chain/rosetta-rpc/src/adapters/transactions.rs 0.00% 3 Missing ⚠️
tools/state-viewer/src/util.rs 40.00% 3 Missing ⚠️
core/primitives/src/types.rs 93.33% 2 Missing ⚠️
tools/state-viewer/src/apply_chunk.rs 86.66% 1 Missing and 1 partial ⚠️
chain/chain-primitives/src/error.rs 0.00% 1 Missing ⚠️
..._validation/chunk_validator/orphan_witness_pool.rs 50.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12234      +/-   ##
==========================================
+ Coverage   71.55%   71.56%   +0.01%     
==========================================
  Files         836      838       +2     
  Lines      168170   168683     +513     
  Branches   168170   168683     +513     
==========================================
+ Hits       120335   120724     +389     
- Misses      42585    42705     +120     
- Partials     5250     5254       +4     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.23% <0.00%> (-0.01%) ⬇️
integration-tests 38.93% <76.55%> (+0.09%) ⬆️
linux 71.10% <48.17%> (-0.10%) ⬇️
linux-nightly 71.13% <78.70%> (+<0.01%) ⬆️
macos 54.18% <45.29%> (-0.13%) ⬇️
pytests 1.55% <0.00%> (-0.01%) ⬇️
sanity-checks 1.35% <0.00%> (-0.01%) ⬇️
unittests 65.34% <82.97%> (+0.02%) ⬆️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… is enabled

All chunks produced in the protocol version where bandwidth scheduler is enabled
should use ShardChunkHeaderInner::V4, I missed this in the previous commit.
The test iterates over all items in the trie and creates a StateRecord for each of them.
The problem is that some types of trie entries don't have a corresponding StateRecord variant.
For example outgoing buffers, yield resume data, and bandwidth scheduler state can't be made
into a StateRecord.
The test started failing because it tries to unwrap result of `StateRecord::from_raw_key_value`
for a trie entry that represents BandwidthSchedulerState. The function returns None and the
unwrap panics.
Fix the problem by removing the unwrap and instead looking for `Some` value. The test only
looks for one type of StateRecord, it doesn't matter if it skips over the scheduler state.
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 2137 to 2142
bandwidth_requests,
contract_accesses,
bandwidth_scheduler_state_hash: bandwidth_scheduler_output
.as_ref()
.map(|o| o.scheduler_state_hash)
.unwrap_or_default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fix the ordering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fixed now, the master merge fiddled with it.

@@ -56,6 +56,7 @@ pub mod col {
/// backpressure on the receiving shard.
/// (`primitives::receipt::Receipt`).
pub const BUFFERED_RECEIPT: u8 = 14;
pub const BANDWIDTH_SCHEDULER_STATE: u8 = 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I mixed that up with nibbles. You're right, we should be just fine. That's awesome :)

@jancionear jancionear added this pull request to the merge queue Oct 23, 2024
ShardChunkHeader::V3(header) => header.inner.bandwidth_requests(),
}
}

/// Returns whether the header is valid for given `ProtocolVersion`.
pub fn valid_for(&self, version: ProtocolVersion) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to myself - make sure that we check this in witness validation. Didn't matter before, but might matter now.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 23, 2024
@jancionear jancionear enabled auto-merge October 24, 2024 15:12
@jancionear jancionear added this pull request to the merge queue Oct 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 24, 2024
@jancionear jancionear added this pull request to the merge queue Oct 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 24, 2024
@jancionear jancionear enabled auto-merge October 24, 2024 17:51
@jancionear jancionear added this pull request to the merge queue Oct 24, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 24, 2024
@Longarithm Longarithm added this pull request to the merge queue Oct 24, 2024
Merged via the queue into master with commit ab75966 Oct 24, 2024
29 checks passed
@Longarithm Longarithm deleted the bandsim-wires branch October 24, 2024 19:39
@jancionear
Copy link
Contributor Author

🎉

github-merge-queue bot pushed a commit that referenced this pull request Oct 28, 2024
…control (#12307)

During review of bandwidth scheduler code, @wacban mentioned that he'd
prefer the header upgrade to be done the same way as it was done for
congestion control
[ref](#12234 (comment)),
but I wasn't convinced if it's really cleaner.

In this PR I modified the header upgrade to work the same way as it does
in congestion control. We can compare the two approaches and choose the
better one.

The current approach looks like this:
* Before protocol upgrade to `BandwidthScheduler` version all chunks use
`InnerV3`, which doesn't have bandwidth requests in it.
* After the protocol version upgrade all newly produced chunks should
have `InnerV4`. Application of the last chunk of the previous protocol
version will produce a `ChunkExtra` which doesn't have bandwidth
requests (they are set to `None`), so the bandwidth requests in
`InnerV4` of the first chunk are set to the default value. Bandwidth
requests in `InnerV4` are not an `Option`, so we can't set them to
`None`.
* After the first chunk all produced `ChunkExtras` will have
`bandwidth_requests` set to `Some`, and they'll be put inside `InnerV4`
* `validate_chunk_with_chunk_extra_and_receipts_root` needs to be aware
of what happens at the first block and allow situations where the
bandwidth requests in `ChunkExtra` are `None`, but they're
`Some(Default::default())` in the chunk header.

The congestion-control-like approach looks like this:
* Before protocol upgrade to `BandwidthScheduler` version all chunks use
`InnerV3`, which doesn't have bandwidth requests in it.
* The first chunk after the upgrade will still use `InnerV3` because the
`bandwidth_requests` in `ChunkExtra` are None.
* For future chunks the `bandwidth_requests` in `ChunkExtra` will be
`Some` and all chunks will use `InnerV4` with the requests.
* `validate_chunk_with_chunk_extra_and_receipts_root` can do a direct
comparison between the bandwidth requests in chunk extra and chunk
header.


In the current approach I like the exactness - all chunk headers
produced with the new protocol version have a new version of `Inner`. We
don't allow multiple header versions in one protocol version. The only
problem is that we need to have the special corner case check in
`validate_chunk_with_chunk_extra_and_receipts_root` . I think we also
need to make sure that we validate the header versions for endorsed
chunks, using `is_valid_for` or something like that.

The congestion-control-like approach doesn't have the weird corner case,
which is nice. It also doesn't require such strict validation of header
version - headers with wrong version will get rejected by chunk extra
validation because of None/Some difference. But it's much less exact. We
allow multiple inner versions for one protocol version, and I find that
much harder to reason about. I'm not sure what happens at genesis
chunks, it looks like we set the congestion infos to None, but that
means that genesis chunks would always have `InnerV2`, which would get
upgraded to `Inner<latest>` on the first chunk. weird. I changed them to
`Some(CongestionInfo::default())`, I think that makes things a bit
better, as now the chain starts with the current version of `Inner`.

Yet another approach would be to make bandwidth requests an `Option` in
`InnerV4`. They would be `None` on the first chunk and `Some` on the
next chunks. We could directly compare that with the requests in
`ChunkExtra`. But it's a bit sad that we'd have an `Option ` for
something that's supposed to always be there :/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants