-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: update docs for jail throttling v2 #1443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great refactoring! It helps a lot in the understanding of the intuition behind the Slash Throttling feature. Reviewed the half, still have to go through the Throttle retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of the ADR changes.
* If two slash packets are at the head of the queue, the consumer will send the first slash packet, and then wait for a success ack from the provider before sending the second slash packet. This seems like it'd simplify implementation. | ||
* VSC matured packets at the head of the queue (ie. NOT trailing a slash packet) can be sent immediately, and do not block any other packets in the queue, since the provider always handles them immediately. | ||
* Slash packets will always be sent to the provider once they're at the head of the queue. | ||
However, once sent, the consumer will not send any trailing `VSCMaturedPackets` from the queue until the provider responds with an ack that the `SlashPacket` has been handled (ie. validator was jailed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: having a visual aid would be helpful to explain what a trailing VSCMaturedPacket
is and how the sub-protocol operates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced trailing VSCMaturedPacket
with subsequent VSCMaturedPacket
. Does that help?
To prevent the provider from having to keep track of what slash packets have been rejected, the consumer will have to retry the sending of slash packets over some period of time. This can be achieved with an on-chain consumer param. The suggested param value would probably be 1/2 of the provider's `SlashMeterReplenishmentPeriod`, although it doesn't matter too much as long as the param value is sane. | ||
To prevent the provider from having to keep track of what `SlashPackets` have been rejected, the consumer will have to retry the sending of `SlashPackets` over some period of time. | ||
This can be achieved with an on-chain consumer param, i.e., `RetryDelayPeriod`. | ||
The suggested param value would probably be 1/2 of the provider's `SlashMeterReplenishmentPeriod`, although it doesn't matter too much as long as the param value is sane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumer does not know about the ShashMeterReplenishmentPeriod
.
The word "sane" is open to interpretation, maybe consider different phrasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that the consumer knows about this param as it can be queried on the provider. The ShashMeterReplenishmentPeriod
is not something that will be change regularly. I agree with replacing the word "sane" though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the recommendation to set RetryDelayPeriod
to 1/2 of SlashMeterReplenishmentPeriod
it's weird, especially as in the code both defaults are 1 hour. So I changed to this:
To reduce the amount of redundant re-sends, we recommend setting
RetryDelayPeriod ~ SlashMeterReplenishmentPeriod
, i.e., waiting for the provider slash meter to be replenished before resending the rejectedSlashPacket
.
|
||
We can improve the append time for this queue by converting it from a protobuf-esq list, to a queue implemented with sdk-esq code. The idea is to persist an uint64 index that will be incremented each time you queue up a packet. You can think of this as storing the tail of the queue. Then, packet data will be keyed by that index, making the data naturally ordered byte-wise for sdk's iterator. The index will also be stored in the packet data value bytes, so that the index can later be used to delete certain packets from the queue. | ||
We can improve the append time for this queue by converting it from a protobuf-esq list, to a queue implemented with sdk-esq code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this benchmarked or in any way tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of, but it's quite intuitive.
This feature will implement consumer changes in [#1024](https://github.com/cosmos/interchain-security/pull/1024). Note these changes should be deployed to prod for all consumers before the provider changes are deployed to prod. That is the consumer changes in #1024 are compatible with the current ("v1") provider implementation of throttling that's running on the Cosmos Hub as of July 2023. | ||
This feature will implement consumer changes in [#1024](https://github.com/cosmos/interchain-security/pull/1024). | ||
|
||
❗***These changes should be deployed to production for all consumers before the provider changes are deployed to production.*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to enforce.
Does this mean that we cannot actually have this on the provider any time soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to release the throttling v2 changes in two releases: v3.2.0 will contain only the consumer-side changes and it will be backwards compatible with previous versions; v4.0.0 will the provider-side changes and it will not be compatible with consumer running versions < v3.2.0. For more details, see 2d6b877
|
||
Once all consumers have deployed the changes in #1024, the provider changes from (TBD) can be deployed to prod, fully enabling v2 throttling. | ||
Once all consumers have deployed the changes in #1024, the provider changes from [#1321](https://github.com/cosmos/interchain-security/pull/1321) can be deployed to production, fully enabling v2 throttling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is feasible. The provider schedule should not depend on the consumers or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it should not. But it's hard to modify existing logic and add new logic that affect both provider and consumer without needing some coordination.
* refactor adr 002 for better understanding * refactor ADR 008 * update throttling params * add docstring to TestBasicSlashPacketThrottling * update features and releases * add comments to TestMultiConsumerSlashPacketThrottling * Update docs/docs/adrs/adr-008-throttle-retries.md Co-authored-by: MSalopek <[email protected]> * add review suggestions * replace trailing with subsequent * add upcoming versions * add notes on backward compatibility --------- Co-authored-by: MSalopek <[email protected]>
Description
Closes: NA
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
docs:
prefix in the PR titleReviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
docs:
prefix in the PR titlemake build-docs
)