Skip to content

Commit

Permalink
Slash packet throttling (#462)
Browse files Browse the repository at this point in the history
* wip

* Update module.go

* wip

* tests pass

* Update relay.go

* Update relay.go

* Update relay.go

* Update relay.go

* wip

* still wip

* panic with too many packets

* Update relay.go

* wip

* checkpoint

* Update throttle.go

* make relay.go closer to main

* merge fixes

* Update expected_keepers.go

* Update throttle_test.go

* Update throttle.go

* second queue is now implemented

* smalls

* Update throttle.go

* removed pointer silliness

* Update throttle_test.go

* test improvements

* small

* where it's called

* Update relay.go

* comments n stuff

* Update throttle.go

* Update throttle.go

* comments

* wip

* callbacks

* cleans

* mas tests

* Update README.md

* less diff

* mas

* keys

* wip

* changes

* Update params.go

* size constraints

* Update keys_test.go

* Update throttle_test.go

* wip

* on recv new behavior and test

* on recv slash packet

* so close

* clean

* Update slashing.go

* cleans

* wip

* params

* wip

* tests

* changes

* mas

* Update README.md

* Update README.md

* Update README.md

* sorry for the friday night emails

* Update instance_test.go

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* works

* Update generic_setup.go

* Update setup.go

* smol

* small

* path to ccv chan setup

* todos

* Update setup.go

* Create debug_test.go

* democ

* Update debug_test.go

* setup all ccv channels

* bump to main

* another bump, missed one

* fix after merge main

* fixes

* Update slashing.go

* expired client tests

* checks

* fixed the stuff

* smol

* changes

* updates

* cleans

* clean

* todo

* fixes

* cleans

* Update slashing.go

* Update slashing.go

* mod tidy

* fix build errs

* test fixes

* Update slashing.go

* Update throttle.go

* base rounding

* mas unit tests

* updates

* comments

* comments

* cleans

* clean

* small

* sin gas

* more understandable logic

* crypto rand

* utils

* one e2e

* helpers

* Update slashing.go

* helpers

* cleans

* shiz works

* tcs

* smalls

* un mas

* allowance changing test

* smol

* smol

* e2e tests are done

* lets try this

* Key Assignment -> goc-december (#527)

* add MsgAssignConsumerKey

* add MsgAssignConsumerKey

* fix package name

* add keys

* add keeper methods for key assignment

* handle MsgAssignConsumerKey

* map addresses in slash requests

* prune old consumer addresses

* move AssignConsumerKey logic to keeper

* update consumer initial valset

* add ApplyKeyAssignmentToValUpdates

* fix client creation

* do not check init valset on consumer

* clean state on val removal

* fix TestAssignConsensusKeyForConsumerChain

* delete on val removal

* remove reverse mapping on val removal

* remove pending key assignment in EndBlock

* add query endpoints
add summary of indexes
change ConsumerValidatorByVscID to ConsumerAddrsToPrune

* Refactor AssignConsumerKey for clarity (IMO)

* finish key assignment genesis code- untested

* FIxed mocks compile issue - not sure if it works right though.

* add test for init and export genesis

* set after get in AssignConsumerKey

* enable AssignConsumerKey to be called twice

* remove key assignment on chain removal

* apply some review comments

* fix bug: two validator with same consumer key

* rename key: ConsumerValidatorsByVscIDBytePrefix -> ConsumerAddrsToPruneBytePrefix

* PendingKeyAssignment -> KeyAssignmentReplacements

* msg.ProviderAddr is a validator addr

* fix: key assignment genesis tests (#517)

* Fix consumer init genesis test

* fix provider genesis tests

* fix key assignement handler

* fix linter

* fix merge conflict

* fix ProviderValidatorAddress

* remove unused expectation

Co-authored-by: Marius Poke <[email protected]>

* add key assignment CRUD operations unit tests (#516)

* test val consumer key related CRUD

* test val consumer addr related CRUD

* test pending key assignments related CRUD

* refactor after review session

* refactor after review session

* add prune key CRUD tests

* renamings in testfiles

* improve KeyAssignmentReplacement set and get

* remove ApplyKeyAssignmentToInitialValset (redundant)

* add invariant to docstring of AppendConsumerAddrsToPrune

* fix address conversion

* adding e2e tests

* fix linter

* add queries; setup integration tests (#519)

* add queries; setup integration testse

* test key assignment before chain start

* fix state queries; refactor

* rm extra comment

* rm unused action field

* bump voting times in all tests

* add provider address query to tests

* Adds some very basic random testing and unit tests (#522)

* Adds imports

* Does multi iterations: fails!

* Handle errs

* checkpoint debug

* Pre introduce dynamic mock

* Issue seems to be resolved

* Removes prints in key asisgn

* Removes debug, pre reintroduce all test features

* Fix some magic numbers, bring back prune check

* Pre rework initial assignments

* Refactor and tidyup

* Better docs, clarity, org

Co-authored-by: Daniel <[email protected]>

* Enable key assignment testing for all e2e tests (#524)

* split CCVTestSuite.setupCallback in two

* pre-assign keys for all vals of first consumer

* fix linter

* remove TestConsumerGenesis

Co-authored-by: mpoke <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: MSalopek <[email protected]>
Co-authored-by: Daniel T <[email protected]>
Co-authored-by: Daniel <[email protected]>

* Fix errors in merge commit, comment out failing TestRelayAndApplySlashPacket test

* add simon's test fixes

* Update state.go

* last replenish time -> last full time

* readme

* increment i

* shared method

* iteration change

* Meter allowance lockstep (#553)

changes

* log

* requested key format changes (#560)

changes

* Throttle garbage collection (#557)

* changes

* Update proposal.go

* add log

* Update throttle_test.go

* fixes

* update invariant

* Throttle bug fixes + req refactors (#565)

* fixes

* Update throttle.go

* fix tests

* set replenish frac = 1.0 for all test runs

* rm unmarshal func

* req refactors

* small lint fix

* comment adjustment

* found <-> jailed order swap

* 100% change

* weird

* merge fixes to build

* tests now pass

* name change

* better ordering tests

* remove integration test diff

* avoid double call to address mapping

* Update throttle.go

* 0 included in iteration start

* naming refactors

* Update keys_test.go

* more refactors

* clarify allowance terminology

* update doc with explanation on min value

* md clarification

* Update throttle.md

* swap replenish order

* add max limit note

* #533 Adds normal operation diff testing

* reb

* reb

* Del unused

Co-authored-by: Daniel <[email protected]>

* progress save

* cleans

* name change

* Bugfix (#605)

* Circuit breaker refactor (#606)

* quick fix

* small key correction

* smol

* don't store time length

* use big endian, shawn you dingus

Co-authored-by: Jehan <[email protected]>
Co-authored-by: mpoke <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: MSalopek <[email protected]>
Co-authored-by: Daniel T <[email protected]>
Co-authored-by: Daniel <[email protected]>
Co-authored-by: Jehan Tremback <[email protected]>
  • Loading branch information
8 people authored Dec 16, 2022
1 parent fa75e8d commit ae66785
Show file tree
Hide file tree
Showing 32 changed files with 3,269 additions and 181 deletions.
106 changes: 106 additions & 0 deletions docs/throttle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Slash/VSCMatured Packet Throttling

## Background

The CCV spec is based around the assumption that the provider binary and all consumers binaries are non-malicious, and follow the defined protocols. In practice, this assumption may not hold. A malicious binary could potentially sneak code into a consumer chain which is able to send many downtime or double signing packets at once to the provider.

Without packet throttling, an attacker could then create validators just below the provider's active set, and slash every honest validator at once. These honest validators are then jailed, and control of the chain passes over to the attacker's validators. This enables the attacker to commit arbitrary state on the provider, and to potentially steal all tokens bridged to the provider over IBC.

A solution to this issue is to handle slash packets on the provider such that validators would have more time to notice such an attack scenario is happening. With more time, validators can more effectively respond to the situation compared to everyone getting slashed instantaneously. The implementation in this repo of such a solution is to throttle slash and VSCMatured packets as described below.

## System Properties - CCV

The system properties maintained for CCV are defined here: [CCV spec - Consumer Initiated Slashing](https://github.com/cosmos/ibc/blob/main/spec/app/ics-028-cross-chain-validation/system_model_and_properties.md#consumer-initiated-slashing).

TODO: Update `Provider Slashing Warranty` to accommodate that slash requests are not always applied on the same block that the provider receives a slash packet.

## Practical/Implementation Properties

One property of this implementation (that probably doesn't need to be in the spec) is that if any of the chain-specific packet data queues become larger than `MaxPendingSlashingPackets (param)`, then the provider binary will panic, and the provider chain will halt. Therefore this param should be set carefully. See [PanicIfTooMuchThrottledPacketData](../x/ccv/provider/keeper/throttle.go#L264) for more details. This behavior is included so that if the provider binaries are queuing up more packet data than machines can handle, the provider chain halts deterministically between validators.

## Data structure - Global entry queue

There exists a single queue which stores "pending slash packet entries". These entries allow the provider to appropriately handle slash packets sent from any consumer in FIFO ordering. This queue is responsible for coordinating the order that slash packets (from multiple chains) are handled over time.

## Data structure - Per-chain data queue

For each established consumer, there exists a queue which stores "pending packet data". Ie. pending slash packet data is queued together with pending VSC matured packet data in FIFO ordering. Order is enforced by IBC sequence number. These "per-chain" queues are responsible for coordinating the order that slash packets are handled in relation to VSC matured packets from the same chain.

## Reasoning - Multiple queues

For reasoning on why this feature was implemented with multiple queues, see [spec](https://github.com/cosmos/ibc/blob/main/spec/app/ics-028-cross-chain-validation/system_model_and_properties.md#consumer-initiated-slashing). Specifically the section on _VSC Maturity and Slashing Order_. There are other ways to ensure such a property (like a queue of linked lists, etc.), but the implemented algorithm seemed to be the most understandable and easiest to implement with a KV store.

## IBC hook - OnRecvSlashPacket

Upon the provider receiving a slash packet from any of the established consumers, two things occur:

1. A pending slash packet entry is queued.
2. The data of such a packet is added to the per-chain queue

## IBC hook - OnRecvVSCMaturedPacket

Upon the provider receiving a VSCMatured packet from any of the established consumers, the following occurs:

1. If the per-chain queue for the consumer that sent this packet is empty, handle the VSC matured packet immediately.
2. Else, add the VSCMatured packet data to the per-chain queue, behind one or more existing packet data instances (which could include slash packet data and/or other VSCMatured packet data)

## Persisted State - Slash Meter

There exists one slash meter on the provider which stores an amount of voting power (integer), corresponding to an allowance of validators that can be jailed/tombstoned over time. This meter is initialized to a certain value on genesis, decremented whenever a slash packet is handled, and periodically replenished as decided by on-chain params.

## Endblocker handling - HandlePendingSlashPackets

Every endblocker the following psuedocode is executed

```typescript
meter := getSlashMeter()

// Keep iterating as long as the meter has positive gas and slash packet entries exist
while meter.IsPositive() && entriesExist() {
// Get next entry in queue
entry := getNextSlashPacketEntry()
// Decrement slash meter by the voting power that will be removed from the valset from handling this slash packet
valPower := entry.getValPower()
meter = meter - valPower
// Using the per-chain queue, handle the single slash packet using its queued data,
// then handle all trailing VSCMatured packets for this consumer
handleSlashPacketAndTrailingVSCMaturedPackets(entry)
// Delete entry in global queue, delete handled data
entry.Delete()
deletePendingSlashPacketData()
deleteTrailingVSCMaturedPacketData()
}
```

## Endblocker handling - Slash Meter Replenishment

Once the slash meter becomes not full, it'll be replenished after `SlashMeterReplenishPeriod (param)` by incrementing the meter with its allowance for the replenishment block, where `allowance` = `SlashMeterReplenishFraction (param)` * `currentTotalVotingPower`. The slash meter will never exceed its current allowance (fn of the total voting power for the block) in value. Note a few things:

1. The slash meter can go negative in value, and will do so when handling a single slash packet that jails a validator with significant voting power. In such a scenario, the slash meter may take multiple replenishment periods to once again reach a positive value, meaning no other slash packets may be handled for multiple replenishment periods.
2. Total voting power of a chain changes over time, especially as validators are jailed/tombstoned. As validators are jailed, total voting power decreases, and so does the slashing allowance for specific blocks.
3. The voting power allowance added to the slash meter during replenishment will always be greater than or equal to 1. If the `SlashMeterReplenishFraction (param)` is set too low, integer rounding will put this minimum value into effect. That is, if `SlashMeterReplenishFraction` * `currentTotalVotingPower` < 1, then the effective allowance would be 1. This min value of allowance ensures that there's some packets handled over time, even if that is a very long time. It's a crude solution to an edge case caused by too small of a replenishment fraction.

The behavior described above is achieved by executing `CheckForSlashMeterReplenishment()` every endblock.

## Throttling Invariant

Using on-chain params and the sub protocol defined, slash packet throttling is implemented such that the following invariant is maintained (in addition to those already defined in the CCV spec).

For the following invariant to hold, these points must be true:

- We assume the total voting power of the chain (as a function of delegations) does not significantly increase over the course of the attack.
- The final slashed validator does not have more than `SlashMeterReplenishFraction` of total voting power on the provider.
- `SlashMeterReplenishFraction` is large enough that `SlashMeterReplenishFraction` * `currentTotalVotingPower` > 1. Ie. the replenish fraction is set high enough that we can ignore rounding errors.
- `SlashMeterReplenishPeriod` is sufficiently longer than the time it takes to produce a block.

Invariant:

> If we define a consumer initiated slash attack to start when the first slash packet from such an attack is received by the provider, and we define the initial validator set as the set that existed when the attack started, the time it takes to jail/tombstone `X`% of the initial validator set will be greater than or equal to `(X * SlashMeterReplenishPeriod / SlashMeterReplenishFraction) - 2 * SlashMeterReplenishPeriod`
Intuition: If jailings begin when the slash meter is full, then `SlashMeterReplenishFraction` of the provider validator set can be jailed immediately. The remaining jailings are only applied when the slash meter is positive in value, so the time it takes to jail the remaining `X - SlashMeterReplenishFraction` of the provider validator set is `(X - SlashMeterReplenishFraction) * SlashMeterReplenishPeriod / SlashMeterReplenishFraction`. However, the final validator could be jailed during the final replenishment period, with the meter being very small in value (causing it to go negative after jailing). So we subtract another `SlashMeterReplenishPeriod` term in the invariant to account for this.

Note this invariant could be adjusted with different slash meter protocols, but the current scheme is the simplest to implement and understand.

This invariant is useful because it allows us to reason about the time it takes to jail a certain percentage of the initial provider validator set from consumer initiated slash requests. For example, if `SlashMeterReplenishFraction` is set to 0.06, then it takes no less than 4 replenishment periods to jail 33% of the initial provider validator set on the Cosmos Hub. Note that as of writing this on 11/29/22, the Cosmos Hub does not have a validator with more than 6% of total voting power.

Note also that 4 replenishment period is a worst case scenario that depends on well crafted attack timings.
21 changes: 17 additions & 4 deletions tests/difference/core/driver/core_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@ func (s *CoreSuite) matchState() {
// TODO: delegations
s.Require().Equalf(int64(s.traces.DelegatorTokens()), s.delegatorBalance(), diagnostic+"P del balance mismatch")
for j := 0; j < initState.NumValidators; j++ {
s.Require().Equalf(s.traces.Jailed(j) != nil, s.isJailed(int64(j)), diagnostic+"P jail status mismatch for val %d", j)
a := s.traces.Jailed(j) != nil
b := s.isJailed(int64(j))
s.Require().Equalf(a, b, diagnostic+"P jail status mismatch for val %d", j)
}
}
if chain == C {
Expand All @@ -259,6 +261,7 @@ func (s *CoreSuite) matchState() {
}

func (s *CoreSuite) executeTrace() {

for i := range s.traces.Actions() {
s.traces.CurrentActionIx = i

Expand Down Expand Up @@ -310,6 +313,8 @@ func (s *CoreSuite) TestAssumptions() {
s.T().Fatal(FAIL_MSG)
}

// TODO: write assumption that checks that throttle params are appropriate

// Delegator balance is correct
s.Require().Equal(int64(initState.InitialDelegatorTokens), s.delegatorBalance())

Expand Down Expand Up @@ -428,9 +433,10 @@ func (s *CoreSuite) TestTraces() {
s.traces = Traces{
Data: LoadTraces("traces.json"),
}
// s.traces.Data = []TraceData{s.traces.Data[69]}
shortest := -1
shortestLen := 10000000000
for i := range s.traces.Data {
s.Run(fmt.Sprintf("Trace num: %d", i), func() {
if !s.Run(fmt.Sprintf("Trace num: %d", i), func() {
// Setup a new pair of chains for each trace
s.SetupTest()

Expand All @@ -448,8 +454,15 @@ func (s *CoreSuite) TestTraces() {
// Record information about the trace, for debugging
// diagnostics.
s.executeTrace()
})
}) {
if s.traces.CurrentActionIx < shortestLen {
shortest = s.traces.CurrentTraceIx
shortestLen = s.traces.CurrentActionIx
}
}
}
fmt.Println("Shortest [traceIx, actionIx]:", shortest, shortestLen)

}

func TestCoreSuite(t *testing.T) {
Expand Down
7 changes: 7 additions & 0 deletions tests/difference/core/driver/setup.go
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,13 @@ func (b *Builder) build() {

b.setSlashParams()

// TODO: tidy up before merging into main
prams := b.providerKeeper().GetParams(b.ctx(P))
prams.SlashMeterReplenishFraction = "1.0"
prams.SlashMeterReplenishPeriod = time.Second * 1
b.providerKeeper().SetParams(b.ctx(P), prams)
b.providerKeeper().InitializeSlashMeter(b.ctx(P))

// Set light client params to match model
tmConfig := ibctesting.NewTendermintConfig()
tmConfig.UnbondingPeriod = b.initState.UnbondingP
Expand Down
2 changes: 1 addition & 1 deletion tests/difference/core/driver/traces.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions tests/difference/core/model/src/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ type ModelInitState = {
downtimeSlashAcks: number[];
tombstoned: boolean[];
matureUnbondingOps: number[];
queue: (Slash | VscMatured)[];
};
};

Expand Down
1 change: 1 addition & 0 deletions tests/difference/core/model/src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ const MODEL_INIT_STATE: ModelInitState = {
downtimeSlashAcks: [],
tombstoned: [false, false, false, false],
matureUnbondingOps: [],
queue: [],
},
staking: {
delegation: [4000, 3000, 2000, 1000],
Expand Down
29 changes: 25 additions & 4 deletions tests/difference/core/model/src/model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,8 @@ class CCVProvider {
tombstoned: boolean[];
// unbonding operations to be completed in EndBlock
matureUnbondingOps: number[];
// queue
queue: (Slash | VscMatured)[];

constructor(model: Model, { ccvP }: ModelInitState) {
this.m = model;
Expand All @@ -382,6 +384,7 @@ class CCVProvider {

endBlockCIS = () => {
this.vscIDtoH[this.vscID] = this.m.h[P] + 1;
this.processPackets();
};

endBlockVSU = () => {
Expand Down Expand Up @@ -420,14 +423,32 @@ class CCVProvider {
};

onReceive = (data: PacketData) => {
// It's sufficient to use isDowntime field as differentiator
if ('isDowntime' in data) {
this.onReceiveSlash(data);
/*
TODO: tidy up before merging to main
This is some quick prototyping to get the tests passing
We have 1 consumer chain so the slash queue is the global queue
if the queue is empty we can just process the packet.
*/
if (this.queue.length == 0 && !('isDowntime' in data)) {
// Skip the queue
this.onReceiveVSCMatured(data as VscMatured);
} else {
this.onReceiveVSCMatured(data);
this.queue.push(data);
}
};

processPackets = () => {
this.queue.forEach((data) => {
// It's sufficient to use isDowntime field as differentiator
if ('isDowntime' in data) {
this.onReceiveSlash(data);
} else {
this.onReceiveVSCMatured(data);
}
});
this.queue = [];
};

onReceiveVSCMatured = (data: VscMatured) => {
if (this.vscIDtoOpIDs.has(data.vscID)) {
this.vscIDtoOpIDs.get(data.vscID)!.forEach((opID: number) => {
Expand Down
Loading

0 comments on commit ae66785

Please sign in to comment.