You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have several bugs (#8400 , #8401, #8404) which are causing mainnet to hold large quantities of objects. As of 01-Dec-2023:
#8400 consumes 291k Payments in v29, 218k in v46, 98k in v68, and 74k in v69
#8401 has 175k cycles, which consume space in both zoe and various contract vats (125k in v29, 34k in v68, fewer in 14 other vats)
#8404 consumes 4k in v9-zoe
The #8400 leftover-Payments will be cleaned up incrementally: each time a new price feed is made, it will delete ten old ones. I estimated that this will finish cleaning up all 218k in v46 in about 15 days, and during this time it will trigger a BOYD (that will take an extra 1.2s) every 30 minutes, which would be quite sustainable. The v29 payments will take maybe 20 days to finish remediation.
However, the #8401 cycles are not easy for userspace to perform incremental cleanup. The remediation process is likely to have userspace delete the entire weakmap, causing all 175k objects to be dumped into liveslots for GC all at once, in "one fell swoop". If we do not implement #8417 , then this will dump some all 175k into the kernel at the same time. According to #8402, we might be able to survive this (as in I'm not yet seeing any superlinear execution time), but we need to be more confident than that.
So the goal of this ticket is to use the "mainfork" tool to run an actual chain upgrade that will trigger this large GC operation all at once, and measure how long it takes. It might take half an hour or more.
If the measured time is short enough to be acceptable, then we can proceed with remediation of #8401 without doing additional work (like #8401). If it is too long, or if it has other problems (high memory usage, etc), then we need to find another way, either building 8401 first, or going back to the drawing board and coming up with an entirely different workaround.
note that this test does not require the new vat to be functional: e.g. it could clobber in-flight offers. This would not be acceptable for the real upgrade, but this test only cares about triggering a sufficiently-problematic amount of GC work
use mainfork to create a clone of current mainnet state
in the clone, submit a CORE_EVAL proposal which upgrades vat-zoe to the form that does one-fell-swoop remediation
measure how long the resulting block takes
Another variant is to perform the upgrade as a chain-halting upgrade, which will more closely match how we expect to deploy this.
Security Considerations
none
Scaling Considerations
this measures scaling concerns, to decide whether we can afford to use one-fell-swoop remediation or not
Test Plan
none, this is a one-shot manual test
Upgrade Considerations
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
We have several bugs (#8400 , #8401, #8404) which are causing mainnet to hold large quantities of objects. As of 01-Dec-2023:
#8400
consumes 291k Payments in v29, 218k in v46, 98k in v68, and 74k in v69#8401
has 175k cycles, which consume space in both zoe and various contract vats (125k in v29, 34k in v68, fewer in 14 other vats)#8404
consumes 4k in v9-zoeThe
#8400
leftover-Payments will be cleaned up incrementally: each time a new price feed is made, it will delete ten old ones. I estimated that this will finish cleaning up all 218k in v46 in about 15 days, and during this time it will trigger a BOYD (that will take an extra 1.2s) every 30 minutes, which would be quite sustainable. The v29 payments will take maybe 20 days to finish remediation.However, the
#8401
cycles are not easy for userspace to perform incremental cleanup. The remediation process is likely to have userspace delete the entire weakmap, causing all 175k objects to be dumped into liveslots for GC all at once, in "one fell swoop". If we do not implement #8417 , then this will dump some all 175k into the kernel at the same time. According to #8402, we might be able to survive this (as in I'm not yet seeing any superlinear execution time), but we need to be more confident than that.So the goal of this ticket is to use the "mainfork" tool to run an actual chain upgrade that will trigger this large GC operation all at once, and measure how long it takes. It might take half an hour or more.
If the measured time is short enough to be acceptable, then we can proceed with remediation of #8401 without doing additional work (like #8401). If it is too long, or if it has other problems (high memory usage, etc), then we need to find another way, either building 8401 first, or going back to the drawing board and coming up with an entirely different workaround.
Description of the Design
Another variant is to perform the upgrade as a chain-halting upgrade, which will more closely match how we expect to deploy this.
Security Considerations
none
Scaling Considerations
this measures scaling concerns, to decide whether we can afford to use one-fell-swoop remediation or not
Test Plan
none, this is a one-shot manual test
Upgrade Considerations
The text was updated successfully, but these errors were encountered: