-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor batch building for better performance and concurrency #6
Comments
I've written few tests based on real world scenario that happened when batching broke last time and they are contained here: https://github.com/PowerLoom/submission-sequencer-batcher/compare/feat/batching-tests?expand=1 Cause of an issue is most probably added delay on ipfs store response time at certain period of time, atm I am implementing changes to batching logic that will help avoid this issue even when we have to wait longer for ipfs response. |
After testing the changes pushed so far in staging environment, the following are some of the issues that have been observed. The stress test was to fire, per epoch, 720 submissions from a full node as well as 400 submissions from 200 lite nodes = 1120 snapshot submissions Nonces out of order and missing tx receiptsThis has been observed to be mitigated by lowering the Larger batch sizes cause a batch submission transaction with larger arrays of project IDs and finalized CIDs. This can cause the tx to be dropped because of block space limits. Following logs show such a situation in progress. Full logs attached.
Batch submissions being retried for the same batch ID and epoch IDWith appropriate sized batches and signers, even though batch submissions seem to be going through, the same batch submission gets retried multiple times leading to failed txs. Failed txs: End batch submissions sent for an epoch without any batches actually being submittedIPFS nodes going out of order means batches were not uploaded, yet an end of batch submission was indicated. Following are logs for batch upload failures.
which is followed by a single transaction that indicates end of batch submission.
Delay of ~2 epochs in batches being builtFollowing are the logs which indicate that 2 minutes after epoch 1025 was released, the batch building began for epoch 1024. That is inconsistent with the expectation that the process for 1024 should have begun by the time the epoch release for 1024 arrived. This is effectively a delay of >=2 epochs in the batches being built and ultimately submitted.
|
The monolith is being refactored on the lines of https://github.com/PowerLoom/libp2p-submission-sequencer-listener and https://github.com/PowerLoom/sequencer-dequeuer to decouple and parallelize the workloads of
|
The specific work of CIDs finalized within batches and their merkle tree building is being refactored into a new component which will be deployed in staging and put to the test for the next couple of days. Also as noted in this issue comment , a lot of the issues detailed in my last report on this thread will be eliminated once we move away from this component. |
Describe the bug
For every epoch, eligible submissions submitted within the deadline are
The above processes run into bottle necks in case of batch size threshold being low, network issues with IPFS or anchor contract calls along with other CPU bound activity that does not utilize the benefits that come with using go routines completely.
To Reproduce
Affected versions:
Steps to reproduce the behavior:
As described above
Expected behavior
Batch building along with updating of submission counts can be greatly simplified by using worker groups, along with setting a minimum permissible threshold for the number of submissions that should be included in a batch.
Proposed Solution
WIP.
Caveats
WIP. To be expanded.
Additional context
NA
The text was updated successfully, but these errors were encountered: