-
Notifications
You must be signed in to change notification settings - Fork 37
Bug: count in defined activity produces count-1 events (one less than expected) #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@bjohnson5 do you have time to take a look at this? |
Sure, I think I have also noticed this before. I will take a look. |
I think this is happening because the In theory, this could also happen with the simulation time feature. The producer could send an event, then the time limit could hit and cause a shutdown before the consumer processes the event. NOTE: If you have more than one payment activity defined, only the last running activity loses one payment. This further confirms that it is a race condition between the shutdown trigger and the events consumer. @carlaKC Any thoughts on an approach to avoid this situation? Should the consumers make sure to empty their queue of events before shutting down? |
This sounds like a good approach, I believe it's best practice for channels to drain the receiver anyway so worth looking into! |
So it looks like it is slightly more complicated than just draining all the receivers because of how the different channels are chained together.
If we make I think we might need a way to track "in progress" payments and make sure to not shutdown until all "in progress" payments have been completed and recorded in the simulation results. |
Hm interesting. I think that the fundamental issue here is the combination of using I think that something like this could work:
This can certainly get hairy (which is why we cheated and added
I'm hesitant to introduce global state in a channel-oriented environment - usually a recipe for deadlock soup. Thanks for the report @alexzaitsev! Good to be reminded of these larger architectural improvements that we can make. |
Ok yes, this makes sense. It will require some careful changes though. I will start working through it and report back! |
@carlaKC I have been thinking about how we could handle the situation that you brought up about payments that take a long time to complete... The What if we just introduce an If the payment count is met or the time limit is hit, a This essentially keeps the functionality the exact same as it is right now for @carlaKC @alexzaitsev Let me know what you think this approach. I put up a draft PR (#230) but I still need to test and clean it up. |
Good point.
I'm hesitant to add a second trigger, because our shutdown conditions are quite complicated as-is. If we're going to do a big refactor here, I think it's worth trying to do things "canonically" for rust channels.
My statement here is wrong, I think. We need to From a quick scan of the codebase, this means that we need to listen for shutdown on
I believe works in the case of a payment taking a long time, because |
I think the fact that we are listening for shutdown in Is that what you had in mind? |
Isn't the problem that the Rather than quit immediately, |
They are both part of the problem. Even if
Yes, but what if the payment that was sent actually was successful (the nodes balances change). Then we would have a results report that shows a failed payment but the underlying nodes would have seen a successful payment. There are several different cases that all need to be handled:
Case 1 the producers will stop after the correct number of payments, causing all of the channel receivers to drain and quit. We do still need a way to stop the results logger thread though. Case 2 needs a way to shutdown the producers, but not the consumers Case 3 is the simplest. Just exit everything without a guarantee that the last payment will be correct. |
Yes ofc, thanks for the walkthrough of options!
Hm yup, that's tricky. What if we add a sane (+ configurable) time limit to We could probably account for this in our result logging, saying - hey we tracked 4 payments and then the 5th one took too long so we don't know, update this wait parameter if you need to know this.
Putting that all together (mostly for my own good to understand if it works)
I think this covers all of our cases? |
Yeah I think that would work! Only question is where is the best place to put that timer. The |
No definitely not, but we can make this a select! to avoid that? One (related) thing that I haven't thought about is how those |
Ok new PR! I closed the old PR #230 and opened PR #235. This one does the following:
The new shutdown flow looks like this:
|
Describe the bug
count
in defined activity produces one event less than expected. For example, this activity transfers 75 satoshi instead of expected 100:To Reproduce
Steps to reproduce the behavior:
grace
(receiver):"balance": "2625"
.sim-cli
(with more logs inproduce_events
, no functionality changed):"balance": "2700",
.As we see the balance is 2700 but the expected balance is 2725.
There are 3 events (4 expected).
Config
I inspected
produce_events
function but didn't see anything unusual.The text was updated successfully, but these errors were encountered: