-
Notifications
You must be signed in to change notification settings - Fork 39
simln-lib/refactor: fully deterministic produce events #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
simln-lib/refactor: fully deterministic produce events #277
Conversation
Opening in draft still need to fix some issues. |
btw don't worry about fixups until this is out of draft - when review hasn't properly started it's okay to just squash em! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direction looking good here! Main comment is that I think we need to have a step where we replenish our heap by calling generate_payments
again?
- If
payment_count().is_none()
we return a single payment fromgenerate_payments
- For
RandomPaymentActivity
, this means we'll do one payment per node and then shut down?
Related to this is that we possibly don't want to queue up tons of events for when payment_count
is defined (say, we want a million payments, we'll queue up a million items which is a bit of a memory waste). This probably isn't much of a big deal, because I'd imagine this use case is primarily for smaller numbers but something to keep in mind as we address the above requirement.
Also would be good to rebase this early on to get to a more current base 🙏
The idea would be to generate all the payments at once, so the master task would dispatch the events.
Yes, in this case, only one payment is generated
Yes, right now it is working in this mode 🤔
Yes, you are right, maybe it would be better to create some batches of payments. I am going to try to come up with an alternative to reduce the memory waste. 🤔 |
b06a289
to
1b3a21f
Compare
Hi @carlaKC , I've developed a new approach for the event generation system. The core idea is to centralize the random number generation to ensure deterministic outcomes for our simulations. Here's a breakdown of the design:
This design ensures that the wait times and final destinations are entirely deterministic across simulation runs. However, there is a new challenge with the non-deterministic order of thread execution. The Determinism ChallengeWhile the values generated (wait times, destinations) are fixed if the random number generator is seeded, the order in which the executor threads request these values is not guaranteed. For example, if we have ex1 and ex2 executors:
This means that even though the sequence of random numbers from the central manager is the same, which executor consumes which number from that sequence is left to the operating system's scheduler, leading to variations in the overall simulation flow. Proposed Solution for Execution OrderTo achieve full simulation determinism, including the order of execution, I'm considering adding a tiny, randomized initial sleep time before each executor thread begins its main loop. While seemingly counter-intuitive, this jitter can effectively "break ties" in thread scheduling in a controlled, reproducible way when combined with a seeded random number generator. This would allow us to deterministically influence which thread acquires the next available random number from the central manager. WDYT? |
Deleted previous comment - it had some misunderstandings. Why can't we keep the current approach of generating a queue of events and then replenish the queue when we run out of events? By generating all of our payment data in one place, we don't need to worry about thread execution order. I think that this can be as simple as pushing a new event to the queue every time we pop one? We might need to track some state for payment count (because we'll need to remember how many we've had), but for random activity it should be reasonable. |
Rough sketch of what I was picturing: Queue up initial set of events:
Read from heap:
Instinct about this is:
|
Description
The goal of this PR is to achieve fully deterministic runs to get reproducible simulations
Changes
nodes: HashMap<PublicKey, Arc<Mutex<dyn LightningNode>>>
UpdateHashMap
for aBTreeMap
. A HashMap does not maintain an order, which has an impact when the simulation is running, making the results unpredictable. Using aBTreeMap
, the order of the nodes is always the same.dispatch_producers
acts as a master task, generating all the payments of the nodes, getting the random destination, and only then spawning a threat for producing the events (produce_events
)Addresses #243