Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop node simulation in replay #48

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

goran-ethernal
Copy link
Collaborator

@goran-ethernal goran-ethernal commented Apr 5, 2022

Replay messages feature now saves actions that occurred in fuzz run to a separate .flow file so that they can be simulated on replay as well. For now, only Drop Node actions are saved to .flow file and simulated in replay.

When running fuzz-run command, messages are saved to messages.flow file, but other meta data (node names, actions, last sequences) are saved to metaData.flow file inside the SavedState folder.

metaData.flow file contains data about:

  • node names - name of nodes that were created on cluster.
  • last sequences in which node were when cluster was stopped - this is needed for replay to know exactly when to stop node from executing in replay (this way we now, that the node reached its end and it needs to be stopped).
  • actions - data about drop node actions that occurred in fuzz run, as well as their revert actions.

Look of messages.flow file remains the same. metaData.flow file, looks as follows:

["NODE_0","NODE_1","NODE_2","NODE_3","NODE_4"]
{"actionType":"LastSequence","data":"NODE_0","sequence":10,"round":0}
{"actionType":"LastSequence","data":"NODE_1","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_2","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_3","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_4","sequence":11,"round":0}
{"actionType":"DropNode","data":"NODE_3","sequence":5,"round":0}
{"actionType":"RevertDropNode","data":"NODE_3","sequence":6,"round":0}

MetaData struct is used to store all necessary data to metaData.flow file. On loading the given file, based on actionType property, replay knows in which map to store a given action.

replay-messages command is now changed to receive folder path where messages.flow and to metaData.flow files are stored, e.g:
go run ./e2e/fuzz/cmd/main.go replay-messages -filesDirectory=../SavedData

NOTE: both files are needed for replay to execute.

Simulating node actions and stopping nodes when they are done with execution is now handled in replay_node_execution.go.
Once replay is started, when node reads its next message from queue, it can now recognize if it needs to shut down properly or if it needs to simulate a drop.
We know that a node needs to be stopped when it has nothing to read from the queue and it reached the sequence and round that is defined in its LastSequence in metaData.flow file.
We know that a node needs to be dropped if it has nothing to read from the queue for given sequence and round, but given sequence and round are not the same as the one defined in its LastSequence in metaData.flow file.

replayNodeExecutionHandler starts a go routine before starting the cluster that listens if a node needs to be dropped, or if its done with execution, or if it needs to be restarted when a sequence is reached in cluster that corresponds to sequence in which node drop was reverted as defined in RevertNode action in metaData.flow file.

Once all nodes are done with execution, replayNodeExecutionHandler will stop the cluster, and command will exit.

@goran-ethernal goran-ethernal marked this pull request as ready for review April 5, 2022 09:04
Copy link
Contributor

@ferranbt ferranbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of changes are required to support node drop. Do you guys have any insights on whether we can optimize any part?

e2e/fuzz/replay/replay_message_command.go Show resolved Hide resolved
e2e/actions.go Outdated Show resolved Hide resolved
@goran-ethernal
Copy link
Collaborator Author

To answer your question, about optimizing. 80% of these changes are mostly just reorganizing code to different files and structures since it was all clustered up in the ReplayMessageNotifier, so we tried to separate logic for reading, writing and node execution to different files.

consensus.go Outdated Show resolved Hide resolved
e2e/actions.go Outdated Show resolved Hide resolved
e2e/fuzz/README.md Outdated Show resolved Hide resolved
e2e/fuzz/README.md Outdated Show resolved Hide resolved
e2e/fuzz/README.md Outdated Show resolved Hide resolved
e2e/fuzz/replay/replay_message.go Outdated Show resolved Hide resolved
e2e/fuzz/replay/replay_message_reader.go Show resolved Hide resolved
e2e/actions.go Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Oct 26, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants