Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing to restart docker nodes during simulation #38

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lhoff94
Copy link

@lhoff94 lhoff94 commented Aug 30, 2022

This PR adds a command to the DockerNode allows restarting of a container during simulation e.g. by a workflow.

Using this function in a workflow enables the following use cases:

  • Testing the behavior of other components if a container restarts
  • Testing the restarting behavior of the component that restarts if other nodes are up and running
  • Changing the image of the container during simulation e.g. to a newer version or a build with a different configuration
  • Changing the parameters of the container e.g. connected volumes, exposed ports, environment variables during simulation

I have to admit lines 249-253 are a bit hacky. They are a workaround for a strange behavior where I couldn't get to the root cause.
If a DockerNode is stopped by calling .stop_docker_container() the veth pair that belongs to it is deleted as well (as far as I can tell by the operating system since there is no call to interface.remove_veth_pair()). The strange thing is that it only happens if the DockerNode is the last one defined in the Scenario. If it is not the last one, the veth_pair survives but is still connected to the namespace of the old container and therefore needs to be deleted and recreated as well.
I tried to find the cause for this behavior but couldn't identify if it's something the kernel or the docker daemon does. However, the solution that is implemented in this PR worked for me for a couple of months during which is used this feature extensively.

@lpirl
Copy link

lpirl commented Sep 1, 2022

LGTM apart from some nit picks.

Sure we cannot come up with a cleaner solution for 249-253? @arneboockmeyer, can the deletion of the veth pair be related to some kind of automatic clean up?

@lhoff94
Copy link
Author

lhoff94 commented Sep 1, 2022

I hacked up an example scenario based on the existing "basic_example.py" to replicate the behaviour that I described here:
https://gist.github.com/lhoff94/ceb72b2b16e2f82359d487b8b204a5cb
The "restart_docker_container" method is embedded into the scenario so that it works with the latest Marvis release (without the commit from this PR).
I also added the mavis log with the exception as a second file to the gist. As you can see, the first container restarts as expected but the second doesn't because the veth can't be deleted because it's not there anymore.

@lpirl
Copy link

lpirl commented Jan 4, 2023

@arneboockmeyer will have a look :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants