-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add containers for Toil appliance (resolves #159) #160
Conversation
TODOs:
|
Error message thrown when launching container with mesos-slave as the entry point: 14:11:09 toil-appliance $ docker run -it d67def2f0ef5 --master=172.17.0.2:5050 --work_dir=/tmp/ |
@cket looks like this is related to MESOS-3498. I will look more. |
@cket appears a fix may be to |
@fnothaft that fix seems to work, I included it in the PR I just filed |
Yahtzee! I'll merge that in a sec. |
The autoscaling code will be responsible for provisioning worker nodes but we need a separate leader provisioning script to start the toil leader on EC2. Since Toil will run on the leader node this provisioning can't be done as part of the workflow. Instead @hannes-ucsc and I propose that this script should be included as part of Toil with the AWS extra and will spin up an instance, propagate the AWS credentials, and start the toil-leader container. If the user doesn't wish to pip install Toil they can alternatively use the script from within the Toil appliance to launch the leader. |
+1 to that approach |
@hannes-ucsc and I also discussed using IAM roles for the worker nodes and how to insure that the block mapping is done properly for instances with ephemeral volumes - unfortunately the ami we are using doesn't mount any volumes for us, and will only attach a maximum of 1 volume if present. Instances with >1 block devices are left up to us to attach and raid. Block mapping will be handled by the leader prior to launching the instances and I wrote a script to be included in the user data that will discover the devices, raid, and mount them appropriately. |
Another thing that we tentatively settled on is to using CoreOS. It provides VM images with Docker for AWS, Azure and Google Compute. We call the program that is used to launch the leader the "bootstrapper". I propose that we'll have distinct bootstrappers for each cloud, e.g. @cket, with this design we can use IAM roles for the leader, too. No need to copy credentials. CGCloud has code for most of this already, so the strategy should be to move sharable functionality from cgcloud-core/src/…/box.py to cgcloud-lib/src/…/ec2.py and reuse that in Toil. This approach was useful for me with the initial provisioner implementation. cgcloud-lib has no exotic dependencies so there are is no detriment to Toil depending on it. |
+1 from me. There's a bunch of CRs missing at EOF. This is on the critical path towards the next Toil release. @cket, I propose that you hijack this PR (close this one, continue the branch with your own changes and open a new PR with the result) instead of asking @fnothaft to merge your PR against his. Make sure you don't squash commits from different authors so that authorship is retained in the history. Would that be ok with you, @fnothaft? Any other process that gets this merged tomorrow, Wed would be fine with me. |
OK by me.
Squashing is OK too. |
continued in #176 |
First pass at putting together Toil appliance containers for DataBiosphere/toil#1088 slash #159. CC @cket. Still needs some TLC:
make test
does nothing)