Add containers for Toil appliance (resolves #159) #160

fnothaft · 2016-08-02T23:20:46Z

First pass at putting together Toil appliance containers for DataBiosphere/toil#1088 slash #159. CC @cket. Still needs some TLC:

Need to come up with a set of tests that we can run (make test does nothing)
I need to test building this
We probably want this to point at unstable Toil releases for now. Right now, I'm pointing at 3.3.0.

fnothaft · 2016-08-03T21:35:40Z

TODOs:

@cket to PR with small fixes against this branch
@fnothaft to circle with @briandoconnor about single vs. multiple docker images
@cket to test on cluster
@cket to post error message (read only file system)
@fnothaft to look at read only file system error
@cket to write unit test --> launch leader/worker containers, run small toil script
@fnothaft determine if/when we should implement user data for Mesos master discovery
@cket to write provisioner script to launch toil leader instance
@cket to insure mesos indicates whether the node is pre-emptible or not

cket · 2016-08-03T21:36:17Z

Error message thrown when launching container with mesos-slave as the entry point:

14:11:09 toil-appliance $ docker run -it d67def2f0ef5 --master=172.17.0.2:5050 --work_dir=/tmp/
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0803 21:11:15.851196 1 main.cpp:243] Build: 2016-07-27 20:23:20 by ubuntu
I0803 21:11:15.852200 1 main.cpp:244] Version: 1.0.0
I0803 21:11:15.852464 1 main.cpp:247] Git tag: 1.0.0
I0803 21:11:15.852957 1 main.cpp:251] Git SHA: c9b70582e9fccab8f6863b0bd3a812b5969a8c24
I0803 21:11:15.858079 1 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
Failed to create a containerizer: Could not create MesosContainerizer: Failed to create launcher: Failed to create Linux launcher: Failed to create root cgroup /sys/fs/cgroup/freezer/mesos: Failed to create directory '/sys/fs/cgroup/freezer/mesos': Read-only file system

fnothaft · 2016-08-03T21:37:38Z

@cket looks like this is related to MESOS-3498. I will look more.

fnothaft · 2016-08-03T21:56:26Z

@cket appears a fix may be to export MESOS_LAUNCHER=posix as per MESOS-3793. We should add this to the Dockerfile.

cket · 2016-08-03T22:25:24Z

@fnothaft that fix seems to work, I included it in the PR I just filed

fnothaft · 2016-08-03T22:26:04Z

@fnothaft that fix seems to work, I included it in the PR I just filed

Yahtzee! I'll merge that in a sec.

cket · 2016-08-24T22:05:33Z

The autoscaling code will be responsible for provisioning worker nodes but we need a separate leader provisioning script to start the toil leader on EC2. Since Toil will run on the leader node this provisioning can't be done as part of the workflow. Instead @hannes-ucsc and I propose that this script should be included as part of Toil with the AWS extra and will spin up an instance, propagate the AWS credentials, and start the toil-leader container.

If the user doesn't wish to pip install Toil they can alternatively use the script from within the Toil appliance to launch the leader.

fnothaft · 2016-08-24T22:13:13Z

+1 to that approach

cket · 2016-08-25T17:52:14Z

@hannes-ucsc and I also discussed using IAM roles for the worker nodes and how to insure that the block mapping is done properly for instances with ephemeral volumes - unfortunately the ami we are using doesn't mount any volumes for us, and will only attach a maximum of 1 volume if present. Instances with >1 block devices are left up to us to attach and raid.

Block mapping will be handled by the leader prior to launching the instances and I wrote a script to be included in the user data that will discover the devices, raid, and mount them appropriately.

hannes-ucsc · 2016-08-25T19:44:03Z

Another thing that we tentatively settled on is to using CoreOS. It provides VM images with Docker for AWS, Azure and Google Compute.

We call the program that is used to launch the leader the "bootstrapper". I propose that we'll have distinct bootstrappers for each cloud, e.g. toil-azure, toil-google and toil-aws, each one a separate entry point. As CJ mentioned, the bootstrappers should be part of Toil. Since launching a leader VM is very similar to launching a worker VM, the bootstrapping code should live in the provisioner. We'll add a createLeader method to the provisioner API.

@cket, with this design we can use IAM roles for the leader, too. No need to copy credentials. CGCloud has code for most of this already, so the strategy should be to move sharable functionality from cgcloud-core/src/…/box.py to cgcloud-lib/src/…/ec2.py and reuse that in Toil. This approach was useful for me with the initial provisioner implementation. cgcloud-lib has no exotic dependencies so there are is no detriment to Toil depending on it.

hannes-ucsc · 2016-09-07T01:05:30Z

+1 from me. There's a bunch of CRs missing at EOF. This is on the critical path towards the next Toil release. @cket, I propose that you hijack this PR (close this one, continue the branch with your own changes and open a new PR with the result) instead of asking @fnothaft to merge your PR against his. Make sure you don't squash commits from different authors so that authorship is retained in the history. Would that be ok with you, @fnothaft? Any other process that gets this merged tomorrow, Wed would be fine with me.

fnothaft · 2016-09-07T01:13:07Z

OK by me.

Make sure you don't squash commits from different authors so that authorship is retained in the history.

Squashing is OK too.

cket · 2016-09-07T14:54:37Z

continued in #176

Add containers for Toil appliance (resolves BD2KGenomics#159)

214cfd2

fnothaft assigned fnothaft and cket Aug 2, 2016

hannes-ucsc added the in progress label Aug 2, 2016

fnothaft added the needs work label Aug 3, 2016

This was referenced Aug 3, 2016

Explore valid networking modes for toil-appliance containers #162

Open

Create Toil appliance DataBiosphere/toil#1088

Closed

Fix minor build & setup errors

86128dd

hannes-ucsc unassigned fnothaft Aug 30, 2016

cket closed this Sep 7, 2016

cket removed the in progress label Sep 7, 2016

hannes-ucsc mentioned this pull request Sep 7, 2016

Toil build should create and optionally push Docker appliance DataBiosphere/toil#1142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add containers for Toil appliance (resolves #159) #160

Add containers for Toil appliance (resolves #159) #160

fnothaft commented Aug 2, 2016 •

edited by hannes-ucsc

Loading

fnothaft commented Aug 3, 2016 •

edited by cket

Loading

cket commented Aug 3, 2016

fnothaft commented Aug 3, 2016

fnothaft commented Aug 3, 2016

cket commented Aug 3, 2016

fnothaft commented Aug 3, 2016

cket commented Aug 24, 2016

fnothaft commented Aug 24, 2016

cket commented Aug 25, 2016

hannes-ucsc commented Aug 25, 2016

hannes-ucsc commented Sep 7, 2016

fnothaft commented Sep 7, 2016

cket commented Sep 7, 2016

Add containers for Toil appliance (resolves #159) #160

Add containers for Toil appliance (resolves #159) #160

Conversation

fnothaft commented Aug 2, 2016 • edited by hannes-ucsc Loading

fnothaft commented Aug 3, 2016 • edited by cket Loading

cket commented Aug 3, 2016

fnothaft commented Aug 3, 2016

fnothaft commented Aug 3, 2016

cket commented Aug 3, 2016

fnothaft commented Aug 3, 2016

cket commented Aug 24, 2016

fnothaft commented Aug 24, 2016

cket commented Aug 25, 2016

hannes-ucsc commented Aug 25, 2016

hannes-ucsc commented Sep 7, 2016

fnothaft commented Sep 7, 2016

cket commented Sep 7, 2016

fnothaft commented Aug 2, 2016 •

edited by hannes-ucsc

Loading

fnothaft commented Aug 3, 2016 •

edited by cket

Loading