Skip to content

Working with data on Amazon Web Services

Bill Mills edited this page Dec 22, 2015 · 2 revisions

AutoQC's docker image (see the readme) makes it really easy to set up and run AutoQC on AWS's EC2 cloud. One slightly fraught process, however, is managing the large amounts of data that we'll want in full analyses for the final data product. The cheapest and fastest way to solve this, is by using S3, Amazon's storage service. Amazon has plenty of tutorials on getting your files onto S3; in this tutorial, we'll assume you've got your data in a bucket on S3, and show you how to copy it over to EC2 and get it mounted inside your Docker image.

1. Copying from S3 to EC2

At your EC2 prompt, do

aws configure

You'll be asked for the Access Key ID and Secret Access Key you got when setting up your security identity for your AWS account. There will be some other questions as well; it's not necessary to enter anything for those. Next, if your S3 bucket is called my-bucket, and you want to copy its contents to a new directory called my-local-data do:

mkdir my-local-data
aws s3 sync s3://my-bucket my-local-data

And that's it! The contents of your S3 bucket will be copied over to the directory you specified. Note that S3->EC2 transfers are free, and very fast. Also note that if you put new files into your local EC2 directory (like, for example, the output of your analysis) and sync again, it'll appear in your S3 bucket, where you can download it at a later time.

2. Exposing data to the Docker container

Now that we've got our data from S3, we want to make it available inside the Docker container. To do this, we mount our data directory in EC2 onto a directory inside our image. Launch your Docker container with the following command, replacing the paths as appropriate:

sudo docker run -v /path/to/your/data/on/EC2:/path/to/your/data/in/docker/container -i -t iquod/autoqc /bin/bash

Now the contents of /path/to/your/data/on/EC2 will be visible in /path/to/your/data/in/docker/container inside your container; also note that things saved to /path/to/your/data/in/docker/container in your container will be available in /path/to/your/data/on/EC2 after you close your container - this is probably the easiest way to get results out of your container and back into the 'real world'.