-
Notifications
You must be signed in to change notification settings - Fork 16
Working with data on Amazon Web Services
AutoQC's docker image (see the readme) makes it really easy to set up and run AutoQC on AWS's EC2 cloud. One slightly fraught process, however, is managing the large amounts of data that we'll want in full analyses for the final data product. The cheapest and fastest way to solve this, is by using S3, Amazon's storage service. Amazon has plenty of tutorials on getting your files onto S3; in this tutorial, we'll assume you've got your data in a bucket on S3, and show you how to copy it over to EC2 and get it mounted inside your Docker image.
At your EC2 prompt, do
aws configure
You'll be asked for the Access Key ID and Secret Access Key you got when setting up your security identity for your AWS account. There will be some other questions as well; it's not necessary to enter anything for those. Next, if your S3 bucket is called my-bucket
, and you want to copy its contents to a new directory called my-local-data
do:
mkdir my-local-data
aws s3 sync s3://my-bucket my-local-data
And that's it! The contents of your S3 bucket will be copied over to the directory you specified. Note that S3->EC2 transfers are free, and very fast. Also note that if you put new files into your local EC2 directory (like, for example, the output of your analysis) and sync again, it'll appear in your S3 bucket, where you can download it at a later time.
Now that we've got our data from S3, we want to make it available inside the Docker container. To do this, we mount our data directory in EC2 onto a directory inside our image. Launch your Docker container with the following command, replacing the paths as appropriate:
sudo docker run -v /path/to/your/data/on/EC2:/path/to/your/data/in/docker/container -i -t iquod/autoqc /bin/bash
Now the contents of /path/to/your/data/on/EC2
will be visible in /path/to/your/data/in/docker/container
inside your container; also note that things saved to /path/to/your/data/in/docker/container
in your container will be available in /path/to/your/data/on/EC2
after you close your container - this is probably the easiest way to get results out of your container and back into the 'real world'.