Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS: able to mount but problems to resolve #73

Open
ghost opened this issue Sep 15, 2016 · 10 comments
Open

EFS: able to mount but problems to resolve #73

ghost opened this issue Sep 15, 2016 · 10 comments

Comments

@ghost
Copy link

ghost commented Sep 15, 2016

This is on the latest Amazon Linux with docker 1.11.2
docker-volume-netshare :: Version: '0.20' - Built: '2016-08-28T20:15:48Z'

  • Running a container with an efs volume does mount it on the host but not in the container. To make it work I have to edit /etc/init.d/docker to prevent it to mount in a separate namespace (i.e. removing $unshare" -m). Is there a way to make it work without editing /etc/init.d/docker and without restarting the service?
  • Looking at the logs everything works fine for the lifetime of the container but when the container is stopped and the volume unmounted I get these entries:
INFO[20699] Unmounting volume 10.x.x.x: from /var/lib/docker-volumes/netshare/efs/fs-xxxxxxxxx
DEBU[0051] Attempting to resolve: 10.x.x.x
ERRO[20713] Error during resolve: Response was empty
INFO[0051] Removing un-managed volume

Why is it attempting to resolve the IP when unmounting?

  • Mounting the efs volume to a plain Ubuntu container works fine
    example: docker run -it --volume-driver=efs -v fs-xxxxxxxxx:/mount ubuntu /bin/bash

However mounting the same volume on the same host with a different container returns an error:

docker run -it --volume-driver=efs -v fs-xxxxxxxxx:/test 8cde90f4491e /bin/bash
docker: Error response from daemon: VolumeDriver.Mount: exit status 32.

What I am seeing in the daemon logs shows the name is not built correctly so it doesn't resolve:

DEBU[0172] Entering Get: {553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59 map[]}
DEBU[0172] Entering Create: name: 553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59, options map[]
DEBU[0172] Create volume -> name: 553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59, map[]
DEBU[0172] Host path for 553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59 is at /var/lib/docker-volumes/netshare/efs/553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59
DEBU[0172] Entering Get: {5b396482e5fa1d37cd5dc91ae7d9fd1f8de5ef5c8dd0ba37af307dbed636df0b map[]}
DEBU[0172] Entering Create: name: 5b396482e5fa1d37cd5dc91ae7d9fd1f8de5ef5c8dd0ba37af307dbed636df0b, options map[]
DEBU[0172] Create volume -> name: 5b396482e5fa1d37cd5dc91ae7d9fd1f8de5ef5c8dd0ba37af307dbed636df0b, map[]
DEBU[0172] Host path for 5b396482e5fa1d37cd5dc91ae7d9fd1f8de5ef5c8dd0ba37af307dbed636df0b is at /var/lib/docker-volumes/netshare/efs/5b396482e5fa1d37cd5dc91ae7d9fd1f8de5ef5c8dd0ba37af307dbed636df0b
DEBU[0172] Attempting to resolve: us-east-1a.553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59.efs.us-east-1.amazonaws.com
ERRO[0172] Error during resolve: Couldn't resolve name 'us-east-1a.553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59.efs.us-east-1.amazonaws.com.' : dns: bad rdata
INFO[0172] Mounting EFS volume us-east-1a.553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59.efs.us-east-1.amazonaws.com: on /var/lib/docker-volumes/netshare/efs/553a1a707b43085f72f5c0fceb4718452657fea3a94e2640b031dab8cc0aef59

What can cause the volumeID to be changed to 3a48cd86e6cfca49b1015dc61789eb2c3e41624e81a9d75d2f8d494dc0d3bb7f? And why would it happen only with some containers?

I tried starting the daemon with –noresolve and run the container with the endpoint IP instead but I am getting the same error 32 and the daemon doesn't try to mount by IP:

DEBU[0004] Host path for 10.x.x.x is at /var/lib/docker-volumes/netshare/efs/10.220.5.248
DEBU[0004] Entering Get: {6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02 map[]}
DEBU[0004] Entering Create: name: 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02, options map[]
DEBU[0004] Create volume -> name: 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02, map[]
DEBU[0004] Host path for 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02 is at /var/lib/docker-volumes/netshare/efs/6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02
DEBU[0004] Entering Get: {59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211 map[]}
DEBU[0004] Entering Create: name: 59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211, options map[]
DEBU[0004] Create volume -> name: 59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211, map[]
DEBU[0004] Host path for 59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211 is at /var/lib/docker-volumes/netshare/efs/59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211
INFO[0004] Mounting EFS volume 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02: on /var/lib/docker-volumes/netshare/efs/6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02
DEBU[0004] exec: mount -t nfs4 -o nfsvers=4.1 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02: /var/lib/docker-volumes/netshare/efs/6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02

2016/09/15 15:50:51 mount.nfs4: Failed to resolve server 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02: Name or service not known

DEBU[0004] Entering Remove: name: 6081b0b3c0502036e33e85964d06af22eb0e7d4c6186ab64fa92aa26654d7e02, options map[]
DEBU[0004] Entering Remove: name: 59e53fc47a2c633d4de273c97b1e383dd4587cabb4be1659662b89d7af1e7211, options map[]

Any clue appreciated

Eric

@ghost
Copy link
Author

ghost commented Sep 16, 2016

One more clue about issue 2. The 3a48cd86e6cfca49b1015dc61789eb2c3e41624e81a9d75d2f8d494dc0d3bb7f that replaces the VolumeID is the container bridge endpoint. I still don't know why it replaces the volumeID with it for some containers and not others

@jhovell
Copy link

jhovell commented Sep 28, 2016

+1 experiencing the same issue. @eric5102 do you mean containers or images? The image I am trying to run is https://hub.docker.com/r/library/jenkins/tags/2.7.4/ and having no luck with it after multiple attempts.

Also I cannot find the UUID in any docker network so I'm not sure what you mean by container bridge endpoint... the container never starts so I don't think there is a way to inspect it. docker network <bridge/host/none> | grep UUID is not yielding any results.

@jhovell
Copy link

jhovell commented Sep 29, 2016

Latest Amazon ECS AMI (amzn-ami-2016.03.i-amazon-ecs-optimized)
Docker version 1.11.2, build b9f10c9/1.11.2

Also @eric5102 I tried commenting out the line you mention in /etc/init.d/docker but after doing so docker running sudo service docker restart times out on both shutdown and startup. Re-adding the line allows docker to start. I am running into the same log messages you post on container creation - I cannot even get containers to start. I'm not that knowledgeable about Docker volumes or plugins, so I'm mostly flying blind here.

@jhovell
Copy link

jhovell commented Sep 29, 2016

Got a bit further. Noticed if I did not specify "--az" option I could successfully start a container. Put more details here as this now seems similar to the issue I'm experiencing

#51

@ghost
Copy link
Author

ghost commented Sep 29, 2016

@jhovell

Also I cannot find the UUID in any docker network...

do a docker inspect <containerid>

I tried commenting out the line you mention in /etc/init.d/docker...

I didn't say to comment the line but to remove just the $unshare" -m from the beginning of that line

We should be able to keep $unshare but have the container join the volume namespace. Others have encountered that problem with Docker. Oracle even suggests to remove $unshare when installing Docker on Oracle Linux.

The netshare plugin project is not very active so for the time being I decided to fallback on safer workarounds to use EFS with ECS.

@jhovell
Copy link

jhovell commented Sep 29, 2016

thanks for the help @eric5102 . I had been following the steps on this blog post

https://aws.amazon.com/blogs/compute/using-amazon-efs-to-persist-data-from-amazon-ecs-containers/

but ran into issues as I am using the Jenkins official Docker image, which does not run as root. I was hoping using a volume driver might give a better experience, but maybe it will not solve my problem anyway.

Relatedly, there's a 3-year old open Docker issue with hundreds of comments which mostly appear outdated or at least the steps didn't work for me.

moby/moby#2259

It's been difficult to track down documentation to figure out what I can do to mount an EFS volume into a Docker container that is NOT running as root (UID = 1000 in my case) from an Amazon ECS host. Any tips or strategies greatly appreciated!

@ghost
Copy link
Author

ghost commented Sep 29, 2016

Mounting the EFS volume works if the docker service is restarted. I've been doing this for months in Beanstalk deployments. This is also what they recommend for ECS but this works only if all your container instances (i.e. the host) mount the volumes at launch. You can't move a container on a host that does not already have the volume. This is too limited and why I was looking at netshare

If you don't mind hardwiring your your efs volumes to all your ecs cluster nodes then just mount the volumes on the host with your launchConfiguration. I suggest also installing fs-cache (cachefilesd) and mounting with the fcs option as it speeds things up significantly if your container has a lot of read io

@jhovell
Copy link

jhovell commented Sep 30, 2016

Thanks so much @eric5102! Needing to restart docker after mounting the volume was what I was missing. The results of not doing so we're very misleading: container starts and mounts a volume but with the wrong permissions and persisted across restarts... very mysterious as I couldn't determine where on the file system data was being stored. After unmounting the efs volume my data was revealed, hidden behind where the efs volume had been mounted. Mysterious, but probably because I just don't understand how docker interacts with mounts

@gfysaris
Copy link

gfysaris commented Dec 8, 2016

And still ECS requires docker to be restarted in order to mount an EFS volume..
Can't really understand why. It makes things so complicated

@ghost
Copy link
Author

ghost commented Dec 8, 2016

To keep it simple you need to accept the host to mount necessary volumes for the cluster. Just have the LaunchConfiguration for your AutoScalingGroup mount volumes when the ec2 instance is started and immediately restart docker. It does mean all volumes must be mounted on all cluster nodes but that kind of limitation can be lived with until more flexibility is possible. Things will change within a few months

ps: the problem is actually how Docker handles resources, not ECS, but this will change too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants