Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bash tools missing from 3.3.0 release #119

Open
thomasritz opened this issue Apr 9, 2021 · 28 comments
Open

Bash tools missing from 3.3.0 release #119

thomasritz opened this issue Apr 9, 2021 · 28 comments
Assignees
Labels
enhancement New feature or request

Comments

@thomasritz
Copy link

If one ECS Fargate container of a task has a health check configured, then all containers must have a health check.
With 3.2.0, I could use true as dummy health check.
With 3.3.0, this health check fails due to true no longer being available in the image.

It would be nice if there was an explicit health check command or so for the daemon.

@rbpltr
Copy link

rbpltr commented Apr 13, 2021

Same here. Just had to spend over an hour picking apart a CloudFormation stack after an ECS service deployment failed and then failed to rollback because we were using the latest version of this container.

@awssandra
Copy link

awssandra commented Apr 13, 2021

Hi all,

This is likely due to a bug that was released in 3.3.0 - please see: #120

We have release v3.3.1 as of last night. Please let us know if this has alleviated your issues.
https://github.com/aws/aws-xray-daemon/releases

Thanks,
Sandra

@thomasritz
Copy link
Author

Hi,

I think #120 is unrelated.
My problem was simply that due to changes in how the images was built some dependencies were removed from 3.2.0 to 3.3.0.

The image of 3.2.0 was built with a lot of tools included such as /usr/bin/true (which I used as dummy health check command).
image

Newer releases are built from scratch and therefore almost empty.
image

I'm a big fan of small images, don't get me wrong :-)
Thanks for having a look into this.

I solved this issue for now by locking my taskdef.json to v3.2.0: "image": "amazon/aws-xray-daemon:3.2.0"

@awssandra
Copy link

awssandra commented Apr 14, 2021

Gotcha, thanks for the explanation.

Apologies for the trouble!
Yes we cut down the image size from 150 MB -> 3.5 MB, by moving from an amazonlinux base image to scratch, which consequently, removed a lot of tools available.
We're are reviewing currently, will get back to you as soon as we can.

Related issue: #126

@CarlosDomingues
Copy link

CarlosDomingues commented Apr 16, 2021

I was using the latest tag and defining the entrypoint explicitly to use local mode (/usr/bin/xray, you can find this configuration in the official AWS docs) and those changes also broke X-Ray daemon in prod for me.

Suggestion: maintain the old images and create a slim tag for folks who want smaller images.

@willarmiros willarmiros changed the title Health Check Bash tools missing from 3.3.0 release Apr 16, 2021
@awssandra
Copy link

awssandra commented Apr 16, 2021

Hey CarlosDomingues,

Apologies for the issues.

We will be republishing the Dockerhub images using the base amazonlinux, and maintain this as the standard solution for Dockerhub users, as it was established as the standard from day one.

We will be updating guidance on this in our public documentation, as well as supplying build instructions for those who wish to incorporate the extra tools into their ECR images for healthchecks or other operational items.

@awssandra
Copy link

#128

@awssandra
Copy link

New release 3.3.2 has been completed. Will be updating the docs accordingly.

@ollypom
Copy link

ollypom commented Apr 26, 2021

Hey @awssandra , shipping the same image tag but as 2 different artefacts to 2 different registries seems a bit of anti-pattern to me. I would expect aws-xray-daemon:3.3.2 to be the same image regardless of the registry I pull it from.

I understand the reason for having 2 Dockerfiles, the amount of embedded tools the end user requires varies, but I would expect the tag to be consistent regardless of its location. I think the problem you are going to have is that folks hitting the Docker Hub rate limitting issue are just going to swap out the registries in their Task definitions and expect it to work. But as we've seen in this thread, their existing health checks will break when moving from Hub to ECR and they won't know why.

If you look at some of the Docker official images the naming convention is x.y.z-baseimage when multiple base images are used:

  • Python has 3.9.4-buster and 3.9.4-alpine
  • Nginx has 1.20 and 1.20-alpine
  • Haproxy has 2.30.10 and 2.30.10-alpine

So my suggestion would be to keep DockerHub and ECR Public in sync, with aws-xray-daemon:3.3.2 being the same image on both registries. But at the same time create a 3.3.2-amazonlinux or 3.3.2-slim (or whatever naming convention suits) stored in both registries with more tools and the ability to use a healthcheck.

For example:

aws-xray-daemon:3.3.2 from Dockerfile
aws-xray-daemon:3.3.2-amazonlinux from Dockerfile.amazonlinux

@fosrias
Copy link

fosrias commented Apr 26, 2021

aws-xray-daemon:3.3.2 to be the same image regardless of the registry I pull it from

Already ran into this. Was confusing as I expected to be fixed, but was getting the same problems when using ECR, IIUC.

@srprash
Copy link
Collaborator

srprash commented Apr 30, 2021

Hi @ollypom @fosrias @thomasritz @CarlosDomingues
We are looking into using a base image which is slim enough but also has the tools that were being used from amazonlinux.
Can you provide a list of essential tools that you'd want the x-ray daemon images to have?

Thanks!

@fosrias
Copy link

fosrias commented Apr 30, 2021

My needs are simple. Just need bash. sh is sufficient.

@thomasritz
Copy link
Author

thomasritz commented Apr 30, 2021

Due to the lack of a health check for ECS, I just used true as a dummy health check like so in taskdef.json:

{
  "name": "xray-daemon",
  "healthCheck": {
    "command": ["true"]
  },
  "image": "amazon/aws-xray-daemon:3.3.2",
  ...
}

I'd prefer to have a real health check command if that's possible.

Thanks for such a great tool. And thanks for your support.

@cynicaljoy
Copy link

cynicaljoy commented May 12, 2021

I fully appreciate the teams effort to cut down the image size, but I too would appreciate input as to a real health check.

@srprash
Copy link
Collaborator

srprash commented May 24, 2021

Hi @cynicaljoy @thomasritz
A real health check is on our backlog but unfortunately we may not be able to prioritize it for some time. However, we are working towards a base image with bash tool so that a workaround for health checks can be possible. Thanks for your patience.

@willarmiros
Copy link
Contributor

To follow up on @srprash's comment, we are specifically considering migrating both the public ECR and DockerHub image to use the debian:bullseye-slim. It has all of the basic bash tools you'd expect but at half the size of amazonlinux. If anyone has any comments or concerns related to this, it would be great to hear now. Thanks!

@cageyv
Copy link

cageyv commented Jul 16, 2021

@thomasritz
here in #9 is my small experiment with sometihg closer to real health check

["CMD", "/xray", "--version", "||", "exit 1"]

@mkielar
Copy link

mkielar commented Sep 21, 2021

Just to bump this up. We have a process of hardening original images before we run them in our infrastructure. The public.ecr.aws/xray/aws-xray-daemon:3.3.3 contains three vulnerabilities which could be fixed by applying this to our dockerfile:

RUN apk update                             \
    curl    $(: 'Fixes ALAS2-2021-1693')   \
    rpm     $(: 'Fixes ALAS2-2021-1689')   \
    openssl $(: 'Fixes ALAS2-2021-1687')

but it fails with:

executor failed running [/bin/sh -c apk update curl $(: 'Fixes ALAS2-2021-1693') rpm $(: 'Fixes ALAS2-2021-1689') openssl $(: 'Fixes ALAS2-2021-1687')]: unable to find groups for spec root: invalid argument

Also, the fact that xray/aws-xray-daemon:3.3.3 is alpine-based in public.ecr.aws, but amazonlinux-based in DockerHub... that's a huge antipattern.

@project0
Copy link

project0 commented Dec 9, 2021

This is really inconsistent and made me crazy today.
Can you at least provide both image version in both repos with a appropriate tag so people can choose which build they want to use?
e.g. 3.3.3-<scratch|alpine|amzn2> ?

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.

@stale stale bot added the stale label Jan 8, 2022
@polothy
Copy link

polothy commented Jan 10, 2022

activity

@stale stale bot removed the stale label Jan 10, 2022
@NathanielRN NathanielRN added the enhancement New feature or request label Jan 12, 2022
@rbpltr
Copy link

rbpltr commented Apr 8, 2022

I don't understand why there hasn't been any movement on this issue. It's been a year.

It's crazy for the same tag on different repos to point to fundamentally different images. Nothing has to be re-invented here, the precedent is already there - you add a suffix to the tag according to what is it based on (exactly as @project0 suggested) and push all of the tags to all of the repos, e.g.

  • 3.3.3 (based on Amazon Linux 2)
  • 3.3.3-slim (from scratch)
  • 3.3.3-debian, etc.

@willarmiros
Copy link
Contributor

Sorry for the delays on action for this issue - we have been off track on improvements for the daemon image and this has unfortunately slipped through the cracks.

The only concern we have heard regarding the proposal @rbpltr mentioned is that pointing 3.3.3 to a daemon with an amazonlinux base image would greatly inflate the size of the image for ECR customers (where it currently points to a scratch image).

This change shouldn't be breaking functionality wise, just additive, but there could be size limits on sidecars that could lead to poor customer experience. Given this is mitigable by migrating to the slim image, we don't think it should block this change, but want to confirm we don't cause additional negative experiences.

@JohnYoungers
Copy link

for the scratch image (public.ecr.aws/xray/aws-xray-daemon:3.3.3): is there something that can be stubbed in that will work for the health check?

@rbpltr
Copy link

rbpltr commented Jul 20, 2022

for the scratch image (public.ecr.aws/xray/aws-xray-daemon:3.3.3): is there something that can be stubbed in that will work for the health check?

This works but I'm not sure how robust it is.

@JohnYoungers
Copy link

for the scratch image (public.ecr.aws/xray/aws-xray-daemon:3.3.3): is there something that can be stubbed in that will work for the health check?

This works but I'm not sure how robust it is.

Thanks, that worked: when I originally tried it I had the entire line after CMD as a single segment opposed to individual items in the array

@polothy
Copy link

polothy commented Jul 21, 2022

(sorry if this was already suggested)

Doing something like ["CMD", "/xray", "--health-check"] would be pretty slick. Would continue to keep the image small and provide a consistent way for the CLI to health check itself.

@ama-joel-kravets
Copy link

I tried a different approach which gives me some idea that the service is currently running.

> xray
2023-11-27T19:17:32Z [Info] Initializing AWS X-Ray daemon 3.3.9
2023-11-27T19:17:32Z [Error] listen udp 127.0.0.1:2000: bind: address already in use

You can then grep this to verify that the service is already listening on that port:

xray | grep 'bind: address already in use' || exit 0

Usage in AWS CDK:

const xray = taskDefinition.addContainer("XRay", {
      image: cdk.aws_ecs.ContainerImage.fromRegistry("amazon/aws-xray-daemon"),
      cpu: 32,
      memoryReservationMiB: 256,
      essential: true,
      portMappings: [
        {
          containerPort: 2000,
          protocol: cdk.aws_ecs.Protocol.UDP,
        },
      ],
      // TODO: Replace with a better health check in the future.
      // https://github.com/aws/aws-xray-daemon/issues/119
      healthCheck: {
        command: [
          "CMD-SHELL",
          "xray | grep 'bind: address already in use' || exit 0",
        ],
      },
    });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests