Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS task continually restarting, freezing CloudFormation deployments #201

Open
nate-anderson opened this issue Jun 21, 2023 · 0 comments
Open

Comments

@nate-anderson
Copy link

nate-anderson commented Jun 21, 2023

Hi all,

I am using the CDK to deploy an EC2-backed ECS cluster with two of my application services and an XRay sidecar service. When I deploy, the XRay sidecar tasks continually restart (it seems every 30 minutes or so) and never reach a healthy state, keeping the CloudFormation stack stuck in UPDATE_IN_PROGRESS or UPDATE_ROLLBACK_IN_PROGRESS if I attempt to cancel the change. I can find no documentation stating if the container is performing its own healthchecks and restarting, and unfortunately, the ECS task has no logs to explain the exits.

Here is my CDK code provisioning the cluster, the service discovery DNS namespace, the server task, and the service.

const vpc = new Vpc(this, `test-vpc`);

const cluster = new ECS.Cluster(this, `test-cluster`, {
    clusterName: clusterId,
    vpc,
    capacity: {
      instanceType: new EC2.InstanceType(props.clusterInstanceType),
    },
    containerInsights: true,
});

const namespace = 'testing';
const dnsNamespace = new ServiceDiscovery.PrivateDnsNamespace(this, `dns-namespace`, {
    vpc,
    name: namespace,
});

const serverTaskDefinition = new ECS.TaskDefinition(this, serverTaskId, {
    compatibility: ECS.Compatibility.EC2,
});

serverTaskDefinition.addContainer('ServerContainer', {
    image: ContainerImage.fromEcrRepository(serverRepo, latestTag),
    containerName: 'server',
    memoryReservationMiB: 1024,
    portMappings: [
        {
            containerPort: Port.HTTP,
        }
    ],
    healthCheck: {
        command: [ `curl localhost/healthcheck` ],
        interval: cdk.Duration.seconds(300),  
    },
});

const xraySidecarTaskDefinition = new ECS.TaskDefinition(this, `xray-sidecar`, {
    compatibility: ECS.Compatibility.EC2,
});

xraySidecarTaskDefinition.addContainer('xray-sidecar-task', {
    containerName: XRayConfig.containerName,
    image: ContainerImage.fromRegistry("amazon/aws-xray-daemon"),
    cpu: 32,
    memoryReservationMiB: 256,
    environment: {
            AWS_XRAY_DAEMON_ADDRESS: `${XRayConfig.containerName}.${namespace}:${XRayConfig.port}`,
    },
    portMappings: [
        {
            hostPort: XRayConfig.port,
            containerPort: XRayConfig.port,
            protocol: ECS.Protocol.UDP,
        }
    ]
});

const xraySidecarService = new ECS.Ec2Service(this, `xray-sidecar-service`, {
    taskDefinition: xraySidecarTaskDefinition,
    cluster,
    desiredCount: 1,
    cloudMapOptions: {
        cloudMapNamespace: dnsNamespace,
        name: XRayConfig.containerName,
        containerPort: XRayConfig.port,
        dnsRecordType: ServiceDiscovery.DnsRecordType.SRV,
    },
});

I would really appreciate if anyone could point out an issue with my CDK approach, or if there is something I've missed in the docs explaining how to prevent the XRay sidecar process from exiting. It seems to me like if there is some healthcheck inside the container (i.e. UDP packets must be received within 30 seconds of the process starting) that, when that healthcheck fails, there should be an error logged to push users in the right direction.

Thanks for any advice you can offer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant