Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ecs: unable to change networkMode back to the default #33410

Open
1 task
rantoniuk opened this issue Feb 12, 2025 · 2 comments
Open
1 task

ecs: unable to change networkMode back to the default #33410

rantoniuk opened this issue Feb 12, 2025 · 2 comments
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@rantoniuk
Copy link

rantoniuk commented Feb 12, 2025

Describe the bug

An EcsService defined like this:

    const taskDefinition = new ecs.Ec2TaskDefinition(this, 'TaskDef', {
      family: 'task-def',
      taskRole: taskRole,
      networkMode: ecs.NetworkMode.AWS_VPC,
    });

   const realtimeWsService = new ecs.Ec2Service(this, 'Service', {
      serviceName: 'Service',
      cluster: props.cluster,
      taskDefinition,
      capacityProviderStrategies: [gpucapacityProviderStrategy],
      serviceConnectConfiguration: {
        logDriver: new ecs.AwsLogDriver({
          logGroup,
          streamPrefix: 'serviceconnect',
        }),
        namespace: 'local',
        // realtimews.local:5001
        services: [{ portMappingName: 'rtws', dnsName: 'rtws' }],
      },
    });

    this.lb = new elbv2.ApplicationLoadBalancer(this, 'LB', {
      loadBalancerName: 'Ecs-RealtimeWs-ALB',
      vpc: props.vpc,
      internetFacing: true,
      preserveHostHeader: true,
    });

    const listener = this.lb.addListener('PublicListener', {
      port: 80,
      open: true,
      protocol: elbv2.ApplicationProtocol.HTTP,
    });

    // Attach ALB to ECS Service
    listener.addTargets('ECS', {
      targetGroupName: 'EcsWebsocketAlbTG',
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targets: [
        realtimeWsService.loadBalancerTarget({
          containerName: containerDef.containerName,
          containerPort: containerDef.containerPort,
        }),
      ],
    });

This deploys and works perfectly fine. However, now, I want to change the approach and I want to remove the public ALB altogether and go back to the default networkMode.

   
    const taskDefinition = new ecs.Ec2TaskDefinition(this, 'TaskDef', {
      family: 'task-def',
      taskRole: taskRole,
      // networkMode: ecs.NetworkMode.AWS_VPC,
    });

   const realtimeWsService = new ecs.Ec2Service(this, 'Service', {
      serviceName: 'Service',
      cluster: props.cluster,
      taskDefinition,
      capacityProviderStrategies: [gpucapacityProviderStrategy],
      serviceConnectConfiguration: {
        logDriver: new ecs.AwsLogDriver({
          logGroup,
          streamPrefix: 'serviceconnect',
        }),
        namespace: 'local',
        // realtimews.local:5001
        services: [{ portMappingName: 'rtws', dnsName: 'rtws' }],
      },
    });

This bails out with:

Service-Stack | 2/6 | 10:42:38 PM | UPDATE_FAILED | AWS::ECS::Service | Resource handler returned message: "Invalid request provided: UpdateService error: The target group arn:aws:elasticloadbalancing:us-west-2:679849022850:targetgroup/EcsWebsocketAlbTG/efbc21827edf62e8 does not exist. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException;

If you check the synchronised template, you'll see it is still referencing the TargetGroup, even though it's nowhere in the code.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

above

Current Behavior

above

Reproduction Steps

above

Possible Solution

No response

Additional Information/Context

  "devDependencies": {
    "@biomejs/biome": "1.9.4",
    "@types/babel__traverse": "^7.18.2",
    "@types/js-yaml": "^4.0.5",
    "@types/node": "^22.10.2",
    "@typescript-eslint/eslint-plugin": "^8.18.0",
    "@typescript-eslint/parser": "^8.18.0",
    "aws-cdk": "^2.80.0",
    "lefthook": "^1.10.1",
    "typescript": "^5.6.3"
  },
  "dependencies": {
    "aws-cdk": "^2.80.0",
    "aws-cdk-lib": "^2.80.0",
    "cdk-nag": "^2.28.179",
    "cloudwatch-retention-setter": "^0.0.15",
    "constructs": "^10.0.0",
    "js-yaml": "^4.1.0",
    "source-map-support": "^0.5.21"
  }

CDK CLI Version

2.177.0 (build b396961)

Framework Version

No response

Node.js Version

v22.12.0

OS

MacOS

Language

TypeScript

Language Version

├── @biomejs/[email protected]
├── @types/[email protected]
├── @types/[email protected]
├── @types/[email protected]
├── @typescript-eslint/[email protected]
├── @typescript-eslint/[email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
└── [email protected]

Other information

No response

@rantoniuk rantoniuk added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 12, 2025
@github-actions github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Feb 12, 2025
@pahud
Copy link
Contributor

pahud commented Feb 12, 2025

This appears to be an issue with how CDK handles the removal of load balancer target groups from ECS services. When you try to remove the ALB configuration, the service is still trying to reference the old target group during the CloudFormation update.

To resolve this, I recommend:

  1. First deployment - Remove the load balancer target:
const realtimeWsService = new ecs.Ec2Service(this, 'Service', {
  serviceName: 'Service',
  cluster: props.cluster,
  taskDefinition,
  capacityProviderStrategies: [gpucapacityProviderStrategy],
  // Keep the networkMode: AWS_VPC for now
  serviceConnectConfiguration: {
    // ... rest of the config
  }
});

// Remove all load balancer related code
  1. After this deployment succeeds, in a second deployment, remove the network mode:
const taskDefinition = new ecs.Ec2TaskDefinition(this, 'TaskDef', {
  family: 'task-def',
  taskRole: taskRole,
  // Now you can remove networkMode
});

This two-step approach should allow CloudFormation to properly handle the removal of the load balancer configuration before changing the network mode.

While one-step deployment is preferable, based on my analysis of the CDK codebase and the issue, I don't recommend trying to do this in one step. Here's why:

  1. The ECS service maintains a direct reference to the target group in its CloudFormation template. When you remove the ALB configuration, the service still needs to properly deregister from the target group first.
  2. The networkMode change and load balancer target group removal are two separate concerns in the CloudFormation stack update process. Trying to change both simultaneously can lead to race conditions where CloudFormation might try to:
  • Update the network mode while the service is still registered with the target group
  • Remove the target group while the service is still trying to use it
  • Update service configuration while network bindings are changing
  1. Looking at the CDK's BaseService implementation, load balancer targets are managed through the service's configuration and are tightly coupled with the network mode when using AWS_VPC.

Therefore, while it might be tempting to do this in one step, it's safer and more reliable to:

  1. First remove the load balancer configuration
  2. Then change the network mode

This follows AWS's best practices for service updates and ensures a clean, predictable deployment process. Attempting to do both changes at once could lead to deployment failures or service disruption.

Let me know if it works for you.

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p3 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Feb 12, 2025
@rantoniuk
Copy link
Author

rantoniuk commented Feb 12, 2025

I tested that approach as well but it also failed:

  • removing the LB worked fine
  • but then just commenting the networkMode line still was failing

While one-step deployment is preferable, based on my analysis of the CDK codebase and the issue, I don't recommend trying to do this in one step

I consider CDK as a declarative language that describes an expected state, so I still see this as a bug.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

No branches or pull requests

2 participants