Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWSX wants to replace all VPC subnet cidr blocks #1204

Closed
spennell opened this issue Jan 12, 2024 · 4 comments · Fixed by #1210
Closed

AWSX wants to replace all VPC subnet cidr blocks #1204

spennell opened this issue Jan 12, 2024 · 4 comments · Fixed by #1210
Assignees
Labels
impact/regression Something that used to work, but is now broken kind/bug Some behavior is incorrect or out of spec p1 A bug severe enough to be the next item assigned to an engineer resolution/fixed This issue was fixed
Milestone

Comments

@spennell
Copy link

What happened?

When running a preview for one of our stacks Pulumi is indicating it wants to replace all of our VPC subnets(12) due to changes with the cidr blocks. There are existing resources in these subnets.

Previewing update (prd):
     Type                                          Name                       Plan        Info
     pulumi:pulumi:Stack                           aws-eks-prd                            
     └─ awsx:ec2:Vpc                               app-prd-vpc                            
        └─ aws:ec2:Vpc                             app-prd-vpc                            [diff: +tagsAll-instanceTenancy~__defaults]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-private-2      replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-private-2                  [diff: +tagsAll]
 +-        │     ├─ aws:ec2:RouteTableAssociation  app-prd-vpc-private-2      replace     [diff: ~subnetId]
 ~         │     └─ aws:ec2:Route                  app-prd-vpc-private-2      update      [diff: ~natGatewayId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-public-1       replace     [diff: ~cidrBlock]
 +-        │  ├─ aws:ec2:NatGateway                app-prd-vpc-1              replace     [diff: ~subnetId]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-public-1                   [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-public-1       replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-private-3      replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-private-3                  [diff: +tagsAll]
 +-        │     ├─ aws:ec2:RouteTableAssociation  app-prd-vpc-private-3      replace     [diff: ~subnetId]
 ~         │     └─ aws:ec2:Route                  app-prd-vpc-private-3      update      [diff: ~natGatewayId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-elasticache-2  replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-elasticache-2              [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-elasticache-2  replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-rds-3          replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-rds-3                      [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-rds-3          replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-elasticache-3  replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-elasticache-3              [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-elasticache-3  replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-public-3       replace     [diff: ~cidrBlock]
           │  ├─ aws:ec2:RouteTable                app-prd-vpc-public-3                   [diff: +tagsAll]
 +-        │  │  └─ aws:ec2:RouteTableAssociation  app-prd-vpc-public-3       replace     [diff: ~subnetId]
 +-        │  └─ aws:ec2:NatGateway                app-prd-vpc-3              replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-rds-2          replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-rds-2                      [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-rds-2          replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-private-1      replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-private-1                  [diff: +tagsAll]
 +-        │     ├─ aws:ec2:RouteTableAssociation  app-prd-vpc-private-1      replace     [diff: ~subnetId]
 ~         │     └─ aws:ec2:Route                  app-prd-vpc-private-1      update      [diff: ~natGatewayId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-public-2       replace     [diff: ~cidrBlock]
           │  ├─ aws:ec2:RouteTable                app-prd-vpc-public-2                   [diff: +tagsAll]
 +-        │  │  └─ aws:ec2:RouteTableAssociation  app-prd-vpc-public-2       replace     [diff: ~subnetId]
 +-        │  └─ aws:ec2:NatGateway                app-prd-vpc-2              replace     [diff: ~subnetId]
 +-        ├─ aws:ec2:Subnet                       app-prd-vpc-elasticache-1  replace     [diff: ~cidrBlock]
           │  └─ aws:ec2:RouteTable                app-prd-vpc-elasticache-1              [diff: +tagsAll]
 +-        │     └─ aws:ec2:RouteTableAssociation  app-prd-vpc-elasticache-1  replace     [diff: ~subnetId]
 +-        └─ aws:ec2:Subnet                       app-prd-vpc-rds-1          replace     [diff: ~cidrBlock]
              └─ aws:ec2:RouteTable                app-prd-vpc-rds-1                      [diff: +tagsAll]
 +-              └─ aws:ec2:RouteTableAssociation  app-prd-vpc-rds-1          replace     [diff: ~subnetId]

When looking at the diff of the preview it is suggesting new cidr blocks for all the subnets.

Example

Code that generates the VPC

vpc = awsx.ec2.Vpc(
        f"{cluster_config.name}-vpc",
        enable_dns_hostnames=True,
        availability_zone_names=availability_zone_names,
        cidr_block=cluster_config.vpc_cidr_block,
        subnet_strategy=awsx.ec2.SubnetAllocationStrategy("Legacy"),
        subnet_specs=[
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.PRIVATE
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.PUBLIC
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.ISOLATED,
                name="rds",
                tags={"Service": "RDS", "Cluster": cluster_config.name},
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.ISOLATED,
                name="elasticache",
                tags={"Service": "ElastiCache", "Cluster": cluster_config.name},
            ),
        ],
        tags={
            "Name": f"VPC-{cluster_config.name}",
            "Environment": cluster_config.environment,
            "Platform": "pulumi",
        },
    )

Output of pulumi about

CLI          
Version      3.101.1
Go Version   go1.21.5
Go Compiler  gc

Plugins
NAME        VERSION
aws         6.13.3
aws-native  0.90.0
awsx        2.3.0
command     0.9.2
datadog     4.23.0
docker      4.5.0
eks         2.0.0
kubernetes  4.6.0
postgresql  3.10.0
python      unknown
random      4.15.0
tailscale   0.13.3

Host     
OS       ubuntu
Version  22.04
Arch     x86_64

Additional context

We recently updated from awsx v1.0.2 to v2.3.0. We didn't see this issues with our test and dev environments.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@spennell spennell added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Jan 12, 2024
@mjeffryes mjeffryes added impact/regression Something that used to work, but is now broken and removed needs-triage Needs attention from the triage team labels Jan 17, 2024
@mjeffryes
Copy link
Member

Thanks for the report @spennell. Looks like this maybe a regression caused by changes in #1135. The intent is that subnet_strategy=awsx.ec2.SubnetAllocationStrategy("Legacy") should prevent these replaces. We're investigating to find the root cause.

@spennell
Copy link
Author

@mjeffryes I was able to develop a work around. After exporting the awsx.ec2.subnet_layout I was seeing /20 for public and private along with /24 for isolated subnet specs.

Looking at the diff of the preview it appears that the previous version of awsx created /19 for our public and private with the isolated staying at /24

By manually defining the cidr_mask for our public subnet to /20 and keeping our private subnet to /19 the preview shows no changes. I discovered this in our test environments that if I statically assign specific network addresses to the cidr_blocks input it wont actually trigger Pulumi to change the subnet address space, even though the subnet address space is different from the current state. I don't think this is the intended behavior.

@lukehoban lukehoban added the p1 A bug severe enough to be the next item assigned to an engineer label Jan 23, 2024
@mjeffryes mjeffryes added this to the 0.99 milestone Jan 26, 2024
@danielrbradley
Copy link
Member

@spennell can I confirm the settings you're using for availability_zone_names and cluster_config.vpc_cidr_block?

  1. Are you specifying 3 availability zone names?
  2. What CIDR block are you configuring?

For 3 AZs and CIDR of 10.0.0.0/16 using version 1.102.0 of awsx, I'm seeing for the first az: private 10.0.0.0/19, public 10.0.32.0/20, rds 10.0.48.0/24, elasticache 10.0.49.0/24. Note: the public I'm getting is not a /19 but a /20. Putting this into the unit test for legacy mode in the latest version results in the same layout.

I've also tried /17 which results in overlapping subnets (invalid), and /15 which is outside the valid range.

On the workaround: adding the explicit ranges is a good first step for predictability. I would also recommend moving to the "Auto" layout as it's a lot more predictable in its behaviour as it won't modify the order of the specs.

@spennell
Copy link
Author

@danielrbradley

For the problematic environment yes we are specifying 3 AZ's. The test environments default to 2 which is why they may not be experiencing the issue.

availability_zone_names = us-east-1a, us-east-1b, us-east-1c
vpc_cidr_block = 10.1.0.0/16

Here is the updated block that uses the workround.

vpc = awsx.ec2.Vpc(
        f"{cluster_config.name}-vpc",
        enable_dns_hostnames=True,
        availability_zone_names=availability_zone_names,
        cidr_block=cluster_config.vpc_cidr_block,
        subnet_strategy=awsx.ec2.SubnetAllocationStrategy("Legacy"),
        subnet_specs=[
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.PRIVATE,
                cidr_mask=19,
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.PUBLIC,
                cidr_mask=20,
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.ISOLATED,
                cidr_mask=24,
                name="rds",
                tags={"Service": "RDS", "Cluster": cluster_config.name},
            ),
            awsx.ec2.SubnetSpecArgs(
                type=awsx.ec2.SubnetType.ISOLATED,
                cidr_mask=24,
                name="elasticache",
                tags={"Service": "ElastiCache", "Cluster": cluster_config.name},
            ),
        ],
        tags={
            "Name": f"VPC-{cluster_config.name}",
            "Environment": cluster_config.environment,
            "Platform": "pulumi",
        },
    )

danielrbradley added a commit that referenced this issue Jan 30, 2024
Fixes #1204 

- Assuming all subnets are the same size is overly cautious and breaks
some existing setups.
- Maintain the new special case for single subnet layouts to use the
whole of small VPCs.
- This will now fail and require manual layout for smaller VPCs with
either:
   - More than 1 private subnets
   - More than 2 public subnets
   - More than 1 public subnets and more than 4 isolated subnets
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/regression Something that used to work, but is now broken kind/bug Some behavior is incorrect or out of spec p1 A bug severe enough to be the next item assigned to an engineer resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants