Skip to content

Commit bc12d8c

Browse files
feat: Implement CDK-based AWS resource cleanup Lambda
Comprehensive AWS resource cleanup Lambda with CDK deployment to replace CloudFormation-based implementation that exceeded 51KB template size limit. Architecture changes: - Migrate from inline CloudFormation to AWS CDK + Python - Rename ec2_cleanup to aws_resource_cleanup (accurate scope) - Modular code structure: 29 files, 2,338 additions - Python 3.12 runtime with modern type annotations Features: - EC2 cleanup: TTL expiration, stop policies, long-stopped detection, untagged instances - EKS cleanup: CloudFormation stack deletion with skip patterns - OpenShift cleanup: Full cluster destruction with reconciliation loop - Billing tag validation: Category strings and Unix timestamps - DRY_RUN mode: Default safe preview mode Infrastructure: - CDK stack with 7 configurable parameters - Lambda: 1024MB memory, 600s timeout, hourly EventBridge schedule - Comprehensive IAM permissions for EC2, EKS, ELB, Route53, S3, VPC - SNS notifications for cleanup actions Developer experience: - Justfile automation: 26+ commands for deployment, monitoring, maintenance - uv package manager: No manual venv activation required - Linters: ruff, black, mypy with full type coverage - Root IaC/justfile for multi-project routing Configuration via environment variables: - DRY_RUN (default: true) - UNTAGGED_THRESHOLD_MINUTES (default: 30) - EKS_SKIP_PATTERN (default: pe-.*) - OPENSHIFT_CLEANUP_ENABLED (default: true) - OPENSHIFT_BASE_DOMAIN (default: cd.percona.com) - OPENSHIFT_MAX_RETRIES (default: 3) Replaces: - IaC/LambdaEC2Cleanup.yml (10-min simple cleanup) - cloud/aws-functions/orphaned_*.py scripts - Manual OpenShift destruction workflows
1 parent 0720cfe commit bc12d8c

33 files changed

+2338
-0
lines changed

IaC/cdk/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# CDK Projects
2+
3+
AWS CDK (Cloud Development Kit) infrastructure as code implementations.
4+
5+
## Projects
6+
7+
### aws-resources-cleanup
8+
Comprehensive AWS resource cleanup Lambda with CDK deployment.
9+
10+
**Purpose**: Automated cleanup of EC2 instances, EKS clusters, and OpenShift infrastructure based on TTL policies and billing tags.
11+
12+
**Features**:
13+
- TTL-based expiration (8h, 24h policies)
14+
- Billing tag validation (category + Unix timestamps)
15+
- EKS CloudFormation deletion
16+
- OpenShift comprehensive cleanup (VPC, ELB, Route53, S3, NAT, security groups)
17+
- DRY_RUN mode (default)
18+
- SNS notifications
19+
- Hourly EventBridge schedule
20+
21+
**Quick Start**:
22+
```bash
23+
cd aws-resources-cleanup
24+
just install # Install dependencies
25+
just deploy # Deploy in DRY_RUN mode
26+
just logs # Tail CloudWatch logs
27+
```
28+
29+
📖 **Full documentation**: [aws-resources-cleanup/README.md](aws-resources-cleanup/README.md)
30+
31+
## Requirements
32+
33+
- AWS CLI configured with appropriate profile
34+
- `uv` package manager: `brew install uv`
35+
- `just` task runner: `brew install just`
36+
37+
## Common Commands
38+
39+
All projects use Justfile for consistent automation:
40+
41+
| Command | Description |
42+
|---------|-------------|
43+
| `just install` | Install all dependencies |
44+
| `just synth` | Generate CloudFormation template |
45+
| `just diff` | Preview infrastructure changes |
46+
| `just deploy` | Deploy stack |
47+
| `just destroy` | Remove stack |
48+
| `just logs` | Tail CloudWatch logs (if applicable) |
49+
50+
## Adding New CDK Projects
51+
52+
When creating a new CDK project in this directory:
53+
54+
1. Create project directory: `mkdir project-name`
55+
2. Initialize CDK: `cdk init app --language python`
56+
3. Add Justfile for automation
57+
4. Add project-specific README.md
58+
5. Update this README with project description
59+
60+
## Resources
61+
62+
- [AWS CDK Documentation](https://docs.aws.amazon.com/cdk/)
63+
- [CDK Python API Reference](https://docs.aws.amazon.com/cdk/api/v2/python/)
64+
- [Justfile Documentation](https://github.com/casey/just)
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# CDK
2+
cdk.out/
3+
.cdk.staging/
4+
cdk.context.json
5+
6+
# Python
7+
__pycache__/
8+
*.py[cod]
9+
*$py.class
10+
*.so
11+
.Python
12+
build/
13+
develop-eggs/
14+
dist/
15+
downloads/
16+
eggs/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# Virtual environments
30+
venv/
31+
ENV/
32+
env/
33+
.venv
34+
35+
# Testing
36+
.pytest_cache/
37+
.coverage
38+
htmlcov/
39+
.tox/
40+
41+
# IDEs
42+
.vscode/
43+
.idea/
44+
*.swp
45+
*.swo
46+
*~
47+
48+
# OS
49+
.DS_Store
50+
Thumbs.db
51+
52+
# Lambda artifacts
53+
*.zip
54+
/tmp/
55+
56+
# Logs
57+
*.log
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# AWS Resources Cleanup
2+
3+
Automated cleanup Lambda for EC2 instances, EKS clusters, and OpenShift infrastructure.
4+
5+
## Overview
6+
7+
Comprehensive resource cleanup Lambda that runs hourly to enforce TTL policies, billing tag compliance, and infrastructure lifecycle management across all AWS regions.
8+
9+
**Default Mode**: DRY_RUN (logs actions without execution)
10+
11+
## Features
12+
13+
### EC2 Cleanup
14+
- **TTL Expiration**: Enforces `creation-time` + `delete-cluster-after-hours` tags
15+
- **Stop Policy**: Honors `stop-after-days` for staging instances
16+
- **Long Stopped**: Terminates instances stopped >30 days
17+
- **Untagged Instances**: Removes instances without valid billing tags after grace period
18+
- **Billing Tag Validation**: Validates category strings or Unix timestamps
19+
20+
**Protected Instances**:
21+
- Persistent billing tags: `jenkins-*`, `pmm-dev`
22+
- Valid `iit-billing-tag` timestamps
23+
- Instances with `Name` tag matching protected patterns
24+
25+
### EKS Cleanup
26+
- CloudFormation stack deletion with configurable skip patterns
27+
- Default protection: `pe-.*` (platform engineering clusters)
28+
29+
### OpenShift Cleanup
30+
Full cluster destruction including:
31+
- Compute: EC2 instances
32+
- Network: VPC, subnets, route tables, internet gateways, NAT gateways, ENIs, Elastic IPs
33+
- Load Balancers: Classic ELB, ALB, NLB
34+
- Security: Security groups, VPC endpoints
35+
- DNS: Route53 hosted zones and records
36+
- Storage: S3 buckets with `kubernetes.io/cluster/<cluster-name>` tags
37+
38+
**Reconciliation Loop**: Retries resource deletion to handle dependency ordering
39+
40+
## Environment Variables
41+
42+
| Variable | Type | Default | Description |
43+
|----------|------|---------|-------------|
44+
| `DRY_RUN` | bool | `true` | Preview mode - logs actions without execution |
45+
| `SNS_TOPIC_ARN` | string | `""` | SNS topic for cleanup notifications |
46+
| `UNTAGGED_THRESHOLD_MINUTES` | int | `30` | Grace period for untagged instances before deletion |
47+
| `EKS_SKIP_PATTERN` | string | `pe-.*` | Regex pattern for protected EKS cluster names |
48+
| `OPENSHIFT_CLEANUP_ENABLED` | bool | `true` | Enable OpenShift cluster cleanup |
49+
| `OPENSHIFT_BASE_DOMAIN` | string | `cd.percona.com` | Route53 base domain for OpenShift clusters |
50+
| `OPENSHIFT_MAX_RETRIES` | int | `3` | Max reconciliation attempts for resource deletion |
51+
52+
**Runtime**: Python 3.12, 1024 MB memory, 10 minute timeout, hourly schedule
53+
54+
## Quick Start
55+
56+
### Prerequisites
57+
```bash
58+
brew install uv # Python package manager
59+
brew install just # Task runner
60+
```
61+
62+
### Deploy
63+
```bash
64+
cd IaC/cdk/aws-resources-cleanup
65+
66+
# Install dependencies and bootstrap CDK (first time)
67+
just install
68+
just bootstrap
69+
70+
# Deploy in DRY_RUN mode (safe)
71+
just deploy
72+
```
73+
74+
### Monitor
75+
```bash
76+
just logs # Tail CloudWatch logs
77+
just invoke-aws # Manual invocation
78+
just info # Lambda configuration
79+
```
80+
81+
## Common Commands
82+
83+
| Command | Description |
84+
|---------|-------------|
85+
| `just deploy` | Deploy in DRY_RUN mode |
86+
| `just deploy-live` | Deploy in LIVE mode (⚠️ destructive!) |
87+
| `just logs` | Tail CloudWatch logs (follow) |
88+
| `just logs-recent` | Show logs from last hour |
89+
| `just update-code` | Fast Lambda code update (no CDK) |
90+
| `just update-env DRY_RUN=false` | Switch to LIVE mode |
91+
| `just diff` | Preview infrastructure changes |
92+
| `just destroy` | Remove entire stack |
93+
94+
Run `just` to see all commands.
95+
96+
## Cleanup Policies
97+
98+
Policies are evaluated in priority order:
99+
100+
1. **TTL Expiration**
101+
- Tags: `creation-time` (Unix timestamp) + `delete-cluster-after-hours` (integer)
102+
- Action: Terminate when TTL expires
103+
104+
2. **Stop Policy**
105+
- Tag: `stop-after-days` (integer)
106+
- Action: Stop instance after specified days
107+
108+
3. **Long Stopped**
109+
- Criteria: Instance in stopped state >30 days
110+
- Action: Terminate
111+
112+
4. **Untagged**
113+
- Criteria: Missing or invalid `iit-billing-tag`
114+
- Action: Terminate after grace period (default: 30 minutes)
115+
116+
**Protection**: Instances with persistent billing tags or valid timestamps are never auto-deleted.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/usr/bin/env python3
2+
"""CDK app for AWS Resources Cleanup Lambda."""
3+
4+
import os
5+
import aws_cdk as cdk
6+
from stacks.resource_cleanup_stack import ResourceCleanupStack
7+
8+
app = cdk.App()
9+
10+
ResourceCleanupStack(
11+
app,
12+
"AWSResourcesCleanupStack",
13+
description="Comprehensive AWS resource cleanup: EC2, EKS, OpenShift infrastructure",
14+
env=cdk.Environment(
15+
account=os.getenv('CDK_DEFAULT_ACCOUNT'),
16+
region=os.getenv('CDK_DEFAULT_REGION', 'us-east-2')
17+
),
18+
tags={
19+
"Project": "PlatformEngineering",
20+
"ManagedBy": "CDK",
21+
"iit-billing-tag": "resource-cleanup"
22+
}
23+
)
24+
25+
app.synth()
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
{
2+
"app": "python3 app.py",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"requirements*.txt",
11+
"source.bat",
12+
"**/__pycache__",
13+
"**/*.pyc",
14+
".pytest_cache"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": [
21+
"aws",
22+
"aws-cn"
23+
],
24+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27+
"@aws-cdk/aws-iam:minimizePolicies": true,
28+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33+
"@aws-cdk/core:enablePartitionLiterals": true,
34+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
38+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
39+
"@aws-cdk/aws-route53-patternslibrary:useCertificate": true,
40+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
41+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
42+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
43+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
44+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
45+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
46+
"@aws-cdk/aws-redshift:columnId": true,
47+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
48+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
49+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
50+
"@aws-cdk/aws-kms:aliasNameRef": true,
51+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
52+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
53+
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
54+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
55+
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
56+
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
57+
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
58+
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
59+
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
60+
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true,
61+
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true,
62+
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true,
63+
"@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true,
64+
"@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true,
65+
"@aws-cdk/aws-eks:nodegroupNameAttribute": true,
66+
"@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true,
67+
"@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true,
68+
"@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false,
69+
"@aws-cdk/aws-s3:keepNotificationInImportedBucket": false
70+
}
71+
}

0 commit comments

Comments
 (0)