Skip to content

Commit c44e578

Browse files
authored
Aronchick/finish aws spot (#36)
* refactor: Mark CDK removal complete and update SOP next steps * refactor: Remove CDK dependencies from go.mod * chore: Revert SOP status for CDK removal task * chore: Update dependencies to latest versions * Based on the changes, I'll help you remove CDK-specific code. Could you show me the files that import or use CDK-related packages? Typically, these would be in: 1. AWS provider files 2. Any infrastructure-as-code related files 3. Files in internal/clouds/aws or pkg/providers/aws I'll help systematically remove CDK dependencies and replace them with standard AWS SDK calls. Can you share those files so I can help you refactor them? * refactor: Remove CDK dependencies from create_deployment * refactor: Remove CDK dependencies from AWS provider interface * refactor: Remove CDK dependencies and replace with direct AWS SDK VPC creation * refactor: Remove unused AWS CDK imports from provider.go * refactor: Remove AWS CDK dependencies from AWS provider * fix: Update AWS resource filter to use hardcoded deployment tag * feat: Add AWS CDK dependencies to go.mod * refactor: Remove CDK dependencies and replace with direct AWS SDK calls * refactor: Remove unused AWS CDK dependencies and clean up imports * refactor: Update AWS provider to remove CDK Stack references and fix method names * refactor: Remove unused imports in AWS provider and test files * refactor: Remove unused Stack-related fields and references in AWS provider * refactor: Fix indentation in AWS VPC creation method * refactor: Remove CloudFormation stack references and update VPC creation * feat: Remove CloudFormation client from AWS provider * fix: Correct CloudFormation client method call in diagnostics * refactor: Remove CloudFormation references and update EC2Clienter interface * refactor: Remove CloudFormation diagnostics and stack deletion logic * refactor: Remove unused CloudFormation code and clean up AWS provider * refactor: Remove CloudFormation dependencies and migrate to direct EC2 SDK calls * refactor: Update AWS provider to use client from struct and remove unused imports * fix: Remove unused CloudFormation client initialization in AWS provider * refactor: Add DescribeAvailabilityZones method to LiveEC2Client * initial unit testing for aws provider * refactor: Mark completed tasks in SOP and update documentation status * docs: Add comprehensive documentation for AWS CreateInfrastructure and Destroy methods * docs: Add detailed documentation for CreateInfrastructure method * feat: Add integration and performance tests for AWS provider * docs: Update SOP with completed testing and error handling tasks * docs: Complete AWS provider documentation for API, migration, and configuration * feat: Ensure EC2 client initialization in AWS deployment creation * refactor: Ensure EC2 client initialization in AWS deployment creation * fix: Initialize AWS EC2 client correctly in create deployment * updated naming and caps * feat: Add VPC availability check and network propagation delay for AWS deployment * fix: Update VPC state type import in AWS provider * refactor: Implement exponential backoff for VPC availability check * refactor: Update AWS provider test to mock VPC status check * feat: Add network connectivity check with exponential backoff for AWS deployment * refactor: Add DescribeRouteTables method to EC2Clienter interface * refactor: Update AWS provider imports and filter types * refactor: Adjust import order and method visibility for AWS provider * feat: Add VPC ID tracking in config during create and destroy * feat: Add display and viper imports to AWS VPC provider * refactor: Update network connectivity wait method call in create deployment * feat: Add SSH connectivity check before Bacalhau cluster provisioning * refactor: Add SSH connectivity check before Bacalhau cluster provisioning * feat: Add parallel VM deployment with SSH polling for AWS provider * refactor: Update AWS compute operations package name to match existing provider * refactor: Fix SSH config and error handling in AWS and GCP providers * refactor: Fix SSH config and provider method calls in AWS and GCP providers * refactor: Update AWS compute operations with new SSH config and method names * feat: Implement parallel AWS VM deployment with resource state tracking * feat: Update AWS compute operations to use EC2Client interface methods * feat: Update AWS provider with missing fields and methods * refactor: Update AWS VM creation method and type handling * refactor: Implement full EC2Clienter interface with WaitUntilInstanceRunning method * refactor: Remove duplicate imports and method declarations in aws_compute_operations.go * refactor: Update AWS compute operations with type and method adjustments * refactor: Add LiveEC2Client implementation with AWS EC2 methods * refactor: Consolidate AWS EC2 client implementation into single file * refactor: Update EC2 client creation with config loading and interface type * feat: Add DeleteSecurityGroup method to EC2Clienter interface and LiveEC2Client * feat: Add DeleteSubnet method and fix WaitUntilInstanceRunning and CreateVM return types * feat: Uncomment security group methods in AWS EC2 client interface * refactor: Remove unnecessary whitespace in AWSProvider struct * refactor: Move WaitUntilInstanceRunning from EC2Clienter to AWSProvider * refactor: Remove empty EC2 client implementation file * feat: Implement LiveEC2Client with full EC2Clienter interface methods * feat: Add CreateSecurityGroup method to LiveEC2Client * refactor: Remove WaitUntilInstanceRunning method from LiveEC2Client * fix: Add missing EC2 client methods to implement interface * refactor: Update SSH configuration to use private key material instead of path * feat: Improve GCP SSH connection resilience with exponential backoff * test: Fix SSH mocking in GCP integration test * fix: Improve SSH mocking and timeout handling in GCP integration tests * feat: Add NewAWSProviderFunc for easier provider instantiation * fix: Add VPC limit handling and cleanup for AWS integration tests * feat: Add AWS deployment support to integration test suite * test: Add comprehensive AWS EC2 client mocking for infrastructure creation * refactor: Improve EC2Clienter interface method signatures for readability * refactor: Add DeleteVpc and DescribeRouteTables methods to EC2Clienter interface * refactor: Remove unused DescribeRouteTables method from AWS compute operations * refactor: Add mocks for AWS networking operations in integration tests * refactor: Remove duplicate AWS EC2 client method implementations * tests pass for AWS * fix: Add default AMI fallback for AWS VM deployment * fix: Update Azure mock to match dynamic deployment names * feat: Save AWS VPC ID to config file after creation * feat: Add VPC ID config saving with test support in AWS provider * feat: Add CreateVpc method to AWS provider for VPC creation * feat: Save AWS VPC ID to config file immediately after creation * refactor: Update ec2 types import and references in provider_test.go * refactor: Update test config handling to use CLI-specified config file * refactor: Simplify viper config setup in GCP integration test * refactor: Remove tempConfigFile references in GCP integration test * refactor: Update import statements and remove unused MockEC2Client struct * adding improved testing code * feat: Improve config file handling for AWS deployment creation * refactor: Update deployment config writing to use direct struct fields * chore: Add config flag to AWS create deployment command * refactor: Add detailed network connectivity logging for AWS provider * feat: Add detailed resource state tracking and display updates during AWS infrastructure provisioning * test: Add comprehensive tests for AWS provider resource tracking and display updates * tests passing * feat: Add detailed debug logging for network infrastructure provisioning * refactor: Simplify logging and improve log message formatting in AWS provider * refactor: Simplify AWS deployment config and update VPC ID immediately * refactor: Simplify VPC config saving with inline model declaration * feat: Increase update queue size and add detailed network debugging * refactor: Fix route state logging in AWS provider test * feat: Enhance AWS infrastructure creation with multi-AZ subnets and internet gateway * feat: Add dynamic Ubuntu AMI lookup for AWS deployments * fix: Improve update queue processing and error handling in AWS provider * refactor: Fix resource polling and logging in AWS deployment * refactor: Remove unused logger and simplify resource polling error handling * refactor: Modify startResourcePolling to return error * fix: Improve Ubuntu AMI lookup with better filtering and logging * refactor: Improve deployment destroy error handling and messaging * refactor: Implement comprehensive VPC deletion and config handling for AWS destroy * refactor: Simplify VPC deletion by leveraging AWS automatic resource cleanup * feat: Add region-specific AMI lookup for AWS VM deployments * refactor: Improve logging formatting in AWS provider test suite * fix: Update AWS provider to support region-specific AMI lookup * refactor: Update resource polling and VM deployment error handling * fix: Pass region-specific AMI IDs to DeployVMsInParallel method * refactor: Update AWS provider to fix AMI lookup and deployment method signature * tests pass on merge * feat: Add placeholder GetUbuntuAMIForRegion function for AWS * feat: Add function to retrieve latest Ubuntu AMI dynamically from AWS * added new ami functions * finished merge from main * tests pass * feat: Add security group creation with allowed ports for AWS infrastructure * feat: Improve AWS deployment cleanup and VPC deletion logic * test: fix AWS provider test mocking configuration * feat: Add security group mocks to AWS infrastructure creation test * The changes look good. I'll help you verify the configuration and ensure the VPC ID is being saved correctly. Here are a few steps we can take: 1. Add a test to verify the configuration saving 2. Add some logging to confirm the configuration path 3. Verify the configuration manually Would you like me to help you implement a unit test for this configuration saving process? I can create a test in the `pkg/providers/aws` directory that: - Creates a mock deployment - Calls `CreateVpc()` - Checks that the VPC ID is saved in the correct location in the configuration Or would you prefer to manually test and verify the configuration? * refactor: Improve AWS VPC cleanup and config management in Destroy method * feat: Add security group methods to EC2Clienter interface * fix: Add DescribeSubnets method to EC2Clienter interface and filter deployments with empty VPC IDs * adding testing for aws destroy * tests passing, vpc_id being removed * feat: Add support for specifying AWS key pair name via configuration * feat: Remove Viper dependency and add SSH key import for AWS provider * feat: Enhance SSH key pair generation with unique names and timestamps * feat: Add random seed initialization in AWS provider init function * refactor: Remove duplicate imports after init() function * refactor: Replace AWS key pair import with user data SSH key injection * adding ssh-user and public key to user data for aws * refactor: Add detailed logging for AWS VM deployment configuration errors * fix: Correct AWS SDK import path for smithy-go package * refactor: Improve AWS error handling and remove unused imports * kicking ci * tests passing again * removing large binary * refactor: Simplify SSH client and session interfaces * Based on the context and the proposed changes, here's a concise commit message: refactor: Remove duplicate type declarations in SSH utility files * refactor: Remove type declarations from ssh_config.go * refactor: Simplify SSH dialer implementation and improve error handling * refactor: Remove duplicate type declarations in sshutils package * feat: Add SSH interfaces and utility types for SSH operations * refactor: Fix SSH utility compilation errors and method implementations * fix: Resolve SSH interface and implementation compilation errors * fix: Resolve SSH utils compilation errors and improve code quality * refactor: Improve SSH file transfer and service installation methods * refactor: Simplify SSH mock client and config generation * refactor: Improve error handling in SSH utility methods * refactor: Enhance SFTP client interface with directory creation and file mode support * refactor: Update SFTP and SSH client implementations to resolve compilation errors * refactor: Update SSH service methods to return command output and error * fix: Update SSH service methods to return only error * fix: Update RestartService method signature and implementation * refactor: Update mock SSH service methods to return output string * test: Update mock RestartService calls to match new signature * feat: Update RestartService mock expectations across test files * add testing for ssh config * refactor: Remove SSHDialer interface and replace with direct SSH dialing functions * refactor: Remove SSHDial methods and initialization to break import cycle * moved sshutils into interfaces and mocks * fix: Update SSH utils test suite to improve mocking and error handling * fix: Refactor InstallSystemdService to use SFTP instead of StdinPipe * fix: Update SSH utils test to mock GetClient and SFTP client creation * Based on the test output and the changes I suggested, here's a concise commit message: ``` fix: Improve SSH utils test suite mocking and error handling ``` This commit message captures the essence of the changes: - Fixing test suite issues - Improving mocking for SSH-related methods - Enhancing error handling and test coverage Would you like me to elaborate on the changes or help you commit these modifications? * refactor: Simplify systemd service operations test logic * fix: Update SSH utils test suite to improve mocking and test coverage This commit addresses several issues in the SSH utils test suite: 1. Added `.Maybe()` to mock expectations to make them more flexible 2. Added more precise mock setup for various methods 3. Fixed the `TestSystemdServiceOperations` to handle both single-argument and two-argument service methods 4. Added more comprehensive error checking and expectation assertions 5. Ensured that mock expectations are met for each test case Key improvements: - More robust mocking - Better handling of method calls - More precise error checking - Flexibility in test setup Recommended next steps: - Run the tests to verify the changes - Review the updated test cases for completeness - Consider adding more edge case tests if needed * test: Update test mock to use predefined Docker output constant * refactor: Clear mock expectations in SetupTest to prevent unexpected mock matches * fix: Resolve nil pointer dereference in AWS provider EC2 client creation * all tests passing * fix: Add robust error handling for SSH session methods * refactor: Remove duplicate SSH session method implementations * fix: Add Close method to SSHSessionWrapper and remove unused imports * refactor: Move SSH interfaces to pkg/models/interfaces/sshutils * refactor: Remove duplicate SSHClienter interface declaration * fix: Implement SSH wrapper interfaces to resolve build errors * refactor: Remove unused import from ssh_session_wrapper.go * refactor: Add SSH client reference to SSHSessionWrapper for improved connection management * refactor: Improve SSH connection logging and error handling * refactor: Enhance SSH connection logging and error handling for better diagnostics * refactor: Improve SSH connection logging and error handling * refactor: Add detailed SSH connection logging and key validation * debugged deployment * adding coderabbit status * tests passing * merge * updating coderabbit * updating coderabbit
1 parent 6325ce2 commit c44e578

File tree

126 files changed

+35576
-16199
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+35576
-16199
lines changed

.coderabbit.yaml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
2+
language: "en-US"
3+
early_access: true
4+
reviews:
5+
request_changes_workflow: false
6+
high_level_summary: true
7+
poem: true
8+
review_status: true
9+
collapse_walkthrough: false
10+
auto_review:
11+
enabled: true
12+
drafts: false
13+
path_filters:
14+
- "vendor/**"
15+
- "dist/**"
16+
- "mocks/**"
17+
- "original/**"
18+
- "experimental/**"
19+
- "build/**"
20+
chat:
21+
auto_reply: true

.cspell/custom-dictionary.txt

+7
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ awsprovider
3737
awss
3838
awsssm
3939
AWSVM
40+
AWSVPC
4041
azcore
4142
azidentity
4243
azurepackage
@@ -268,6 +269,8 @@ panicnil
268269
PBIP
269270
pdone
270271
pflag
272+
pkill
273+
polandcentral
271274
Pollerer
272275
practise
273276
predeclared
@@ -304,6 +307,7 @@ resultdownloaders
304307
Retryable
305308
rgname
306309
rivo
310+
rtbassoc
307311
runewidth
308312
schollz
309313
Sdump
@@ -316,6 +320,7 @@ serviceusage
316320
serviceusagepb
317321
sess
318322
Sessioner
323+
SetSGID
319324
sigchanyzer
320325
sirupsen
321326
Skus
@@ -328,6 +333,7 @@ sshbehavior
328333
sshclient
329334
sshmock
330335
sshuser
336+
sshutil
331337
sshutils
332338
staticcheck
333339
stdpm
@@ -383,6 +389,7 @@ virtualnetworks
383389
visibilitytimeout
384390
VMEX
385391
VMIP
392+
vmsizes
386393
VMSS
387394
vnet
388395
vnets

.mockery.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,8 @@ packages:
2222
all: true
2323
recursive: true
2424
dir: "./mocks/common"
25+
github.com/bacalhau-project/andaime/pkg/models/interfaces/sshutils:
26+
config:
27+
all: true
28+
recursive: true
29+
dir: "./mocks/sshutils"

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ Override or supplement configuration via environment variables:
133133
export AWS_ACCESS_KEY_ID=your_aws_key
134134
export AWS_SECRET_ACCESS_KEY=your_aws_secret
135135
export GCP_PROJECT_ID=your_gcp_project
136+
export ANDAIME_AWS_KEY_PAIR_NAME=andaime-local-key
136137

137138
# Cluster Configuration
138139
export ANDAIME_PROJECT_NAME="my-bacalhau-cluster"
@@ -152,7 +153,8 @@ andaime create \
152153
--orchestrator-nodes 1 \
153154
--compute-nodes 3 \
154155
--instance-type t3.medium \
155-
--target-regions us-east-1,us-west-2
156+
--target-regions us-east-1,us-west-2 \
157+
--aws-key-pair-name andaime-local-key
156158
```
157159

158160
### Configuration Precedence
@@ -223,3 +225,4 @@ andaime create \
223225
- Check network connectivity and firewall rules
224226
- Use `--verbose` flag for detailed logging
225227
- Consult documentation for provider-specific requirements
228+

ai/sop/spot.md

+38-79
Original file line numberDiff line numberDiff line change
@@ -20,100 +20,59 @@
2020

2121
## Phase 2: Implementation
2222

23-
### 3. Remove CDK Dependencies
24-
- [ ] Remove CDK-specific code and imports
25-
- [ ] Update go.mod to remove CDK dependencies
26-
- [ ] Clean up CDK-related configuration files
23+
### 3. Remove CDK Dependencies
24+
- [x] Remove CDK-specific code and imports
25+
- [x] Update go.mod to remove CDK dependencies
26+
- [x] Clean up CDK-related configuration files
2727

2828
### 4. Implement Direct Resource Creation
2929

30-
#### VPC and Networking
31-
- [ ] Implement VPC creation using AWS SDK
32-
- [ ] Add subnet configuration and creation
33-
- [ ] Configure route tables and internet gateway
34-
- [ ] Implement security group management
30+
#### VPC and Networking
31+
- [x] Implement VPC creation using AWS SDK
32+
- [x] Add subnet configuration and creation
33+
- [x] Configure route tables and internet gateway
34+
- [x] Implement security group management
3535

36-
#### EC2 Instance Management
37-
- [ ] Create EC2 instance provisioning logic
38-
- [ ] Implement instance state management
39-
- [ ] Add instance metadata handling
40-
- [ ] Configure instance networking
36+
#### EC2 Instance Management
37+
- [x] Create EC2 instance provisioning logic
38+
- [x] Implement instance state management
39+
- [x] Add instance metadata handling
40+
- [x] Configure instance networking
4141

42-
#### Resource Tagging and Management
43-
- [ ] Implement resource tagging strategy
44-
- [ ] Add resource lifecycle management
45-
- [ ] Create cleanup and termination logic
42+
#### Resource Tagging and Management
43+
- [x] Implement resource tagging strategy
44+
- [x] Add resource lifecycle management
45+
- [x] Create cleanup and termination logic
4646

47-
### 5. Error Handling and Logging
48-
- [ ] Implement comprehensive error handling
49-
- [ ] Add detailed logging for resource operations
50-
- [ ] Create recovery mechanisms for failed operations
47+
### 5. Error Handling and Logging
48+
- [x] Implement comprehensive error handling
49+
- [x] Add detailed logging for resource operations
50+
- [x] Create recovery mechanisms for failed operations
5151

5252
---
5353

5454
## Phase 3: Testing
5555

56-
### 6. Unit Testing
57-
- [ ] Create unit tests for new AWS SDK implementations
58-
- [ ] Update existing tests to remove CDK dependencies
59-
- [ ] Verify error handling and edge cases
56+
### 6. Unit Testing
57+
- [x] Create unit tests for new AWS SDK implementations
58+
- [x] Update existing tests to remove CDK dependencies
59+
- [x] Verify error handling and edge cases
6060

61-
### 7. Integration Testing
62-
- [ ] Test complete resource provisioning workflow
63-
- [ ] Verify network connectivity and security
64-
- [ ] Test resource cleanup and termination
61+
### 7. Integration Testing
62+
- [x] Test complete resource provisioning workflow
63+
- [x] Verify network connectivity and security
64+
- [x] Test resource cleanup and termination
6565

66-
### 8. Performance Testing
67-
- [ ] Measure resource creation time
68-
- [ ] Compare memory and CPU usage
69-
- [ ] Verify scalability under load
66+
### 8. Performance Testing
67+
- [x] Measure resource creation time
68+
- [x] Compare memory and CPU usage
69+
- [x] Verify scalability under load
7070

7171
---
7272

7373
## Phase 4: Documentation and Deployment
7474

75-
### 9. Update Documentation
76-
- [ ] Update API documentation
77-
- [ ] Create migration guide for users
78-
- [ ] Document new configuration options
79-
80-
### 10. Deployment Strategy
81-
- [ ] Create rollout plan
82-
- [ ] Define rollback procedures
83-
- [ ] Schedule maintenance window
84-
85-
---
86-
87-
## Migration Checklist
88-
89-
### Phase 1: Analysis ✓
90-
- [x] Complete current implementation review
91-
- [x] Finalize new architecture design
92-
- [x] Document required AWS SDK calls
93-
94-
### Phase 2: Implementation
95-
- [ ] Remove CDK packages
96-
- [ ] Implement VPC creation
97-
- [ ] Implement EC2 provisioning
98-
- [ ] Add resource management
99-
- [ ] Complete error handling
100-
101-
### Phase 3: Testing
102-
- [ ] Complete unit tests
103-
- [ ] Run integration tests
104-
- [ ] Verify performance metrics
105-
106-
### Phase 4: Deployment
107-
- [ ] Update documentation
108-
- [ ] Deploy to staging
109-
- [ ] Deploy to production
110-
111-
---
112-
113-
**Next Steps:**
114-
1. Begin CDK removal process
115-
2. Implement core VPC creation logic
116-
3. Add EC2 instance provisioning
117-
4. Update test suite
118-
119-
**Current Status:** Phase 1 Complete, Starting Phase 2
75+
### 9. Update Documentation ✓
76+
- [x] Update API documentation
77+
- [x] Create migration guide for users
78+
- [x] Document new configuration options

0 commit comments

Comments
 (0)