Skip to content

Commit

Permalink
Release 1.1.1 - 2nd attempt (#54)
Browse files Browse the repository at this point in the history
* Split normalize_config into two functions (#27)

* Split normalize_config into two functions

* Add test cases for normalize_config and parse_additional_config

* Bug Fix: on test_parse_additional_config

* Update test_parse_additional_config

* Added new badges to the Readme (#30)

* Move EC2 pricing calls to single function. (#29)

* Added more clear SSH error message for improper credentials

* Updated changelog

* Fixed changelog

* Updated SSH credential error message

* Add tests for ssh.py module.

* Add coverage as a test dependency.

* Update changelog and fix style.

* Add unittests for rsync module. (#33)

* Add tests for yaml_loader.py to increase coverage (#34)

* Add tests for yaml_loader.py to increase coverage

* remove redundant imports

Co-authored-by: ali <[email protected]>

* moved function  outside for better testing (#35)

Co-authored-by: ali <[email protected]>

* Added venv to .gitignore

* bump version (#37)

* Added venv to .gitignore

* bump version

bump version so we can merge with main

* Update CHANGELOG.md

update link for unreleased

* Update CHANGELOG.md

update link for unreleased

Co-authored-by: Gabriele A. Ron <[email protected]>

* increase test coverage for forge/destory.py (#39)

Co-authored-by: ali <[email protected]>

* Add a configurable spot strategy

* Updated tests

* Fixed tests for multi-az

* Updated documentation for multi-az

* Add multi-az functionality

* Add spot retries and failover

* Update documentation

* Bumped version

* Bumped version to 1.1.0

* Updated maintainers

* Fixed region bug in create.py

* 1.1.1

* Fixed automatic multi-worker allocation bug

* Updated dependencies

* Added destroy_on_create

* Fixed potential bug

* Moved to get_nlist()

* Version 1.2.0

* Add create_timeout configuration option

* Remove default create_timeout setting

* Fix create_timeout check

* Reduce version to 1.1.0

* Remove Hacktoberfest 2022 branding

* GPU Fix (#47)

* Fix gpu flag not being parsed properly.

* Update changelog.

* Add error reporting for RAM/CPU misconfigurations

* Add retries and return code to rsync

* Bump version to 1.1.1

* Add minute timer after create to engine

* Add msg in engine to inform user of Rsync delay (#51)

* Add log message to inform user of rsync delay.

* Add missing changelog links.

* Resolving merge conflicts (#53)

* merge dev to main (#38)

* Split normalize_config into two functions (#27)

* Split normalize_config into two functions

* Add test cases for normalize_config and parse_additional_config

* Bug Fix: on test_parse_additional_config

* Update test_parse_additional_config

* Added new badges to the Readme (#30)

* Move EC2 pricing calls to single function. (#29)

* Added more clear SSH error message for improper credentials

* Updated changelog

* Fixed changelog

* Updated SSH credential error message

* Add tests for ssh.py module.

* Add coverage as a test dependency.

* Update changelog and fix style.

* Add unittests for rsync module. (#33)

* Add tests for yaml_loader.py to increase coverage (#34)

* Add tests for yaml_loader.py to increase coverage

* remove redundant imports

Co-authored-by: ali <[email protected]>

* moved function  outside for better testing (#35)

Co-authored-by: ali <[email protected]>

* Added venv to .gitignore

* bump version (#37)

* Added venv to .gitignore

* bump version

bump version so we can merge with main

* Update CHANGELOG.md

update link for unreleased

* Update CHANGELOG.md

update link for unreleased

Co-authored-by: Gabriele A. Ron <[email protected]>

Co-authored-by: Gabe Ron <[email protected]>
Co-authored-by: Heshanthaka <[email protected]>
Co-authored-by: Joao Moreira <[email protected]>
Co-authored-by: Gabriele A. Ron <[email protected]>
Co-authored-by: Mohammed Ali Zubair <[email protected]>
Co-authored-by: ali <[email protected]>

* Add log message to inform user of rsync delay.

* Add missing changelog links.

* Bump minimum python to 3.9.

---------

Co-authored-by: npatel-cars <[email protected]>
Co-authored-by: Gabe Ron <[email protected]>
Co-authored-by: Heshanthaka <[email protected]>
Co-authored-by: Gabriele A. Ron <[email protected]>
Co-authored-by: Mohammed Ali Zubair <[email protected]>
Co-authored-by: ali <[email protected]>

* Update release date.

---------

Co-authored-by: Gabe Ron <[email protected]>
Co-authored-by: Heshanthaka <[email protected]>
Co-authored-by: npatel-cars <[email protected]>
Co-authored-by: Gabriele A. Ron <[email protected]>
Co-authored-by: Mohammed Ali Zubair <[email protected]>
Co-authored-by: ali <[email protected]>
  • Loading branch information
7 people authored Dec 13, 2024
1 parent 05bbfdf commit 2a33e52
Show file tree
Hide file tree
Showing 31 changed files with 550 additions and 199 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.0.2
current_version = 1.1.1
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.9'

- name: Install dependencies
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/run_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.9'

- name: Install dependencies
run: |
Expand Down
35 changes: 33 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,35 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
## [1.1.1] - 2024-12-12

### Changed
- **Python Version** - Bump minimum python version to 3.9.
- **Rsync** - Properly triggers retry sequence
- **Rsync** - Gives a return code now

### Fixed
- **Create** - Fix GPU AMI not being selected.
- **Parser** - Fix GPU flag not being passed properly to the config dict.
- **Create** - Better error reporting regarding RAM and CPU misconfigurations.


## [1.1.0] - 2024-02-26

### Added
- **Create** - Added `destroy_on_create`
- **Create** - Added `create_timeout` option
- **Common** - Moved all `n_list` functions to `get_nlist()`
- **Dependencies** - Updated dependencies and tested on latest versions
- **Create** - Set default boto3 session at beginning of create to resolve region bug
- **Create**
- Multi-AZ functionality
- Spot retries
- On-demand Failover

### Changed
- **Create** - Configurable spot strategy
- **Documentation** - Updated with new changes


## [1.0.2] - 2022-10-27
Expand Down Expand Up @@ -33,12 +61,15 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- **GitHub** - Update action to build and publish package only when version is bumped.
- **Forge** - Added automatic tag `forge-name` to allow `Name` tag to be changed.


## [1.0.0] - 2022-09-27

### Added
- **Initial commit** - Forge source code, unittests, docs, pyproject.toml, README.md, and LICENSE files.

[unreleased]: https://github.com/carsdotcom/cars-forge/compare/v1.0.2...HEAD
[unreleased]: https://github.com/carsdotcom/cars-forge/compare/v1.1.1...HEAD
[1.1.1]: https://github.com/carsdotcom/cars-forge/compare/v1.1.0...v1.1.1
[1.1.0]: https://github.com/carsdotcom/cars-forge/compare/v1.0.2...v1.1.0
[1.0.2]: https://github.com/carsdotcom/cars-forge/compare/v1.0.1...v1.0.2
[1.0.1]: https://github.com/carsdotcom/cars-forge/compare/v1.0.0...v1.0.1
[1.0.0]: https://github.com/carsdotcom/cars-forge/releases/tag/v1.0.0
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

[![GitHub license](https://img.shields.io/github/license/carsdotcom/cars-forge?color=navy&label=License&logo=License&style=flat-square)](https://github.com/carsdotcom/cars-forge/blob/main/LICENSE)
[![PyPI](https://img.shields.io/pypi/v/cars-forge?color=navy&style=flat-square)](https://pypi.org/project/cars-forge/)
![hacktoberfest](https://img.shields.io/github/issues/carsdotcom/cars-forge?color=orange&label=Hacktoberfest%202022&style=flat-square&?labelColor=black)
![PyPI - Downloads](https://img.shields.io/pypi/dm/cars-forge?color=navy&style=flat-square)
![GitHub Workflow Status (branch)](https://img.shields.io/github/workflow/status/carsdotcom/cars-forge/Publish%20Package/main?color=navy&style=flat-square)
![GitHub contributors](https://img.shields.io/github/contributors/carsdotcom/cars-forge?color=navy&style=flat-square)

---

## About
Expand Down
16 changes: 13 additions & 3 deletions docs/environmental_yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,18 @@ https://github.com/carsdotcom/cars-forge/blob/main/examples/env_yaml_example/exa
constraints: [2.3, 3.0, 3.1]
error: "Invalid Spark version. Only 2.3, 3.0, and 3.1 are supported."
```
- **aws_az** - The [AWS availability zone](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) where Forge will create the EC2 instance. Currently, Forge can run only in one AZ
- **aws_profile** - [AWS CLI profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html) to use
- **aws_az** - The [AWS availability zone](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) where Forge will create the EC2 instance. If set, multi-az placement will be disabled.
- **aws_region** - The AWS region for Forge to run in- **aws_profile** - [AWS CLI profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html) to use
- **aws_security_group** - [AWS Security Group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) for the instance
- **aws_subnet** - [AWS subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html) where the EC2s will run
- **aws_subnet** - [AWS subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html) where the EC2s will run
- **aws_multi_az** - [AWS subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html) where the EC2s will run organized by AZ
- E.g.
```yaml
aws_multi_az:
us-east-1a: subnet-aaaaaaaaaaaaaaaaa
us-east-1b: subnet-bbbbbbbbbbbbbbbbb
us-east-1c: subnet-ccccccccccccccccc
```
- **default_ratio** - Override the default ratio of RAM to CPU if the user does not provide one. Must be a list of the minimum and maximum.
- default is [8, 8]
- **ec2_amis** - A dictionary of dictionaries to store [AMI](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) info.
Expand Down Expand Up @@ -95,6 +103,8 @@ https://github.com/carsdotcom/cars-forge/blob/main/examples/env_yaml_example/exa
```
- **forge_env** - Name of the Forge environment. The user will refer to this in their yaml.
- **forge_pem_secret** - The secret name where the `ec2_key` is stored
- **on_demand_failover** - If using engine mode and all spot attempts (market: spot + spot retries) have failed, run a final attempt using on-demand.
- **spot_retries** - If using engine mode, sets the number of times to retry a spot instance. Only retries if either market is spot.
- **tags** - [Tags](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html) to apply to instances created by Forge. Follows the AWS tag format.
- Forge also exposes all string, numeric, and some extra variables from the combined user and environmental configs that will be replaced at runtime by the matching values (e.g. `{name}` for job name, `{date}` for job date, etc.) See the [variables](variables.md) page for more details.
- E.g.
Expand Down
3 changes: 3 additions & 0 deletions docs/yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Each forge command certain parameters. A yaml file with all the parameters can b
```
- If running via the command line, a range of values is passed as: ``--market on-demand spot``.
- **name** - Name of the instance/cluster
- **on_demand_failover** - If using engine mode and all spot attempts (market: spot + spot retries) have failed, run a final attempt using on-demand.
- **ram** - Minimum amount of RAM required. Can be a range e.g. [16, 32].
- If using a cluster, you must specify both the master and worker. Master first, worker second.
```yaml
Expand Down Expand Up @@ -76,5 +77,7 @@ Each forge command certain parameters. A yaml file with all the parameters can b
- Use the `--all` flag to run the script on all the instances in a cluster.
- E.g. `run_cmd: scripts/run.sh {env} {date} {ip}`
- **service** - `cluster` or `single`
- **spot_strategy** - Select the [spot allocation strategy](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/create_fleet.html).
- **spot_retries** - If using engine mode, sets the number of times to retry a spot instance. Only retries if either market is spot.
- **user_data** - Custom script passed to instance. Will be run only once when the instance starts up.
- **valid_time** - How many hours the fleet will stay up. After this time, all EC2s will be destroyed. The default is 8.
29 changes: 20 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,20 @@
name = "cars-forge"
description = "Create an on-demand/spot fleet of single or cluster EC2 instances."
readme = "README.md"
requires-python = ">=3.7"
requires-python = ">=3.9"
license = "Apache-2.0"
authors = [
{name = "Nikhil Patel", email = "[email protected]"}
{name = "Nikhil Patel", email = "[email protected]"},
{name = "Gabriele Ron", email = "[email protected]"},
{name = "Joao Moreira", email = "[email protected]"}
]

maintainers = [
{name = "Nikhil Patel", email = "[email protected]"},
{name = "Gabriele Ron", email = "[email protected]"},
{name = "Joao Moreira", email = "[email protected]"}
]

keywords = [
"AWS",
"EC2",
Expand All @@ -19,6 +28,7 @@ keywords = [
"Cluster",
"Jupyter"
]

classifiers = [
"Development Status :: 5 - Production/Stable",
"Environment :: Console",
Expand All @@ -28,24 +38,25 @@ classifiers = [
"Operating System :: Unix",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
]

dynamic = ["version"]

dependencies = [
"boto3~=1.19.0",
"pyyaml~=5.3.0",
"schema~=0.7.0",
"boto3",
"pyyaml",
"schema",
]

[project.optional-dependencies]
test = [
"pytest~=7.1.0",
"pytest-cov~=4.0"
"pytest",
"pytest-cov"
]

dev = [
"bump2version~=1.0",
]
Expand Down
5 changes: 3 additions & 2 deletions src/forge/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "1.0.2"
__version__ = "1.1.1"

# Default values for forge's essential arguments
DEFAULT_ARG_VALS = {
Expand All @@ -11,7 +11,8 @@
'destroy_after_failure': True,
'default_ratio': [8, 8],
'valid_time': 8,
'ec2_max': 768
'ec2_max': 768,
'spot_strategy': 'price-capacity-optimized'
}

# Required arguments for each Forge job
Expand Down
56 changes: 53 additions & 3 deletions src/forge/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from botocore.exceptions import ClientError, NoCredentialsError

from . import DEFAULT_ARG_VALS, ADDITIONAL_KEYS
from .exceptions import ExitHandlerException

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -117,7 +118,8 @@ def ec2_ip(n, config):
'instance_type': i.get('InstanceType'),
'state': i.get('State').get('Name'),
'launch_time': i.get('LaunchTime'),
'fleet_id': check_fleet_id(n, config)
'fleet_id': check_fleet_id(n, config),
'az': i.get('Placement')['AvailabilityZone']
}
details.append(x)
logger.debug('ec2_ip details is %s', details)
Expand All @@ -142,6 +144,35 @@ def get_ip(details, states):
return [(i['ip'], i['id']) for i in list(filter(lambda x: x['state'] in states, details))]


def get_nlist(config):
"""get list of instance names based on service
Parameters
----------
config : dict
Forge configuration data
Returns
-------
list
List of instance names
"""
date = config.get('date', '')
market = config.get('market', DEFAULT_ARG_VALS['market'])
name = config['name']
service = config['service']

n_list = []
if service == "cluster":
n_list.append(f'{name}-{market[0]}-{service}-master-{date}')
if config.get('rr_all'):
n_list.append(f'{name}-{market[-1]}-{service}-worker-{date}')
elif service == "single":
n_list.append(f'{name}-{market[0]}-{service}-{date}')

return n_list


@contextlib.contextmanager
def key_file(secret_id, region, profile):
"""Safely retrieve a secret file from AWS for temporary use.
Expand Down Expand Up @@ -320,6 +351,14 @@ def normalize_config(config):
if config.get('aws_az'):
config['region'] = config['aws_az'][:-1]

if config.get('aws_subnet') and not config.get('aws_multi_az'):
config['aws_multi_az'] = {config.get('aws_az'): config.get('aws_subnet')}
elif config.get('aws_subnet') and config.get('aws_multi_az'):
logger.warning('Both aws_multi_az and aws_subnet exist, defaulting to aws_multi_az')

if config.get('aws_region'):
config['region'] = config['aws_region']

if not config.get('ram') and not config.get('cpu') and config.get('ratio'):
DEFAULT_ARG_VALS['default_ratio'] = config.pop('ratio')

Expand Down Expand Up @@ -492,8 +531,8 @@ def get_ec2_pricing(ec2_type, market, config):
float
Hourly price of given EC2 type in given market.
"""
region = config.get('region')
az = config.get('aws_az')
region = config['region']
az = config['aws_az']

if market == 'spot':
client = boto3.client('ec2')
Expand Down Expand Up @@ -529,3 +568,14 @@ def get_ec2_pricing(ec2_type, market, config):
price = float(price)

return price


def exit_callback(config, exit: bool = False):
if config['job'] == 'engine' and (config.get('spot_retries') or (config.get('on_demand_failover') or config.get('market_failover'))):
logger.error('Error occurred, bubbling up error to handler.')
raise ExitHandlerException

if exit:
sys.exit(1)

pass
23 changes: 18 additions & 5 deletions src/forge/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import sys

import yaml
from schema import Schema, And, Optional, SchemaError
from schema import Schema, And, Optional, Or, SchemaError, Use

from .common import set_config_dir

Expand Down Expand Up @@ -50,19 +50,32 @@ def check_env_yaml(env_yaml):
"""
schema = Schema({
'forge_env': And(str, len, error='Invalid Environment Name'),
'aws_az': And(str, len, error='Invalid AWS availability zone'),
Optional('aws_region'): And(str, len, error='Invalid AWS region'),
Optional('aws_az'): And(str, len, error='Invalid AWS availability zone'),
Optional('aws_subnet'): And(str, len, error='Invalid AWS Subnet'),
'ec2_amis': And(dict, len, error='Invalid AMI Dictionary'),
'aws_subnet': And(str, len, error='Invalid AWS Subnet'),
Optional('aws_multi_az'): And(dict, len, error='Invalid AWS Subnet'),
'ec2_key': And(str, len, error='Invalid AWS key'),
'aws_security_group': And(str, len, error='Invalid AWS Security Group'),
Optional('aws_security_group'): And(str, len, error='Invalid AWS Security Group'),
'forge_pem_secret': And(str, len, error='Invalid Name of Secret'),
Optional('aws_profile'): And(str, len, error='Invalid AWS profile'),
Optional('ratio'): And(list, len, error='Invalid default ratio'),
Optional('user_data'): And(dict, len, error='Invalid Create Scripts'),
Optional('tags'): And(list, len, error="Invalid AWS tags"),
Optional('excluded_ec2s'): And(list),
Optional('additional_config'): And(list),
Optional('ec2_max'): And(int)
Optional('ec2_max'): And(int),
Optional('spot_strategy'): And(str, len,
Or(
'lowest-price',
'diversified',
'capacity-optimized',
'capacity-optimized-prioritized',
'price-capacity-optimized'),
error='Invalid spot allocation strategy'),
Optional('on_demand_failover'): And(bool),
Optional('spot_retries'): And(Use(int), lambda x: x > 0),
Optional('create_timeout'): And(Use(int), lambda x: x > 0),
})
try:
validated = schema.validate(env_yaml)
Expand Down
Loading

0 comments on commit 2a33e52

Please sign in to comment.