Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the running rosa spots ability #347

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions dags/openshift_nightlies/config/install/rosa/ovn-small-spot.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"aws_profile": "",
"aws_access_key_id": "",
"aws_secret_access_key": "",
"aws_authentication_method": "sts",
"rosa_environment": "staging",
"rosa_cli_version": "container",
"ocm_environment": "stage",
"managed_channel_group": "nightly",
"managed_ocp_version": "latest",
"openshift_worker_count": 24,
"openshift_network_type": "OVNKubernetes",
"openshift_worker_instance_type": "m5.2xlarge",
"machineset_metadata_label_prefix": "machine.openshift.io",
"openshift_workload_node_instance_type": "m5.2xlarge",
"enable_spot_workers": true
}
5 changes: 5 additions & 0 deletions dags/openshift_nightlies/manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,11 @@ platforms:
config:
install: rosa/ovn-small.json
benchmarks: small-control-plane-mgs.json
- name: sts-ovn-small-spot-cp
schedule: "0 12 * * 3"
config:
install: rosa/ovb-small-spot.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:
install: rosa/ovb-small-spot.json --> install: rosa/ovn-small-spot.json

benchmarks: small-control-plane-mgs.json
- name: sts-ovn-medium-cp
schedule: "5 12 * * 1"
config:
Expand Down
24 changes: 21 additions & 3 deletions dags/openshift_nightlies/scripts/install/rosa.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,17 @@ _wait_for_nodes_ready(){
NODES_COUNT=$2
ALL_READY_ITERATIONS=4 #reduced extra buffers for hosted cp clusters
else
# Node count is number of workers + 3 infra
NODES_COUNT=$(($2+3))
if [ "$3" == "rosa-spots=true" ]; then
NODES_COUNT=$2
else
if [ "$SPOT_POOL_READY" == "true" ]; then
# Node count is number of workers pool + 3 infra + 3 default Nodes
NODES_COUNT=$(($2+3+3))
else
# Node count is number of workers + 3 infra
NODES_COUNT=$(($2+3))
fi
fi
fi
# 30 seconds per node, waiting for all nodes ready to finalize
while [ ${ITERATIONS} -le $((${NODES_COUNT}*5)) ] ; do
Expand Down Expand Up @@ -445,6 +454,7 @@ setup(){
export NETWORK_TYPE=$(cat ${json_file} | jq -r .openshift_network_type)
export ES_SERVER=$(cat ${json_file} | jq -r .es_server)
export HCP=$(cat ${json_file} | jq -r .rosa_hcp)
export ENABLE_SPOT_WORKERS=$(cat ${json_file} | jq -r .enable_spot_workers)
export UUID=$(uuidgen)
if [ $HCP == "true" ]; then
export STAGE_CONFIG=""
Expand Down Expand Up @@ -557,7 +567,15 @@ install(){
else
INSTALLATION_PARAMS="${INSTALLATION_PARAMS} --multi-az" # Multi AZ is default on hosted-cp cluster
fi
rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --replicas ${COMPUTE_WORKERS_NUMBER} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
if [ "$ENABLE_SPOT_WORKERS" == "true" ]; then
rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
_wait_for_cluster_ready ${CLUSTER_NAME}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also check for node_ready, won't it fail here if it is not at desired size ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes right. Will add the checks there.

rosa create machinepool -c ${CLUSTER_NAME} --name="${CLUSTER_NAME}-spot-pool" --replicas=${COMPUTE_WORKERS_NUMBER} --instance-type="${COMPUTE_WORKERS_TYPE}" --labels="rosa-spots=true" --use-spot-instances
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Curious, why don't we create replicas $((COMPUTE_WORKERS_NUMBER-3)) to keep it user requested size, but by doing that would it affect performance result as it may spread the load across both spot as well as non-spot instances ?
I guess we are using same instance type for both machinepools, it should be fine right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought of having total spot instances instead of on-demand.

_wait_for_nodes_ready $CLUSTER_NAME $COMPUTE_WORKERS_NUMBER "rosa-spots=true"
export SPOT_POOL_READY=true
else
rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --replicas ${COMPUTE_WORKERS_NUMBER} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
fi
fi
postinstall
return 0
Expand Down
4 changes: 3 additions & 1 deletion dags/openshift_nightlies/tasks/install/rosa/defaults.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,7 @@
"staging_mgmt_provisioner_shards": "",
"aws_region": "us-west-2",
"oidc_config": "",
"extra_machinepool": []
"extra_machinepool": [],
"openshift_workload_node_instance_type": null,
"enable_spot_workers": false
Copy link
Member

@krishvoor krishvoor Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add:
"openshift_workload_node_instance_type": null

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default , there is no variable with that name

Copy link
Member

@krishvoor krishvoor Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can declare a new variable with empty/null/default values in defaults.json file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}