Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the running rosa spots ability #347

Closed
wants to merge 2 commits into from

Conversation

athiruma
Copy link

@athiruma athiruma commented Aug 1, 2023

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

Span the Rosa worker nodes with the help of machinepools by creating the default cluster with 3(M+W+I).

Related Tickets & Documents

Reference document: Link

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please describe the System Under Test.
  • Please provide detailed steps to perform tests related to this code change.
    • Add the variable enable_spot_workers=true in the install config file of rosa.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.
    • Ran the airflow cluster by creating a playground on the sailplane cluster.

@krishvoor
Copy link
Member

@athiruma Can you share the Airflow DAG for review?

@athiruma
Copy link
Author

athiruma commented Aug 1, 2023

@athiruma Can you share the Airflow DAG for review?

yes, it is attached to the word playground in the above description.

Copy link
Member

@krishvoor krishvoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions, overall lgtm

@@ -41,5 +41,6 @@
"staging_mgmt_provisioner_shards": "",
"aws_region": "us-west-2",
"oidc_config": "",
"extra_machinepool": []
"extra_machinepool": [],
"enable_spot_workers": false
Copy link
Member

@krishvoor krishvoor Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add:
"openshift_workload_node_instance_type": null

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default , there is no variable with that name

Copy link
Member

@krishvoor krishvoor Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can declare a new variable with empty/null/default values in defaults.json file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 15 to 16
"openshift_workload_node_instance_type": null,
"enable_spot_workers": false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of modifying the existing json, please create a new json.

Copy link
Member

@krishvoor krishvoor Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a default instance_type to openshift_workload_node_instance_type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the following:

    "openshift_workload_node_instance_type": m5.2xlarge,
    "enable_spot_workers": false

dags/openshift_nightlies/scripts/install/rosa.sh Outdated Show resolved Hide resolved
@krishvoor
Copy link
Member

@athiruma Can you share the Airflow DAG for review?

yes, it is attached to the word playground in the above description.

Ahh, I see it now thank you!

@athiruma athiruma force-pushed the rosa-spots branch 3 times, most recently from dc7f0d8 to b1e02fe Compare August 1, 2023 09:50
@@ -12,5 +12,6 @@
"openshift_network_type": "OVNKubernetes",
"openshift_worker_instance_type": "m5.2xlarge",
"machineset_metadata_label_prefix": "machine.openshift.io",
"openshift_workload_node_instance_type": null
"openshift_workload_node_instance_type": "m5.2xlarge",
"enable_spot_workers": false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a new file: dags/openshift_nightlies/config/install/rosa/ovn-small-spot.json
And move the key/values to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is the case, I need to add this file to the manifest.yml to read properties of this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is the case, I need to add this file to the manifest.yml to read properties of this file.

Please update accordingly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@athiruma athiruma force-pushed the rosa-spots branch 3 times, most recently from c784a5b to a0e3a1b Compare August 1, 2023 10:39
Signed-off-by: Thirumalesh Aaraveti <[email protected]>
Copy link
Collaborator

@mukrishn mukrishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a code rebase because of this cleanup

rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --replicas ${COMPUTE_WORKERS_NUMBER} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
if [ "$ENABLE_SPOT_WORKERS" == "true" ]; then
rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
_wait_for_cluster_ready ${CLUSTER_NAME}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also check for node_ready, won't it fail here if it is not at desired size ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes right. Will add the checks there.

if [ "$ENABLE_SPOT_WORKERS" == "true" ]; then
rosa create cluster --tags=User:${GITHUB_USERNAME} --cluster-name ${CLUSTER_NAME} --version "${ROSA_VERSION}" --channel-group=${MANAGED_CHANNEL_GROUP} --compute-machine-type ${COMPUTE_WORKERS_TYPE} --network-type ${NETWORK_TYPE} ${INSTALLATION_PARAMS} ${ROSA_HCP_PARAMS}
_wait_for_cluster_ready ${CLUSTER_NAME}
rosa create machinepool -c ${CLUSTER_NAME} --name="${CLUSTER_NAME}-spot-pool" --replicas=${COMPUTE_WORKERS_NUMBER} --instance-type="${COMPUTE_WORKERS_TYPE}" --labels="rosa-spots=true" --use-spot-instances
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Curious, why don't we create replicas $((COMPUTE_WORKERS_NUMBER-3)) to keep it user requested size, but by doing that would it affect performance result as it may spread the load across both spot as well as non-spot instances ?
I guess we are using same instance type for both machinepools, it should be fine right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought of having total spot instances instead of on-demand.

- name: sts-ovn-small-spot-cp
schedule: "0 12 * * 3"
config:
install: rosa/ovb-small-spot.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:
install: rosa/ovb-small-spot.json --> install: rosa/ovn-small-spot.json

@codecov-commenter
Copy link

Codecov Report

Merging #347 (69506c3) into master (967454d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #347   +/-   ##
=======================================
  Coverage   79.06%   79.06%           
=======================================
  Files          22       22           
  Lines        1003     1003           
=======================================
  Hits          793      793           
  Misses        210      210           
Flag Coverage Δ
gha 79.06% <ø> (ø)
python-3.9 79.06% <ø> (ø)
unit 79.06% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@athiruma athiruma closed this Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants