Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure a spark application can only be submitted once #460

Merged
merged 12 commits into from
Sep 16, 2024
Merged

Conversation

razvan
Copy link
Member

@razvan razvan commented Sep 12, 2024

Description

Fixes #457

Kubernetes recycles Spark applications jobs after ttlSecondsAfterFinished (10min currently) but the application objects live forever (or until the user deletes them).
If a reconciliation is triggered on an app that has no child Job, the operator will submit the Job again.

The fix is to use the application's status field as guard for the reconciliation loop.

I tested it by:

  1. running the smoke integration test with --skip-delete
  2. After the test finished successfully I deleted the Spark pi Job object
  3. I deleted the operator pod (simulated a restart)
  4. The Job was not submitted again and in the operator logs I see:
│ 2024-09-12T13:14:51.650689Z  INFO app_controller:reconciling object{object.ref=SparkApplication.v1alpha1.spark.stackable.tech/spark-pi-s3-1.kuttl-test-meet-martin object.reason=object updated}: stackable_spark │
│ _k8s_operator::spark_k8s_controller: Skip reconciling SparkApplication [spark-pi-s3-1] with non empty status  

🟢 CI: https://testing.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/spark-k8s-operator-it-custom/4/

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

Reviewer

Acceptance

@razvan razvan requested a review from a team September 12, 2024 12:44
@razvan razvan marked this pull request as ready for review September 12, 2024 15:33
@razvan razvan self-assigned this Sep 13, 2024
@sbernauer sbernauer self-requested a review September 16, 2024 08:45
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality LGTM, only some style questions

rust/operator-binary/src/spark_k8s_controller.rs Outdated Show resolved Hide resolved
rust/operator-binary/src/spark_k8s_controller.rs Outdated Show resolved Hide resolved
rust/operator-binary/src/spark_k8s_controller.rs Outdated Show resolved Hide resolved
sbernauer
sbernauer previously approved these changes Sep 16, 2024
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you please run a Jenkins custom test before merge?

rust/operator-binary/src/spark_k8s_controller.rs Outdated Show resolved Hide resolved
@razvan
Copy link
Member Author

razvan commented Sep 16, 2024

Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@razvan razvan added this pull request to the merge queue Sep 16, 2024
Merged via the queue into main with commit 9cd61dd Sep 16, 2024
31 checks passed
@razvan razvan deleted the fix/457 branch September 16, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

operator resubmits all spark applications after restart no matter what their status is
3 participants