Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KOGITO-9972] Add SonataFlowClusterPlatform CRD & Controller #345

Merged
merged 3 commits into from
Feb 1, 2024

Conversation

tchughesiv
Copy link
Contributor

@tchughesiv tchughesiv commented Jan 11, 2024

Description of the change:
Create a new cluster-scoped SonataFlowClusterPlatform CRD which is optionally used to specify a cluster-wide set of services for workflow consumption. This is done by referencing an existing, namespaced SonataFlowPlatform resource... thus leveraging its existing service deployment capabilities, and making its services available to workflows in the entire cluster.

With this change workflows cluster-wide will be able to leverage supporting services in a "central" namespace.

Example of an active SonataFlowClusterPlatform CR -

apiVersion: sonataflow.org/v1alpha08
kind: SonataFlowClusterPlatform
metadata:
  name: cluster-platform
spec:
  platformRef:
    name: sonataflow-platform
    namespace: central-platform-ns
status:
  conditions:
    - lastUpdateTime: '2024-01-16T00:42:17Z'
      message: >-
        Referenced SonataFlowPlatform 'central-platform-ns/sonataflow-platform'
        is ready
      status: 'True'
      type: Succeed
  observedGeneration: 1
  version: '0.8'
$ oc get SonataFlowClusterPlatform
NAME               PLATFORM_NAME         PLATFORM_NS           READY   REASON
cluster-platform   sonataflow-platform   central-platform-ns   True    

Example of an active SonataFlowPlatform CR which is leveraging the cluster-scoped services defined in the above SonataFlowClusterPlatform. Workflows in this test namespace will leverage services in the central-platform-ns namespace -

apiVersion: sonataflow.org/v1alpha08
kind: SonataFlowPlatform
metadata:
  name: example
  namespace: test
spec: {}
status:
  cluster: openshift
  clusterPlatformRef:
    name: cluster-platform
    platformRef:
      name: sonataflow-platform
      namespace: central-platform-ns
    services:
      dataIndexRef:
        url: 'http://sonataflow-platform-data-index-service.central-platform-ns'
      jobServiceRef:
        url: 'http://sonataflow-platform-jobs-service.central-platform-ns'
  conditions:
    - lastUpdateTime: '2024-01-16T00:51:30Z'
      status: 'True'
      type: Succeed
  info:
    goOS: darwin
    goVersion: go1.19.10
  observedGeneration: 2
  version: '0.8'

Example of an active SonataFlowPlatform CR which is overriding the cluster-scoped services defined in the above SonataFlowClusterPlatform. Workflows in this test2 namespace will not leverage services in the central-platform-ns namespace. These workflows will, instead, use any locally deployed services that might be configured -

apiVersion: sonataflow.org/v1alpha08
kind: SonataFlowPlatform
metadata:
  name: example2
  namespace: test2
spec:
  ...
  services: {}
status:
  cluster: openshift
  clusterPlatformRef:
    name: cluster-platform
    platformRef:
      name: sonataflow-platform
      namespace: central-platform-ns
  conditions:
    - lastUpdateTime: '2024-01-16T01:04:22Z'
      status: 'True'
      type: Succeed
  info:
    goOS: darwin
    goVersion: go1.19.10
  observedGeneration: 2
  version: '0.8'

Motivation for the change:
The current Sonataflow Operator implementation will only configure workflows to use supporting services, like Data Index & Job Service, within the same deployed namespace. While there is a manual step one could take to leverage "central" infra/services by modifying a workflow’s configuration post-deployment, this is not scalable or manageable. The operator should support handling service(s) deployment at the cluster scope as well.

Checklist

  • Add or Modify a unit test for your change
  • Have you verified that tall the tests are passing?
How to backport a pull request to a different branch?

In order to automatically create a backporting pull request please add one or more labels having the following format backport-<branch-name>, where <branch-name> is the name of the branch where the pull request must be backported to (e.g., backport-7.67.x to backport the original PR to the 7.67.x branch).

NOTE: backporting is an action aiming to move a change (usually a commit) from a branch (usually the main one) to another one, which is generally referring to a still maintained release branch. Keeping it simple: it is about to move a specific change or a set of them from one branch to another.

Once the original pull request is successfully merged, the automated action will create one backporting pull request per each label (with the previous format) that has been added.

If something goes wrong, the author will be notified and at this point a manual backporting is needed.

NOTE: this automated backporting is triggered whenever a pull request on main branch is labeled or closed, but both conditions must be satisfied to get the new PR created.

@tchughesiv tchughesiv force-pushed the KOGITO-9972-2 branch 7 times, most recently from dadfac1 to 1ea6a88 Compare January 16, 2024 22:30
@tchughesiv tchughesiv changed the title WIP [KOGITO-9972] Add SonataFlowClusterPlatform CRD & Controller [KOGITO-9972] Add SonataFlowClusterPlatform CRD & Controller Jan 16, 2024
@tchughesiv tchughesiv marked this pull request as ready for review January 16, 2024 22:32
Copy link
Member

@ricardozanini ricardozanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I've added a few non-critical comments. Many thanks!

controllers/clusterplatform/initialize.go Show resolved Hide resolved
controllers/platform/services/services.go Outdated Show resolved Hide resolved
controllers/profiles/common/app_properties_test.go.removed Outdated Show resolved Hide resolved
@jordigilh
Copy link
Contributor

Let me see if I understand this:

  • A platform CR applies to a namespace, whereas a cluster platform is seen by all platforms as a reference object where others can use to retrieve the services configuration to share.
  • A cluster platform CR can be active or not active, or considered secondary and duplicated.
  • Multiple cluster platform CRs can be stored in the cluster, but only 1 is active and used as reference by other platforms.
  • Changes to a referenced platform (being the platform referenced in the active cluster platform CR) are propagated to the platform CRs using its services.
  • At the same time, changes to the active cluster platform CR will force a reconciliation to all platform CRs that reference it, although currently it won't force a reconciliation to the deployed services, rather create new ones (mentioned as a comment in the PR).
  • When the active cluster platform is deleted, the platforms that consumed the active cluster platform remove their references in the status spec, but they won't remove the services generated. On the other hand, if the service spec is changed, that just changes the serviceURL field in the status but not the configmap properties of the services or the deployments

Questions:

  • What should happen when a referenced platform CR (referenced by an active cluster platform CR) is deleted from the cluster? should it trigger a reconciliation to all platforms that used its service spec via the cluster platform CR and delete their service specs?

I find strange that the services spec in a platform CR can be either a reference to platforms that are configured to inherit from the cluster platform, or just consumers, depending on the information of a CR.
In my opinion, it would make more sense to have the cluster platform define the services and let the platforms inherit those. It's a one way relationship top to bottom and ensures that changes to the cluster platform CR alone trigger changes to the platforms, rather than both cluster platform CRs and platform CRs.

api/v1alpha08/sonataflowplatform_types.go Show resolved Hide resolved
controllers/clusterplatform/initialize.go Show resolved Hide resolved
controllers/platform/k8s.go Outdated Show resolved Hide resolved
controllers/platform/k8s.go Outdated Show resolved Hide resolved
controllers/platform/services/services.go Show resolved Hide resolved
@ricardozanini
Copy link
Member

it would make more sense to have the cluster platform define the services and let the platforms inherit those

@tchughesiv I remember that we discussed this possibility and it was deferred to preserve the namespace information. I'll review the ADR again, 'cause I don't remember if we talked about it in the chat or doc. I want to refresh/review why we tabled this to avoid back-to-back discussions on a topic that's supposed to be tabled.

@tchughesiv
Copy link
Contributor Author

it would make more sense to have the cluster platform define the services and let the platforms inherit those

@tchughesiv I remember that we discussed this possibility and it was deferred to preserve the namespace information. I'll review the ADR again, 'cause I don't remember if we talked about it in the chat or doc. I want to refresh/review why we tabled this to avoid back-to-back discussions on a topic that's supposed to be tabled.

we discussed this in chat, i'll add to ADR though under Alternatives Considered / Rejected and will also add in reply to @jordigilh's review here

@tchughesiv
Copy link
Contributor Author

tchughesiv commented Jan 18, 2024

  • At the same time, changes to the active cluster platform CR will force a reconciliation to all platform CRs that reference it, although currently it won't force a reconciliation to the deployed services, rather create new ones (mentioned as a comment in the PR).
  • When the active cluster platform is deleted, the platforms that consumed the active cluster platform remove their references in the status spec, but they won't remove the services generated. On the other hand, if the service spec is changed, that just changes the serviceURL field in the status but not the configmap properties of the services or the deployments

@jordigilh Nothing about how service deployments are handled is being changed w/ this PR. This PR is all about allowing workflows to communicate with a defined set of "central" services. The SonataFlowClusterPlatform does not deploy, manage or modify anything pertaining to services or workflows... it's merely a reference object. This is by design and keeps things simple. The SonataFlowPlatform objects continue to be the single source of truth for deployed services.

I'll address your other questions and comments shortly

@tchughesiv
Copy link
Contributor Author

tchughesiv commented Jan 18, 2024

In my opinion, it would make more sense to have the cluster platform define the services and let the platforms inherit those. It's a one way relationship top to bottom and ensures that changes to the cluster platform CR alone trigger changes to the platforms, rather than both cluster platform CRs and platform CRs.

@jordigilh I've also added these notes to the ADR. Here are some reasons why that approach sounds nice, but is problematic.

  • Cluster-scoped CRDs have no out-of-the-box way to define a namespace for a controller to execute against. It’s not their function and they should not be used that way. They are intended to be used at the cluster level only.
    • This is problematic and in order to deploy these services, we would have to rely on a namespace field somewhere within the spec. When a user inevitably changes this spec.namespace field, it will lead to orphaned supporting services (e.g. deployment, configmaps, services). This is because the operator will no longer know about the prior services and will move on to creating services in the newly defined location, which consumes additional resources and leads to confusion.
    • This is also problematic because cluster-scoped CRDs do not require a namespace to exist. So, a user could point to a non-existent namespace and reconciliation will not work until the user manually created the namespace. This is not an intuitive design and we don’t want the operator to be in the business of creating namespaces in a cluster.
  • Another major issue would be SonataFlowClusterPlatform services conflicting with SonataFlowPlatform or duplicate services existing in the same namespace. What if a user has already installed a SonataFlowPlatform in that namespace? There’s nothing stopping a user from also defining services in that same namespace. It would take a large effort in the code to maneuver this kind of setup, if it were even possible.
  • SonataFlowPlatform is a namespaced CRD and is already the single source of truth for supporting services at the namespace level, and it does its job very well. Why duplicate code or introduce potentially conflicting controllers?

By simply adding a reference to an active SonataFlowPlatform, which is what this PR is doing, we avoid all of the above issues and we keep things simple.

@tchughesiv
Copy link
Contributor Author

tchughesiv commented Jan 18, 2024

Questions:

  • What should happen when a referenced platform CR (referenced by an active cluster platform CR) is deleted from the cluster? should it trigger a reconciliation to all platforms that used its service spec via the cluster platform CR and delete their service specs?

Yes it will reconcile all platforms. But no, it will not delete their service specs. It WILL delete the clusterPlatformRef from their status which may, or may not, result in changes to a workflow's app properties configmap... depending on whether the local platform was using the cluster services or not.

I find strange that the services spec in a platform CR can be either a reference to platforms that are configured to inherit from the cluster platform, or just consumers, depending on the information of a CR.

In the above example, the "sonataflow-platform" SonataFlowPlatform in the the "central-platform-ns" namespace will just be using its own services, as it's been referenced as the "central" platform within the SonataFlowClusterPlatform.spec.

Please define what you mean by "just consumers" here. It's important we allow the user the ability to decide whether they want to use the cluster-wide services, or override by defining their own within their namespace.

@tchughesiv
Copy link
Contributor Author

@jordigilh thanks for the extensive review! i've done my best to answer your Qs or explain design choices. i've implemented a few of your suggestion. there are a couple i need to think about a bit more and some others requiring @ricardozanini's input.

@tchughesiv tchughesiv force-pushed the KOGITO-9972-2 branch 4 times, most recently from 517e7d6 to baf7bc5 Compare January 22, 2024 15:25
@jordigilh
Copy link
Contributor

@tchughesiv LGTM. I can't mark the comments as resolved, just replied as ack, but I have no more concerns.

@ricardozanini
Copy link
Member

Waiting for the nightly images duo to ASF Jenkins outage. @domhanak @wmedvede can you please review?

@tchughesiv tchughesiv force-pushed the KOGITO-9972-2 branch 2 times, most recently from 33bd7b3 to 81e1e83 Compare January 23, 2024 16:58
@ricardozanini
Copy link
Member

@domhanak @wmedvede can you please review?

Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments.

controllers/sonataflowplatform_controller.go Show resolved Hide resolved
controllers/sonataflowplatform_controller.go Show resolved Hide resolved
controllers/platform/services/properties.go Show resolved Hide resolved
@wmedvede
Copy link
Contributor

Thanks @tchughesiv for the response.

Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave the runtime testing to QE to not block this PR from my side.

@ricardozanini
Copy link
Member

We are waiting for QE review to merge this one. @domhanak

@tchughesiv tchughesiv force-pushed the KOGITO-9972-2 branch 2 times, most recently from cefb2fd to 03729dd Compare January 30, 2024 18:46
@ricardozanini ricardozanini merged commit 20694d6 into apache:main Feb 1, 2024
4 checks passed
rgdoliveira pushed a commit to rgdoliveira/kogito-serverless-operator that referenced this pull request Feb 5, 2024
)

* SonataFlowClusterPlatform CRD & Controller

Signed-off-by: Tommy Hughes <[email protected]>

* service constants cleanup

Signed-off-by: Tommy Hughes <[email protected]>

* clusterplatform api descrip change

Signed-off-by: Tommy Hughes <[email protected]>

---------

Signed-off-by: Tommy Hughes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants