Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue-464]Create a Prometheus ServiceMonitor object that can capture/collect metrics from deployed SonataFlow instances #540

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

jianrongzhang89
Copy link
Contributor

@jianrongzhang89 jianrongzhang89 commented Sep 23, 2024

Fix #464.

The operator an track in the cluster if Prometheus Operator is installed. If this condition is true and the SonataFlowPlatform is marked with .spec.monitoring.enabled: true, and the workflow is deployed as an k8s deployment, then each workflow can have a ServiceMonitor object bound to it.

Note: for workflows deployed as Knative services, this PR does not create the ServiceMonitor objects.

Checklist

[X ] Add or Modify a unit test for your change
[X ] Have you verified that tall the tests are passing?

Copy link
Member

@ricardozanini ricardozanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jianrongzhang89! I left a few comments, but I believe we are on the right path.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
controllers/profiles/common/monitoring.go Outdated Show resolved Hide resolved
controllers/profiles/common/object_creators.go Outdated Show resolved Hide resolved
controllers/profiles/common/object_creators.go Outdated Show resolved Hide resolved
controllers/profiles/preview/deployment_handler.go Outdated Show resolved Hide resolved
controllers/profiles/preview/profile_preview.go Outdated Show resolved Hide resolved
test/testdata/monitoring.yaml Outdated Show resolved Hide resolved
utils/monitoring/monitoring.go Outdated Show resolved Hide resolved
utils/monitoring/monitoring.go Outdated Show resolved Hide resolved
@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 3 times, most recently from b8091fc to 18b1063 Compare October 1, 2024 14:10
@ricardozanini
Copy link
Member

Please add conditionals when registering the listeners in the controller:

2024-10-01T14:32:35.471896952Z stderr F E1001 14:32:35.471807       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:32:45.472140785Z stderr F E1001 14:32:45.472033       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:32:55.471966197Z stderr F E1001 14:32:55.471868       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:05.472046816Z stderr F E1001 14:33:05.471956       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:15.472269129Z stderr F E1001 14:33:15.472185       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:25.472286385Z stderr F E1001 14:33:25.472204       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:35.4720388Z stderr F E1001 14:33:35.471924       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:45.472208024Z stderr F E1001 14:33:45.472121       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:33:55.472168441Z stderr F E1001 14:33:55.472041       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"
2024-10-01T14:34:05.471752553Z stderr F E1001 14:34:05.471677       1 kind.go:71] "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" "kind"={"Group":"monitoring.coreos.com","Kind":"ServiceMonitor"} "logger"="sonataflow-manager.controller-runtime.source.EventHandler"

@ricardozanini
Copy link
Member

The e2e tests have been fixed, please rebase your branch 🙏

@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 2 times, most recently from d1438cc to a0edf9d Compare October 2, 2024 14:31
@jianrongzhang89
Copy link
Contributor Author

Please add conditionals when registering the listeners in the controller:

Done!

@jianrongzhang89
Copy link
Contributor Author

The e2e tests have been fixed, please rebase your branch 🙏

Code has been rebased.

@ricardozanini
Copy link
Member

Can you check the files generation? 😬

@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 4 times, most recently from 6231535 to 038527b Compare October 4, 2024 01:35
@ricardozanini
Copy link
Member

@jianrongzhang89 can you please rebase? Your Knative fix has been merged with the e2e refactoring.

Copy link
Member

@ricardozanini ricardozanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will stop reviewing since we will rescope this feature.

api/v1alpha08/sonataflowplatform_types.go Outdated Show resolved Hide resolved
internal/controller/profiles/common/mutate_visitors.go Outdated Show resolved Hide resolved
internal/controller/profiles/common/object_creators.go Outdated Show resolved Hide resolved
@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 2 times, most recently from 59f71df to 86a5975 Compare October 11, 2024 00:27
Copy link
Member

@ricardozanini ricardozanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can have a rather simple test for monitoring in the e2e. Do we really need all the supporting services and persistence?

api/v1alpha08/sonataflowplatform_types.go Outdated Show resolved Hide resolved
internal/controller/profiles/dev/states_dev.go Outdated Show resolved Hide resolved
internal/controller/profiles/monitoring/monitoring.go Outdated Show resolved Hide resolved
internal/controller/profiles/preview/profile_preview.go Outdated Show resolved Hide resolved
test/e2e/workflow_test.go Outdated Show resolved Hide resolved
@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 5 times, most recently from ab5b575 to 506d9f1 Compare October 15, 2024 14:53
@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 2 times, most recently from a4d77e9 to 309827a Compare October 15, 2024 19:40
Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments from my side.
I'll give it a try.

@wmedvede
Copy link
Contributor

@ricardozanini @jianrongzhang89

My last comment on this PR guys.
Feel free to open a follow up issue if you believe makes sense.

case1:
I have brand new OpenShift Local installation.
I never installed the Prometheus operator
I create a SFP with spec.monitoring.enabled = true
I create a WF in namespace test
After the WF deployment I can see a ServiceMonitor monitor created for it, weird.
Only explanation is that OpenShift Local already has the operator installed the Prometheus resources ouf-of-the-box

In customer OpenShfift cluster installations different thant OpenShift Local installations IDK what will happen. Probably the operator is always there.

case2:
I install the Prometheus operator in namespace test1
I create a SFP, with spec.monitoring.enabled = true
I create, a WF, etc,
I can see the ServiceMonitior, etc.

I create a namespace test2, I don't install the Prometheus operator in that namespace.
I create a SFP, with spec.monitoring.enabled = true
I create, a WF, etc,
I can see that a ServiceMonitior, etc.

Sort of corolary o case1

Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a last comment

@ricardozanini
Copy link
Member

ricardozanini commented Oct 18, 2024

@wmedvede I think we can safely create the monitor always if the monitoring: true is enabled and Prometheus is available.

My only take is that we should report in events that monitoring: true and Prometheus is not available. This must be done in this PR. @jianrongzhang89 can you confirm?

@jianrongzhang89
Copy link
Contributor Author

@wmedvede I think we can safely create the monitor always if the monitoring: true is enabled and Prometheus is available.

My only take is that we should report in events that monitoring: true and Prometheus is not available. This must be done in this PR. @jianrongzhang89 can you confirm?

@ricardozanini yes, in this case, a message will be logged in the platform log:
I1018 15:05:40.135463 1 sonataflowplatform_controller.go:146] Monitoring is enabled in platform sonataflow-platform, but Prometheus is not installed

@ricardozanini
Copy link
Member

ricardozanini commented Oct 18, 2024

@jianrongzhang89 Can you change to an event instead too?

@jianrongzhang89
Copy link
Contributor Author

@jianrongzhang89 Can you change to an event instead too?

Done

@jianrongzhang89 jianrongzhang89 force-pushed the servicemonitor branch 3 times, most recently from fe75da8 to 215631d Compare October 18, 2024 21:39
…e/collect metrics from deployed SonataFlow instances
@ricardozanini ricardozanini merged commit 4f2ba09 into apache:main Oct 23, 2024
4 checks passed
rgdoliveira pushed a commit to rgdoliveira/kogito-serverless-operator that referenced this pull request Oct 24, 2024
…e/collect metrics from deployed SonataFlow instances (apache#540)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a Prometheus ServiceMonitor object that can capture/collect metrics from deployed SonataFlow instances
4 participants