Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-14962: init MCOA metrics #1659

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Nov 7, 2024

This PR initialises metrics collection for MCOA. More precisely, it:

  • Updates MCO CRD to support the metrics capability for both platform and user workload.
  • Adds customised variables to the addon configuration.
  • Adds permissions to MCOA for handling metrics resources.
  • Generates only the configs for the enabled signals to avoid having to update it for getting a valid addon status

Relates to the MCOA PR: stolostron/multicluster-observability-addon#77

Notes:

Copy link

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thibaultmg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 7, 2024
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
@thibaultmg
Copy link
Contributor Author

/retest

Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Copy link

sonarcloud bot commented Nov 29, 2024

Copy link

openshift-ci bot commented Nov 29, 2024

@thibaultmg: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-e2e 57e2104 link true /test test-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

Comment on lines +195 to +214
{
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{
Group: prometheusalpha1.SchemeGroupVersion.Group,
Resource: prometheusalpha1.ScrapeConfigName,
},
ConfigReferent: addonapiv1alpha1.ConfigReferent{
Name: "platform-metrics-default",
Namespace: mcoconfig.GetDefaultNamespace(),
},
},
{
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{
Group: prometheusv1.SchemeGroupVersion.Group,
Resource: prometheusv1.PrometheusRuleName,
},
ConfigReferent: addonapiv1alpha1.ConfigReferent{
Name: "platform-rules-default",
Namespace: mcoconfig.GetDefaultNamespace(),
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to reference these ones out of the get go?

@@ -212,6 +341,14 @@ func (r *MCORenderer) renderAddonDeploymentConfig(
}
}

if (cs.Platform != nil && cs.Platform.Metrics.Collection.Enabled) || (cs.UserWorkloads != nil && cs.UserWorkloads.Metrics.Collection.Enabled) {
obsAPIURL, err := mcoconfig.GetObsAPIExternalURL(context.TODO(), r.kubeClient, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of getting this value here could we add a field to the renderOptions.MCOAOptions and use it directly here? This would:

  • Make progress to eventually stop using ctx.TODO()
  • Simplify the code path for this function making it testing also simpler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the value itself I'm still not sure about the use of the key signalsHubEndpoint to represent it, mainly because this is still a very metrics oriented value. I'm thinking if we could split it so that on the short term would be more useful to more signals.

Taking the following two endpoints on the hub:

  • https://obs-api-open-cluster-management-observability.apps.myhubcom/api/metrics/v1
  • https://mcoa-managed-instance-openshift-logging.apps.myhub.foo.com/api/logs/v1/cluster-1/otlp/v1/logs

Then:

  • hubHostname (would be there if any Managed Storage is enabled): apps.myhub.foo or myhub.foo
  • hubMetricsRoute: obs-api-open-cluster-management-observability

WDYT? I'm suggesting something like this because then logs&traces would be able to re-use hubHostname. Regardless if we do unified OBS Api or separated ones. (This could also be addressed in a follow up)

Comment on lines +71 to +79
// PlatformMetricsCollectionSpec defines the spec for the addon to collect and forward metrics
// from fleet managed clusters.
type PlatformMetricsCollectionSpec struct {
// Enabled defines a flag to enable/disable the platform metrics collection.
//
// +optional
// +kubebuilder:validation:Optional
Enabled bool `json:"enabled,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this initial version since metrics will be taking a managed approach to metric collection I wonder if these values are the best ones to use since the equivalent for logs & traces currently correlates to an unmanaged approach where the user fully controls the root resources. I would be in favor of deliberating a bit how we want these different scenarios to be exposed to users. This correlates with stolostron/multicluster-observability-addon#88 (comment) maybe a discussion for the nexus syncs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants