Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1911 CLI metrics #106

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Oct 18, 2024

Description

Add metrics capture to CLI. Use:

$ ./build/oc-netobserv metrics --enable_pktdrop="true" --enable_dns="true" --enable_rtt="true"

to run a capture and generate a dashboard.

screencapture-console-openshift-console-apps-rosa-km3ao-kvjvq-mao-ktcw-p3-openshiftapps-monitoring-dashboards-netobserv-cli-2024-10-18-13_41_30

Dependencies

Based on #103

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 22.66%. Comparing base (245f6a0) to head (523df1f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #106   +/-   ##
=======================================
  Coverage   22.66%   22.66%           
=======================================
  Files          10       10           
  Lines        1337     1337           
=======================================
  Hits          303      303           
  Misses       1015     1015           
  Partials       19       19           
Flag Coverage Δ
unittests 22.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, no big blocker (just worrying in case things go wrong when stopping capture, trying to imagine some worst case scenario, what would happen)

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated
@@ -24,6 +24,12 @@ COPY .mk/ .mk/
# Build collector
RUN GOARCH=$TARGETARCH make compile

# Install oc to allow collector to run commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these changes now need to be mirrored into Dockerfile.downstream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm .. am I missing something? I don't see it in Dockerfile.downstream :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #139

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
cmd/root.go Outdated Show resolved Hide resolved
cmd/root.go Outdated
resetTerminal()
out, err := exec.Command("/oc-netobserv", "stop").Output()
if err != nil {
log.Fatal(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's more we could do here if the command failed. I'm afraid not stopping the capture may have quite bad consequences. Could we find another approach to kill everything? Or do some retry? (perhaps by setting captureEnded back to false?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can put a loop and retry every X seconds indeed.

The only alternative way I have in mind would be to create a Job outside of this collector Pod to invoke the oc delete daemonset command... However, the job will need to get from somewhere the current namespace + the capture ended trigger so I'm not sure it's better somehow.

@memodi
Copy link

memodi commented Dec 18, 2024

/ok-to-test

Copy link

New image:
quay.io/netobserv/network-observability-cli:0c2012f

It will expire after two weeks.

To use this build, update your commands using:

USER=netobserv VERSION=0c2012f make commands

or download the updated commands.

Copy link

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpinsonneau I tested this, I think labelling the CLI namespace to have openshift.io/cluster-monitoring: "true" is missing, I could only get metrics to be scraped after I added that label. Not sure if that's going to be only downstream thing.

Copy link

openshift-ci bot commented Dec 20, 2024

New changes are detected. LGTM label has been removed.

Copy link

openshift-ci bot commented Dec 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Contributor Author

@jpinsonneau I tested this, I think labelling the CLI namespace to have openshift.io/cluster-monitoring: "true" is missing, I could only get metrics to be scraped after I added that label. Not sure if that's going to be only downstream thing.

That's right @memodi. I have rebased and forced the label in the namespace creation. Thanks !

@jpinsonneau
Copy link
Contributor Author

image

@memodi
Copy link

memodi commented Dec 20, 2024

/label qe-approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants