Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-15902: notify alarms subscriptions #431

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pixelsoccupied
Copy link
Collaborator

@pixelsoccupied pixelsoccupied commented Jan 2, 2025

  • After capturing all the alerts we notify any subscriber using alarm sequence, event cursor and filter to only send the latest applicable.
  • Triggers are combined to managed the events during insert or update.
  • Some cleanup to use existing http client.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 2, 2025
Copy link

openshift-ci bot commented Jan 2, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Jan 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from pixelsoccupied. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pixelsoccupied
Copy link
Collaborator Author

/test all

@pixelsoccupied pixelsoccupied changed the title [wip] notify alarms subscriber [wip] notify alarms subscriptions Jan 2, 2025
@pixelsoccupied
Copy link
Collaborator Author

/test all

@pixelsoccupied
Copy link
Collaborator Author

/test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 3, 2025
@pixelsoccupied pixelsoccupied force-pushed the subscribe-notif branch 2 times, most recently from f060893 to 95c0d22 Compare January 3, 2025 15:13
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 3, 2025
@pixelsoccupied pixelsoccupied force-pushed the subscribe-notif branch 2 times, most recently from 6c810d2 to 6bb49c6 Compare January 3, 2025 19:01
@pixelsoccupied pixelsoccupied changed the title [wip] notify alarms subscriptions [wip]CNF-15902: notify alarms subscriptions Jan 3, 2025
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 3, 2025

@pixelsoccupied: This pull request references CNF-15902 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

TBD

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 3, 2025

@pixelsoccupied: This pull request references CNF-15902 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

  • After capturing all the alerts we notify any subscriber using alarm sequence, event cursor and filter to only send the latest applicable.
  • Triggers are combined to managed the events during insert or update.
  • Some cleanup to use existing http client.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pixelsoccupied
Copy link
Collaborator Author

/test all

@pixelsoccupied pixelsoccupied changed the title [wip]CNF-15902: notify alarms subscriptions CNF-15902: notify alarms subscriptions Jan 6, 2025
@pixelsoccupied pixelsoccupied marked this pull request as ready for review January 6, 2025 13:57
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 6, 2025
@@ -502,12 +514,120 @@ func (a *AlarmsServer) AmNotification(ctx context.Context, request api.AmNotific
return nil, fmt.Errorf("%s: %w", msg, err)
}

//TODO: Get subscriber
// Return 200 before subscription processing to free up AM conn
a.Wg.Add(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be cleaner if this part was decoupled and moved to a separate component of the alarm server to behave more like the way it was setup for the resource and cluster servers. There is some duplication here with what the Notifier was intended to do.

This could be aligned easily if the code above materialized each of the alarm changes into a pending notifications table and sent an event over to the Notifier. If the proper hooks were implemented then the subscription filtering and cleanup of completed notifications would be handled in the same way that this is done for the resource/cluster data change notifications...including the cursor handling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

looking into it!

}
}

func CallUrl(ctx context.Context, logger *slog.Logger, client *http.Client, url string, event Notification) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another comment, rather than exposing this and using it directly from places other than directly within the Notifier I'd prefer if we align the alarm functionality to how the resource/cluster components work so that the Notifier handles the subscriber jobs and sending of the notifications. One reason for this is that I plan on refactoring this to allow for sending to different types of subscribers based on the callback URL (i.e., special handling for SMO notifications vs internal notifications between our servers that wouldn't require (or support) OAuth tokens).

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2025
END IF;
END IF;

RETURN NEW;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this briefly in the past, but I feel like it is worth mentioning again ... I'm a bit concerned that we're using the AlarmEventRecord table to do double-duty in also storing event/notification type information. If two or more changes occur to any given alarm in a short period of time then we may not be able to send off both notifications and the client would only get the last event. Clients may want to see all transitions as they may want to take action on each transition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just noting it here what we discussed)

we do track all the changes back to back...partially. e.g if we have a subscriber that has a filter "CHANGE" (so notify me for everything that's NOT change), and then we have a notification that was firing, and the resolved and then firing again and now the subscriber is added, the subscriber will receive a payload with two events - one for the resolved and one for firing. The first one resolved (updated from firing to resolved) and then finally the second (firing with the new start time but same fingerprint, resulting in a new row/event)

But its true 3 things happened but the client is getting 2. But given that its currently resolved..it somewhat implied that originally it was firing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants