Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-43744-alert-eval-api #1

Merged
merged 12 commits into from
Apr 3, 2024
Merged

Conversation

yasmin-tr
Copy link

@yasmin-tr yasmin-tr commented Feb 11, 2024

What is this feature?

Logzio Alert Evaluation API

  • contains basic functionality for alert evaluation api
  • including headers and necessary files and models to support evaluation on logzio datasources

Explaining the main logic and changes:
We added a new internal api for evaluating alerts externally from the alert-manager.
External in this meaning is that the evaluation is called outside the alert-manager and not as part of the engine schedule as it is today.
To be able to reuse the evaluation code as much as possible, we are using the ScheduleService code to run alert evaluation.
In order to use ScheduleService, but without having it running the evaluation, I needed to add a configuration not to execute the evaluations from the scheduler. That way we have all the management of the rule routines and updates, but only the execution is disabled, and then can run from our API.
For running the external evaluation I also added another function for the Scheduler interface that the API is using.

External notification support -
In order to send the notification in an external API, added a LogzioRouter that implements the Send interface which sends the PostableAlerts to the url coming from configuration

Logzio datasources support required the following changes:

  • adding headers that contains the user-context passed by the API
  • propagating the headers into the evaluation request in relevant clients - elasticsearch and prometheus
  • adding url-override from the evaluation request passed by the API
  • overriding the url of the datasource for the relevant one so it will call the logzio APIs for query

Why do we need this feature?

metrics alerts evaluation process

Who is this feature for?

Logzio Metrics Alerts

Which issue(s) does this PR fix?:

DEV-43744
DEV-43883
DEV-43889

Added code from the following tickets:

  • DEV-31493 Override datasource URL and pass custom headers to alert rule evaluator (partly)
  • DEV-17927 - add LogzIoHeaders, Add error msg, change time to StartTime
  • DEV-30169: api alert evaluation
  • DEV-16492 Support external alert evaluation
  • (no ticket) Upgrade to 8.4.0
  • DEV-18005 - add requestId in error message
  • DEV-20400 - Grafana alerts evaluation - set 'accountsToSearch' query param

*DEV-31493 Override datasource URL and pass custom headers to alert rule evaluator
*DEV-17927 - add LogzIoHeaders, Add error msg, change time to StartTime
*DEV-30169, DEV-30170: api alert evaluation and alert processing
(processing in comments)
*DEV-16492 Support external alert evaluation
*(no ticket) Upgrade to 8.4.0
*DEV-18900 - fix For check to be inclusive
Copy link
Collaborator

@Jonathan-Eng Jonathan-Eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed. the pr is not ready yet for review

@@ -9,6 +9,7 @@ import (

"github.com/grafana/grafana/pkg/api/dtos"
"github.com/grafana/grafana/pkg/api/response"
"github.com/grafana/grafana/pkg/models" // LOGZ.IO GRAFANA CHANGE :: DEV-17927 - add LogzIoHeaders
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment, I think better to use the reference to the new tickets. Also note in my PR I used a slightly different comment, so better be consistent (you can change mine if you wish).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -179,7 +180,8 @@ func (hs *HTTPServer) AlertTest(c *contextmodel.ReqContext) response.Response {
return response.Error(400, "The dashboard needs to be saved at least once before you can test an alert rule", nil)
}

res, err := hs.AlertEngine.AlertTest(c.SignedInUser.GetOrgID(), dto.Dashboard, dto.PanelId, c.SignedInUser)
// LOGZ.IO GRAFANA CHANGE :: DEV-17927 - add LogzIoHeaders
res, err := hs.AlertEngine.AlertTest(c.SignedInUser.GetOrgID(), dto.Dashboard, dto.PanelId, c.SignedInUser, &models.LogzIoHeaders{RequestHeaders: c.Req.Header})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are LogzIoHeaders required for test alert flow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is technically part of the logzio datasources support, but because a lot of the implementation for the logz evaluation includes this i put it in this PR.

i don't know if at that point it's worth the time of taking out of this PR for the next , because the next PR anyway will be based on that and making these changes can add up a day of work i don't see any value for it 😅
in hindsight i would make the split if i knew about it from the beginning 😓

DEV-18005 - add requestId in error message
DEV-43744 , DEV-43883 , DEV-43895

Removed all code changes related to old alerting which isn't used, therefor not needed.
…-pasting thing relevant code into our API.

!!still work in progress!!

need to:
*remove commented code
*add logz headers somehow to request
*fix scheduler so it will not evaluate so we can use it for the api
*remove processing part
* not returning results in API and keeping the processing part in eval
* added a logzio router to replace the regular router and send notifications to logz internal api (in config)
* removed unused code of processing
* changed the evaluation api to run as bulk
@@ -2,6 +2,7 @@ package api

import (
"context"
"github.com/grafana/grafana/pkg/services/ngalert/schedule"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark with // LOGZ.IO GRAFANA CHANGE

@@ -75,6 +76,7 @@ type API struct {
Tracer tracing.Tracer
AppUrl *url.URL
UpgradeService migration.UpgradeService
Schedule schedule.ScheduleService
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark with // LOGZ.IO GRAFANA CHANGE

}

// NewLogzioAlertingApi creates a new LogzioAlertingApi instance
func NewLogzioAlertingApi(service *LogzioAlertingService) *LogzioAlertingApi {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we say its redundant? (leave only the service)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the structure of all APIs in this project. there is a class of "XXXApi" that has the route to the endpoint, and the function to call the service class that contains the main logic.
i think what we said is that maybe the LogzioAlertingService can be redundant and we can put the logic directly in this class. but actually i think this structure has some sense in it.. also i'm pretty sure we'll want it also for later when more logic/apis will be added (or at least it gives the infra for it).

@@ -4,6 +4,9 @@ import (
"context"
"encoding/json"
"fmt"

"github.com/grafana/grafana/pkg/services/ngalert/models"
// LOGZ.IO GRAFANA CHANGE :: End
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the start :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably some merge messed this up 😅 will fix it 👍

@@ -467,6 +551,8 @@ func (sch *schedule) ruleRoutine(grafanaCtx context.Context, key ngmodels.AlertR
attribute.Int64("results", int64(len(results))),
))
}

logger.Debug("RuleRoutine evaluation: processing evaluation results")
Copy link
Collaborator

@ohadza ohadza Mar 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should cleanup eventually all the logger.Debug(...) and comments ....

traceCtx, end := s.trace(ctx, q)
defer end()

logger := s.log.FromContext(traceCtx)
logger.Debug("Sending query", "start", q.Start, "end", q.End, "step", q.Step, "query", q.Expr)
logger.Debug("fetching query", "query", q, "headers", headers)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this to be the original - for some reason didn't have the log so i added it , not sure how this happened

@yasmin-tr yasmin-tr changed the base branch from logzio to v10.4.x-logzio April 1, 2024 08:29
@yasmin-tr yasmin-tr changed the base branch from v10.4.x-logzio to logzio April 1, 2024 08:30
…lert-eval-api

# Conflicts:
#	pkg/services/ngalert/api/api.go
#	pkg/services/ngalert/schedule/schedule.go
#	pkg/setting/setting_unified_alerting.go
@yasmin-tr yasmin-tr force-pushed the DEV-43744-alert-eval-api branch from 69867a8 to 46080fc Compare April 1, 2024 14:57
@yasmin-tr yasmin-tr changed the base branch from logzio to v10.4.x-logzio April 1, 2024 15:06
* Renamed logzio files to same standard
* Changed request object for logzio router notification API and changed default url behaviour
@yasmin-tr yasmin-tr merged commit 7fd4afe into v10.4.x-logzio Apr 3, 2024
10 of 12 checks passed
@yasmin-tr yasmin-tr deleted the DEV-43744-alert-eval-api branch September 22, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants