-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and fix metrics export tests. #1957
Refactor and fix metrics export tests. #1957
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: evankanderson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If splitting |
Codecov Report
@@ Coverage Diff @@
## master #1957 +/- ##
==========================================
+ Coverage 68.96% 69.08% +0.12%
==========================================
Files 209 209
Lines 8790 8786 -4
==========================================
+ Hits 6062 6070 +8
+ Misses 2453 2447 -6
+ Partials 275 269 -6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the failing downstream test relative?
metrics/e2e_test.go
Outdated
// We unregister the views because this is one of two ways to flush | ||
// the internal aggregation buffers; the other is to have the | ||
// internal reporting period duration tick, which is at least | ||
// [new duration] in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the comments here still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, no. We have a function for that now.
Value: m.Timeseries[0].Points[0].GetInt64Value(), | ||
} | ||
records = append(records, metric) | ||
keys[metric.Key()] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment that why using a set here fixes the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I was convinced for a long time that the RPCs weren't actually going to the right place, but I finally figured out that we simply weren't reading enough off the channel to find them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG in general
Left some stylistic comments.
metrics/e2e_test.go
Outdated
return fmt.Sprintf("%s:%d", m.Key(), m.Value) | ||
} | ||
|
||
func initSdFake(sdFake *stackDriverFake) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess?
func initSdFake(sdFake *stackDriverFake) error { | |
func initSDFake(sdFake *stackDriverFake) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have no idea, the product name is "Stackdriver", so I just spelled it out the 3 places it is used.
metrics/e2e_test.go
Outdated
resources := []*resource.Resource{ | ||
{ | ||
Type: "revision", | ||
Labels: map[string]string{ | ||
"project": "p1", | ||
"revision": "r1", | ||
}, | ||
}, | ||
{ | ||
Type: "revision", | ||
Labels: map[string]string{ | ||
"project": "p1", | ||
"revision": "r2", | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resources := []*resource.Resource{ | |
{ | |
Type: "revision", | |
Labels: map[string]string{ | |
"project": "p1", | |
"revision": "r1", | |
}, | |
}, | |
{ | |
Type: "revision", | |
Labels: map[string]string{ | |
"project": "p1", | |
"revision": "r2", | |
}, | |
}, | |
} | |
resources := []*resource.Resource{{ | |
Type: "revision", | |
Labels: map[string]string{ | |
"project": "p1", | |
"revision": "r1", | |
}, | |
},{ | |
Type: "revision", | |
Labels: map[string]string{ | |
"project": "p1", | |
"revision": "r2", | |
}, | |
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
metrics/e2e_test.go
Outdated
if err != nil { | ||
t.Fatalf("failed to read prometheus response: %+v", err) | ||
} | ||
want := `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want := `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. | |
const want = `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
metrics/e2e_test.go
Outdated
expected: []metricExtract{ | ||
{ | ||
"knative.dev/serving/autoscaler/actual_pods", | ||
label1, | ||
1, | ||
}, | ||
{ | ||
"knative.dev/serving/autoscaler/desired_pods", | ||
label2, | ||
2, | ||
}, | ||
{ | ||
"custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | ||
batchLabels, | ||
3, | ||
}, | ||
}, | ||
}, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected: []metricExtract{ | |
{ | |
"knative.dev/serving/autoscaler/actual_pods", | |
label1, | |
1, | |
}, | |
{ | |
"knative.dev/serving/autoscaler/desired_pods", | |
label2, | |
2, | |
}, | |
{ | |
"custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | |
batchLabels, | |
3, | |
}, | |
}, | |
}, { | |
expected: []metricExtract{{ | |
"knative.dev/serving/autoscaler/actual_pods", | |
label1, | |
1, | |
},{ | |
"knative.dev/serving/autoscaler/desired_pods", | |
label2, | |
2, | |
},{ | |
"custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | |
batchLabels, | |
3, | |
}}, | |
}, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
metrics/e2e_test.go
Outdated
}, { | ||
name: "Don't allow custom metrics", | ||
allowCustomMetrics: "false", | ||
expected: []metricExtract{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LG but the failed unit test indicates there may be another problem. |
I've seen prometheus fail to be able to listen on the port; I can choose a random port to see if that helps. Probably the other fix would be to add a SO_REUSEADDR to the prometheus server setup; let me see if that helps. |
I'm not sure that |
Regardless, the Prometheus code has not been changed, so I don't think that should be a barrier on this PR. (Though I'd love to figure out why the server sometimes won't respond for > 10s). |
metrics/exporter.go
Outdated
@@ -161,13 +162,13 @@ func UpdateExporter(ctx context.Context, ops ExporterOptions, logger *zap.Sugare | |||
flushGivenExporter(curMetricsExporter) | |||
e, f, err := newMetricsExporter(newConfig, logger) | |||
if err != nil { | |||
logger.Errorw("Failed to update a new metrics exporter based on metric config", newConfig, zap.Error(err)) | |||
logger.Errorw("Failed to update a new metrics exporter based on metric config", "config", newConfig, "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zap.String("config", newConfig), zap.Error(err)? Same below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh... I want to replace this with a non-sugared logger. It looks like With
takes both Field objects and key-value pairs. Unfortunately, this ended up with the key-value format where the value was a zap.Field
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well you do use sugared.
You don't need with
, just logger.Error("...", zap.Error(err))
on the desugared one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched this to put all the Field
objects first so it's not possible to log things as "string" => Field, which was what the old code was doing.
Using logger.Error()
means taking the error out of a separate JSON field, which makes it harder (for example) to filter all the logs for just the ones with errors in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know. That's why when I joined I was surprised we;re using this crap (zap that is) :)
But that's the zap philosophy — use jq to parse :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Changes
resource_view_test
files.TestFlushExporter
. Fixed that, too./kind bug
/kind cleanup
Fixes #1672
/assign @vagababov
It took embarassingly long to find this... I had to walk away twice for at least a week after bashing my head on the export code to figure out what was really going on. (The bug was inside the test all along!)