Fix K6 Prometheus Dashboard Metrics Quantile Mismatch by Changing Aggregation to lastNotNull #170

dylanjpaulson · 2024-10-17T21:47:25Z

Summary

This pull request updates the k6-prometheus.json file for the K6 Prometheus dashboard. Specifically, I’ve changed the aggregation method in the Requests by URL panel to use lastNotNull for quantile calculations. Previously, the aggregation used mean, which caused the dashboard to not match the K6 console output.

I discovered this issue while comparing the K6 console output and the Grafana dashboard and also discussed it in this Grafana forum post.

Details

The problem was that the dashboard used the mean aggregation for quantiles, leading to discrepancies, particularly in cases where URL request durations varied more significantly.
By switching to the lastNotNull aggregation for quantile calculations, the dashboard now accurately reflects the values seen in the K6 console output.

Screenshots and Comparison:

Here is a comparison of the K6 console output and the dashboard with different aggregation methods applied:

Console Output:
Dashboard with mean aggregation (before fix):
Dashboard with lastNotNull aggregation (after fix)

Note that the K6 console output shows the p90 and p95 quantiles, while the dashboard shows p95 and p99. In this comparison, the p95 values can be directly compared. I have confirmed this issue across various test cases and quantile calculations.

Steps to Reproduce the Issue:

Set up a K6 Test with Thresholds:
Define threshold tests for specific API endpoints to monitor their request durations. Below is an example using https://jsonplaceholder.typicode.com for the test:

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  thresholds: {
    "http_req_duration{name:https://jsonplaceholder.typicode.com/posts/1}": [
      "p(95)<500",
    ],
    "http_req_duration{name:https://jsonplaceholder.typicode.com/comments/1}": [
      "p(95)<500",
    ],
    "http_req_duration{name:https://jsonplaceholder.typicode.com/todos/1}": [
      "p(95)<500",
    ],
  },
  scenarios: {
    contacts: {
      executor: "constant-arrival-rate",
      duration: "5m",
      rate: 120, 
      timeUnit: "60s",
      preAllocatedVUs: 2,
      maxVUs: 50,
    },
  },
};

export default function () {
  let api1Res = http.get("https://jsonplaceholder.typicode.com/posts/1");
  check(api1Res, { "API 1 status is 200": (r) => r.status === 200 });

  let api2Res = http.get("https://jsonplaceholder.typicode.com/comments/1");
  check(api2Res, { "API 2 status is 200": (r) => r.status === 200 });

  let api3Res = http.get("https://jsonplaceholder.typicode.com/todos/1");
  check(api3Res, { "API 3 status is 200": (r) => r.status === 200 });
}

Run the Test with Prometheus Remote Write Enabled:
Execute the K6 test with Prometheus remote write configured, ensuring the results are available for monitoring in your Grafana dashboard. Use the default K6 Prometheus dashboard here.
Compare Console Output and Dashboard:
After running the test:
- Check the console output for p95 timings for each API endpoint.
- Compare these values with the p95 values displayed in the Grafana dashboard.
- Observe the differences when using the default mean aggregation method.
Apply the Fix:
Change the aggregation method in the Requests by URL panel to lastNotNull for more accurate quantile calculations. Compare the updated dashboard values with the console output again to confirm the fix.

CLAassistant · 2024-10-17T21:47:31Z

All committers have signed the CLA.

…put by updating aggregation to lastNotNull

mstoykov

LGTM!

But to be honest I am not very profitiant with this specific aggregation method.

@ppcano maybe you have more experience?

ppcano · 2024-10-28T11:07:42Z

@mstoykov I don't remember off top of my head. Is this change necessary for the Histogram dashboard?

dylanjpaulson · 2024-10-29T18:58:47Z

@ppcano, I’m fairly confident that this change is also necessary for the Histogram dashboard. However, when I tried using native histograms, I couldn’t get the console output to exactly match the values displayed on the dashboard, so I couldn’t confirm with certainty.

Is this a known issue with the native histogram dashboard? When using trend metrics, the dashboard aligns perfectly with the console values for all quantiles. But with the native histogram dashboard, there’s an inconsistency, with values varying by different amounts.

ppcano · 2024-10-30T15:38:33Z

@dylanjpaulson thanks for informing us of your findings, and your detailed report in the community forum.

Is this a known issue with the native histogram dashboard?

I don't know.

But with the native histogram dashboard, there’s an inconsistency, with values varying by different amounts.

If it's a minor difference, it could be because:
a) k6 aggregates data before sending it
b) the resolution that prometheus uses to store histogram data

We could look at this in another PR.

I noticed the difference between the terminal output and dashboard output in this example was small. This is a common issue when k6 exports data because as detailed previously:
a) k6 aggregates data before sending it
b) backends can also aggregate data

In the non-native histogram dashboard, the query is uses avg:

avg by(name, method, status) (k6_http_req_duration_p95{testid=~\"$testid\"})

Then, the returned query data is reduced using mean to display a single value. This PR changes mean to lastNonNull.

@oleiade, LGTM, I wonder how k6 does the same calculation.

dylanjpaulson · 2024-10-31T23:48:30Z

@ppcano,

I noticed the difference between the terminal output and dashboard output in this example was small.

This only appears to be an issue with the Native Histograms. As shown below, the Trend Query Metrics with the LastNotNull calculation matches the log exactly. Also, the screenshot below highlights that the difference between the console and dashboard when using the mean calculation is much larger and more inconsistent when running more complex test cases with greater variation in API request duration.

The screenshots below show that the console output exactly matches the dashboard when using LastNotNull, and the difference between the console and the dashboard is exaggerated when an api's request duration is more variable for example with edit_goal_challenge.

oleiade · 2024-11-05T08:51:56Z

Hey folks 👋🏻

Thanks for the fruitful discussions. We've discussed this topic internally, and we're happy to merge the PR as-is. Thanks a lot for your contribution @dylanjpaulson, much appreciated 🎉 🙇🏻

dylanjpaulson requested a review from a team as a code owner October 17, 2024 21:47

dylanjpaulson requested review from mstoykov and oleiade and removed request for a team October 17, 2024 21:47

Fix mismatch between Requests by URL quantile metrics and console out…

b4aa289

…put by updating aggregation to lastNotNull

dylanjpaulson force-pushed the fix-k6-prometheus-dashboard-metrics-quantile-mismatch-lastnotnull branch from 7a1d11e to b4aa289 Compare October 17, 2024 22:05

olegbespalov requested a review from ppcano October 18, 2024 06:19

mstoykov approved these changes Oct 24, 2024

View reviewed changes

oleiade approved these changes Oct 28, 2024

View reviewed changes

oleiade merged commit 1511bc9 into grafana:main Nov 5, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix K6 Prometheus Dashboard Metrics Quantile Mismatch by Changing Aggregation to lastNotNull #170

Fix K6 Prometheus Dashboard Metrics Quantile Mismatch by Changing Aggregation to lastNotNull #170

dylanjpaulson commented Oct 17, 2024 •

edited

Loading

CLAassistant commented Oct 17, 2024 •

edited

Loading

mstoykov left a comment

ppcano commented Oct 28, 2024

dylanjpaulson commented Oct 29, 2024

ppcano commented Oct 30, 2024 •

edited

Loading

dylanjpaulson commented Oct 31, 2024

oleiade commented Nov 5, 2024

Fix K6 Prometheus Dashboard Metrics Quantile Mismatch by Changing Aggregation to lastNotNull #170

Fix K6 Prometheus Dashboard Metrics Quantile Mismatch by Changing Aggregation to lastNotNull #170

Conversation

dylanjpaulson commented Oct 17, 2024 • edited Loading

Summary

Details

Screenshots and Comparison:

Steps to Reproduce the Issue:

CLAassistant commented Oct 17, 2024 • edited Loading

mstoykov left a comment

Choose a reason for hiding this comment

ppcano commented Oct 28, 2024

dylanjpaulson commented Oct 29, 2024

ppcano commented Oct 30, 2024 • edited Loading

dylanjpaulson commented Oct 31, 2024

oleiade commented Nov 5, 2024

dylanjpaulson commented Oct 17, 2024 •

edited

Loading

CLAassistant commented Oct 17, 2024 •

edited

Loading

ppcano commented Oct 30, 2024 •

edited

Loading