Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System] Issue with short time frame (5min) #1437

Closed
ruflin opened this issue Aug 3, 2021 · 34 comments · Fixed by #6743
Closed

[System] Issue with short time frame (5min) #1437

ruflin opened this issue Aug 3, 2021 · 34 comments · Fixed by #6743
Assignees
Labels
Team:Integrations Label for the Integrations team

Comments

@ruflin
Copy link
Contributor

ruflin commented Aug 3, 2021

Sometimes when a short timeframe is selected for the system dashboard (for example 5min) some of the values are not shown. Everything works again as expected as soon as a longer timeframe like 30min is selected. Initially I thought some data did not make it through in time but it seems other graphs work and the data is coming from the same machine. Any idea on what could cause this?

Screenshot 2021-08-03 at 11 41 11

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Aug 3, 2021
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@fearful-symmetry
Copy link
Contributor

Alright, I can reproduce this. Honestly, I'm at a loss here. The data coming in seems fine, and it's across too many different visualizations to just be an issue with a single metric. It's also really unpredictable. Sometimes it happens with a 15 minute interval, sometimes it doesn't. Sometimes it happens with a 20 minute interval, sometimes it doesn't. Is there some rounding issue going on in the visualizations themselves?

@kaiyan-sheng
Copy link
Contributor

@fearful-symmetry Is it possible because the system module dashboard is still using TSVB? I tested this with EC2 and RDS dashboard, the same thing happens with EC2 dashboard, which is using TSVB for the visualizations. But the RDS dashboard is fine, which is using Lens.

@fearful-symmetry
Copy link
Contributor

Also seconded, tinkered around with lens for a bit, everything seems fine. Wonder if that's where the issue is.

@ruflin
Copy link
Contributor Author

ruflin commented Aug 4, 2021

@jasonrhodes Do you have by chance any idea on what is happening here? What team should be pinged on this in case it is a TSVB bug?

@hendry-lim
Copy link

hendry-lim commented Jan 4, 2022

May be related to #121684 and #121734

@botelastic
Copy link

botelastic bot commented Jan 4, 2023

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 4, 2023
@jasonrhodes
Copy link
Member

@neptunian this issue is very old, but I'm seeing it in my notifications again because of the stale bot. When you have time, can you look to see if this System Module dashboard still displays the described problem here? It would be interesting to understand if using TSVB can cause data issues in dashboards as we are always considering Lens vs TSVB etc.

This isn't urgent, but it'd be nice to know if this is a simple/obvious TSVB problem or not. If not, I don't think we need to spend a ton of time digging beyond that unless the answer seems obvious to you.

@botelastic botelastic bot removed the Stalled label Jan 5, 2023
@neptunian
Copy link
Contributor

neptunian commented Jan 5, 2023

I took a look at my local host and I do get a lot of missing data when I have a short time range as well.
Screen Shot 2023-01-05 at 4 17 20 PM

When I use the "copy to dashboard" link in any of the visualizations that are missing data, and then "convert to lens", the lens visualizations doe show metrics. Here is the CPU gauge visualization as a Lens Metric or Gauge:
Screen Shot 2023-01-05 at 4 22 13 PM
Screen Shot 2023-01-05 at 4 22 20 PM

Opening the CPU Memory gauge visualization in TSVB I can see in the request that TSVB uses a date histogram to calculate the metrics and returns time series data, whereas Lens does not. This looks to be because of the "Date timerange mode" being set to "last value". In that case it will use a data histogram with the "auto" interval which in the case of a small time range will set fixed_interval to 1s. This will mostly return null values of timeseries data. If you were to change the interval to something larger like "1m" you would likely get some data.
Screen Shot 2023-01-05 at 4 27 25 PM

Or if "entire time range" is selected from the dropdown instead, it uses an "auto_date_histogram" of 1 bucket and the response is similar to Lens which just uses a metrics agg:

Screen Shot 2023-01-05 at 4 27 45 PM

The explanation texts read:
This setting controls the timespan used for matching documents. "Entire timerange" will match all the documents selected in the timepicker. "Last value" will match only the documents for the specified interval from the end of the timerange."

I think for something like a gauge single metric it would make sense for the user to see the metric average over the entire time range selected and not the last interval (which would match Lens behaviour). So if we're okay with that, it should be a small change. Didn't look into the tables yet but imagine something similar as they also show data when converted to Lens.

@jasonrhodes
Copy link
Member

Thanks for looking into this, @neptunian -- I guess we should probably work with the integration teams and submit changes for these shipped dashboards? I wonder if we could distill what you found into some "Common Advice" for those creating dashboards like this, so that each team could check their own dashboards and adjust accordingly?

@neptunian
Copy link
Contributor

neptunian commented Jan 30, 2023

I'd like to understand why it's currently being done this way before making changes. In the (current) case of using "last value", TSVB figures out how to auto calculate the interval for a given range and returns the averages for each of the time series buckets while only displaying the last bucket's value. If we change this to "entire_time_range", so that it stops getting time series data and gets the average over the entire time range, could that be slower over a larger time range? I wouldn't think so. Is there a reason why we only want the last bucket? Is that more "real time"? I would expect to get the whole time range and not the last bucket when I am allowed to choose a timerange. TSVB doesn't default to this option when I create a gauge, currently, so I assume this was intentional. Also, If we did change it to "entire_time_range", it wouldn't solve the problem completely. If the user selects a small enough time range, say last 10 seconds but only sends metrics every 30s, it will still be an empty bucket over that "entire time range".

In the above screenshots I posted I got the metricbeat dashboard after enabling the metricbeat system module whereas @ruflin screenshot says "metrics" so I'm assuming its the integration. However, when I install the system integration it looks different for me so maybe it has been updated and it's somewhat easier to understand why you might not get a value, because its null in that bucket:

Screen Shot 2023-01-30 at 1 49 23 PM

But for some reason when the time range is small the interval drops to 1s, it stops showing the interval in the tooltip:

Screen Shot 2023-01-30 at 1 46 53 PM

I'm not sure how these info tool tips work but if we could add more to show the value and interval, the user might understand that the time range is too small to reliably show data in smaller intervals than what they are collecting.

Happy to work with @elastic/integrations as I'm assuming they own these?

@ruflin
Copy link
Contributor Author

ruflin commented Jan 30, 2023

There have been quite a few iterations on this intergration dashboard since I opened the issue. I remember we always had quite a bit of discussions around using last value vs not (@simianhacker might be able to chime in for general advice).

I wonder if we could distill what you found into some "Common Advice" for those creating dashboards like this, so that each team could check their own dashboards and adjust accordingly?

That would be an ideal outcome for me of this discussion. @drewdaemon might also be able to chime in here in the context of Lens on what the recommendation is / should be.

@drewdaemon
Copy link
Contributor

drewdaemon commented Jan 30, 2023

@ruflin thanks for the ping

At this point, everything in this issue should be accomplished with Lens. I even have an open PR to replace the gauges with Lens metrics.

But, I think the questions about behavior are mostly more fundamental than any specific visualization tool and a full discussion of these behaviors occurred when the Kibana visualizations team originally reworked these very dashboards.

A big problem with TSVB's "last-value mode" is that it can be unpredictable and hard to understand as is noted by @neptunian .

The tables

As far as whether to use the average over the full range vs the real-time value, we proposed the following approach for the tables which would contain the best of both.

table_with_both_metrics

However, at the time, using the Lens table instead of TSVB introduced an extra click to get to the host dashboard. Because of this, we reverted to the old TSVB tables. This blocker is no longer relevant so I would suggest we go with the original suggestion pictured above (part of #4868).

The gauges (soon to be Lens metrics)

The single-number visualizations are harder because we can't just add another column as we can in the tables. In my PR, I have left the single metric visualizations taking the average over a limited time range simply to maintain parity with the existing visualizations. This suffers from the same issues as TSVB "last value mode."

I would much rather we decide which of the two approaches is most valuable and be consistent. Either

  • use the last value reported (downside: may be out of date)
  • use an average over the full time range (downside: less "real-time")

@drewdaemon
Copy link
Contributor

drewdaemon commented Jan 30, 2023

I wonder if we could distill what you found into some "Common Advice" for those creating dashboards like this, so that each team could check their own dashboards and adjust accordingly?

My general advice is implicit in my comment above, but I would say

Make your visualizations consistent and predictable favoring one of these two approaches:

  1. Show data for the full time range - if the user wants to see the latest state, they have to set the dashboard time range to 30s
  2. Show the very last state - take the very last value (edit: in the timerange). This is done using Lens's last value function.

In very few cases does it make sense to constrain the timerange independently of the time picker (i.e. TSVB "Last value mode" and Lens "Reduced time range.")
Screenshot 2023-01-30 at 4 26 08 PM

But, of course every case is different.

cc @stratoula in case she has any more thoughts.

@neptunian
Copy link
Contributor

neptunian commented Jan 31, 2023

2. Show the very last state - take the very last value. This is done using Lens's last value function.

Is there a point in allowing the user to select a time range if everything is showing the last value? I can see selecting a point in time (like the Inventory waffle map) but not a time range. Assume this was only for the chart visualizations where we want them to be able to define a time range?

Favoring either of the two approaches doesn't seem consistent. It also might confuse the user that once they move into the curated UIs we do things another way.

@drewdaemon
Copy link
Contributor

drewdaemon commented Jan 31, 2023

Hi @neptunian , thanks for the question and comments.

Is there a point in allowing the user to select a time range if everything is showing the last value?

Yes, the user sees the last value in their selected time range. Sorry—I see now that my original comment was ambiguous on this point.

Favoring either of the two approaches doesn't seem consistent.

TBH, I'm surprised to hear this! Can you explain your thinking?

Users are often confused by the magic behind TSVB's last-value mode (isn't that the main reason for this issue)?

To us, it seems more consistent to indicate either a metric that is subject to the entire timerange, or the last reported value. In the table screenshot I posted above each number is specifically labeled and will be consistent with any independent analysis/checking a user performs on their end (e.g. looking at the latest document in Discover or running their own Elasticsearch aggregation queries).

It also might confuse the user that once they move into the curated UIs we do things another way.

Hmmm, I guess I don't understand this one. What are these "curated UIs?"

@neptunian
Copy link
Contributor

neptunian commented Feb 1, 2023

Favoring either of the two approaches doesn't seem consistent.

TBH, I'm surprised to hear this! Can you explain your thinking?

Users are often confused by the magic behind TSVB's last-value mode (isn't that the main reason for this issue)?

To us, it seems more consistent to indicate either a metric that is subject to the entire timerange, or the last reported value. In the table screenshot I posted above each number is specifically labeled and will be consistent with any independent analysis/checking a user performs on their end (e.g. looking at the latest document in Discover or running their own Elasticsearch aggregation queries).

Sorry, I think i misunderstood. I agree we should "decide which of the two approaches is most valuable and be consistent". When it comes to the "common advice" I am hoping we stick with one of the two approaches and ideally not go either way, unless there is some exceptional case, as you mentioned.

It also might confuse the user that once they move into the curated UIs we do things another way.

Hmmm, I guess I don't understand this one. What are these "curated UIs?"

This is might be related to my misunderstanding above. We are building UIs (observability infra) that are influenced by these dashboards. When it comes to something like these gauges for instance or "summary" type of metrics, we do not use the "last metric". Mainly because I didn't know that was happening behind the scenes and this had me thinking whether we should be, too, if that's what a user expects.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 2, 2023

++ on having ONE recommended way and that it is consistent across all parts of our UI's, be it embedded Lens or not.

@drewdaemon
Copy link
Contributor

As I say, some visualization types do support using both approaches simultaneously, such as the table which can have an aggregation and a last value column, both clearly labeled. However, others, especially single-value visualizations like gauges and metrics, clearly don't and choosing one approach for some and another for others will definitely be confusing.

But, that's just me chiming in from the general visualization best-practices angle. I certainly defer to y'all on any questions regarding what is most valuable for observability users, along with how you decide to come to consistency across your dashboards and UI views.

@mgevans-5
Copy link

mgevans-5 commented Feb 21, 2023

//this article is also posted in discuss.elastic.co
https://discuss.elastic.co/t/dashboards-best-practices-and-thoughts-for-elastics-observability-team/326078
@anyone - I'm unable to attach images in Github due to corp restrictions - my apologies to links backward to the above article

Dashboards!

Why do dashboards exist In Observability?
Dashboards provide quick views of important information to help determine if further action is needed.
https://en.wikipedia.org/wiki/Dashboard_(business)
https://www.opservices.com/dashboards-empresas-de-tecnologia/

What is the difference between business service level dashboards and host level dashboards?
• Business Services dashboards are a view of business KPIs (are transactions performing in a normal range? Is business volume typical?) and an aggregate view of the underlying technology (my uptime and overall system’s health).
• Technology dashboards provide detail views of the metric-generating components that underly a business service and allows for a detailed look at attributes that impact current or future performance and health.

The remainder of this discussion is centered on technology dashboards and host level views. Business Service dashboards are a different animal.

In viewing host level metrics there are two general groupings: multiple systems – typically of a similar function (web, services, DB) – vs. a view of a single system.
A closer look at how best to utilize metrics at a host level. We typically visualize metrics for one of two reasons.

  1. I want to see what is going on now. I need to see real-time metrics during an important release activity or during an incident recovery.
  2. I want to understand the trending behavior of a system. For example I may be looking for performance changes over time or working on setting compute resources to actual needs.

These two views of metrics are different and both important and each has its challenges when it comes to displaying data. When viewing the ‘Latest’ data, we need to know about its precision (is it averaged or point-in-time? over what period?), its actual collection interval (1 sec, 1 min), and any lag between ‘now’ and when the data was collected (is it truly the ‘latest’ real-time or just ‘near-time’?). When viewing trending data we need to show normalized information as well as important data that identifies behavior that may be buried by simple averages.

Note: an important consideration for monitoring is choosing a meaningful interval for raw data collection. In some cases this may be reviewed per host, per metric and even per app component. For Disk, 10 minutes may be perfectly acceptable for established, mature environments but not necessarily safe for systems with immature processes and rapid changes – or for systems where disk usage is vital to the stability of the system e.g. database environments. Similarly, for Memory and CPU a default interval of 10 seconds may be fine grained enough, but for some systems 60 or 120 seconds may be more than adequate.

Visualization Concepts for Elastic Observability
Much of the following is discussed in the github development conversation above; I’m distilling it here with some context where necessary.
When dealing with single number visualizations
• Values should be labeled as either Averages or Last Value because Averages <im a bit confused by this, is it just never equal?> Last Value. When showing a single value without a label it would be assumed to be current or ‘last value’ data – but averaged data should always be labeled.
• Some values will ALWAYS have a last value. There is no such thing as non-existing Memory count. Note: this is indeed a technical issue Elastic acknowledges and is being worked on.
• Some values should ONLY have a last value. There isn’t a need to average disk space over the last 15 minutes. Disk usage is a discreet number that typically has concrete boundaries.
• ‘Last value’ datapoints should indicate time offsets from ‘now’ as: current timestamp – last value timestamp = time offset in seconds. Since each datapoint could have different data capture intervals this time offset value may need to be indicated per visualization.

When dealing with time-graphed data
• Averages and trendlines are useful data points and can help a user see through the trees. However Averages do not always tell the full story. The following is written around the CPU example but is often applicable to other metrics.
By its nature CPU usage is already averaged out when the values are collected by beats/agents. When CPU metric data is collected the data is the average of the metric over the collection interval. Important concept: we should not RE-Average the metric unless absolutely necessary. Whenever possible the most detailed information should be displayed. When data buckets must be collapsed due to extended time period views we must show top and/or bottom values in addition to time-bucket-averages. Why? Because by RE-averaging data we are dropping possibly very important behavior information about metric spikes within a time-period.
‘Help! my important data has been buried in an average’ – this has been shown to be a problem in many time-span graphs across Elastic where an averaged value mistakenly hides very important peak and valley values. You can see this problem when you zoom into an averaged time period only to find very peaky performance that is unknown if you were to only look at a larger interval of data. E.g. Stack Monitoring has this problem when showing JVM Heap where looking at a small time period I see heap usage moving from ~4.5 GB to ~18.6 GB whereas if I zoom out I see usage evenly riding at just around 19 GB.

Lets take a real-world example
The following graph was made from using the same Windows perfmon counters that Beats use (https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-windows-perfmon.html). The graph includes counters captured at 1 second intervals and 10 second intervals. CPU usage was controlled through SysInternals CPUSTRES tool.
In the graph you can see that the CPU usage average line at 10 second intervals can show wildly different behavior when the CPU usage gets ‘spiky’ in the later stages of the test. Although we do see the average going up and down, given a perfectly timed CPU stress behavior we can force generate a smooth average line to match the behavior in the earlier stage of the test. What do we learn here? Averages can bury potentially important metric behavior.

https://global.discourse-cdn.com/elastic/original/3X/0/c/0c12ec657a4e1ae225d04b04c7345e4065149848.jpeg

Now lets take this to the next level. Here is the same data (same system, same timeframe) as presented in Elastic Kibana’s [Metricbeat System] ECS ‘Host Overview’ dashboard. (Note this holds true in v8.5 test env. as well)
• In the first graph we see the raw data at 10 second intervals – which is the default interval in Metricbeat.
• In the second graph we see the same data, as reviewed in a zoomed out view over two days – and the data grouped in 30 minute buckets. The averaged information (~48% max) hides that the CPU had times of much more use during that bucket period.
• The third graph is even worse when the data is grouped in 12 hour buckets and the CPU appears to never rise over 8% usage.

https://global.discourse-cdn.com/elastic/original/3X/a/1/a1d51f4119a83f20652768d3a4deff4c29705dcb.jpeg

https://global.discourse-cdn.com/elastic/original/3X/4/f/4f351cf33923d618861c1dc753e856df1b5a0aa9.png

https://global.discourse-cdn.com/elastic/original/3X/d/a/da9a02d301c4777a1216afe349d9b18806a5c64f.png

How do we deal?
We show both Average and Max – for example in CPU we show max CPU if > 80%

https://global.discourse-cdn.com/elastic/original/3X/6/2/62a4961b1529388d32df176db12efb1d88ed8322.jpeg

@ruflin
Copy link
Contributor Author

ruflin commented Feb 23, 2023

@mgevans-5 Thank you a lot for taking the time to write all this down and share it! Really appreciated. It is a good point that in many cases we should show Avg + Max (+ maybe Min) to get the full picture.

@ruflin
Copy link
Contributor Author

ruflin commented Apr 12, 2023

I went through this issue again and quite a few related issues. Thanks everyone for contributing. Here my follow up thoughts:

  • Metric types: The introduction of TSDS in Elasticsearch partially helps solve some of the problems. Assuming in the mappings counter and gauges are set properly, for counters we can automatically focus on the last / max value. For gauges, Lens should pontentially offer to automatically visualise also max / min because there are important too. @drewdaemon Not sure if something like this was discussed before.
  • Lens: It seems lens started to address quite a few of these issues. Migrating over from TSVB to Lens will help.
  • Indicator for last value vs average: It would be nice if in the UI or on hover, it is easy to see for a consumer if the data looked at is an average calculated or the last value.

The part that seems to create the most confusion is the empty graphs with - inside or be 0. From a technical point of view it is simple to explain as there is no data yet. As discussed above, Lens solves part of this because it doesn't use the fixed time range. But can we do better? Can we show better messaging to the user to expand the time range?

@drewdaemon
Copy link
Contributor

drewdaemon commented Apr 13, 2023

@ruflin that's a good summary. A few responses:

Assuming in the mappings counter and gauges are set properly, for counters we can automatically focus on the last / max value. For gauges, Lens should pontentially offer to automatically visualise also max / min because there are important too. @drewdaemon Not sure if something like this was discussed before.

I haven't heard any discussion about this level of guidance. cc @dej611, @stratoula

Migrating over from TSVB to Lens will help.

In some ways, this is probably true, but the "Edit visualization in Lens" button will retain the TSVB "last value" mode as Lens's "reduced time range" setting, perpetuating the issue.

Because of this, the person converting legacy visualizations currently has to understand this problem and manually intervene. The situation is made worse by the fact that TSVB used to turn on "last value" mode by default, so it is widespread in the integration dashboards, probably in many cases without the original authors understanding what was happening.

I'm still trying to figure out what to do about this TBH.

It would be nice if in the UI or on hover, it is easy to see for a consumer if the data looked at is an average calculated or the last value.

Lens already gives you a default label of either "Last value of <field>" or "Average of <field>". Seems like the responsibility here should ultimately rest with the dashboard authors since they're the ones with the power to override the labelling. But, always open to enhancement requests to Lens (not sure if this is what was meant).

@dej611
Copy link

dej611 commented Apr 13, 2023

Assuming in the mappings counter and gauges are set properly, for counters we can automatically focus on the last / max value. For gauges, Lens should pontentially offer to automatically visualise also max / min because there are important too. @drewdaemon Not sure if something like this was discussed before.

I haven't heard any discussion about this level of guidance. cc @dej611, @stratoula

As far as I know there's not been any discussion on this yet.
There have been some more simpler "smart" configs based on fields metadata like elastic/kibana#130727 .

@ruflin
Copy link
Contributor Author

ruflin commented Apr 17, 2023

Lens already gives you a default label of either "Last value of " or "Average of ". Seems like the responsibility here should ultimately rest with the dashboard authors since they're the ones with the power to override the labelling. But, always open to enhancement requests to Lens (not sure if this is what was meant).

Lets brainstorm a bit more on this one as it might also help solve the TSVB migration. What does a user do with this label? How does a user understand what it means (without being an expert)? Can we help the user somehow? If the user selected the wrong option, can they switch over? Do we have during setup some recommendations in case the user is unsure what to select? The way I think we should approach it: Lets make the right choice by default for "new" visualisations. For everything existing, lets guide the user as much as possible and don't assume magical knowledge about how things work. Ideally for 80% of the docs / explanations, users do not have to jump to a website and come back ;-)

@drewdaemon
Copy link
Contributor

drewdaemon commented Apr 18, 2023

In some ways, this is probably true, but the "Edit visualization in Lens" button will retain the TSVB "last value" mode as Lens's "reduced time range" setting, perpetuating the issue.

By way of clarification, Lens offers this "reduced time range" option outside of the TSVB migration context since there are valid use cases for it. However, it is a little buried in the UI and not on by default. 👍

Because of this, the person converting legacy visualizations currently has to understand this problem and manually intervene. The situation is made worse by the fact that TSVB used to turn on "last value" mode by default, so it is widespread in the integration dashboards, probably in many cases without the original authors understanding what was happening.

I think the concern I stated above ^^ doesn't have to be tied to the TSVB to Lens migration efforts currently underway, so sorry for any confusion that caused. We can always track the integration visualizations that are using this feature and perform an "audit" later. 👍

Lets make the right choice by default for "new" visualisations.

Strong yes here. This is always our goal and I will open an issue with the ideas you've stated about counter rate and gauge fields.

For everything existing, lets guide the user as much as possible and don't assume magical knowledge about how things work. Ideally for 80% of the docs / explanations, users do not have to jump to a website and come back ;-)

In discussing this with the visualizations team, the general sentiment was that we should avoid inserting guidance around best practices into the user flow to keep the automatic convert to Lens button as frictionless as possible. But, I think that's okay as far as this issue goes since the scope is limited to improving single-number visualizations in our curated integration dashboards. That is an effort which probably has to be undertaken manually anyways.

@ruflin
Copy link
Contributor Author

ruflin commented Apr 19, 2023

This issue was initially opened around the 5min time frame issue for the metrics dashboards but evolved in a pretty long and fruitful discussion. I expect this issue to also serve partially as documentation if these issues pop up again. Never the less I would like to see us having documentations around the above topics in our docs pages where we can send users to but I'm not sure about the right place. Should it be with Lens? Should there be docs paged focused on the metrics use cases? @drewdaemon @ninoslavmiskovic @mlunadia Interested to hear your take.

My plan is to close this issue soonish. The topic is now in good hands with @drewdaemon and team on the Lens side. The topic that likely deserves a follow up issue in Kibana is the max / min discussion for gauge. The thing not clear to me is who takes a lead on follow up docs.

@ruflin
Copy link
Contributor Author

ruflin commented Apr 26, 2023

Closing this issue but please keep chiming in.

@ruflin ruflin closed this as completed Apr 26, 2023
@mgevans-5
Copy link

@ruflin you identified this as being in good hands with @drewdaemon - for reference sake can you link the related efforts or identify any internal project ID/name we can use when working with our support/sales groups?
thanks.

@ruflin
Copy link
Contributor Author

ruflin commented Apr 27, 2023

@mgevans-5 Can you ping me through support (@ruflin ) in reference to this comment here so we can handle this "internally"?

@drewdaemon Can you add all the links here that are public available for future reference?

@drewdaemon
Copy link
Contributor

drewdaemon commented Apr 27, 2023

I don't know of any resources to share outside the comments on this thread.

Mainly I'm talking about

  • this one which guides to choose either an aggregation over the entire time range, or Lens's Last Value function (gives the last reported value, not to be confused with the TSVB "Last value" Data timerange mode setting which simply constrains the timerange of an aggregation)
  • this one which guides to use Lens and provide clear labeling around what each value in a visualization means.

As far as ownership goes, the Lens enhancement ideas obviously fall under us (visualizations team), but I would expect the integration authors/maintainers to correct instances of this issue (intermittent blank values due to constrained time ranges) in their own dashboards.

It may be that we (visualizations team) can give help in pointing integration owners to visualizations that are likely to exhibit this behavior, but an initiative like this probably makes most sense after the Lens migration is further toward completion.

Actually, @ruflin I'm a little surprised to have this specific issue closed because the system dashboard's single-value visualizations haven't yet been corrected... I hoped to do so with #4975 but I never received feedback from the owners on which of the two approaches (last value or average over time range) would be most appropriate and it fell off the back.

@ruflin ruflin reopened this Apr 28, 2023
@ruflin
Copy link
Contributor Author

ruflin commented Apr 28, 2023

Actually, @ruflin I'm a little surprised to have this specific issue closed because the system dashboard's single-value visualizations haven't yet been corrected... I hoped to do so with #4975 but I never received feedback from the owners on which of the two approaches (last value or average over time range) would be most appropriate and it fell off the back.

Reopening this issue. @drewdaemon I missed on my end that we have not fixed this yet :-( @cmacknz @lalit-satapathy One of your teams should take this on.

It may be that we (visualizations team) can give help in pointing integration owners to visualizations that are likely to exhibit this behavior, but an initiative like this probably makes most sense after the Lens migration is further toward completion.

Integrations owner is one of the target groups but I think broader. Basically everyone can build integrations now so only targetting the current owners is not enough. @drewdaemon You linked above some of the most important comments in this thread. Where in the docs should these go? My assumption is Lens related?

@cmacknz
Copy link
Member

cmacknz commented Apr 28, 2023

Let's get the PR here re-opened so we can get the necessary feedback. #4975

I will find someone to make sure it gets reviewed, there have been multiple customer issues with this as it is today. @kpollich or someone from his team is probably better suited to review the changes to the visualizations in the package.

My team owns the system integration but most of our knowledge is on the data collection side, not the visualization side.

@drewdaemon
Copy link
Contributor

drewdaemon commented May 3, 2023

Sounds good @cmacknz . I will coordinate with @kpollich about the right approach here and open another PR in the next couple weeks. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.