Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of the adaptive algorithm for data visualization (enhancement) #198

Closed
lmar76 opened this issue Apr 14, 2017 · 7 comments
Closed

Comments

@lmar76
Copy link

lmar76 commented Apr 14, 2017

One of the feedback received at the training session in Banff was on the measurements sub-setting in the scatter plots. It seems that for products with a reduced amount of data (e.g. EEF, but also FAC products which have many "nan" values) this sub-setting is still applied in the visualization process.

For this reason, we would like to understand if it is possible to improve the adaptive algorithm for data visualization in order to:

  • ensure that if for the given AOI and/or TOI the number of measurements is lower than a "limit", no sub-setting will be performed (i.e. all the values are visible).
  • ensure that the measurements having minimum and maximum values are always included in the selected data

The description reported in the FAQ (see question 6) seems to be not correct:

if no Area-of-Interest (AoI) is selected the data sampling is kept at 5s for time selection up to 1 day and the size of the transferred data rises proportionally to the size of the interval. For time selections over 1 day the size of the data is fixed and the data sampling period rises proportionally to the time selection.

In fact, after some tests, it came out that, with the exception of EEF (check also vires_subsetting.xlsx):

  • for time selections up to 1 day: the sampling period spans between 8 seconds and 20 seconds
  • for time selection greater than 1 day and up to 30 days: the sampling period spans between 20 seconds and 397 seconds, thus, dividing the number of seconds of the selection by the sampling period it appears that the "estimated" number of points is not fixed (e.g.: for 1 day we have 86400 seconds / 20 seconds = 4320 points while for 30 days we have: 30 * 86400 seconds / 397 seconds = 6529 points)
@lmar76 lmar76 changed the title Review of the adaptive algorithm for data visualization Review of the adaptive algorithm for data visualization (enhancement) Apr 14, 2017
@lmar76
Copy link
Author

lmar76 commented Nov 2, 2017

After internal discussion, we would like to improve the algorithm in order to show all the measurements (i.e. no subsetting) for temporal/area selections minor than a threshold (to be defined). For time/area selection greater than this threshold, the subsetting algorithm should always include the maximum and minimum measurements.

@santilland
Copy link
Member

Ok, i rechecked the subsampling step, the following is done:

step = relative_area * ( min_step + (relative_time * (base_step - min_step) )  )

Where relative area is 1.0 when no area selection is done and between 0 and 1.0 depending on the size of area selection.
min_step is currently set to 7s and base step to 20s.
relative_time is time span selected divided by 1 day, so 1.0 when a full day is selected, so 20 seconds steps when a full day is selected.

Meaning that right now to get the full resolution data you need to select a small area.

After doing some tests i think if we set the min_step to 0 and keep the 20s as base we should get the desired effect, which is keeping the number of points/measurements around ~4k and to make sure the original data "resolution" is taken for up to 30 minutes time selection.

30 minutes would result in 1*(0+0.0208*20) = 0.41 step, resulting in full data for 1Hz as well as 2Hz (~1800 points 1Hz).

  • ~1 hour the subsampling step would be 0.8s (~3600 points in client 1Hz).
  • ~2.5 hours the subsampling step would be 2s (~3600 points in client 1Hz).
  • 12h - 10s (4320 points)
  • 24h - 20s (same ~4k)
  • 15 days - 300s (same ~4k)

This change could be done quickly and be also deployed to production.

For the second point of keeping the max/min values is after some thought not as straight forward as i expected and we would need to discuss and explore some options, i created another thicket for that purpose (#230)

@lmar76
Copy link
Author

lmar76 commented Nov 30, 2017

According to the formula described above, the solution based on min_step = 0 is good for time selections up to 1 day (i.e. the step size is lower than the case of min_step = 7). However, for selections above 1 day, the step size is greater than the one obtained with min_step = 7, e.g., with reference to the same case described above (relative_area =1), for 15 days we have:

  • min_step = 7 -> step = 202 s
  • min_step = 0 -> step = 300 s

So, our proposal is to set min_step = 0 for time selections up to 1 day and min_step = 7 for time selections above 1 day. Do you think it is feasible?

@pacesm
Copy link

pacesm commented Nov 30, 2017

It is a trivial change. For relative_time <= 1.0 we can use

step = relative_area * relative_time * 20

and for relative_time > 1.0 we can use

step = relative_area * (7 + relative_time * 13)

to lower the slope of the step increase.

@santilland
Copy link
Member

The proposed solution has been deployed to staging and is behaving as expected:
eg: for magnetic
1 hour -> 3600 --> ~1s step (original)
[for plasma 2Hz 30min->3600-->~0.5 step (original)]
12 hrs -> 4200 --> 10s step
1 day -> 4300 points --> ~20s step
15 days -> 5495 points -> ~220s step

@lmar76
Copy link
Author

lmar76 commented Dec 20, 2017

We have tested the new solution on staging and is behaving as expected. Please, transfer it into the operational server.

@santilland
Copy link
Member

Moved to operations, closing ticket, feel free to comment for any related things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants