Skip to content
This repository has been archived by the owner on Mar 18, 2021. It is now read-only.

Plot drawing method #33

Open
johanfforsberg opened this issue Sep 8, 2018 · 4 comments
Open

Plot drawing method #33

johanfforsberg opened this issue Sep 8, 2018 · 4 comments

Comments

@johanfforsberg
Copy link
Contributor

I'd like to discuss the pros and cons of the current method of drawing, now that the application has been in use for a while. Feel free to add your observations and opinions to this issue. If the drawbacks are too large, we should start thinking about a new solution.

The plots are currently completely drawn on the server, as an image in whatever size the client requests, and sent to the client as a single base64 encoded PNG. To do this we use datashader (http://datashader.org/).

Advantages

  • Datashader is able to plot huge datasets without downsampling. By this I mean that all points are always drawn no matter the total number of points. For example, a single pressure spike lasting less than a second will be visible even if you plot an entire year of otherwise even pressure. Downsampling data is a very tricky subject, so I think this is a huge practical advantage and it's the main reason I went with this method.
  • A very important factor is how many points we can expect to plot. As a start, let's consider an attribute that is stored once per second. This means ~2.5 million points in a month. I think this is not a very unusual case, e.g when looking at a long term trend. So we at least should aim to handle tens of millions of points routinely. Datashader is advertised as handling datasets of hundreds of millions of points or more and so far I think it has handled things very well. It should even be possible to distribute datashader in a computation cluster (using "dask") if performance is not good enough.

Disadvantages

  • Drawing everything on the server means that changing simple properties of the plot (e.g. line color) requires redrawing the entire image. This is probably solvable without changing the whole architecture (see Color selection improvements #17) but it will complicate things a bit.
  • We can't use any of the existing third party JS plotting libraries since they depend on getting point data (we do use D3, but only for axis drawing and such).
  • The plots don't look great; no antialiasing on lines, and no line styles. This can be a real problem for people with color blindness.
  • There's no straightforward way to cache data in the client, since images need to be redrawn when the Y axis changes.

Other notes

  • The images may be fairly large, but not huge; according to some quick measurements, realistic data at HD resolution will typically transfer ~50-100 kb per attribute plotted, with compression. Not sure it would be that much more efficient to send raw data points though. Also, the size of the images is essentially independent of the size of the raw data, so plotting a million points can take the same bandwidth as a thousand. I.e. the bandwidth usage is limited.

Alternative solutions

I haven't checked the options during the last year so I may be missing some important ones, but to me there are two main options:

  • Bokeh https://bokeh.pydata.org/en/latest/ is a library that basically solves the same problem, of having large datasets on the server and plotting them in the browser. I started the first prototype of the HDB++ viewer using bokeh, but I ran into some problems with updating data that made me change, I don't remember exactly why. However, bokeh has developed a lot since then, and is definitely worth looking into again. If it worked it could simplify both server and client code.
  • Implementing our own way of downsampling/compressing data, sending it to the client and drawing it with a third party javascript plotting library. Probably tricky, but not impossible.
@meguiraun
Copy link
Contributor

We were having some troubles with some of the request from the users and their implementation, and we also see those drawbacks. Some of us (I will not name who ;)) were lobbying for testing client side rendering (at least for testing purposes), but as you point out, downsampling can be tricky...

For me, the main question is what is the longterm objective of this app. I see it as a basic data display application. Being the main purpose helping commissioning, maintenance and internal reports (not basic as in simplicity ;)). If the user wants high quality plots and/or fancy plotting features, then, they should download the raw data (via this app or through the api) and plot in any other software (e.g. matlab, origin...).

So, as long as we manage to add the main features that were requested (mostly make easy the curve differenciation), we can keep the server side rendering. Assuming that we can fulfil those requriments. And also, assuming that we can communicate our users about the limitations and that they understand them.

I can see that a trial for a client rendering should not take too long, but then we open the door for an endless feature request list, and perhaps, ending up building something bigger than it should.

@hardion @13bscsaamjad @AntoineDupre

@johanfforsberg
Copy link
Contributor Author

Yes, maybe the thing we need is actually a specification of the scope of the project, i.e. what problem are we trying to solve.

To me, the main idea was to make a tool for technical users to quickly look at any historical data. High quality presentation or advanced analysis features was not high priority, so it made sense to compromise on visual quality in exchange for "heavy duty" plotting capability.

In the end, obviously the users are the only ones who can decide what works for them. I think we should make an effort to solve the main issues they have and then re-evaluate this?

I think the trickiest part of the application right now is anyway the data fetching from the DB, so replacing the plotting part is not a huge deal. I have a feeling it could lead to a rabbit hole though... :)

@meguiraun
Copy link
Contributor

I am away this week (work travel) and on vacation the next one. On my return I will trigger the discussion internally and depending on the outcome we may reconsider the steps to take.
would that be ok?

@johanfforsberg
Copy link
Contributor Author

Great!
I've already started on implementing #36 and I'm pretty sure I'll have a PR ready for you to try out when you get back :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants