Jump to content

Visualization of large data sets - seeking advice


Ronny Abraham
 Share

Recommended Posts

Hello all,

I am looking for a way to render interactive charts in a browser with a large number of data points, and at a high rate. This is for a data visualization project I am working on here at kns.com ( I am a developer in the Data Visualization team here)

For example I would like to be able to render up to 2 million data points, with an update rate of 20K new points per second (new points are added and the oldest ones are thrown away). I would also like to have data grouping, where at a high zoom level, neighboring points are being grouped to 1 point by averaging or other method (level of detail).

Up to today, we were using Highcharts (a chart library), which has a very good API to create charts and does data grouping. However, even with its latest version and with webGL enabled, the performance is not adequate for large data sets and continuous updates.

See example here: https://jsfiddle.net/kq0cs070/ (click 'start update, and use mouse to drag-zoom).  The update rate is around 400 MS to draw the scene. That is 2.5 frames per second.

[As a side note I will add that highcharts prepares a webGL canvas and then make an image out of it, followed by creating a tree to map pixels on the image to objects (data points).]

In any case I am looking for something with better performance and to be able to run in a browser.

So I started looking into game development with webGL, and I wanted to ask for advice.

I would prefer to have a chart API where my code will call "create chart" and not "draw a line", but I did not find such a library with the performance I am looking for. So I was wondering if I need to develop something at a lower webGL level with libraries like pixi or phaser etc, and what would be a good path to solve this problem.

Your input is highly appreciated

Thanks for reading :-)

Ronny

Link to comment
Share on other sites

I did read something fairly recently evaluating different libraries for large data tables (sorry, I can't find the link, but googling has plenty of examples and run-throughs), I can just about remember the take-aways though (this just dealt with data tables):

* It evaluated different libraries that were tied to different 'frameworks', generally speaking anything in the Vue/React/Angular2 world was the slowest but by far the easiest to set up, style and use. All struggled with large datasets although some used various techniques to reduce the amount of render i.e. React-Virtualised is a basically a list management library that ensures that the DOM is as minimal as possible, so, whilst your data set might contain several thousand (or million) lines of data the screen is only ever going to be able to render 20-30-40 at a time so these tools aim to reuse 'exiting' lines and replace them as 'entering' lines (very similar to object pooling) so that the DOM is only ever rendering the minimal amount, so long as you get the scroll right then rendering is super snappy (load times of all the data is obviously a little slower). This is the exact same method iOS uses to render list views (Android might do the same, I dont know).

* Vanilla JS libs weren't much faster than those tied in to the rendering lifecycle of Vue/React/NG2/etc, the only real gain is in the loss of the perf overhead those libs introduce, or, perhaps more accurately, their rendering lifecycles are more optimised for the exact use case.

* By far the fastest was a wholly canvas based renderer (I think this one cost some cash, and I don't know what it was called). This one was also easily the hardest to use, definitely the hardest to style and layout and, out-of-the-box, looked absolutely terrible. But, if huge datasets are your thing then its the only real viable solution (although the minimal-DOM approach explained above wasn't terrible, even on mobiles).

For charts and stuff I don't think I've ever looked far beyond d3. It's more work to setup and use than HighCharts as its a lower-level library but it can handle a number of visualisations and its primary goal is performance so large datasets aren't always a big issue. It's been a while since I used it but I think its a pluggable library so you can bolt on DOM or canvas-based renderers as appropriate for the type of visualisation you want.

I don't see HighCharts as a competing library to d3 as they have slightly different scopes and I don't know any real competition for d3 either.

d3 is probably lower-level than you want to go (its a 'draw a line' api rather than 'draw a chart') but there are millions of abstractions (modules) on top that give you a 'draw a chart' api.

Link to comment
Share on other sites

  • 2 weeks later...
On 12/19/2017 at 7:41 AM, mattstyles said:

I don't see HighCharts as a competing library to d3 as they have slightly different scopes and I don't know any real competition for d3 either.

d3 is probably lower-level than you want to go (its a 'draw a line' api rather than 'draw a chart') but there are millions of abstractions (modules) on top that give you a 'draw a chart' api.

Thank you for your reply, from what I read D3 is based on SVG not WebGL, and from my limited experience up to now SVG does not work well on large data sets. The browser slows down once we introduce more than a few hundred thousand data points.

Link to comment
Share on other sites

  • 1 month later...

I'm not really sure what the requirements for your visualisation are, and if your fiddle is in any way related to what you have to visualise, but might I make a scientific question: what information is the user meant to get from the graphs?

I'm speaking as a bioinformatics/data visuals guy, and I fell like it is absolutely ridiculous (no offense, just figure of speech) to think any living entity will ever be able to process those 2e6 data points and their high-freq changes in real time. Before you get lost in this task you should probably research how your users interact with the graphs. Maybe they will just zoom in to a certain range to keep an eye on it, and so you'll be able to start with a closer view need to update far less points. Maybe there are other usage patterns you can use to your advantage.

In case you really need to visualise all that data I think you won't get around doing a draw-a-line approach; but drawing lines between known coordinates ain't that hard.
The harder part is to draw pleasant looking axes, but really there is no need to. In web you have the possibility of drawing the coordinate system and the data independently. So you could have a graph library of your choice drawing the axis, descriptions, and whatever doesn't need to be redrawn constantly as beautiful as you like, and then render a transparent canvas on top of it which does only render datapoints with simple lines. Maybe even Pixi.graphics.lineTo(x, y) will be enough? Then you only need to take care that viewport changes propagate to both elements.
To speed up the whole thing you might want to do object pooling; that is not creating new point coordinates every time you update, but reuse existing point objects - cost of allocation and all the stuff you already know. Perhaps you could even abuse Pixi's already optimised particle system to do that?
Using WebGL directly might be good, too - for example scaling the input coordinates to canvas coordinates can be done by a fragment shader instead of JS for even more pure speed.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...