Jump to content

computeNormals optimization


jerome
 Share

Recommended Posts

Hi,

 

After having monitored in many browsers the CPU profiler, I noticed, when morphing meshes, that the computeNormals() method called in each frame was the bottleneck. There was a previous topic where this performance issue was talked about with JCPalmer.

No real improvement were found.

 

So I decided to hack this method to check if something could be done.

Remember this method was designed long time ago before dynamic morphing was added to BJS, so it wasn't designed to be used in a render loop 60 times per second.

 

<tl;dr;>

current computeNormals() :

http://www.babylonjs-playground.com/#ZOSGB

 

local computeNormals() :

http://www.babylonjs-playground.com/#ZOSGB#1

 

If this mesh is too heavy (48 000 vertices !) for your computer, please adjust your mesh size line 147 :

 var nbPaths = 80;

</tl;td;>

 

What did I find ?

I found many inexpensive changes could be done.

For instance, the current computeNormals() method does :

- 3 passes : the first for positions.length times, the second for faces number times and the last vertices number * faces of vertices number times

- allocate 3 intermediate arrays : one will be filled with number of vertices new Vector3 objects, another one will be filled number of vertices arrays (an array of arrays) containing faces indexes.

 

This makes the code very readable (if I can understand it at first sight not being an expert, I consider it very readable) but may be improved in term of passes and created objects.

This is quite important because I noticed that Chromium, which has for now better general perfs in js execution and rendering than FF, starts to consume more and more in GC after a while on a morphing running for minutes... up to 70% CPU on some examples on long duration. The framerate finally decreases...

Since FF keeps a constant lower framerate and a constant GC usage.

 

Well, for now, I reduced the method down to 2 passes only (the first for nb of faces times and the second for nb of vertices times) and the memory allocation down to 6 Vector3 objects and one array of Vector3 sized nb of vertices length.

The 6 intermediate Vector3 could be not used if I would re-implement vector3 methods locally (add, subtract, cross product, normalize) but I don't think it's worth it in term of GC gain.

I would like to eliminate the second pass and the intermediate array whose use is only to normalize each normal, but I have no idea for now. The CPU profiler doesn't show this pass has a noticeable impact, so it may not be worth it either.

 

That said... how much is this improvement ?

Well, it depends on the mesh size and on your computer/browser capabilities.

 

So please in this very heavy example (done in purpose to stress the method), change the mesh size (nb of vertices) line 147 and you'll probably find a value where the improvement is really really noticeable.

On my own computer, with Chromium, I can have 60 fps (and no GC at all) for minutes with the new function since the legacy one will decrease the framerate downto 28 fps after minutes and the GC will increase up to 70% CPU.

 

You can also compare both method durations by opening your browser console :

new method : http://www.babylonjs-playground.com/#ZOSGB#2

legacy : http://www.babylonjs-playground.com/#ZOSGB#3

On my computer, with local examples (outside PG and PG editor running scripts), I get between x3 and x5 speed increase !

 

So please make your own tests and let me know what you think about this improvement before I dare a PR

Link to comment
Share on other sites

Hi Jerome, in chrome v. 42 (windows 7) the new version gives 6 FPS, versus 12 FPS for the legacy one.

 

It's weird since it looks like you've optimized the method quite well...

 

I tried copy-pasting the original function into your custom one and then the FPS drops to around 4: http://www.babylonjs-playground.com/#ZOSGB#4

 

So I'd say your method is definitely an improvement, but there is something weird in how the PG behaves!

Link to comment
Share on other sites

OK,

Please use Chrome (I don't have IE) so we can compare for now (I hope IE will react the same then).

Please open the Developer Tools.

In the lower panel, click on the Rendering tab, then opt in for "Show FPS meter".

In the upper tab, opt in for "Collect Javascript CPU Profile". Don't start it for now please.

 

Now, you can just check the FPS with this example (I tuned for the results are very sensible on my machine = 28800 vertices) :

legacy = http://jerome.bousquie.fr/BJS/computeNormals/legacy.html

optimized = http://jerome.bousquie.fr/BJS/computeNormals/optimized.html

 

My results here are :

constant 28 FPS with legacy ...then decreasing

constant 60 FPS with optimized

 

If now you go back into the Developer Tools and you start the CPU profiler for, say, 3 minutes.

You'll see the GC will remain at very low level, quite null, with optimized but will increase drastically with legacy version.

 

post-5453-0-79117900-1430314987_thumb.pn

post-5453-0-79117900-1430314987_thumb.pn

Link to comment
Share on other sites

Please test the direct links and let me know

I will add a way to tune the mesh size so you will find a good value to see the difference

 

For those wanting to to read the code, here's the last commented version :

Two passes only (nb of faces times then nb of normal vectors times) and no more intermediate array.

  var localComputeNormals = function(positions, indices, normals) {    var index = 0;    // temp Vector3    var p1 = BABYLON.Vector3.Zero();    var p2 = BABYLON.Vector3.Zero();    var p3 = BABYLON.Vector3.Zero();    var p1p2 = BABYLON.Vector3.Zero();    var p3p2 = BABYLON.Vector3.Zero();    var faceNormal = BABYLON.Vector3.Zero();    var vertexNormali1 = BABYLON.Vector3.Zero();;    var vertexNormali2 = BABYLON.Vector3.Zero();;    var vertexNormali3 = BABYLON.Vector3.Zero();;    // indice triplets = 1 face    var nbFaces = indices.length / 3;    for (index = 0; index < nbFaces; index++) {      var i1 = indices[index * 3];      var i2 = indices[index * 3 + 1];      var i3 = indices[index * 3 + 2];      // setting the temp V3      BABYLON.Vector3.FromFloatsToRef(positions[i1 * 3], positions[i1 * 3 + 1], positions[i1 * 3 + 2] , p1);      BABYLON.Vector3.FromFloatsToRef(positions[i2 * 3], positions[i2 * 3 + 1], positions[i2 * 3 + 2] , p2);      BABYLON.Vector3.FromFloatsToRef(positions[i3 * 3], positions[i3 * 3 + 1], positions[i3 * 3 + 2] , p3);      p1.subtractToRef(p2, p1p2);      p3.subtractToRef(p2, p3p2);      BABYLON.Vector3.CrossToRef(p1p2, p3p2, faceNormal);      faceNormal.normalize();      // All intermediate results are stored in the normals array :      // get the normals at i1, i2 and i3 indexes      normals[i1 * 3]     = normals[i1 * 3]     || 0;      normals[i1 * 3 + 1] = normals[i1 * 3 + 1] || 0;      normals[i1 * 3 + 2] = normals[i1 * 3 + 2] || 0;      normals[i2 * 3]     = normals[i2 * 3]     || 0;      normals[i2 * 3 + 1] = normals[i2 * 3 + 1] || 0;      normals[i2 * 3 + 2] = normals[i2 * 3 + 2] || 0;      normals[i3 * 3]     = normals[i3 * 3]     || 0;      normals[i3 * 3 + 1] = normals[i3 * 3 + 1] || 0;      normals[i3 * 3 + 2] = normals[i3 * 3 + 2] || 0;      // make intermediate vectors3 from normals values      BABYLON.Vector3.FromFloatsToRef(normals[i1 * 3], normals[i1 * 3 + 1], normals[i1 * 3 + 2] , vertexNormali1);      BABYLON.Vector3.FromFloatsToRef(normals[i2 * 3], normals[i2 * 3 + 1], normals[i2 * 3 + 2] , vertexNormali2);      BABYLON.Vector3.FromFloatsToRef(normals[i3 * 3], normals[i3 * 3 + 1], normals[i3 * 3 + 2] , vertexNormali3);      // add the current normal to the face to these intermediate vectors3      vertexNormali1 = vertexNormali1.addInPlace(faceNormal);      vertexNormali2 = vertexNormali2.addInPlace(faceNormal);      vertexNormali3 = vertexNormali3.addInPlace(faceNormal);      // store back intermediate vectors3 into the normals array      normals[i1 * 3]     = vertexNormali1.x;      normals[i1 * 3 + 1] = vertexNormali1.y;      normals[i1 * 3 + 2] = vertexNormali1.z;      normals[i2 * 3]     = vertexNormali2.x;      normals[i2 * 3 + 1] = vertexNormali2.y;      normals[i2 * 3 + 2] = vertexNormali2.z;      normals[i3 * 3]     = vertexNormali3.x;      normals[i3 * 3 + 1] = vertexNormali3.y;      normals[i3 * 3 + 2] = vertexNormali3.z;    }        // last normalization    for (index = 0; index < normals.length / 3; index++) {      BABYLON.Vector3.FromFloatsToRef(normals[index * 3], normals[index * 3 + 1], normals[index * 3 + 2] , vertexNormali1);      vertexNormali1.normalize();      normals[index * 3]     = vertexNormali1.x;      normals[index * 3 + 1] = vertexNormali1.y;      normals[index * 3 + 2] = vertexNormali1.z;        }  };
Link to comment
Share on other sites

You've put double semicolons here:

     var vertexNormali1 = BABYLON.Vector3.Zero();;    var vertexNormali2 = BABYLON.Vector3.Zero();;    var vertexNormali3 = BABYLON.Vector3.Zero();;
I tend to avoid multiplying the key word “var” when i can. that optimizes the code slightly.

 

    var index = 0,        p1 = BABYLON.Vector3.Zero(),        p2 = BABYLON.Vector3.Zero(),        p3 = BABYLON.Vector3.Zero(),        p1p2 = BABYLON.Vector3.Zero(),        p3p2 = BABYLON.Vector3.Zero(),        faceNormal = BABYLON.Vector3.Zero(),        vertexNormali1 = BABYLON.Vector3.Zero(),        vertexNormali2 = BABYLON.Vector3.Zero(),        vertexNormali3 = BABYLON.Vector3.Zero(),        nbFaces = indices.length / 3;
Link to comment
Share on other sites

Use now this (Q&D) link to tune it to your computer : http://jerome.bousquie.fr/BJS/computeNormals/menu.html

 

Please guys let me know how it behaves in your browsers (Chrome, IE, FF, etc).

I feel a bit afraid to modify computeNormals() in the main repo  :wacko: and I need to know before if it works on your platforms as well as on mine.

Link to comment
Share on other sites

Excellent optimization Jerome, nice work!

 

You could maybe set up a few test cases to prove that your new function replaces perfectly the current one? Something link:

1/ compute normals on a ribbon (your current test case)

2/ compute normal on another base shape, for example : a height map

3/ compute normal on an imported mesh

Link to comment
Share on other sites

here in chrome, I get a enormous gap between the two functions for a mesh sized 60 (28 800 vertices) and more : 60 fps  / 20 fps

3 times faster !

 

size 70 : 60 fps / 12 fps  =>    5 times faster !

size 90 : 50 fps / 9 fps  =>     5,5 times faster

size 120 : 30 fps / 4 fps => 7,5 times faster !!! with the last acceptable framerate : 30 fps

size 150 (135 200 vertices !) : 25 fps / 3 fps. well 8,33 times faster.. but 25 fps only

Link to comment
Share on other sites

If someone could test please on a imported mesh (never done this)

 

I will do it on a height map, ok.

I tested on a ribbon because it is intended to be morphed then with CreateRibbon() and because it is highly parametric.

 

On a height map, I will probably recompute the same normals each frame... well, it is not visible but a big computation.

Link to comment
Share on other sites

Actually the mesh type doesn't really matter.

The computeNormals() method only knows about a positions and a indices array as input.

It only depends upon the size of these two arrays.

 

This example generates meshes sized dozen of thousands of vertices which is quite big compared to usual meshes having hundreds or thousands of vertices only.

Link to comment
Share on other sites

Differences are more difficult to notice in FF because when debugLayer is visible, it doesn't go higher than 30 fps here.

Tests should be done with no debugLayer and FF profiler to register the average fps.

 

I will test later ... (I'm in Toulouse tomorrow for a conf about systems and networks  supervision, will try to make a demo of my 3D weathermap :D )

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...