Jump to content

Observed inactive meshes take just as much CPU as when active


JCPalmer
 Share

Recommended Posts

I know the optimizing of inactive meshes is not a priority, but thought I would report the observation that making a lot of them inactive / invisible does not reduce the cpu, or increase FPS when cpu bound:

public setLayerMask(maskId :number){    for (var i = this._subs.length - 1; i >= 0; i--){        if (this._subs[i] !== null)            this._subs[i].setLayerMask(maskId);    }    this.layerMask = maskId;    // need to make sure not pickable, when mask is for suspended level    this.isVisible = maskId !== DialogSys.SUSPENDED_DIALOG_LAYER;}

Given that I have never seen line level cpu or run count profiling for javascript, this may be useful to help pin down closer, where much of the cpu overhead of having a mesh might really be.

 

How to reproduce observation. 

  1. Run the Dialog Tester Scene.  https://googledrive.com/host/0B6-s6ZjHyEwUfjlzYXJKMC1zLXdIaV81REJhbjdfRmczQTJFOEpjWWg2SUIwZVRRS0VsR28
  2. Click the Use System Camera checkbox, which will enable "Dock" button.
  3. Turn on Debug layer with the checkbox. 
  4. Observe values of statistics
  5. Temporarily hide statistics, so the "Dock" button can be clicked
  6. Re-enable statistics, & compare.
Link to comment
Share on other sites

Andy,

Looking at statistics with the "Modal Stack" menu selected, 753 meshes (700 active), I show "Mesh selection" Duration at around 95 ms (after a while).   When I Click "Dock", it only drops to about 65 ms with only about 29 active meshes.  If it was perfectly linear, it would drop to 95 * (29/700), or 3.9 ms.  That is not even close to 65.

 

Looking at the section DK linked to.  Yes, it would appear it HAS to be computeWorldMatrix, BUT there is checking at the front to not always do it.  I added a counter, which incremented only when it was actually done, then changed my "Input" button to write that number to console.  If nothing changed, I can go minutes between "Input" clicks and the value to console is same.  Something else is responsible.

public static nCompWMs = 0;public computeWorldMatrix(force?: boolean): Matrix {    if (!force && (this._currentRenderId === this.getScene().getRenderId() || this.isSynchronized(true))) {        return this._worldMatrix;    }    AbstractMesh.nCompWMs++;    ....}

Was hoping that there was a defective check for not always doing it.  If it were, fixing would improve everything, not just for inactive meshes (not that important).

 

Really wish Javascript had a line level profiler.  It is critical for an interpreted language.  I had one way back in the mid 80's with the Sharp APL interpreter. It saved my life over an over, even though I had to code my own reports.  Netbeans's Java profiler is to die for.

Link to comment
Share on other sites

fenomas,

Thanks I had never found the profiler before.  BTW, yes thought you were Andy.

 

Now that I have this, I know why computeWorldMatrix consumes so much cpu, when it is not actually doing anything.  The test itself of check to see if it needs to do anything is very large. Normally, you want the checking to save cpu to be as fast as possible.  See my profile with the inside of computeWorldMatrix.

 

post-8492-0-56517100-1429902045_thumb.pn

This scene can not only generate a huge # of clones, but they are highly nested.  All this recursive parent checking for sync is a large waste of time.  If the recursion was in _evaluateActiveMeshes & in the opposite direction (parent to child), the parent calling their children would already know if it was in sync and could pass it.  Not all scenes do as much parenting as this, but overhead checking is not good.

 

I will think about this.  This computeWorldMatrix step might be made as a separate pass through scene.meshes in _evaluateActiveMeshes().  A separate pass would mean activeMeshes would still come in the same order as before (I know you care about the order for materials).

 

Link to comment
Share on other sites

Hey, okay, hope your thinking is helpful. ;)  I also suspect optimizations could be made here - for example in a typical case for me I often see the scene spend 80% of its time in mesh selection (20% rendering) even when nothing is moving anywhere in the scene, and that's with no more than one level of nesting. One might think that more matrix updates could be skipped, but then I've logged 6-8 bugs lately and I think they've all been due to BJS being too aggressive in skipping matrix updates, so it's presumably not a simple matter.

 

(And I am Andy, yes, I just didn't know I'd said so ;) )

Link to comment
Share on other sites

Your Github account is your name.  Some of the issues you posted, that I get emails of, could only have come from you.  Was on Github yesterday.  You even have the same avatar.

 

I spent Friday afternoon, testing my theory. 

  • I modified Node.ts to hide direct access to parent with gets / sets.  This allowed a children : Array<Node> property & a way to maintain it.
  • Made changes to abstractMesh.computeWorldMatrix(), adding a skipParentSyncChecking, and in isSynchronized too.
  • Added recursive scene function computeWorldMatrixTree(), and called for all meshes that either did not have a parent or the parent was not a mesh, eg. a camera.  

It worked practically first run, but somehow the result was unchanged.  I try not to keep my mad scientist changes to the repository around too long, so I trashed them with a reset.  I save the entire filesystem daily.  Might still have it, could paste changes, if so.

Link to comment
Share on other sites

We have to be very cautious here as Fenomas mentioned. Computing the world matrix is expensive and so Babylon.js uses various ways to skip this step.

 

Once of them is obviously the evaluation of active meshes. This is a complex problem because a world matrix can be updated:

- Because you changed position, rotation, scaling

- Parent or parent of parent or parent of parent of parent (and so on) changed its world matrix

- You are using Billboarding

Link to comment
Share on other sites

I think for the vast majority of a scene, meshes DO NOT change every frame, e.g. background meshes.  Think the best strategy would be to take away direct access of any property that could cause a recompute, just like I did for Node.parent with getters / setters.  The setters could set a simple _isDirty : boolean.  ComputeWorldMatrix, could just check this.  The node.children member could handle parent changes.

 

Do not know what a renderID is, so do not know if ComputeWorldMatrix could set it back to false, or the scene would have to.

 

Think the code would be a lot cleaner too.  Think allowing direct access, leads to either recomputing everything every frame, or increasingly intricate checking & difficult code to follow.  The overhead of a getter / setter is probably low, and you only pay for when you use it.

Link to comment
Share on other sites

We already check if something has changed without having to use getter and setter: This is the goal of IsSynchronized which check against  cached values :)

 

Adding getter/setter will have performance impact in ALL the engine as vector3 are used everywhere. And I'm pretty sure that even if we removed the isSynchronized stuff, this won't lead to a big performance gain (but perhaps I'm wrong :))

 

 

One idea: Adding a IsWorldMatrixFrozen property to mesh. THis will be used to block the update of the WorldMatrix.

 

THoughts?

Link to comment
Share on other sites

Ok, with the last push I introduced 3 optimizations:

- mesh.freezeWorldMatrix() and mesh.unfreezeWorldMatrix(). A frozen world matrix will never be evaluated and always server from cache

- mesh.alwaysSelectAsActiveMesh = true: Frustrum clipping is disabled which leads to performance improvements in active meshes evaluation (But will remove frustrum clipping then)

-mesh.isEnabled == false will now block comptuteWorldMatrix evaluation

 

 

Feel free to give feedbacks!

Link to comment
Share on other sites

Application level optimizations do offer ways for the developer always do or never do things, that only they would know.  Also bit of an advanced feature, so probably want to do this as part of a publishing phase.

 

I have seen where using a pair of methods to set something on or off were later regretted, .e.g. Java swing show()  & hide().  In that example, they changed to setVisible(boolean).  Wonder if single function like mesh.setFixedWorldMatrix(boolean) might allow for more flexible calling.

 

For the dialog extension, using isEnabled to completely block comptuteWorldMatrix evaluation of entire Panel hierarchies that I know will never show on any camera, sounds good.

Link to comment
Share on other sites

I will have a look at how these affect my scenes later in the week. I would imagine that skipping matrix updates for disabled meshes will go a long way towards solving JC's case, but scenes that regularly have lots of disabled meshes probably aren't so common.

 

Just to check, freezeWorldMatrix just affects a mesh's matrix w.r.t. the world, and the camera transform is separate on top of that, right?

Link to comment
Share on other sites

Okay, I wound up trying these today. First, freezeMatrix is a solid improvement. Here's a scene with a couple thousand meshes (only a few hundred draw calls), which already uses octrees to moderately speed up mesh selection:

 

BxNskdp.png

 

Here's the same scene after freezing the terrain:

 

voB4mF2.png

 

So yeah, solid improvement! Very cool.  B)

 

With that said, some thoughts:

 

1. Would mesh.static:Boolean might be a better name? It would be hard for casual users to guess the implications of "freezeWorldMatrix", but "static" would be pretty straightforward. There might even other optimizations one could do with a mesh that the user has declared to be "static".

 

2. Could BJS initialize the mesh's world matrix when the freeze API is called? It would be most straightforward if the user can create a mesh, set its position/rotation, and then freeze it, but that doesn't work (I assume because the matrix doesn't get made until the next render).

 

3. It doesn't work for billboards. :(  I know you already alluded to that but do you think there's any (possibly separate) way that billboarded static meshes could be optimized? I think it's a fairly common use case to have lots of terrain billboards that never move (grass, flowers, etc).
Link to comment
Share on other sites

Sorry for interrupting this ver interesting technical debate but just for my own knowledge : the worldMatrix is the transformation matrix from the mesh local space to the world space and so there is a worldMatrix per mesh.

Am I right ?

 

The debate here is about to improve performance by not recomputing this worldMatrix each frame for meshes tagged as immutable/static once created (if this feature would be possible).

Did I get it ?

Link to comment
Share on other sites

@fenomas:

1. I want to keep it as a function because it implies some drawbacks that the user has to understand. So I prefer having a explicit function there

2. Already the case: https://github.com/BabylonJS/Babylon.js/blob/master/Babylon/Mesh/babylon.abstractMesh.ts#L189

3. Billboards need to have a new worldmatrix per frame. Because they are facing the camera

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...