Jump to content

How to optimise performance when using a lot of instances


Recommended Posts

Hey guys,

long time no see, eh? How are you all doing? :)

I started digging up some old game ideas and playing around. I ended up with one where I create a field of hexagons as the game world. Unfortunately I am having performance problems when creating bigger maps and zooming out to show the whole map. Here is an example of what I am trying to do:


Any idea how I could improve performance for this kind of game world?

Link to comment
Share on other sites

Hey Iiceman,

glad to see back :-)

Very nice scene !

Well, reduced to only 7 draw calls, that's great ! almost than 700K vertices and more than 10K meshes on the whole map ... I don't know how you could get better prefs with such a big amount of objects to render

Maybe by using the LOD (level of detail) feature to reduce the number of vertices when a certain distance from the camera is reached ?

Or to find a way not to render everything ... faking what is far

Link to comment
Share on other sites

Hey jerome,

thanks for the feedback! I actually already tried LOD (line 147) and I was surprised to see that it seems makes totally no difference. Here is a version with LOD comment in: http://www.babylonjs-playground.com/#RFIRC#1 Even if the meshes are hidden when reaching the LOD distance, they still have performance impact. Might this be a bug or am I just using it in a wrong way? :huh:

Link to comment
Share on other sites

Yeah, I visited those two sites, too while trying to optimize the scene, but since I use really super simple meshes the simplification doesn't seem to make a difference (at least nothing noticeable frame rate vise).

Anyhow, I assumed then when I completely hide the meshes, it should make a difference. The main problem is not that I need maps as big as in the example playground. I want something about half the size, but there is the forest terrain that is supposed to have little trees on it. I made a super simple model in blender (a group of 3 trees) and when I placed those on my forest terrain the frame rate dropped noticeably when zooming out. Sooo... hiding the trees when zooming out would be okay.... but as you can see in the second playground it doesn't even make a difference if I hide them :(

Link to comment
Share on other sites

The thing that I think could give the biggest optimization is to not draw all that geometry in the first place...  draw all the hexagon tiles onto a 2d canvas first and then use that as a texture on a single low-res groundmesh in babylon. Then use some filters on the texture to generate a normal map that you can also apply to give it the illusion that they stick out.
It won't look quite as good but it will be superfast and done in a single draw call. with only a few verts.

Then for the ones that stick right out, you can use an octree to draw only the ones needed, and maybe instancing if they're all the same mesh. 

Link to comment
Share on other sites

another lead would to build the whole terrain (all the hexagons) with a single SPS, it is to say one big mesh with many many vertices. As it will be only one big mesh, the LOD will have a real impact during the simplification process instead of trying to simplify litlle instanced meshes (each individual hexagon, so few inittial vertices)

this is just an idea among others, maybe mixable also with these others ...

Link to comment
Share on other sites

Thanks for the input guys! I am gonna try your suggestions, maybe I can really combine your ideas somehow.

I started with the SPS version... look promising so far: http://www.babylonjs-playground.com/#WCDZS#15 (I choose really big values so initial computation takes preeeettttyyy long, so be patient)

@jerome: if I zoom in at a certain point the SPS seems to disappear only coming back if I zoom out again far enough. Any idea what I am doing wrong?

Link to comment
Share on other sites

Quoting my recent post, the three rules of optimization:

  1. Profile
  3. PROFILE!!!!

In this case, for me in Chrome, when zoomed out over 70% of the time is spent in Scene._evaluateActiveMeshes. So an octree might be the first thing to try. Also a lot of time calculating world matrices, so make sure to freeze meshes if they're not moving.

Link to comment
Share on other sites

I agree with fenomas.  Instances may reduce the number of draw calls, but not the CPU spent seeing if has each as:

moved, scaled, or rotated? each and every frame.  Looks like they never move, so freezeWorldMatrix of them.  CPU time reduction is single threaded, so reducing senseless overhead of background should always be done.  This is should be done before even bothering to profile.

freezeWorldMatrix can be suspended, so if you have some sort of animation you can turn it off, animate, and re-freeze on an instance by instance basis.

Link to comment
Share on other sites

Thanks for all the feedback!

I ired the SPS might not be the ebst choide for my case after all. The idea for the game is taht cou can dynamically change the tile types later in the game (some kind of terra forming). So having to rebuild the SPS might be too much trouble.

@fenomas There is already an octree in that playgorund (Line 101) ...but maybe I am not using it right? Is there anything else I have to do besides just adding it? But it seemed like it already improved performance a lot when I added it.

@JCPalmer Using freezeWorldMatrix seems to help quite a bit, too, improving 32 FPS to about 45 FPS when zoomed out all the way in my local version. In the playground somehow it doesn't seem to make a difference. But As I said, my local vrsion is a bit different since I have less tiles but more "decoration" placed over the tiles.

The evaluateActiveMeshes call still seems to be the bad guy, but it only calls the render functions so I don't know what I could do to reduce the time that it spend on it... any suggestions? :D


I am not good at interpreting those profile tables, so here is the upper part of my local one, maybe you guys can spot something here that I missed:

38.0 ms       0.16 %	22615.6 ms 95.20 %	babylon.max.js:5579Engine._renderLoop	
7.0 ms        0.03 %	22547.5 ms 94.92 %	babylon.max.js:13642Scene.render	
418.4 ms      1.76 %	22521.5 ms 94.81 %	babylon.max.js:13486Scene._renderForCamera	
4560.8 ms    19.20 %	17174.9 ms 72.30 %	babylon.max.js:13378Scene._evaluateActiveMeshes	
>4560.8 ms   19.20 %	17174.9 ms 72.30 %	babylon.max.js:13486Scene._renderForCamera	
>>4560.8 ms  19.20 %	17174.9 ms 72.30 %	babylon.max.js:13642Scene.render	
>>>4560.8 ms 19.20 %	17174.9 ms 72.30 %	babylon.max.js:5579Engine._renderLoop	
35.0 ms       0.15 %	4894.1 ms  20.60 %	babylon.max.js:12051RenderingManager.render	
543.6 ms      2.29 %	4859.1 ms  20.46 %	babylon.max.js:12110RenderingGroup.render	
325.3 ms      1.37 %	4291.5 ms  18.07 %	babylon.max.js:15268Mesh.render	
229.2 ms      0.97 %	3876.0 ms  16.32 %	babylon.max.js:13451Scene._activeMesh	
2361.5 ms     9.94 %	3638.8 ms  15.32 %	babylon.max.js:13358Scene._evaluateSubMesh	
>2361.5 ms    9.94 %	3638.8 ms  15.32 %	babylon.max.js:13451Scene._activeMesh	
>>2361.5 ms   9.94 %	3638.8 ms  15.32 %	babylon.max.js:13378Scene._evaluateActiveMeshes	
835.9 ms      3.52 %	2649.8 ms  11.15 %	babylon.max.js:8098AbstractMesh.computeWorldMatrix	
20.0 ms       0.08 %	2382.5 ms  10.03 %	babylon.max.js:15240Mesh._processRendering	
2018.1 ms     8.50 %	2361.5 ms   9.94 %	babylon.max.js:15201Mesh._renderWithInsta


Link to comment
Share on other sites

Oh, I didn't see you had an octree made in there!

Okay, here's my guess at what's going on. Most of the time is still spent in evaluateActiveMeshes, which is where the engine chooses which meshes are in the frustum. When I looked at the octree code, if I remember correctly, the way it works is, if an octree block is outside the frustum then Babylon culls all that block's meshes from the scene, but if the block is partially in the frustum, the engine then loops through every entry in the block checking if the mesh can individually be culled.

Thus, when you're zoomed in and some of the octree blocks are getting culled, the octree saves you a lot of performance, but when you're zoomed out and everything's in view, the engine winds up trying to cull every single mesh just as if there was no octree. So I think that's why the performance only drops when you zoom out.

@Deltakosh wouldn't it make sense if mesh selection could check whether an octree block is entirely inside the frustum, and if so, include the whole block for rendering without checking all the entries?

Link to comment
Share on other sites

It make sense but not sure this is the issue here.


If you always have all the tiles visible, you can shortcut selection with mesh.alwaysSelectAsActiveMesh = true.

To improve performance, I would suggest:

- Reduce meshes count: Try to merge meshes who share the same material, this will greatly improve performance. Now you have bunch of 72 vertices which is not enough for the GPU (Even with instances).

- 10007 meshes if too much. You can have objects to represent tiles at high level in your code but you should not have that much at rendering level. This is why evaluate is slow.

- Octree is not a good idea if everything is in the frustrum. Furthermore octree is not adapted to a flat world (Octree is a 3d structure)

Link to comment
Share on other sites

Well, for what it's worth if I run the demo with  alwaysSelectAsActiveMesh=true for every mesh, the performance greatly improves.

It still spends most of its time in evaluateActiveMeshes though - not sure what that's doing if all the meshes are flagged to be active and also have their world mesh frozen.

Link to comment
Share on other sites

Alrighty then, I'll try to reduce the number of instances and maybe I can also merge some instances together. I'll keep developing the game a bit more and I'll keep an eye on performance. I guess I could try alwaysSelectAsActiveMesh when zooming out all the way. For now performance seems okay if I limit the camera movement a bit. I'll keep you guys updated!:D

Link to comment
Share on other sites

Yeah, lods would help a lot for the tiles:

Level 0: 12 vertices
Level 1: 6 vertices
Level 2: 4 vertices (plane, maybe with a texture with the shape? this may be detrimental in the end tho

Another thing is, well keep in mind I haven't touched babylon for like a year, so I may be wrong here, but could you disable shadows on certain particles/meshes? Cause that could help as well for far away meshes.

Another possibility is kinda complex:
Have a huge plane span the entire "world". Whenever a tile is far away, disable the mesh and paint a tile shape onto the huge plane. Whenever it gets closer to the player/camera, erase the tile and re-enable the mesh. The awesome part is since the painted tile is far away, it could be low-res.

If you are interested, I could dig into this a bit at some point :)

Link to comment
Share on other sites

Hmm, sounds interesting, I'll might give SPS another try... FPS seems okay to me.. at least a lot better than with instances. not sure how to properly combine LoD and SPS.

@joshcamas I don't use any shadows. And Using LoD to hide instances didn't seem to give me any performance boost.

@jerome I'll have to replace some tiles alter in the game. that means I'll have to rebuild the SPS? I know I can hide some particles, but removing/replacing is not possible, right?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...