Jump to content

Slow instance performace -- Equivalent of threejs' InstancedBufferGeometry in BabylonJS?


BeanstalkBlue
 Share

Recommended Posts

In both Three.js and Bablyon.js I have implemented almost the exact same scene of 10000 instances of a 30 vertex mesh.

In my Three.js implementation it takes ~3ms to do the draw call that renders these.

In my Babylon.js implementation it takes ~40ms (Frame time) to do 1 draw call that renders these instances. I am following the Instance demo on the babylon website.

In Three.js, there is a "InstancedBufferGeometry" object that speeds this up considerably. Does BabylonJS have a similar concept? How do I build a vertex buffer that draws these instances faster?

The mesh being drawn is the same for all instances, with the same shader being used for all meshes, with only a small amount of data (position, orientation) of the mesh changing per instance.

Link to comment
Share on other sites

Thanks! The SPS is pretty restrictive in how it works though. I am not building a particle system. I guess I could delve into the BabylonJS typescript and duplicate the SPS with customizations for my own system, but is there another way?

 

Playing with SPS in the babylon-playground it seems like this runs quite a lot slower CPU-side (maybe around 5x slower) than what I am doing with three.js so I wonder if under the hood a different method is being used to manage the instances?

Link to comment
Share on other sites

@BeanstalkBlue does 3JS cull on the CPU side each instance or is it done triangle-per-triangle on the GPU's side?

If I remember correctly the BJS implementation of Instanced Mesh does the culling on the CPU side (which I think is understandable when you understand at which level BJS is meant to operate) and then reconstruct the InstancedBufferArray before each render.

I don't know how 3JS is working, but my first guess would be: no culling, everything is static (except of course the transformation matrix) and that's why it's fast.

So I would say it depend of what you're trying to achieve, culling is costing a lots of CPU time in Javascript, the Mesh class is a full features one in BJS you can do many things with it, but even if you don't use these feature it has an overhead and anyway I think most of the time would be spent during the culling.

If you don't mind about CPU culling of your instanced then I guess using BJS' instanced meshes is not the best option. I wonder if SPS is culling on a per instance basis (I don't know SPS very well, but Jerome will answer about this) but I would say it doesn't/shouldn't.

You said SPS is too restrictive for  you, can you tell us which these restrictions you think about? 

Link to comment
Share on other sites

If Mesh is CPU culling that would make a lot of sense for why it is taking so much CPU time. I want to do my own culling, as I have an efficient way to do this that applies to my game specifically.

Consider the case of a minecraft world made up of simple cubes (not my exact case, but close enough). And let's say we want to achieve drawing it with cube instances. We don't need per instance orientations that SPS gives us since all cubes are oriented the same way.

Since the entire world is made of these instances, optimising this process is critical to game performance, so I would like to discard things like the per instance orientations in SPS.

Basically, do I need to look at the BabylonJS code and rebuild the SPS class for my own purposes? Or does BabylonJS expose enough of the primitives that SPS uses so that I can build this myself without even looking at the BabylonJS typescript?

Link to comment
Share on other sites

Ok, so it's not about "restrictions" when you talk about the SPS, it's the opposite: the SPS system offer more features than you need and then you think you could achieve better performances if you stick to the strict minimum.

I may be wrong, but I don't think you would win a lot of time by getting rid of the orientation info. You would of course, because with what you need a Matrix is not necessary and a simple Vector3 would seal the deal.

Maybe @Deltakosh can tell us if it's possible to disable culling at the InstancedMesh level, but I took a look at the instancedMesh.ts file and it doesn't seem to be the case. It could be a fairly simple change to add this feature and CPU time would definitely be won.

But I think you will agree when I say that BJS is a high level 3D Engine, we try to cover most cases but we won't cover them all. If 3JS is rendering a bunch of instanced mesh without doing any culling, well, my first thought would be that it's not a good idea, my second thought would be: well, the power ratio between one Javascript Thread and a GPU being that ridiculous maybe it's something to consider. But you can't compare the perf of a single drawcall with static buffers against a 3D engine that update the instancedArray buffer while doing culling.

But when I think about what you're willing to do, looks like to me that if rendering all these instances is one of the cornerstone of your game/app and if performances are critical then there's no mistake: you have to write your own class and stick as close as possible to the metal.

I don't think it would be something really hard to do, I'm relying heavily on InstancedArray for the Canvas2D feature and I've developed a set of classes to create/update easily the Float32Array buffer that will update the instanced Array and also that support dynamic size (which is a must have when you cull because you don't know how many instances you will send to the GPU), the class is DynamicFloatArray in the babylon.dynamicFloatArray.ts file. Then you have the whole "low level" of BJS that you can use to deal with mesh, effect/shaders and their rendering.

It's what I did when I've developed Canvas2D. It requires you to take a closer look at all these things, but it's accessible and it will definitely pay off at the end if you write your own 3D Visual class.

Description of the DynamicFloatArray class

Quote

 The purpose of this class is to store float32 based elements of a given size (defined by the stride argument) in a dynamic fashion, that is, you can add/free elements. You can then access to a defragmented/packed version of the underlying Float32Array by calling the pack() method.
   The intent is to maintain through time data that will be bound to a WebGlBuffer with the ability to change add/remove elements.
   It was first built to efficiently maintain the WebGlBuffer that contain instancing based data.
   Allocating an Element will return a instance of DynamicFloatArrayElement which contains the offset into the Float32Array of where the element starts, you are then responsible to copy your data using this offset.
   Beware, calling pack() may change the offset of some Entries because this method will defragment the Float32Array to replace empty elements by moving allocated ones at their location.
   This method will return an ArrayBufferView on the existing Float32Array that describes the used elements. Use this View to update the WebGLBuffer and NOT the "buffer" field of the class. The pack() method won't shrink/reallocate the buffer to keep it GC friendly, all the empty space will be put at the end of the buffer, the method just ensure there are no "free holes". 

 

Link to comment
Share on other sites

the SPS is a big mesh so it behaves like a mesh : one world matrix, one global culling.

You don't need to update all the SPS parts (particles) each frame, you can update only the required ones and only when you need what is quite performant in this case :

setParticles(i, i) => updates only the i-th particle

You can also set/otientate all your particles only once at the world creation for instance and make an immutable SPS is nothing evolves then... or set everything once with an updatable SPS and then just update the very needed part.

Several people on this forum made games with a world based SPS like Iiceman or Temechon as far as I remember.

Can't you reproduce a short prototype of what you're dong in the playground and telling us what you would expect from it so we could maybe help you ?

Link to comment
Share on other sites

The "disable culling" suggestion I think doesn't stop every mesh from getting inserted into the world mesh tree, which is still getting cull tested. I'm not sure. But debug overlay says "mesh selection" is still taking ~20ms per frame for 2000 meshes. In my 3JS InstancedBufferGeometry way it is at least 5-10x faster.

I am willing to draft a prototype in the playground I guess, but maybe someone can offer some direction first. Is SPS really the way to go for this? Consider this screenshot:

hex-field-instance-testing-babylon.png

 

You could think of this as simply a heightfield represented by meshes.

Link to comment
Share on other sites

In the case that updating many particles is required, will having two particle systems and then alternating between updating one and rendering the other (render one this frame, render the other next frame) potentially speed things up?

I started writing a playground prototype of this below. How do I stop one SPS from drawing for the current frame? (so that I can theoretically let it's vertex buffer get updated this frame without blocking on that GPU call)

http://www.babylonjs-playground.com/#VHJYX#4

Link to comment
Share on other sites

I don't think that's a function of sps actually?

Anyway I think for this idea I need to create my own new version of the SolidParticleSystem and implement a double buffered vertex buffer object. Because right now with a single buffer, the SolidParticleSystem becomes coupled to the GPU on vertex data changes, which is bad for performance.

Or please correct me if this is somehow already implemented in Babylon.js.

Link to comment
Share on other sites

Are you sure the performances are impacted due to changing the content of the instanced vertex buffer before submitting it for render? Is it something you successfully proved? 

I want to be sure because long time ago we were told by drivers publisher/OpenGL/DX people that we had to do that, then few years later they said it was no longer true because they took care of it (the same way) internally. So I wonder how is it working with WebGL, but my understanding was WebGL was using OpenGL internally, so...

Maybe the implem on mobile is less advanced than desktop, this would be something to check.

But using double buffering techniques must be done when you are really sure you win at the end...a noticeable win...

Link to comment
Share on other sites

That's a good point. Thanks :)

Changing the vertex buffer data is causing a very large drop in framerate for me.

For this optimization, I don't know the answer for sure. I do notice that 3JS has a double-buffered particle system demo.

However maybe I leave this alone for now and optimize it later, since ultimately the fastest way to do this requires WebGL 2.0 anyway, and I don't want to spend too much time on a WebGL 1.0 solution that only saves a small (or maybe zero) render time.

By the way, I don't want to prematurely optimize, but the reason I am so focused on this right now is that the entire game world is made of these objects, so finding a performant way to draw them is going to be an important task at some point.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...