Performance/draw calls

fenomas · February 15, 2015

Hi. Are there any guides that comment on how to make Babylon.js perform well?

I'm wondering about things like:

What causes a model to require more/fewer draw calls?
Does a multi-material mesh perform better than the same model split into several normal meshes?
For a static model, does the structure of the data affect performance? For example, if I put all the solid stuff in one child mesh and all the transparent stuff in another, would that be better than mixing them?
Is it possible to combine the textures for a particular model into one texture, and sample from it like a texture atlas? If so, would that require fewer draw calls?

Stuff like that. I couldn't find any documents that comment on this.

If nothing else, just an explanation of what causes BJS to separate draw calls would be really helpful.

Thanks!

davrous · February 15, 2015

Hi,

You can use our debug layer to better understand what cause slowdowns by enabling/disabling features: http://babylondoc.azurewebsites.net/page.php?p=22611

You can even go further by coupling it with user marks: http://blogs.msdn.com/b/eternalcoding/archive/2015/02/02/using-user-mark-to-analyze-performance-of-your-javascript-code.aspx

To answer your question, you should indeed group static geometries and use multimaterials. Michel, our 3D artist, has done a very interesting talk on how he optimized the Hill Valley scene: https://m.youtube.com/watch?v=ObZX541I-Tk

Bye,

David

fenomas · February 15, 2015

Thanks for the link! I found it a bit hard to follow, since it was mainly about optimizing things in 3D modeling tools. I'm working with babylon APIs.

There was a part in the middle where he went through an Autodesk process to collapse several materials into one. Is that what happens if I use submeshes in BJS? That is, if I have a mesh with N child meshes, each with its own material, and that takes N draw calls, should merging them into one multimaterial will reduce it to one draw?

GameMonetize · February 15, 2015

Unfortunately no, because each material requires a reset of the WebGL state. If many objects uses many materials there is no easy way to optimize because each object will need a reset hence a draw call

fenomas · February 15, 2015

I see, thanks Delta. In that case, what is the advantage for merging meshes and using multimaterials?

Wouldn't it be preferable to split up the mesh, and join together the parts that share a material?

That is, if each material causes a new draw call, is it not best to consolidate all static parts of a scene that use the same material?

GameMonetize · February 15, 2015

To reduce draw calls, ideally you should just have one big mesh with several submeshes (one per material),

The minimum draw calls you can have is equal to the number of active materials

fenomas · February 16, 2015

I see, thank you. So it sounds like using multi-materials doesn't necessarily affect performance.

In that case, in the Hill Valley talk where Michel talked about merging complex models into one mesh, can you tell me what was happening on the API side? Was it basically merging textures into an atlas and updating the model's uvs?

fenomas · February 16, 2015

And another follow-up: is there any case where two different meshes (with the same material) are drawn in one call?

Or is the minimum number of draws equal to (number of meshes) * (number of materials used per mesh)?

fenomas · February 17, 2015

Hi.. Perhaps I my question could be clearer with a concrete example.

Suppose I'm rendering Minecraft terrain in BJS. The terrain is split into chunks, and each chunk contains various different kinds of blocks, with different textures. Supposing that 100 chunks are drawn, and each has 10 different kinds of blocks, is there any way to avoid needing 1000 draw calls to rendering this terrain?

jahow · February 17, 2015

Well I guess you could easily go down to 10 draw calls by instancing these 10 base blocks.

I don't know much about multi materials and submeshes, but I know for sure that instancing is insanely useful when you need to render a lot of simple stuff on screen.

Edit: I just realized that you may have meant that each chunk has unique blocks to it... then yes simple instancing would in theory give 1000 draw calls. Hmm, not sure then, sorry :/

jahow · February 17, 2015

Also Minecraft uses one big texture atlas for all its blocks. So in theory your scene may contain only one material (if you can fit everything in it, which may not be possible in a practical case).

And if i'm understanding what Deltakosh said, then if you set all the blocks of the scene as submeshes of one big mesh, you may after all end up with only one big draw call... On the other hand, doing this would not draw benefit from the mesh instancing system. That may be costly if you're drawing thousands of cubes on screen.

Otherwise, if you indeed have 1000 unique meshes on screen, and you want to use instances, I guess you'll have to make those 1000 draw calls. Although it remains an extreme case in my opinion. Even Minecraft which is a pretty complex game has something like 150 unique block types, and can probably get away with no more than 50 unique types rendered at a time.

fenomas · February 17, 2015

Well I guess you could easily go down to 10 draw calls by instancing these 10 base blocks.

If you draw a mesh for every block, then I guess instancing would work, but the number of polys would get pretty crazy. For this reason Minecraft-style engines usually reduce the polygon count by joining neighboring blocks of the same type. If you google image search on "greedy meshing" you'll see what I mean.

In "real" voxel engines I assume they solve this problem by joining the (tiny) terrain textures into an atlas, and drawing all terrain from that single texture. I don't know if that's possible in BJS? Alternately, if there's some fancy way to batch-draw several meshes that all use the same texture? Or some other approach..

fenomas · February 17, 2015

Also Minecraft uses one big texture atlas for all its blocks. So in theory your scene may contain only one material (if you can fit everything in it, which may not be possible in a practical case).

And if i'm understanding what Deltakosh said, then if you set all the blocks of the scene as submeshes of one big mesh, you may after all end up with only one big draw call... On the other hand, doing this would not draw benefit from the mesh instancing system. That may be costly if you're drawing thousands of cubes on screen.

Sorry, I hadn't seen your edit or other reply yet.

To clarify, I'm just talking about something equivalent to Minecraft, so only a few dozen kinds of blocks. I know MC uses an atlas, what I don't know is whether that could be done programmatically in BJS.

As for making things submeshes of one big mesh, the hierarchy of meshes doesn't affect the number of draw calls, does it? At least I couldn't find a way to, other than actually merging geometries.

jahow · February 17, 2015

Well you can most definitely creates meshes for each block types you want, use one material for all of them and set their UV coordinates according to their texture's position on the atlas. If your question is: can I define meshes by code, setting position and UV data for each vertex, then the anwser is yes

But as you point it out, rendering vast amount of landmass based on voxels will probably need heavy meshing algorithms like the one you mentioned. So instancing won't be that useful here.

As for submeshes, I'm not familiar enough with this system, so I won't risk giving a false answer, sorry!

fenomas · February 18, 2015

If your question is: can I define meshes by code, setting position and UV data for each vertex, then the anwser is yes

No, that's the part I've already done.

My question is, how can I render a hundred chunks, each with a dozen block types, without needing a thousand draw calls?

jahow · February 18, 2015

Ok then!

How about: each time a new chunk is loaded, its blocks are merged into a few large meshes with an algorithm like greedy meshing. Hidden blocks are culled. All blocks of the same type are part of the same mesh.

Then, all those big meshes are assigned the same material. They will also be assigned a "texture index", which corresponds to the texture that will be used in the atlas.

The material used will have to use a custom shader that will repeat the selected part of the atlas along the whole mesh. This way, you won't have to stich together all the small textures into a big one used for rendering.

I found this article on the same site, which is pretty interesting and has a very good example of what I mean.

Also you'd have to handle the modification of chunks. Rebuilding the merged meshes each time might be too costly, I'm not sure... Have you tried looking at how minecraft does it by activating wireframe in it?

fenomas · February 18, 2015

Hi, thanks for the reply.

Apart from using a texture atlas, I'm already doing everything you've described - greedy meshing, managing chunks, etc.

Basically, since I'm dynamically generating all the meshes and textures and uvs etc, I thought there might be a best-practices way to structure the data so that it scaled better than the naive way. But it sounds like maybe not?

jahow · February 18, 2015

Not that I know of... It seems you already did quite some work on optimization, so I'm not sure I have any relevant advice to give you!

Do you have a specific performance problem or an identified bottleneck, like too many draw calls ? or too many vertices ?

Also, some more suggestions that come to my mind:

- it seems you're not using instanced meshes in your project. Although merging meshes might be useful for when many similar blocks are next to each other, instanced meshes would definitely be better for isolated blocks. Also it will allow you to more easily modify your chunk topology. Instanced meshes are VERY fast. Rendering hundreds of thousands of triangles is completely doable with it.

- in a minecraft-like game, most of the rendering time will probably be spent on the background, especially if you want nice & far landscapes. I think rendering the background might need a different technique, for example using vertex colors instead of textures for blocks, maybe even sprites? Also, simplifying chunks that are far away should be doable with the right algorithm.

- BJS offers an octree system to speed mesh culling operations; have you looked into that? link

I'm actually very interested in this whole topic (as you may have seen), since heavy optimization is an absolute necessity when creating a large-scale game.

JCPalmer · February 18, 2015

I read most of this thread, then did a search for the words GPU, CPU, & latency. None were found. I think much of the key to merging meshes for the reduction GPU calls is: the latency of the CPU to GPU call. The larger the amount of work done by an individual call, the fewer calls that need to be made. Each call has a latency which stalls all the shaders, so it is advisable to make them count.

If by dynamic generation, vertex data is changed often is meant, e.g. morphing, then more than a gl.DrawArrays is required. CPU to GPU data transfer is expensive. BABYLON.Mesh.MergeMeshes() is not possible for these.

I am not familiar with how hardware acceleration of Instances works, but probably not want to merge them. I wonder if clones are mergeable?

VertexData._ExtractFrom() does check isVerticesDataPresent(), but not clear.

If I may rephrase your best-practices request, I would like it spelled out all the instances where merging meshes WILL NOT work. Is it has updateable vertice data, instances, clones, has matricesWeights & matricesIndices, or changes (scale, position, or rotation)?

fenomas · February 18, 2015

Do you have a specific performance problem or an identified bottleneck, like too many draw calls ? or too many vertices ?

Also, some more suggestions that come to my mind:
- it seems you're not using instanced meshes in your project...

I'm actually very interested in this whole topic (as you may have seen), since heavy optimization is an absolute necessity when creating a large-scale game.

I have not stress tested yet, but I noticed that naively producing a Minecraft-sized world would conservatively need tens of thousands of draw calls just for terrain, so I wanted to look for better ways. Regarding instances, I don't know if it might work in some cases, but in my case I have implemented AO. So I couldn't just have an instance for every "grass" blockface, I'd need a set. I imagine lighting would be a problem as well (I mean MC-style where lighting is part of the vertex colors or material. I assume instances don't have those, right?)

As for scaling, I guess maybe texture atlas is the only real solution for something like this. Not that I'm eager to write shaders..

jahow · February 18, 2015

Regarding instances, I don't know if it might work in some cases, but in my case I have implemented AO. So I couldn't just have an instance for every "grass" blockface, I'd need a set. I imagine lighting would be a problem as well (I mean MC-style where lighting is part of the vertex colors or material. I assume instances don't have those, right?)

Ha! Hadn't thought of that. Seriously though, all these issues have been tackled in Minecraft so I think a very good thing to do (if not already done) would be to try and figure out how that works.

Instances all share the same material, and I haven't find a way to define custom properties on them that may be used in the material's shader program. So changing vertex color will be an issue with them. I guess it kind of rules this solution out.

Ok so let's keep greedy meshing as the first step of the optimization. This is basically a simple way of reducing the number of vertices. Then, you will have to take that draw call count down, as well as reducing the amount of CPU to GPU calls, like JCPalmer said. To do that, I guess you will have to merge geometry at some point: this way, each GPU buffer will contain as many blocks as possible, and all these will be rendered in one draw call.

As a first approach, I'd say: merge each chunk as one big mesh. Using submeshes will improve the culling of its blocks, but that can come later. Then, I can think of 2 methods:

1. allow some chunks to become 'active', meaning that they're not just one big mesh but several smaller, separate meshes. Chunks can for example be marked as active when a player or NPC capable of modifying blocks is into them.

2. keep the 'one chunk = one mesh' system all the time. When a block is modified in a chunk, go and update only the specific part of the GPU buffers to reflect that change in the chunk geometry. Even better: batch modifications so you don't do 10 CPU to GPU calls each frame, but wait for a frame or two to gather as much geometry change as possible, and then inject them in the GPU buffers in one quick move.

As to 'how to do it in BJS', here is a link to a great article by Temechon in case you need it.

What do you think? Is that giving you any lead?

fenomas · February 19, 2015

Jahow, thank you for the reply but what you're describing amounts to using larger chunks, or variable sized chunks. And certainly that might help, but it's orthogonal to BJS.

Basically my goal here was to find out whether I'm overlooking any BJS features that allow minecraft-style rendering to scale better. Such as, a way to speed up rendering when lots of meshes share a material, or a way to preprocess multiple textures into one material (which is what minecraft does, and voxel.js, and I assume other engines).

If there aren't, then naturally when I hit a bottleneck I will have to simplify the data or use fewer materials. (Or implement my own texture atlases...)

fenomas · February 19, 2015

Actually, look back over this I'm still confused on one point.

Can anyone confirm why it's beneficial to use submeshes and multi-materials?

In my testing the number of draw calls is the same either way, but is there some other benefit?

jahow · February 19, 2015

Hey,

Using submeshes will improve partial mesh culling and collision detection: by using an octree, BJS will be able to speed up these computations.

Other tools BJS provides that may help you improve performance, that I know of:

- scene octree

- instancing

- scene optimizer

- level-of-detail & in-browser mesh simplification

- mesh merging

Hope that finally answers your question

fenomas · February 19, 2015

Hm. Well, I don't think any of those apply, so I guess that would depend on whether your list is exhaustive. I think probably my initial question here was too vague. Now that we have narrowed down the problem, maybe I'd better post a follow-up.

Thanks for helping refine the problem!

Performance/draw calls

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members