fenomas

Members
  • Content count

    465
  • Joined

  • Last visited

  • Days Won

    9

fenomas last won the day on February 25

fenomas had the most liked content!

2 Followers

About fenomas

  • Rank
    Advanced Member

Recent Profile Visitors

1,227 profile views
  1. So, the absolute numbers will change from profile to profile, depending on your machine's CPU and so forth. So what I usually do is, look at the total time spent executing scripts compared to the time spent at some point in the "call tree" graph. For example, if the root of the tree ("Animation frame fired") has a total time of 50%, and further down in the tree "computeWorldMatrix" has a total time of 25%, one can say that computeWorldMatrix is accounting for about half the script execution time. On a slower machine it might be 80% and 40%, so the absolute numbers can be misleading but the ratios sort of tell you what's going on. When you start to talk about stuff like this, you really have to know what's going on inside V8 to make predictions about what will improve performance. For code like here, just because there are a lot of "var tm5 = this.m[5];" statements doesn't necessarily mean that the JS engine is allocating new floats onto the stack - the optimizing compiler does a lot of magic and it's hard to predict how it all works. The best way I've found to test performance improvements for low-level stuff like this is to make two versions of the function that I want to compare, and then route the code so that it alternates between each version. They you can just profile, and see which function took more execution time. For example, here's what this would look like for testing multiplyToArray: http://www.babylonjs-playground.com/#E2HVNG#1 Down at the bottom you can see that I define two alternate versions, one just like the original and one that doesn't declare temp vars. If you profile that you should find that the original version is somewhat faster than the alternate. (Not to say that the function can't be improved - I think I can speed it up moderately, but the other stuff you're looking at sounds more likely to be valuable) This part of the code I don't understand at all, but if calls can be skipped that'd be cool. Thanks for looking at it!
  2. This was my expectation as well, but for scenes with lots of simple meshes it seems to be the bottleneck, by a long shot. Here's a simple pg that demonstrates roughly what I'm talking about - for me, profiling that shows that about 50% of the total scripting time is spent inside computeWorldMatrix. (Profiling in the playground is iffy, but if you load that link and profile it without changing anything, nothing should get deopted so it should be fine.) I am using Octrees - performance is better with them than without them, but computeWorldMatrix is still the biggest bottleneck either way.
  3. Hi, I have a scene with around ~800 non-static meshes moving around, and I find the main performance bottleneck is the time taken by Babylon calling evaluateActiveMeshes, which in turn calls computeWorldMatrix and _updateBoundingInfo on most of the meshes. However, the nature of my scene is that most of the meshes never rotate, and I separately track their locations and bounding info. So in principle, it seems like I could tell Babylon they're static (by calling freezeWorldMatrix?), and then manually update their boundingInfo objects, and set their worldMatrices to simple translation matrices. Would this be a safe approach, or has anyone tried it? Or is there some built-in way of achieving a similar result? Or does freezing the world matrix have other implications that would cause this to break? Thanks!
  4. @Temechon Did you ever get a chance to look at this? The inspector layer is still broken for anyone building JS projects with bundlers like webpack. To recap the problem: the inspector bundle calls a function called "__extends" which isn't defined anywhere in the bundle. The babylon library bundles (babylon.max.js, etc.) define "__extends" as a local variable, so if you load them directly into the HTML page they pollute global scope with the __extends function, and the inspector doesn't break. However if you require in Babylon.js from a bundler, the Babylon code runs inside a closure, global scope doesn't get polluted, and the inspector code crashes trying to call undefined(). To recap the fix: I believe it has to do with how the inspector bundle is compiled. There should be a typescript flag that sets whether or not to include typescript boilerplate (including __extends) into the bundle.
  5. When you apply a texture to a mesh, every fragment of the mesh will get painted with a pixel of the texture. Wrapping vs. clamping just changes which pixel. So if you want part of the mesh not to have any texture, you'll need to split the mesh into separate parts, or change the texture to have transparent pixels, yeah.
  6. You mean like this? https://www.babylonjs-playground.com/#NXVFFK#1 If so, the clamp mode is the problem. When you set an offset, the edge of the texture moves to the middle of the mesh. Setting clamp mode tells the engine to keep drawing the edge of the texture beyond that point.
  7. Warning: link in first post is a CPU bomb. Don't open it. Here is the same playground with the number of particles scaled down. As for your question, well, you're creating an SPS with 6M vertices, every single one of which gets updated every frame, so that's liable to be a little slow. You might try using a simpler mesh for your particles, instead of spheres with 120 vertices each, but you may also need to preprocess your point cloud to reduce the density. Also, for me in Chrome at least the core SPS function is getting deoptimized. That kills performance but it's probably a side effect of the playground. Keep in mind that the PG is not for high-performance; most scenes will run faster in a regular page.
  8. Yes, I follow you and agree with what you're saying. If you don't process every particle, inactive ones will get into the view, and one has to suspect that things will happen (whether with picking, or artifacts due to the material used, etc). Basically, if the number of particles you need isn't constant, it feels like you should create a system large enough for your peak needs, and then just turn particles on and off as needed. But in light of the inner loop, dead particles are only moderately less costly than live ones, so it probably makes more sense to make several smaller SPSs and dispose the ones that aren't needed. Looking at it very naively, I believe the GL call that renders VBOs lets you specify start and end points, so if BJS exposed that somehow then SPS could just skip dead particles entirely, both processing and rendering. But that's well into @Deltakosh territory, and even if you could skip vertices for rendering, making it work well with other systems (e.g. picking) would probably be a lot of work...
  9. Thanks Jerome, I've confirmed the fix in my project with new nightlies!
  10. Yeah, I saw that "alive" wasn't being used, so I ignored it. It would make sense to implement something with that, yes! But if dead particles are skipped entirely by setParticles, then recycling a particle by marking it invisible and dead wouldn't work, right? You'd need to mark it invisible but not dead, then call setParticles on it, and then mark it dead, or else it would remain visible, right? (And incidentally, doing this on individual particles looks like a bad idea due to overhead. If you call "setParticles(n, n)" on 500 different values of n, then if billboard is set you just calculated view matrices 500 times?) The background here is, when you render a VBO/IBO in GL, you specify start and end points right? I had thought that maybe Babylon exposed this somehow, so that if I kept my particles list partitioned between live and dead elements, I could tell BJS the span of live indices and the dead ones would get skipped entirely (so there was no need to scale them down or move them somewhere invisible). But it looks like BJS doesn't expose anything like this, right?
  11. Hi, I have a question about an SPS where not all of the particles are in use. That is, suppose the SPS has 10,000 particles in total, but at the moment only 5,000 of them are visible, and the other 5,000 are inactive, ready to be emitted. At first glance, since setParticles takes "start, end" parameters, it looks like one can save performance by keeping the 5,000 dead particles at the end of the particles array. That way you can call "setParticles(0, 5000)" and the update loop won't have to visit the dead particles. However, if I understand SPS correctly, it doesn't have any mechanism to skip rendering dead particles - it renders them, but if they're invisible it scales them down and moves them inside the camera frustum, right? And since that happens inside setParticles, this means that setParticles needs to be called even on invisible particles, right? So is there any way to tell the SPS to skip unneeded particles? Or do you just need to mark them invisible but still include them in the setParticles loop so that they get moved to places where they don't render?
  12. Repro: http://www.babylonjs-playground.com/#8J3QW9 I haven't checked, but I'll bet anyone a doughnut that the fix is: s/camera.position/camera._globalPosition/g
  13. Follow up - in looking at the sorting feature Three does, I don't thing BJS does this but I've no idea if it would help. You could try it easily enough - AFAIK if you manually sort the "scene.meshes" list by distance every N frames it ought to have the same effect.
  14. This is a kind of dangerous way to do things. All those techniques can slow down a scene as well as speed it up; if you're not profiling often it's hard to know if they're helping. I don't think there's any general thing that can be done automatically for occlusion culling - if there was every engine would do it already. There are apparently GPU culling techniques (I'm not familiar with them), but they need to be designed around the content. So I think your best bet for culling is what Jerome said - it will help if it can be done manually based on unchanging things in your scene. For example if there's a building in the middle of the scene, and you can know that when the player is on one side of it then things on the other side are occluded, etc. Or more generally, you could take some kind of approach of choosing the most complex geometries in the scene and pre-caclucating some data structure of where they're visible from. But any kind of general occlusion testing that's done per-frame is going to cost more than it saves.
  15. Listen to Adam! You can't speed up a scene unless you know where the bottleneck is. If your scene just has too many vertices, then culling objects might help, but if the problem is on the CPU side (and it usually is), adding more CPU work before rendering won't to help. Things also depend on mobile vs desktop, etc. If you have a lot of meshes, frustrum culling may be a huge issue - it certainly was for me. Using octrees and merging meshes (with SPS and with submeshes/multimaterials) helped this enormously. But it could be problems with materials, or something else - you have to know what's slow.