# Expensive computing solution.

## Recommended Posts

This is a question about theory , so I don't have PG for this.

Assume that I have a 3d scene with many meshs, is there any best solutions for computing distance among each meshs, update continuously each render tick, I mean the best solutions for computing and distance calculation is just example. And modern mobile browser is expect target. Performance is important.

So far, first solution that I can think, is using web worker. An asynchronous calculation , via a separating thread will avoid lagging caused by expensive calculation. Everything done by CPU.

Second solution, http://gpu.rocks , I don't understand clearly, but is there a solution to calculate via GPU ? Anybody try this ?

##### Share on other sites

Web worker has a high latency, but maybe you do not really need the calculation on every tick, so you can separate the calculation from the rendering.

Do you need distance from every mesh to every other (N x N), or just searching for the closest n other meshes?
In the latter case you can use a broadphase optimalization, for example sort the bodies by all three axis and calculate distances for the closest ones on that axis, or you can use some kind of hashing algorithm (buckets), like every mesh is on the same bucket by rounding the coordinates to the nearest GRID_SIZE. (putting them into a grid and compute distance to the same and the nearest grid blocks).

All that depends of what is that you want to do with the distances?

Thanks for suggesting gpu.rocks!

##### Share on other sites

@BitOfGold: Yes actually I have some success with web worker. GPUJS is my new approach to get my 3d game performance close to native as possible. And distance calculating is just a "test case". If you or someone have experience with it, know its pros, cons and applying cases, can share me the light ?

##### Share on other sites

Currently I don't think is a good idea doing GPGPU on WebGL game right now. It's true, calculating matrix transformations is super fast on the GPU and it's ideal for general purpose calculation, specially with compute shaders. The problem is that currently WebGL doesn't give any performance optimization when doing synchronization from device to host memory. It might be great for generating an immutable single block of data at initialization but not for executing GPU calculations every frame (updating meshes, calculating physics, etc). The main bottleneck you'll get is with glReadPixels. You can even try it with the gpu.rocks demo, try running that test while rendering a game. Maybe when the standard decides to add compute shaders and PBOs to webgl it could be possible.

##### Share on other sites

@Felipe: That seem a sad news. You are right, don't know how fast it is with GPU calculation but slow transfer result from GLSL can ruin my game. I already do some test with gpuRock and turboJS, even with simple sum mathematics , the response time is always higher 16ms which mean can not fit single frame calculation. May be I should keep sticking with webworker and wait for some day...

##### Share on other sites

Are you sure you're already doing the fastest thing you can do at the algorithmic level?

Often there are solutions to problems like this that involve, for example, pre-computing something once such that the thing you need to know can then be tracked incrementally each frame. Or alternatively, there could be a way to approximate what you need much more cheaply than getting the exact answer, etc.

If you can describe what you're doing a little more, people may be able to help.

##### Share on other sites

@fenomas: I'm just trying to figure out a best solution for complex computing without burning CPU or freezing scene.

Let say I want to try make out an occlusion culling solution with BabylonJS. Launching a ray from camera object to every static meshs on scene(buildings), choose which one "on view" to help me optimizing render process. But since I have a lot of mesh (a large city) then ray collision computing for each mesh can be expensive. Yes I know we have many optimization solutions already (octree,merge meshs,etc) , I just wonder if GPU parallele computing can do the trick, if I have complex gameplay I will not want CPU handle too much, especially mobile target.

But my first approach seem too bad. I tried simple calculation to check response time from GPUrock:

var gpu = new GPU();
var test = gpu.createKernel(function(A, B ) {
var sum = A+B;
return sum;
}).dimensions([1]);

var startTime = new Date().getTime();

test(4,5);

var endTime = new Date().getTime();

var timeCost = endTime - startTime;

Calculation result is 9 , but timeCost is too high, about 50-150 ms which can not fit a single frame render (16 ms ). I think it is enough for a bad solution.

I just tried TurboJS, it is faster but still not fit 16ms for simple calculation.

##### Share on other sites

Here's a PG example calculating distance between 251 meshes using simple math,
on my pc it takes around 10-20 MS (+/-) before optimation.

thats roughly 62750 3d position calculations each frame.

##### Share on other sites

7-9ms on my PC.
With euclidean distance (real distance between points):
http://www.babylonjs-playground.com/#1JB1NX#1
Much slower of course. (25ms)

##### Share on other sites

3 hours ago, tranlong021988 said:

@fenomas: I'm just trying to figure out a best solution for complex computing without burning CPU or freezing scene.

Let say I want to try make out an occlusion culling solution with BabylonJS. Launching a ray from camera object to every static meshs on scene(buildings), choose which one "on view" to help me optimizing render process. But since I have a lot of mesh (a large city) then ray collision computing for each mesh can be expensive. Yes I know we have many optimization solutions already (octree,merge meshs,etc) , I just wonder if GPU parallele computing can do the trick, if I have complex gameplay I will not want CPU handle too much, especially mobile target.er but still not fit 16ms for simple calculation.

I strongly suspect you'll have better results setting up a normal scene, measuring for any performance problems, and then fixing them with the usual methods (octrees, e.g.).

The trouble with offloading calculations to the GPU is, you need to run several steps in sequence:  (A) send data to the GPU, (B) run calculations there, (C) wait for the results to come back to the GPU, (D) use those results to update your scene, (E) render the scene.  Even though (B) may be very fast, (C) is very slow - probably much slower than just performing (B) on the CPU.

In other words - if there was an easy way to do frustum culling (for example) on the GPU, Babylon would probably already be doing it. If you have optimizations you can do that are specific to your particular scene, I think you'll find that doing those optimizations on the CPU before rendering will be the best bet.

##### Share on other sites

A simple little optimization :

- instead of computing to vector3 (subtraction) between two meshes, just store the resut in a predefined temporary one and use subtractToRef() => the memory allocator and the GC won't work

- instead of comparing the distance between one mesh and all the other meshes each frame, just compare the distance between one mesh and all the resting other mesh : start the second loop from i+1 instead from 0  (you already computed the i first distances before)

(actually, I think it's even faster than the displayed results because console.log() is really a slow method)

##### Share on other sites

Thank you two for your PG, it's about 15-30ms on my PC, may be slower than your results because of our CPU power difference. But it still be an immersive solution if I

separate that calculation work to web worker, async response can help avoiding freezing/glitching issue. Actually I just applied this solution for my past game : http://appsbymekong.com/flashdemo/vrgame/optimized4/ (mobile only by using device orientation API).

As I said, distance calculation is just example, and so far our solution is almost done by CPU, my main point is wondering about GPU ability. I mean if we can use GPU for some cases and be nice with CPU. :))

(Oh my stupid English skill).

##### Share on other sites

42 minutes ago, BitOfGold said:

using BABYLON's vector length:
http://www.babylonjs-playground.com/#1JB1NX#2
2ms!!! How on earth it is faster?

Just an educated guess, but probably because your "getDistance" function was wrapped up inside a bunch of playground-specific contexts and evals and whatnot. Here's a tweaked version of your first link that should run faster. Don't ask me specifically why that tweak works - I suspect (but haven't tested) that this tweak wouldn't have been necessary outside the playground.

Playground is great for many things but it's not good for performance tests.

##### Share on other sites

36 minutes ago, fenomas said:

I strongly suspect you'll have better results setting up a normal scene, measuring for any performance problems, and then fixing them with the usual methods (octrees, e.g.).

The trouble with offloading calculations to the GPU is, you need to run several steps in sequence:  (A) send data to the GPU, (B) run calculations there, (C) wait for the results to come back to the GPU, (D) use those results to update your scene, (E) render the scene.  Even though (B) may be very fast, (C) is very slow - probably much slower than just performing (B) on the CPU.

In other words - if there was an easy way to do frustum culling (for example) on the GPU, Babylon would probably already be doing it. If you have optimizations you can do that are specific to your particular scene, I think you'll find that doing those optimizations on the CPU before rendering will be the best bet.

Yeah, I was dreaming again. And "we can not get everything we want" (can not remember who said this)  :))

Btw, about occlusion culling, is there any good news with webgl2 and any support plan of BJS ?

##### Share on other sites

54 minutes ago, tranlong021988 said:

Yeah, I was dreaming again. And "we can not get everything we want" (can not remember who said this)  :))

Btw, about occlusion culling, is there any good news with webgl2 and any support plan of BJS ?

"There's no such thing as a free lunch", as they say

But with that said, remember - Babylon already does most of the general optimizations that it can do, but that doesn't mean you can't still do huge optimizations that are specific to your content. For example, if your content is a city, and the player is right next to a building, it might be very easy to (manually) cull a certain set of meshes that you know are on the opposite side of the building. Babylon can't do that automatically because it can't (easily) know that the building is fully opaque, but if you know that, you can use that information.

Also, in my game that I've been making, early on culling was the biggest performance cost, but I've found that octrees have completely solved the problem. It took some effort to make it work, but I managed it because I had a very specific idea of what was slow. Make sure you don't try to optimize too early, before you know where your bottleneck is.

##### Share on other sites

3 minutes ago, fenomas said:

"There's no such thing as a free lunch", as they say

But with that said, remember - Babylon already does most of the general optimizations that it can do, but that doesn't mean you can't still do huge optimizations that are specific to your content. For example, if your content is a city, and the player is right next to a building, it might be very easy to (manually) cull a certain set of meshes that you know are on the opposite side of the building. Babylon can't do that automatically because it can't (easily) know that the building is fully opaque, but if you know that, you can use that information.

Also, in my game that I've been making, early on culling was the biggest performance cost, but I've found that octrees have completely solved the problem. It took some effort to make it work, but I managed it because I had a very specific idea of what was slow. Make sure you don't try to optimize too early, before you know where your bottleneck is.

Thanks for advice. Have a good day. :))

##### Share on other sites

On 25/02/2017 at 10:04 AM, aWeirdo said:

Here's a PG example calculating distance between 251 meshes using simple math,
on my pc it takes around 10-20 MS (+/-) before optimation.

thats roughly 62750 3d position calculations each frame.

@aWeirdo I tried the exact same code with the built-in function of bjs : http://www.babylonjs-playground.com/#1JB1NX#8 (line 70, Vector3.DistanceSquared)

It takes around 40/60ms, whereas in your playground it takes around 10/20ms... It's the exact same code ?? How do you explain it ?

##### Share on other sites

Vector3.DistanceSquared lacks the square root calculation.  It's useful for when testing/comparing the distance squared.  You'd still need to do the square root on the result if you needed the actual distance.

##### Share on other sites

But why does my pg runs slower than aWeirdo's one ? It's the same code, it should be the same time

##### Share on other sites

26 minutes ago, Temechon said:

But why does my pg runs slower than aWeirdo's one ? It's the same code, it should be the same time

On 2/25/2017 at 7:05 PM, fenomas said:

but probably because your "getDistance" function was wrapped up inside a bunch of playground-specific contexts and evals and whatnot. ....

Playground is great for many things but it's not good for performance tests.

##### Share on other sites

It make sense I'll try to setup a test locally...

##### Share on other sites

2 hours ago, Temechon said:

But why does my pg runs slower than aWeirdo's one ? It's the same code, it should be the same time

Now I see what your question was.  Interesting.

##### Share on other sites

I'm getting 1 or 2 ms on Firefox for both distance squared functions in that PG.

In Chrome I'm getting 7 or 8 ms for getDistance and 20 - 23 ms for BABYLON.Vector3.DistanceSquared.

##### Share on other sites

Modern JS is fast because JS engines are good at optimizing it. The fast results you guys are seeing are cases where the optimizations all worked, and the slower ones are cases where something didn't get optimized as much as it could be.

And again, the problems are probably all playground specific - PG is not the right place to measure performance.

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.