Recommended Posts

Hi guys!

I'm working on realistic ocean simulation for a browser game at the moment.

The best known way to simulate ocean waves is Jerry Tessendorf's method with statistical model.

I won't paste any formulas here for simplification, so here is core problem: calculations are expensive and I don't want to compute water heightmap by CPU in browser because the algorithm may be paralleled very well and GPU is able to compute the grid much faster.

Is there any way to use GPU computing from babylon.js?

I'm thinking about using shader with texture renderTarget to generate heightmap and then use the results in physics simulation in javascript and pass it to the shader material for rendering water surface.

Is it worth or not? Can anyone suggest any other methods?

Thanks!

Share this post


Link to post
Share on other sites

GPGPU for WebGL, welcome to the weeds!  Before OpenCL came out, I had tried to use OpenGL 2.0 for GPGPU. Got into a nVidia developer program to test OpenCL at the first opportunity.  It was much easier.

 

I see major issues using OpenGL 2.0 ES for GPGPU.  OpenGL 2.0 was bad enough.  Basically, you build a vertex shader with a single, ortho, quad.  Passed up any substantial data to read as textures.  The main part of the program is a, or a series of, fragment shaders, which read the textures.  Cannot quite remember how you got the data back to the cpu. 

 

OpenGL 2.0 ES does not support quads, so you'll have to build 2 triangles.  Makes basing your calculation on your location in the quad more involved.  Probably have to get the vertex shader involved to know which triangle you are in.

 

Then inter operating with BabylonJS seems difficult. Web workers seems like a better way to go, even if it is async.  There are virtually no single core cpus on the market today.  BabylonJS is only using 1 core.  Flooring a separate core seems much more attractive.

Share this post


Link to post
Share on other sites

Hello elessar.perm !

I love what you're going to do :)

 

Something you can do (because we don't have Compute Shaders :'( ) is to create a "Screen Quad" for your height map calculation. You can see the screen quad vertices organization in this file for example : https://github.com/clbr/MLAA-test-app/blob/master/screenquad.h

 

Once you have your ScreenQuad mesh, apply a ShaderMaterial that will generate your height map using one or multiple passes into your RTT(s).

 

Basically, the vertex program of the Screen Quad should look like (to be fullscreen) :

attribute vec3 position;attribute vec2 uv;varying vec2 vUV;void main(void) {    gl_Position = vec4(position, 1.0);    vUV = uv;}

And in your pixel program you'll calculate the height map.

 

It's only an idea, not sure it will work !

 

May the force be with you !

Share this post


Link to post
Share on other sites

Hello elessar.perm !

I love what you're going to do :)

 

Something you can do (because we don't have Compute Shaders :'( ) is to create a "Screen Quad" for your height map calculation. You can see the screen quad vertices organization in this file for example : https://github.com/clbr/MLAA-test-app/blob/master/screenquad.h

 

Once you have your ScreenQuad mesh, apply a ShaderMaterial that will generate your height map using one or multiple passes into your RTT(s).

 

Basically, the vertex program of the Screen Quad should look like (to be fullscreen) :

attribute vec3 position;attribute vec2 uv;varying vec2 vUV;void main(void) {    gl_Position = vec4(position, 1.0);    vUV = uv;}

And in your pixel program you'll calculate the height map.

 

It's only an idea, not sure it will work !

 

May the force be with you !

 

Exactly what I wanted to to do.
 
But the main question is will it be significant faster that web worker version or not.

Share this post


Link to post
Share on other sites

Oh course it will because vertices & pixels operations you want to do are in almost all cases faster on GPUs ^^

You can read the little article I wrote about CPU & GPU computations at : https://medium.com/community-play-3d/computing-your-own-depth-shadow-pass-into-cp3d-439293b36457

There is a performance comparison between both methods at the end.

Share this post


Link to post
Share on other sites

If you don't have to read back the data on the CPU (ie the waves simulation are only used as inputs of other shaders) it should be doable (maybe not in every browser / devices).

You have to render a quad and store your simulation data into textures accessible in read/write by the gpu (if I recall, for wave simulations, you need access to frame n-1 and n-2 to compute frame n), and use the data produces in the texture inside your redering vertex shader (you animate vertex positions of a grid).

In the DX9.1 era, with some Nvidia custom extensions, I used to do something quite similar :).

Share this post


Link to post
Share on other sites

cpu time need not be less then gpu time + transfer back.

 

If you can live with your height map being one frame behind, cpu time must only be less then BJS cpu time + gpu time.  This is due to to the fact that you will be running async on an otherwise unused cpu core.

 

Executing on the gpu will need to be sync, and take away time for the rest.  A better question is:  what will give better throughput?  Also, in a photo finish, I would always do the less exotic way (web worker).  If you need some OpenGL extensions, you could have device issues.

Share this post


Link to post
Share on other sites

Think it would be wise to actually write an inline version first, which can be adapted for a web worker if required.  "Right first, Fast later".  You could be fighting too many simultaneous battles attempting to go straight to a web worker.

 

Also, I would avoid objects like BABYLON.Vector3. Put the output of your height map inside of a Float32Array if possible.  3 reasons:

  1. Typed arrays are known for slightly slower initialization, but also slightly faster access.  I use them extensively, and found at least they are not slower.  Difficult to actually measure.  You only need to create it once.
  2. Typed arrays are not inside of the VM heap, unlike an array of BABYLON.Vector3. If you have to use Vector3, make sure to DO NOT create them over and over.  Use .inPlace() methods, or .x, .y, .z = .  Throw away instances will put a lot of pressure on the heap causing more garbage collection.
  3. If this were ever to come from the GPU, a Float32Array is how it would come.

If this were to be changed to a web worker, then you would just create 2 arrays.  The current one for babylon to use, and the future one being updated by the the web worker.

Share this post


Link to post
Share on other sites

For future search result:

 

I did a double check on returning data from OpenGL 2.0 ES, since I knew this was an obvious area to cut back on for mobile.  The function gl.readPixels is in ES, but it is hobbled to only return in Uint8Array.  OpenGL 2.0 can return in 20 different formats.

 

Getting readPixels & your own unrelated shaders to inter-operate probably would require pretty close familiarity with the BabylonJS source code.

Share this post


Link to post
Share on other sites
Just understood some interesting thing. I don't need full grid for physics simulation, because it will be used just for some relatively small amount of objects, so for them I can use CPU.

I need the full grid only for rendering and I can compute it in the fragment shader.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.