Sign in to follow this  
The Leftover

Web Assembly

Recommended Posts

I should not being doing this . . . but I have been experimenting with Web Assembly.  With so many hexagons and so many crime incidents, Illuminated City has come computationally intensive tasks.

My question is, have y'all tried to use it for some functions (e.g. ComputeNormals)?  Did it perform well?  Did it seem worthwhile?

Share this post


Link to post
Share on other sites

Yes, you should be doing that!! :)  And it's a good question.  Especially now with essentially full compatibility.  Check this issue for extra details:
https://github.com/BabylonJS/Babylon.js/issues/3248

I think it's a matter of time really, but it's I think hard to split the work/communication from wasm and webGL and keep fast render.  I don't think it's a question about performance - the math is going to be faster - it's getting it all working together.

I haven't done any wasm experiments except playing with OpenJPEG.  I need to find more time somewhere...
 

Share this post


Link to post
Share on other sites

The answer might come (before the end of year, I hope) from AssemblyScript : https://github.com/AssemblyScript/assemblyscript  that allows to compile a subset of TypeScript directly to WebAssembly,  aka WASM, or to Javascript.

Check out this online tool : https://webassembly.studio/

Knowing that BJS is already coded in TypeScript, the effort to port some parts of the code to the AssemblyScript required subset would probably be worth a try instead of rewritting thousands of lines in C/C++ just in order to compile them to WASM.

A subset of TypeScript is just some legal TS... this just means that AssemblyScript can't understand all TypeScript, but just a subpart of it because the compiling process imposes some explicit definitions.

Example : whereas TS can understand the statement "var a = 10" or "var a:number = 10", AssemblyScript needs to know if the variable a is an integer, a float and what memory size to use : i16, i32, f32, f64 before compiling. The same thing we would have to do if the same part of code were ported to C actually. 

Well, if you already contribute to BJS in TS, or if you simply code in TS on your side, you don't really need to learn more to start to code with AssemblyScript in order to get your first working WASM. Never tried so far though 😄 

Share this post


Link to post
Share on other sites

Gentlemen, thank you for the links.  Let me some opinions base on three days of work.

I started writing in straight WAT.   Because I have a genetic defect that causes me to do things the hard way.  However, it has caused me to learn a lot of things.

WebAssembly is at the "MVP" stage as they call it.  One can only create a module with functions below that - two levels.  One can create a list of which functions may be exported.

The MVP status shows:  I couldn't figure out how to make a module-global variable that was mutable; so I did a work-around.  One can share a typed array between JS and WA.  In WA, it is called "memory" but there may only be one of them.  I redesigned things a bit so all processing was applied against one array.  This could put a crimp in my style.

Is it possible the "C" converter bypasses these functionality bottlenecks?  It seems a little unlikely; I think wat is the textual representation of wasm and they go hand-in-hand.  They do appear to be beavering away at this much as we are here.

The integration makes it *NOT* an all or nothing kind of thing.  When the module is built, it can receive JS functions, notably console.log.  So I can log things to the console.  I could make other JS calls if I wanted.  Exported functions are just a function.  You can call it from JS.  (If you print them it says "native code", which gave me a kick.)

In light of this, I am pushing forward with creating limited functions for the three or four places where Illuminated City sits for more than a second.  It requires some re-organization but I have the substantial advantage of being the only author.  I can also write these functions in JS.  That part is really neat; the array is one array and looks the same whether the manipulation was done by JS or WA.  This will be helpful for testing.

Share this post


Link to post
Share on other sites

Really nice feedback and interesting try.

I have done some investigations about asm/wasm some time ago and I know this way of sharing data between JS and WASM (typedArrays / buffers).

The approach you made, and I didn't mention, to build dedicated code in wasn is obviously one of the best : why only thinking about porting the 3D engine in wasm and not the user code (the logic) that may often be the slow part of the final software ?

Please have a read about how they choose smartly to share the memory between JS and the compiled wasm module in AS : https://github.com/AssemblyScript/assemblyscript/wiki/Memory-Layout-&-Management

This could be inspiring to organize the data transfers to/from JS and the module. 

Share this post


Link to post
Share on other sites

Jerome, thanks.  I usually do something like this.  I create a module config object, and attach the memory there.  This hands in whatever settings are needed and the memory.  Now both JavaScript and WebAssembly can access the same typed array using 'hexlatticeMemoryView' and 'i32.load'/'i32.store'.

I treat the WebAssembly module as a persistent "closure".  Some of them have four or five function entry points.

 JAVASCRIPT  JAVASCRIPT  JAVASCRIPT  JAVASCRIPT  JAVASCRIPT 
            hexlatticeImportObject = {
              settings             : {
                weekbreaksLength   : weekbreaks.length*4,
                weekbreaksStart    : 16,
                monthBreaksLength  : 0,
                monthBreaksStart   : 16+weekbreaks.length 
              },
              imports              : {
                log                : console.log
              }
            }; 
          hexlatticeMemory       = new WebAssembly.Memory({initial:1});
          hexlatticeMemoryView   = new Uint32Array(hexlatticeMemory.buffer);
          hexlatticeImportObject.imports.mem = hexlatticeMemory;
 JAVASCRIPT  JAVASCRIPT  JAVASCRIPT  JAVASCRIPT  JAVASCRIPT 
 WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY 
(module
  (import "imports" "log"      (func $log     (param i32)))
  (memory                      (import "imports"    "mem") 1)
  (global $weekbreaksstart     (import "settings"   "weekbreaksStart")     i32)
  (global $weekbreakslength    (import "settings"   "weekbreaksLength")    i32)

 WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY  WEBASSEMBLY 

 

Share this post


Link to post
Share on other sites

The heartbreak of today was that the module I snipped this from runs faster in JavaScript.  In other areas, I have reaped a 3x speedup with WebAssembly.

I believe that issue is that there is substantial overhead in entering and exiting WebAssembly.  You want the work it does while there to be large enough to make up the overhead and still give you a win.

Gonna go soak my head . . . 

Share this post


Link to post
Share on other sites

I agree : calling small wasm functions could be slower than using the same algo inside the JS process, because of the memory access and the entering/existing.

I think that the gain and the power of wasm could be real when dealing with few calls of wasm functions treating large amounts of data at once (some kind of batch computations) : big loop with huge iteration numbers, big float arrays (meaning over 10K elements to treat), etc

Example : CPU particles, culling (if it were batched : one process for all meshes at once), SPS, WM computations of instances (idem, if it were batched)

Most of the BJS functions are already very fast and quite short, they won't probably get no gain at all  to be translated to wasm as is. 

Share this post


Link to post
Share on other sites

just starting some little tests on my poor free time to evaluate the gain provided by wasm with AssemblyScript and wasm-ffi in order to focus only on the perfs and to not loose time to implement array passing nor to waste time on the C compilation toolchain : TS only and a library to manage out-of-the-box the data exchange between JS and WASM

I'll let you know if something good raises from this

Share this post


Link to post
Share on other sites

Quick feedback about some first tests : a big loop computing all the vertices (rotations, translations, scalings) of a modified SPS in a WASM module.

For now, the results are quite ... disappointing :

1) even by using a cool language, AssemblyScript, to emit WASM bytecode at quite no porting cost from TS, the way to manage the exchanges between the JS code and the WASM module is painful. Not even speaking about the lack of a garbage collector WASM side what forces to give a particular attention to every object creation.

2) WASM, although being a bytecode usable in the browser (we could expect some features like other bytecode based languages like Java can provide) doesn't provide any math functions ! This means we have to implement by ourselves, say, all the trigonometry (sine, cosine, etc). How can we compute any 3D rotation without sine or cosine ?

3) the first global execution on the same basis than the SPS experiments is ... slower than the full stack legacy SPS !

I'll profile and tweak the WASM code soon in the hope to get something faster. But for now, it's not worth it at all... unless I'm missing something.

Share this post


Link to post
Share on other sites

That said, don't trust me either.

Why ? well, in '98, I tested Java for the first time and it was sooooo slow. I made a definitive opinion : this stuff isn't worth it and has probably no future.

So please, don't trust me.

By nature (statically typed, low level, bytcode, aot compilation), WASM is way faster than JS in theory.

I've probably done plenty of conception/usage errors in my test, maybe also the AssemblyScript emiter isn't mature yet, and perhaps the browser compilers/runtime executer will get better and better (or the 3 reasons together). The theory seems to tell us this is the way to go to improve by far the performances. Now, it's just a question of time for the theory to get into practice. :D

Share this post


Link to post
Share on other sites

Jerome, delighted to read your report.  This FYI:

I had done some profiling of Illuminated City and determined that speeding 'VertexData.ComputeNormals' would be a win for my application.  It is a good candidate in other ways, being about 200 lines of vector arithmetic (with two calls to Math.sqrt).  I had considered rewriting it in native WASM but quickly realized that I did not understand application well enough.

Also, there were a bunch of boundary issues:  it deals with multiple arrays that - I think - were not typed.  Converting typed <--> untyped is gonna add a lot of overhead.  Also, the only way I know how to process multiple arrays in a single WebAssembly module is to put them in one array and use indexed addressing.  I do this for my own application but it is a hefty bunch of work.

At the end, I did not.

'VertexData.ComputeNormals' is an example of a function that I would optimize the hell out of in JavaScript first.  Good chance that is a win.

Here is my only actual concrete suggestion, respectfully placed from someone who doesn't really understand your code:  Migrate to typed arrays for the obvious suspects, like uvs and normals and so on.

Share this post


Link to post
Share on other sites

brianzinn, not sure I understand why that would be.  (I am not doubting it, though.  And please don't explain it to me.)

I ran into a problem where I had to choose between WebAssembly and Web Workers unless I was willing to do large array copying.  They both have data sharing for large vectors but they wouldn't work together.  I chose to use WebAssembly and skip Web Workers for now.

Life on the bleeding edge . . . 

Share this post


Link to post
Share on other sites

About ComputeNormals(), I agree it's a good candidate.

First note : usually, unless you're dealing with an updatable, then morphed, mesh, you don't need to recompute the normals each frame.  But it's still a good candidate for a WASM computation, especially when the mesh needs to be morphed.

The current implementation of ComputeNormals() has been optimized many times over its life (check old forum posts about this topic)... and I'm the culprit 😄 . Actually there are parts of the ComputeNormals() code that are conditional and used only for the feature FacetData, because they use the very same algo on the same data. So it's faster to compute the normals and the facetData in the same time than to do this big loop twice. But you can just ignore the facetData part if you want to focus on the normal computation only.

 

About my first test with WASM, I did like you : I used only TypedArray. Actually one big TypedArray in to pass the data to the WASM module and a big TypedArray out to get the results back from the module. 

But, in order to focus only in my own logic, I also used the library wasm-ffi : https://github.com/DeMille/wasm-ffi

This library handles for you all the TypedArray/ArrayBuffer exchanges between the AssemblyScript code and the WASM module. I should probably get rid of it and manage the data passing/sharing by hand to be sure it happens like I really want.

At last, WASM and workers are compatible AFAIK because modules are objects that can be passed to a worker, then be instanciated in the worker itself. So, although it's probably hell in terms of data exchanges and memory sharing (main thread sync/to/from/ the workers, each exchanging with their own WASM module), it's also maybe (theorically) what could bring a huge performance gain.

Share this post


Link to post
Share on other sites
3 hours ago, The Leftover said:

brianzinn, not sure I understand why that would be.

I should probably not have said anything as I am not an expert, so my comment can safely be ignored.  My thoughts were that if BabylonJS were to use TypedArray on UVs/normals - I imagine there would be lots of overhead to ArrayBuffer back to WebGL, but as I have found it would serve best to not assume and see some real numbers.  I think even if it were close would be worth the effort with the tooling and further browser improvements.  I am curious now how much time overhead is involved in the exchanges Jerome just mentioned!

edit: I want to add also that non-standard browsers like Samsung Internet, Oculus, etc. likely have poor support.  Not sure if that means that a fallback mechanism is needed.  Yikes!

Share this post


Link to post
Share on other sites

mmmh... first investigation : the execution time is maybe not lost in the WASM module but rather in the copy of the returned array buffer into the typed array required by updateVerticesData()

in brief, in the massive returned data parsing, JS side. Still investigating, stay tuned.

Note sure either ...

 

[EDITED] : wrong log, the time is really lost in the wasm module call

Share this post


Link to post
Share on other sites

Just to make it clear, WebGL WANTS Typed Arrays,  See here.  Typed Arrays are backed by ArrayBuffers.  If you can pass the addresses of ArrayBuffers to to this thing,  you can just modify the memory going to the GPU directly.  Nothing to either way.

const normals32F = mesh.getVerticesData(BABYLON.VertexBuffer.NormalKind);
const buf = normals32F.buffer;

magic(&buf); // clearly BS syntax

mesh.updateVerticesData(BABYLON.VertexBuffer.NormalKind, normals32F);

If this is already being done, my bad.

Share this post


Link to post
Share on other sites

For now, it seems that passing big bunches of data to WASM is still the bottleneck https://blog.sqreen.io/webassembly-performance/

That's what I've just experimented. This is sad because the benefit of WASM would precisely be to compute fast millions floats for our needs in 3D : vector coordinates, normals, quaternions, etc

Share this post


Link to post
Share on other sites

I think I remember back when I still received github issues / PR's via email, that someone was trying to make a web assembly version to math.ts.  They gave up, as I remember due to lack of improvement.  One might fish through the repo communications for what they found out.

Share this post


Link to post
Share on other sites
1 hour ago, jerome said:

For now, it seems that passing big bunches of data to WASM is still the bottleneck

Jerome, if I am belaboring the obvious, sorry.

I only have one copy of the data structures.  It is shared between WebAssembly and Javascript.  If you look at the code (from my August 3 post above) you will see that I create the array as WebAssembly memory.  Then I create a view into that memory.  Javascript or WebAssembly both may address the array at that point.  As far as JavaScript is concerned, it is just a typed array.  This array is sized/created early in the session and tends to persist for quite a while.  In fact, sometimes I clear the array and start over; as opposed to allocating a new one.

To drive the point home:  Sometimes I write the same function in JavaScript and WebAssembly.  More specifically, I have already written it in JavaScript and it is working.  When I develop the WebAssembly, I set it up so that I can call either "flavor" of the same function.  Given that I don't have a battery of test suites, this helps me spot unintended changes of behavior.

Once I get things settled, smaller interactions with the array are usually done through JavaScript.  Larger ones are done with WebAssembly.

After all of this, I am happy with the effort.  I did hit many dead ends but I also got several large tasks to run 3X faster.

I hope this is useful.

Share this post


Link to post
Share on other sites

I just started a topic in the AssemblyScript Github repo about how to manage shared arrays JS/AS . My test ones were not shared and I guess the data copy (million floats) is the bottleneck.

One of the guy made a port of earcut to AssemblyScript and this might be a good start to get inspired from.

https://github.com/AssemblyScript/assemblyscript/issues/263

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.