Jump to content

asm.js test


jerome
 Share

Recommended Posts

Hi guys,

 

For my own knowledge (remember the Vousk-Prod's survey  ;) ), I just made a minimal asm.js test on my old laptop here : http://www.babylonjs-playground.com/#1NKL3P#1

 

You need to run it with FF or Edge 12 or 13 : http://caniuse.com/#feat=asmjs

You need to set the value passed according to your own computer at the line 45.

 

You can enable/disable asm.js by un/commenting the line 2.

 

As you can imagine, I didn't code this in C then compiled it to JS. It's directly (painfully) hand-written.

 

It's just a big loop affecting the render loop. I was wondering about the gain with asm.js.

Well... I passed the value 50 millions here, and there's no difference with or without asm.js compilation => 16 FPS.

The initial goal was to get a real gain with asm.js.

 

So ..?

 

As asm.js compiles ahead on time, I guess the gain would be better if I had many type checks to do instead of a simple loop and a comparison between two signed integers only.

I can just deduct for now that the javascript engine is really good as it is as fast with JIT compilation as with AOT one for simple type checks and integer operations.

 

I would like to implement in asm.js something like ComputeNormals() which needs to iterate over very big arrays and to make many float operation to check if there is a real difference. 

I still have to learn how to pass (and to get back) by references from js to asm.js  the indices (int), positions (float) and normals (float) arrays. I don't want to copy the memory, these arrays can be huge : millions of indices in the SPS.

Link to comment
Share on other sites

Well... this example is probably not that pertinent.

 

I might recode something more consistent with accesses to big arrays and many float computations.

But it's just a pain to code directly in asm.js and I'm not sure it's worth it to do it in C (then all the flow  C compilation + emscript compilation ) for so little amount of code.

Link to comment
Share on other sites

AFAI from my readings on the Web, Emscripten can only access the memory allocated within the Emscripten code.

This means we can't pass an array by reference from javascript without reallocating the memory in the C code.... this wouldn't be the best thing to do about positions and indices arrays which can be really big (allocating twice the memory needed).

 

I need to have a look directly to asm.js code to know if this is the same case, unfortunately poorly documented to code directly in asm.js. But I'm afraid this is the same mechanism : memory reallocation :(  

Link to comment
Share on other sites

  • 3 weeks later...

I think I understood at last how a JS float32 array can be modified from a asm.js routine.

I still need to make some tests (very painful to code by hand in asm.js with no real debugger) but I guess it's promizing to speed up some big array float computations without any memory reallocation (SPS case or ComputeNormals for instance).

Link to comment
Share on other sites

Please look at this test (it needs to open your console) : http://www.babylonjs-playground.com/#210S10#5

 

Line 35 : we define a float32 array size, here 10 millions. You can lower this value for your computer if needed.

Line 37 : we create a memory buffer of a size valued to the nearest upper power of two... it's mandatory, all we do now is really at low level.

Line 38 : we define a view on this buffer as a float32

Line 40 : with this view, as we handle as a usual JS array, we set all element to zero.

 

Then we call at the line 47 asm.loop() which is a pre-compiled (ahead of time) routine. It's defined at the line 6, it just sets the value 1.0 to every element of the array (actually it does it at byte level).

After this call, if you display in the console the content of the javascript array, you'll see everything has the value 1. So the asm.loop really modified the JS array.

 

What is important is the speed. This test should be pertinent with FF or Edge 11.

The time in milliseconds of the asm.loop() call is displayed in the console.

If you comment the line 2 ("use asm"), you should see the difference with/without pre-compilation.

On my machine, with FF 42 for 10 millions elements :

asm = 5 milliseconds

no asm = 12 milliseconds ... remember the browser refresh rate is 16 ms at 60 FPS.

 

There's a real gain here.

And no math computation (only buffer access and settings) were done.

Link to comment
Share on other sites

obviously, we need to make the comparison also to a usual js iteration :

for loop and array element settings :

http://www.babylonjs-playground.com/#210S10#1

 

57 ms on the same machine ...

 

so, for this very test :

JS syntax : 57 ms

Asm syntax, no pre-compilation, but strong typing and low-level buffer access : 12 ms

Asm with AOT compilation : 5 ms

 

From 57 to 5 ms ...

I think we really should give a try to package some prebuilt asm basic helpers to be then used from any other BJS core functions.

 

[EDIT] ooopss

there was a big error on my code (Floatarray var instead array32 var, line 49) : http://www.babylonjs-playground.com/#210S10#2

so it's 12 ms also for js syntax !!!

Link to comment
Share on other sites

yep... more than 100% of speed gain however : the asm.js is more than twice faster than the js.

 

I'll give a try, if I can, with a version of ComputeNormals() which is only about array iterations and simple float operations

 

not sure, I can pass 3 different arrays (positions, indices, normals) to an asm routine ...

almost sure, I can't  :(

Link to comment
Share on other sites

Well

I did some more test. An asm module is not designed to accept more than a singleton buffer (as they call it in their spec).

So no simple way to pass many JS arrays as input parameters.

 

It's definetly not designed to easily discuss with JS code (share some logic or data), but rather to be all coded in C then to be all compiled into asm.

From what I understand until now, what is easily doable is :

 

The asm routine can share with the JS code a single memory buffer with a pre-fixed size (for now, this size can't be changed dynamically after).

It is simpler that the data in this buffer have always the same structure, it's not mandatory but quickly un-manageable when dealing with UInt, float32 or pointers on JS object properties within the same buffer !

It can read and write in this buffer, so access and modify a JS array.

It can accept typed numerical input parameter or a JS function that it will be able to call then.

It can return only one typed numerical value.

 

 

So not usable for ComputeNormals() as it requires 2 arrays as input parameters (positions and indices) and 1 array to be written, what could have been the buffer.

 

With these programming constraints, I can't see for now any real use case for the BJS core. Maybe only Matrix computations ? but they are already candidates for simd processes...

Link to comment
Share on other sites

well, they treat different things : 

  • simd.js will enable paralellization on float computations what is excellent in our case because of the intensive usage of matrices
  • asm.js will enable AOT compilation, so near native (compared to C) execution speed, because of the pre-fixed types and low level programming (per byte access, memory pointer, etc)

 

I think good asm.js programs are coded directly in C and then compiled ...so not readable as they are in asm.js and no maintenable/contribuable as is in pure asm. The source is the C code. Everything must be done in C and we must forget javascript except for DOM handling.

 

I was hoping I could write by hand some generic asm routines, then reusable in the BJS core from the javascript code.

But the contrainst about the way to share the memory (only one singleton buffer) makes it hard to find real use cases that are worth it, regarding the gain (no need for computing a cross product) and the human un-readibility after.

If they had implemented a way to pass many arrays as input parameters, this would have been really worth it, because of the speed of iterations and computations. No way, too bad.

 

My attempts and long profiling tests, especially on a slow computer, while coding the SPS (dozens of thousands of iterations +computations to do CPU side each frame)  gave me, from the real experience,  many hints and better practices on how or where we can get serious performance gains with the ol'good javascript  :P

A real happiness to see the FPS grows up every day with the same program !

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...