dbawel

Babylon.js vs. Three.js... Choosing a WebGL Framework for Sony

Recommended Posts

Hello,

I thought to place this on the demos and projects thread, however I decided to post this here as it is more a topic for which framework to use and why. I was hired by an elite software development group at Sony Electronics to help them navigate through WebGL to build a pipeline to deliver content for the South By Southwest convention and to create a foundation to quickly develop games and online media for future projects. In short, I was tasked to escape the limitations of 2D media and help Sony move forward into 3D content taking advantage of the WebGL rendering standards. 

This was no esay task, as I was hired Dec. 11th, and was given a hard deadline of March 5 to deliver 2 multiplayer games which were to be the focus of Sony's booth at SXSW in Austin Texas. But first I had to run a quick evaluation and convince a very proficient team of Engineers which framework was the best fit for Sony to invest considerable resources into for SXSW and which was the right coice to take them into future projects. Yhis wa a huge consideration as the WebGL framework which was to be chosen was to play a much greater role at Sony Electronics considering the group I was assigned to works well ahead of the rest of the industry... developing what most likely will be native intelligent applications on Sony devices (especially smartphones) in the near future. These are applications which benefit the consumer in making their day to day interactions simple and informative. Thus the WebGL framework to be chosen needed to be an element in displaying information as well as entertainment for a greater core technology which is developing daily in a unique tool set used by the software engineers to build applications which allows Sony to remain the leader not only in hardware technology, but in the applications which consumers want to use on Sony devices.

But as I was working for Sony, I also had a greater task as there were existing expectations in developing a game on Sony devices which needed to be on par with what consumers already were experiencing with their Playstation consoles. As unrealistic as this might initially appear, that had to be the target as we couldn't take a step back from the quality and playability the consumer was already accustomed to.  So back to the first task... selecting the WebGL framework for Sony Electronics to use moving forward. Rather than telling a story, I'll simply outline why there was little discussion as to which framework to choose. Initially Sony requested someone with Three.js experience as is more than often the case. So when they approached me for the position, I told them I would only consider the position if they were open to other frameworks as well. They were very forthcoming to open their minds to any framework as their goal was not political in any way - as they only cared about which framework was going to provide them with the best set of tools and features to meet their needs. And one might certainly assume that since Sony Playstation is in direct competition with Microsoft Xbox, and Microsoft is now providing the resources in house to develop babylon.js, that Sony Electronics might see a PR conflict in selecting babylon.js as their WebGL development framework. However, I'm proud to say that there was never a question from anyone at Sony. I was very impressed that their only goal was to select the very best tools for the development work, and to look beyond the perceived politics and to develop the very best applications for the consumer and to fulfill their obligations to their shareholders in building tools that consumers want on their smartphones and other electronic devices.

So once again... Three.js vs. Babylon.js. This was a very short evaluation. What it came down to was that three.js had far more libraries and extensions - however, this was not the strength of three.js since there is no cohesive development cycles with three.js and although many libraries, tools, and extensions exist, more than often they are not maintained. So it was easy to demonstrate that practically any tool or extension we would require for the SXSW production would require myself or the team updating the extension or tool to be compatible with the other tools we might use on the project. This was due to the failings of the framework since each developer who writes an extension for three.js is writing for a specific compatibility for their own project needs... and not for the overall framework... as this is not within the scope of any developer or group of developers. Thus I find that it requires weeks if not months of of maintenance in three.js prior to building content, just to ensure compatibility between all of the tools and extensions needed to use for most projects. As for babylon.js, the wheel is not generally re-invented as it is with three.js, as most extensions are quickly absorbed into a cohesive framework quickly - provided they have universal appeal - and this integration ensures compatibility as there are fewer and fewer extensions to use, but instead an integrated set of tools which are thoroughly tested and used in production revealing any incompatibilities quickly.

The bottom line is that there are no alpha, beta, and development cycles in three.js, thus no stable releases. Whereas the opposite exists with babylon.js. There is a cohesive development of the tools, and Sony is smart enough to see beyond the politics and to realize that having Microsoft support the development of babylon.js is a huge bonus for an open source framework. And if anyone had to choose a company to support the development of a WebGL or any framework, who better than Microsoft? With practically every other useful WebGL framework in existence spawned by MIT, most all are barely useful at best. And why would anyone pay to use a limited WebGL framework such as PlayCanvas when Babylon.js is far more functional, stable, and free? This baffles me and most anyone who chooses one project using babylon.js. The only argument against babylon.js is that the development of the framework is now supported in house by Microsoft. But for myself and others, this is a positive, not a negative. I've been assured by the creators and lead developers of babylon.js that they have secured an agreement with Microsoft ensuring the framework remain open source and free. This ensures that anyone is able to contribute and review all code in the framework, and that it remains in the public domain. Sony gets this and we quickly moved forward adopting babylon.js as the WebGL framework within at least one division of Sony Electronics.

At the end of this post I'll provide a link on youtube to a news report of not only the games we built for SXSW, but the exciting new technology on built on Sony phones which uses the phones camera to capture a hight resolution (yet optimized) 3D scan of a person's head. This is only a prototype today, but will be a native app on Sony phones in the future. So our task was not only to develop multiplayer games of 15+ players simultaneous in real-time, but to have a continuous game which adds a new player as people come through the booth and using a Sony phone, has their head scanned. This was an additional challenge, and I must say that I was very fortunate to work with a group of extremely talented software engineers. The team at Sony is the best of the best, I must say.

All in all, it was an easy choice in choosing babylon.js for the WebGL framework at Sony Electronics in San Diego. Below is a news report from SXSW which shows the new scanning technoogy in use, as well as a brief example of one of the games on the large booth screen. And using Electron (a stand-alone version of Chromium), I was able to render 15 high resolution scanned heads, vehicles for each head, animation on each vehicle, particles on each vehicle, and many more animations, collisions, and effects without any limitations on the game - all running at approx. 40 fps. The highlight of the show was when the officers from Sony Japan came through the booth... which are the real people we work for... gave their thumbs up, as they were very happy with hat we achieved in such a short time. And these were the people who wanted to see graphics and playability comparable to what the Playstation delivered. And they approved. 

Link:

Thanks to babylon.js.

DB

Share this post


Link to post
Share on other sites

Way to go Dave!  Hope all is well, looks like you are back on your feet after the fire.  You should give me a call here sometime so we can catch up.
Proud of you!

Share this post


Link to post
Share on other sites

This happens a lot in Melbourne, Australia too.  webGL jobs always mention three.js.  Its simply because that's the only webGL framework they've heard of.  If I'm on a project and no work, or minimal work has been done with the three.js framework I can always convince them to switch to babylon.js.  For the record, I actually started with three.js and it got frustrating.  Parts of it felt incomplete and broken, so I switched to babylon.js and never looked back.

Share this post


Link to post
Share on other sites

So glad to hear this great feedback. I was in similar situations where I had to decide between aframe vs babylonjs.

It's true there is still a lot of silly anti Microsoft sentiments in the IT world, but yes we have to be smart to see through (the dirty) politics. 😃

Share this post


Link to post
Share on other sites

That's awesome! And it's notable how so much was going on behind the scenes before making it to the convention, which is something that is easily overlooked.

This is also good news:

I've been assured by the creators and lead developers of babylon.js that they have secured an agreement with Microsoft ensuring the framework remain open source and free.

It would be cool if I could scan my face on my Xperia.. 🤩 . Thank you for your service.

Share this post


Link to post
Share on other sites

So looks like you got the Kinect working :D.  For the face scanner to make it into production, seems like you would have to remove the robot from the process. 

Also, in early March, on the NBC nightly news, I remember that they re-ran a DARPA sourced story.  It warned to be on the lookout for people being made to say things in videos, a.k.a faked videos.  It caught my eye as I use the DARPA funded Arpabet database for my animated speech.  Wonder if the timely of the re-run was prompted by your demo?:ph34r:

I also would have thought that the example of President OBama talking (which wasn't even visually convincing) was a little far fetched, since it relied upon a voice impersonator.  This is except for the fact I remembered last fall.  I had just finished the first version of my speech system.  Was walking my dog, and a young woman, sitting on her deck said something to me in a very low, Lauren Bacall voice.  It then just popped into my head that IBM had bought a Voice Font patent back in like 1990.  I do not have time to do anything with that right now, but it was definitely noted.

Share this post


Link to post
Share on other sites

@JCPalmer and everyone,

Hi,

I like that you like the write up. As for making realistic people, we solved this on Lord of the Rings with Gollum. It only takes 11 - 14 phonems to make a convincing person, with 5 facial states. perhaps this should be a project as it's easy to achieve. I first discovered this by examination in building CatDog for TV in 1990 as we had to go live on game shows and talk shows. But now there is no mystery to it, so let me know if you want the list of phonems and facial expressions. I've posted it here before, but happy to help if need be.

DB

 

Share this post


Link to post
Share on other sites

@Gijs

FYI - expect to be able to scan your face on Sony devices soon. The only reason we used the robot to scan, was that we wanted an entire head scan. Otherwise, you'll be scanning your face, and either a generic head will be selected, or in some applications and games, you'll select you head attributes such as hair color and length.

DB

Share this post


Link to post
Share on other sites

Sounds like a Sony version of Memoji.

On the phonemes / visemes front, I think you only posted the phonemes which are also visemes in a PM, not a topic.  There you had 16.  Some like FV or PB are really the same viseme for multiple phonemes.  I did not go back to the original message, but I have them as (red not in CMU / DARPA DB):

AO | AE | AX | FV | GK | IY | L | M | N | OW | PB | S | TD | UH | UW | ZH

One problem I had is using a database to generate animation, is not having AX in it.  Databases are kind of picky.  In database defined phonemes, what should AX be?  Dictionary below (blue already assigned by you):

AA | AO | AW | AE | AH | AY | B | CH | D | DH | EH | ER | EY | F | G | HH | IH | IY | JH | K | L | M | N | NG | OW | OY | P | R | S | SH | T | TH | UH | UW | V | W | Y | Z | ZH

Maybe AX is for AA, AW, AH, & AY?  One reason for the higher count from the PM is you listed M & PB separately.  Many places, including me, use the same viseme for all 3. Sample words for each:

AA | hOt, wAnt, bOUGHt, Odd

AW | cOW, OUt, mOUsE, hOUsE

AH | Up, Alone, hUt

AY | fInd, rIde, lIGHt, flY, pIE

No mention before about 5 states.  I am guessing one of them is 'rest', used between words.  Please elaborate.

My first speech implementation looks too much like chomping / over enunciation, especially with speed set too fast.  I use the vowel stress indicator in the DB to not animate vowels with a weak stress indication.  Low stress vowel reduction helps, but a few visemes (or the # of phonemes they are used for) are going to need to be removed for next version.  There are 16 visemes, but they cover many more phonemes:

AA | AO | AW-OW | AE-EH | AH | AY-IH | B-M-P | CH-JH-SH-ZH | DH-TH | ER-R-W | EY | F-V | IY | L | OY-UH-UW | S

This only really leaves D, G, K, N, NG, T, Y, & Z with no viseme.

 

 

Share this post


Link to post
Share on other sites

The almost throwaway comment

Quote

And why would anyone pay to use a limited WebGL framework such as PlayCanvas when Babylon.js is far more functional, stable, and free?

seriously gave me pause for thought. I was planning on using PlayCanvas (at an upcoming game jam) but now...

PlayCanvas has a free option, but is PlayCanvas really that limited compared to Babylon.js?

Share this post


Link to post
Share on other sites

@CarlBateman One issue with Playcanvas  - their engine is opensource but  their asset import tools and their editor are still proprietary. You have to upload your assets (in fbx obj etc format) to their servers to convert them to their format. I don't think they have published their asset format.

Share this post


Link to post
Share on other sites

@CarlBateman-

I wouldn't completely discount PlayCanvas... I simply don't believe there is a future if you were to invest in PlayCanvas as your primary WebGL framework. However, for newbies or simple projects which need to be completed quickly, then PlayCanvas might be what you're looking for. If it's interface and API offers everything you need for your project, then it might be your best solution in the short term. 

But for long term investment into a framework, I personally don't believe that PlayCanvas or any other framework for profit is a good investment as looking at simple logistics clearly show that there is no way for any for profit WebGL framework to compete against babylon.js or three.js. And specifically babylon.js is growing in features and flexibility that PlayCanvas can only survive due to it's strengths of a reasonably intuitive API.

So if you are new to WebGL and either don't need to pay for the use of PlayCanvas' API - or can afford in your budget, then perhaps it's a good first choice to simply get a project delivered quickly. However, I doubt that many developers would disagree that investing in future development in a for profit framework such as PlayCanvas would be beneficial for you. So if you are proficient in JavaScript and general Web development, then I would personally choose an open source framework. And I personally like babylon.js for many reasons - but other than it's stability and functionality, you'll find the community the very best for any WebGL framework.

DB

Share this post


Link to post
Share on other sites

@JCPalmer-

I guess I opened up another can of worms... 

As for phonemes/visemes, it is difficult to come up with a definitive list which represents realistic articulate speech - as you well know. As for AX, I usually have this in my list of phonemes, although lately I've been omitting AX as it rarely gets called in most any audio analysis algorithms... and usually confuses the analysis with partial calls identifying AX. So I generally omit it from the list - not that I believe this is ideal. It comes from years of real-time speech animation using various methods.

On 6/6/2018 at 8:02 AM, JCPalmer said:

This only really leaves D, G, K, N, NG, T, Y, & Z with no viseme.

As for the above, if I understand correctly, most of these are covered by "D, T, and S" (with the odd W which can be assigned to a combination of a limited list) - as there is little noticeable differences when running real-time speech analysis. Another issue I've yet to mention is that I don't use a one to one representation of any phonemes/visemes, as I assign blends of these to fully cover what is needed for the performance. 

Bottom line, what I've discovered is that less is more. Generally, the shorter the list of phonemes/visemes, the better the results. The more which are added causes "chattering" of the mouth and tongue, and becomes a distraction rather than a smooth fit within the scope of animation. I certainly don't have a definitive list to provide, but simply a limited list based upon the many methods I've used and the results they commonly produce.

As for the facial "states", these are limited only to the # of control inputs from the puppeteering controls I've built for film and TV performances - some video game cinematics as well. These always include "Happy", "Sad" "Open mouth (perhaps considered loud)", and at a minimum "left brow up and down" and right brow up and down" - which can be assigned together or separately providing enough channels are available. Separately is better. I always set these to attenuate so I can produce an eye squint and open the eyes wide as needed. I also assign an eye blink to a button control, as this is certainly necessary.

As for a "default" mouth and facial state, this is always set as the very first keyframe so that when no controls or inputs are used, the face returns to this state. There is no need for me to attach this to any controls. But I always automate the mouth (for the most part) except for specific states such as yelling or whispering. And then always puppeteer everything on the face except for most mouth shapes, as I've never found any algorithm which provides any useful results for facial states. And since I record in real-time, it;s incredibly fast to record a finished take. I might tweak the speech a bit after the fact, but otherwise, I'm able to tweak any aspect of the animation by enabling only the channel(s) I might need to make subtle changes. But with minimal rehearsal as well as running the audio through a parametric EQ and a compressor, I've been able to record finished facial animation using similar lists to what I listed prior as well as a little practice prior to going live.

It's all subjective, although there are definitely methods which allow me to gain advantage over a specific performance. Knowing the character is everything, but it's almost as simple to walk in without rehearsal once you have a few productions behind you and you've built and setup the controls personally.

Cheers,

DB

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.