Faceware Technologies, a leading developer of markerless 3D facial motion capture solutions has released Faceware Studio, a new real-time platform for creating high-quality facial animation Live. The new platform is available immediately to try or purchase.
Faceware Studio is new from the ground up and it completely replaces the former Faceware Live Server product. The new Studio real-time streaming workflow allows quick calibration, tracking, and animation of anyone’s face in real-time by harnessing a new machine learning approach and the latest neural network techniques. The data can be streamed to Faceware-supported plugins for Unreal Engine (UE4), Unity, MotionBuilder and soon Autodesk Maya. Artists are also given the tools to tune and tailor the animation to an actor’s unique performance and build additional logic with Motion Effects.
The product currently lives alongside the non-live Faceware pipeline built on Faceware’s Analyzer and ReTargeter. It is directly intended to be used for Live applications, but in the future, this technology will be expanded and blended with the non-real time traditional Faceware Pipeline tools. The Studio producer assumes one camera and can work from a webcam or a head-mounted camera rig (HMC). Live facial solving is an important and different problem from a standard offline production and one that is well suited to using Machine Learning (ML).
At the core of the new Faceware innovations is ML which is used to dramatically improve lip-sync, by solving much more effectively the problem of the actor’s jaw position. Using Deep Learning Convolution Neural Networks (CNN), Faceware’s improved jaw positioning tech gives users a faster and more accurate lip-sync to digital characters in real-time.
The problem of producing accurate lip sync is directly jaw-related, and compounded by the occlusion of the teeth in most performances. Even if the teeth are showing, it is normally the fixed upper teeth and not the lower jaw teeth that are visible. To complicate matters the lower lip and mouth skin slide and compress with their own muscle movement, providing no reliable tracking information. The situation only gets worse if the subject has a beard. ML is used to provide a plausible output of where the jaw is, even with no obvious tracking points to be seen, and no lower teeth are visible. The ML solution is based on supervised deep learning which required extensive manual annotation of footage to train the CNN.
The machine learning frameworks Faceware used were Tensorflow and PyTorch. Given the curated training data the Faceware team provided, and using these latest ML libraries, the system produces a jaw estimation value between 0 and 1. This then informs the facial solver to more reliably simulate and recreate the correct lip shapes and sync. The human jaw has six degrees of freedom and can move in what is called the Posselt’s Envelope or shield-shaped range of movement. But Faceware is not aiming to output detailed geometry of the jaw. It works by dramatically improving the lip-sync, for example, just knowing if the teeth are apart and by how much, when the lips themselves are closed, dramatically adds realism to the facial solution.
Jay Grenier, Product Director at Faceware comments that “the intent was always to solve the jaw problem because all lip-sync is dependent on the jaw being in an accurate position or as close as you can possibly get. The single biggest lipsync problem we had was being unable to predict exactly where the jaw is during capture”.
The specific output that the team was trying to solve was getting a normalized value of how open the jaw is in a clip at any given time. “Assuming zero is the closed position when your teeth are clenched and your jaw is completely closed, no motion, and when you’ve opened your jaw as wide as you possibly can, that’s a hundred percent or 1 on that scale. Essentially it is a singular value we’re looking to solve for, as the rest of the lipsync is sort of dependent on that” he adds.
ML is perfect for this type of work, Grenier points out since “the training is heavy and it does take a while, but once everything’s trained. It runs extremely fast. We’ve had no problem running it at full frame rate, which for us is 60 frames per second. The machine learning tech has had no problem achieving that”.
“This is more than just a rebranding of our LIVE product – Faceware Studio is a complete re-engineering of our real-time platform,” said Peter Busch, vice president of business development at Faceware Technologies. “Based on what we’ve learned from the market over the past few years about how people create and work with realtime facial animation, we re-thought the entire product from the ground up to make it easier to use, more intuitive, and to produce higher quality animation. The initial response during the beta was extremely positive, and we’re excited to be releasing it for all of our users.”
The new improved lip estimation is then moderated by motion effects & animation tuning. This allows Studio users to add additional direct control over their final animation. The new Studio software allows artists to visualize and adjust actor-specific profiles with animation tuning and build powerful logic into the real-time data upstream of it being fed into, say, UE4 for live viewing.
For the system to work, the training data needs to be carefully curated. The training data defines the solution space, and thus it will always more accurately predict the jaw if the demographics of the training data encompasses the user or actors’ own age, race, gender, and facial appearance. Faceware is very keen to provide a solution for all users, thus it needed to annotate and include a wide variety of faces. This meant a mix of people from all walks of life and a range of countries. The ML solution is a good fit for Live applications, as Supervised Training ML approaches are slow to train and fast at run time. The system only uses training data explicitly gathered by the research team in Faceware’s Manchester office. The system in no way uses user performances or sends any data from a capture session as part of this ML process.
For Grenier and the team, the ML has really delivered a substantial result. “It blew my expectations out of the water, to be honest with you. We’re always a little bit cautious, – a bit conservative. We don’t want to over-promise,…but honestly, we’ve had really positive feedback and it’s been super encouraging”. This experience has been so productive that Faceware is planning on doing significantly more ML development moving forward. “We’re in the process of converting almost everything to ML,” says Grenier. “It enables us to do things that frankly, we just couldn’t do before”. For example, the team is exploring using ML to solve for the cheek position, as there is often little to track to obtain pure cheek movement data. Similarly, “your highbrows can be hard if someone’s eyebrows are completely occluded by the rim of their glasses, – that’s another example of something that we traditionally have a lot of trouble with. But these new ML techniques may provide a light at the end of the tunnel that we might even be able to solve occlusion”.
Away from the jaw the software also has enhanced facial tracking technology, which improves the underlying tracking technology to improve the quality and robustness of the general tracking and animation.
Realtime Animation Viewport and Media Timeline
Users can see their facial animation from any angle with Studio’s fully-featured 3D animation viewport and use the Timeline and media controls to pause, play, and scrub through their media to find ideal frames for Calibration and to focus on specific sections of their video.
Dockable, Customizable Interface
A modern and customizable user interface with docking panels and saveable workspaces that allow for a completely personalized setup.
An internal toolset to customize and optimize animation results with simplicity and expandability. Eliminates the need for complex in-engine scripting.
Improved CPU/GPU Performance
Faceware Studio is optimized to create better and faster results while using less resources than its predecessor. Along with the optional ‘Optimize for Realtime’ feature, users can enjoy higher frame-rate tracking across a much wider range of hardware.