This week, Sucker Punch Productions' Second Son is up for another gong at the VES Awards in LA. We published an article on fxguide some time ago about about this third game in the inFAMOUS series and in particular the particle effects. But in addition to the great effects simulations and animations, the game has a very strong face animation pipeline. The new hardware of the PS4 allowed a jump in quality and facial animation which makes the game incredibly visually impressive. This, coupled with a strong narrative, has made Second Son one of the most successful games on the PS4, and it regularly rates as one of the most popular in polls of PS4 users.
Gaming presents real issues that can be very different from films, and Sucker Punch's Spencer Alexander made a great presentation on the face pipeline at GDC 2014 last year which highlights the depth, complexity and increasing sophistication of modern games when dealing with faces. It also allows the game to have a really impactful backstory and character arcs which are often given short change in first person shooter (FPS) games.
Just the FACS
Previous games had real-time cut scenes but the face animation was hand key frame animated using only joint based face rigs. For this new game the team decided to use a vastly more detailed facial and motion capture system with a data driven - example based approach - using the FACS system.
FACS stands for Facial Action Coding System and it was developed primarily by psychologist Paul Ekman, but not for computer graphics. Ekman was aiming to understand human facial expressions and meaning. The FACS system was part of that research. His original research question was one that we almost take for granted today, namely "are there universal facial expressions of emotion?" As part of this he developed a way of coding different visible facial actions, many times these actions related to specific facial muscle movements people had. In producing a system of breaking down expressions of an actor or person into a codified set related to a base capture Ekman produced an incredible wealth of research to build on.
Dr Mark Sagar pioneered the use of the FACS in motion capture for digital characters, from the system originally devised by Paul Ekman. The FACS system has been credited with helping to capture and translate the subtlety of emotion in the faces of digital characters in Weta Digital's major effects films such as King Kong and Avatar, and gained Sagar his second Sci-tech Oscar (in a row) in 2011. Today Dr Sagar is at the University of Auckland and his work is featured inside our fxphd.com with 'BabyX'. The FACS system is extremely widely used as the backbone of facial animation capture and reproduction the world over. FACS is now so widely regarded that Sucker Punch decided they would adopt its approach for the complex facial work required inside Second Son.
With 44 unique action units (AUs) coupled with camera reference for head and eye movements, the team at Sucker Punch could film actors performing and then break that motion down into the component facial FACS based poses. It should be noted that FACS was never aiming to be an animation system, so not all the standard FACS poses are normally relevant, nor do the FACS poses always have any 1:1 relationship to an individual muscle, but in combination with a good rig and the reference from the actual actor with a face camera it is possible to produce award winning facial animation with FACS.
Given the approach of having a large data set of known examples as input to make a generalize representation of the faces of the characters talking and emoting, the actors needed to be sampled with a consistent facial marker system. Sucker Punch decided to make plastic vacuform face masks with holes allowing for repeatable dots to be placed on the actors faces before each session, this helped provide consistent marker correspondence.
Head rigs present their own issues, for example a single camera can have trouble seeing the depth of say a lip, as demonstrated further below in the test shot. This can be solved by using multiple cameras on the one head rig.
Once the performance is captured the data is solved from the captured mocap data to the rigs and their blend shapes. Finally, motion re-targetting applies this to which ever character face the gamer will see. Clearly it would be easier if each character was just the digital version of the actor, exactly matched, but in a game environment that is not the case.
In a sense there are two data streams:
- The data from the FACS poses that informs the solver as to which 'muscles' to move and a re-targetting that will get from this 'FACS' face of an actual actor to the character's digital face.
- The performance capture which is solved as to how to combine a set of FACS poses into one expression that matches those things but also re-targeted to the game character. Plus specific eye tracking or eyelines.
Animators would still tweak certain things, such as say sticky lips (the way lips Do not just open and close but have surface contact/surface tension that makes a set of lips more real).
Of course, animation is movement and while key poses are important what is critical is also the timing. Humans do not move in a linear fashion from one expression to the next. Pose dynamics can give emotional context.
Games at the new level of the PS4 require a level of detail not focused on before. For example, complex wrinkles on the face - especially around the eyes and mouth. In the film world this might be just part of the model, and certainly research such as that done at USC ICT has worked to very high resolution mapping actual skin pores between key poses, but for a game with its real time requirements and data/memory limits, this is done by overlaying a live wrinkle engine that adds believable detail.
In the end Sucker Punch did 70 facial pose scans per actor with the actor being sampled using two linked structured light field scanners.
fxguide spoke to Spencer Alexander about the amazing face pipeline he implemented with the team at Sucker Punch. The facial performance capture started with internal research. As Alexander had worked as a Character TD at various visual effects houses prior to to his move to the game industry he was already very familiar with a non-games pipeline. "I had been following related technical papers coming out of Siggraph Conferences for quite a while," he says, "and I have been lucky in that I was able to see firsthand how a few of the more successful techniques I had seen in papers were implemented in practicality as well as see how much work was going on behind the scenes to complete."
Building a new pipeline
The key at Sucker Punch was to solve the pipeline not by just having a huge amount of animators - since the volume of game play footage and cut scenes would make that approach unworkable.
Several new new facial pipeline ideas were examined and one idea in particular seemed like it might be a good fit, but still required a good amount of research and proof of concepts. "I wanted to combine some successful techniques that were already production-proven, just in a unique way that would hopefully help compensate for fewer shapes while adding some fidelity in motion and perhaps reduce the amount of animation on the back end," notes Alexander. "Those areas of highest interest to me were the example based methods that utilized a series of facial scans (FACS scans)."
Areas of research that the team looked at included using Principle Component Analysis (PCA) as drivers with Singular Value Decomposition, and basis function regression to solve shapes based on 2d marker or optical flow tracking. This ended up being very close to what was used, but Alexander opted to not use Principle Component Eigen Vectors as the drivers. PCA seemed a very good candidate and it has been used very successfully, but it was thought that it can end up being very unintuitive to work with. "I felt if you could specifically call out the drivers most appropriate to regions of the face it would be very useful for tying it nicely together with animation controls that were more intuitive," he says. "It would also be easier to troubleshoot with recognizable values."
In addition, Alexander found that most methods based on 2d tracking were getting odd influences of shapes around the mouth where the lips had depth variations but very little to no difference in a frontal view, causing overlapping influences of unrelated poses. "I opted for a 3d marker tracking approach," explains Alexander, "which also gives you a ground truth to compare your results and evaluate any discrepancies of error at the same surface points on your solved results."
Alexander and the team had a lot of success using Pose Space Deformers (PSD) as interactive nodes in a rigging solution for realistic body deformations and saw a reasonable path for making them work for facial animation and performance capture.
The facial pipeline was closely discussed with the Engine and Rendering Team Lead Adrian Bentley who implemented the necessary engine code and Maya plugins, as well as some additional research, to support this pipeline approach.
The team also spent quite a bit of research time testing the various head mounted facial rig options as well as the corresponding tracking requirements to find a suitable fit for their production needs. This resulted in a dual camera solution, which also solved the lip depth issue.
The team needed really needed the characters to exhibit a very broad range of emotions and expressions throughout the game, subtle and also very expressive. "We wanted the versatility of just letting the actors perform," says Alexander, "allowing for their improvisations and to bring their own range of emotions and nuance into the game without the results just looking 'posey'. What we were really hoping for was that much of the story and connection to the characters could come through with the subtle micro-expressions in the face where you could see the thought and emotions behind their eyes and capture as much as you can directly from their performance, without having to spell it all out line for line in the dialog."
This approach opens up the storytelling options, but one surprisingly successful example which Alexander thought might have gone too far, was when the Animation Director Billy Harper decided to really push the face pipeline and got actor Travis Willingham (who played Delsin’s brother Reggie, chewing gum during a scene (!) In the end that performance came through as captured directly from the actor's performance to the digital Reggie successfully.
Many FACS-based poses were scanned together with various additional combination expressions - chosen to get a full range of capability of what the actors’ faces can do within the teams notional 'budget' of scans per actor.
"We did include a number of shapes that were signature looks that we let the actor choose for expressions that were common and unique to the character, to help ensure that certain recognizable signature looks got through," adds Alexander.
To reach the level of expressiveness of the actors, each of the scanned shapes that was acquired was automatically broken down into small regions around each of the 168+ joints on the face with a smooth falloff between them for nice blending across regions. Each of the regions could be triggered independently based on the relevant marker configuration from motion capture. To maintain the fidelity of the expressions while in motion, the joints on the face moved directly with the 3d tracked markers and the shapes deformations were applied pre-skinCluster to move naturally with the surface during shape interpolation providing additional coherency and fluidity between shape and movement.
"Attempts with strictly blendShape interpolation with too few of shapes can take 'wonky' zig zagging, linear paths between shapes, that our method helps compensate for," says Alexander. "There was no specific limit to the amount of expressiveness that each of our characters was capable as ultimately the shapes can be morphed together in any combination to best match what the actor was doing on set. Even with the best pipeline in the world - there is still normally room for improvement - these got hand animation 'polish' in that respect most pipelined can always benefit from more scanned shapes."
FACS was always an important part of the new pipeline plan. "It was important that we had precise topology correspondence between shapes for nice blending but it was of particular importance at the marker positions that were used as shape drivers," describes Alexander. "The team opted to have engineering grade custom vacuum-formed masks made of the actors' faces with holes drilled for reliable face marker dots. Using this on both the FACS session and during the later mocap shoots to help ensure their correspondence between the two data sets."
"We also had to have a system of scanning that we could trust would accurately stitch together shapes acquired from multiple camera pairs with very high precision to maintain that correspondence," adds Alexander.Mesh with locations and markers.
Scanning faces can be particularly difficult around the eyes, teeth, any hair, including eyelashes and other facial hair, with the data leaving these areas as effectively holes. The scattered light in the eyes and teeth and fine lashes allow for scanning to work well. "The lips are difficult," notes Alexander, "since much of the inner lips are not always seen in each scan, but it is important to maintain the proper thickness of the lips in the inner mouth for good blending … an area very often ignored." The team required modeling cleanup on these important shapes around the inner lips and eyes. A talent blendShape artist in the end would just use the imagery from the scans as reference and work from a good understanding of basic facial anatomy to get the data right.
Matching body and face
In addition to the face capture there was a traditional motion capture suit used for the body that was worn at the same time as the wireless head mounted camera rigs. It is important to sync body and face, as even shoulder movement can effect the way a face is 'read' by a user. To sync the two rigs they added hardware for timecode sync lock. The actors would hit the standard "T" pose before each take to calibrate and align. Says Alexander: "We did use some additional tech for deformations after the motion capture was applied to the rig, but the capture portion of the bodies was business as usual."
In terms of retargeting the team used an in-house set of tools that ran a constraint relaxation algorithm that used edge information per vertex with the relationship between the actor’s default pose against the actor’s individual FACS poses as a basis for deriving the game character FACS poses relative to the game character’s own default pose. "This still required good topological correspondence where the points should lie on similar surface anatomy even though the overall shapes of the head and face can vary between the two default poses," explains Alexander.
The team retargeted their tracked motion captured marker data using the same constraint relaxation algorithm only on a low res marker mesh representation of the default poses with vertices snapped to the marker positions. The marker mesh was also directly driven by the motion capture data and used against the defaults for retargeting the motion. "We did have an alternate method as a backup," says Alexander, "where you could use the weight values solved on the actor directly applied to corresponding shape weights on the game character rig, which would be more adept for characters with very different anatomy."
As for retargeting the bodies from the actor's motion to the game characters, the team utilized the pipeline already in place at the motion capture studio, which delivered re-targeted body skeletons. They also developed an in-house method of inheriting and re-targeting motion between characters on the fly in engine.
Alexander commented that it took a lot of motion studies and educated trial and error to come up with a minimum combination of driver values using the known marker training data at each pose. "The Pose Space Deformer (PSD) uses a gaussian radial basis function to solve the weights after setting up a matrix of our training data of known shapes weights for easily converting motion data to weight values within our interactive deformer via a linear least squares solver."
In the system, each additional driver meant adding another dimension of input to the matrix and would thus require a multitude of additional pose shapes to fill the space. Appropriate drivers were carefully chosen for each region. Because of this and the variation of travel between related movements a method was used to regularize the inputs with a scalar value while avoiding scaling inputs too much which can "muddy up or blur shapes together", even when exactly matching to known poses.
The main characters used the exact same drivers as all the other characters, and did not require any special customisations. "The nice thing is that once a good combination of drivers were found, they can now be re-used for training any face," says Alexander.
The facial rigging, under the hood, worked the same for animation as it did for motion capture. By simply moving facial controls, it was possible to unwittingly move a duplicate set of the markers used as shape drivers, and that would pull away from the motion capture results. "One benefit of this was that when the animators keyed from pose to pose, any relevant shape changes would kick in along the way. For example, the lips would wait to compress until they came together when moving from a mouth open to tight lipped pose."Facial animation interface.
There were multiple levels of control for animation:
• There was a simple UI with pictures of the actor taken from one of the cameras during each of the FACS scans. Selecting a picture would pop up a larger view with a set of “broad region” sliders unique to the pose that would weight on that section of the face to match the pose. If all the sliders on that picture were active, it should have looked exactly like the pose over the whole face. In other words, the marker drivers they were controlling the face would move in a normalized fashion so they would add up to exactly match the entire pose when all the sliders for that pose were active. "Because you are never directly manipulating shapes," explains Alexander, but rather the driver markers you can avoid some of the over-adding issues' which take the face off model. The animators additionally had animation attributes that they could use to move off the motion capture data to whatever the degree desired."
• There was also broad region controls that could be revealed on the face that would allow an animator to manipulate and pose the face such that one could see appropriate blending of shapes in the relevant areas - that best correspond to that motion. Additional fine level controls could be revealed at each face joint/marker position for tweaking. There were also attributes that would warp the results from the solved shape such that the marker positions would exactly reach the 3d tracked markers. "With a good 3d motion capture track of the face markers, this would allow for some very natural organic motion directly tied to the fluidity of the actor’s face, by compensating for any slight discrepancies between the results and performance," Alexander explains.Mouth animation.
Adding wrinkles on top
One important aspect of realism was adding wrinkles to the digital characters. This could not be a fundamental part of the model for data and rendering reasons so the wrinkles and creases on the face were normal maps that were automated using tension and compression of the face geometry at a vertex level. The system would do this by interpreting the vertex rgb colors as alphas.
"So Red was compression, green was stretch and blue was the default alpha," sets out Alexander. "This would dynamically transitioning from the default normal map to the stretch or compression maps independently across the face. A low res cage corresponding to a subset of the vertices on the face would be used to determine per triangle area for driving the rgb vertex colors during compression and stretch that would smoothly blend the alpha colors between the edges. This gets transferred to the high res on the fly. This allows for faster calculation, smoother falloff, and more art direction while avoiding unnecessary noise and helped avoid it feeling like wrinkle patches were simply turning on and off. The wrinkle maps were derived from a combination of extreme poses taking the difference of the high res scans and final resolution topology. We found that in UV space the vast majority of the wrinkles were repeated in the same areas so it was efficient to use a combined map could be utilized to represent the wrinkles of the entire face, that only get revealed where compression occurs."Wrinkles set-up. In-engine wrinkles on display.
Real time issues
As Second Son is a game this face pipeline needed to work in real time. This is important not only for the run time technical considerations but for downloadable content and also just the variable 'Karma' state of the character. "We knew," says Alexander, "we wanted the same animation system used for high resolution facial animation in our real-time cutscenes as would be used for any other animations in game, hand-keyed or otherwise for dialog or expressions and it was determined that we couldn’t afford to have the high number of blendShapes at the desired resolution to store all that in memory. A PSD deformer applied directly in engine would be too slow for our purposes."
In the past, Alexander had used geometry caching on film projects and he had always been impressed with the amount of geometry that can be pushed around in real-time and scrubbed say in Maya and Houdini. It turned out GPU decompression could be utilized very efficiently for streaming especially if using small values relative to the skinCluster results. "We also took advantage of the fact that our skinned joints moved with the surface of the face that could be used alone as an optional Level of Detail (LOD), such as during gameplay when the faces are not at Full Frame."
The team applied vertex offsets pre-skinCluster, which allowed any variable rate streaming to still follow relative to the skinning deformations. They also re-calculated normals on the fly to remain correct after the deformations. "We also had to correct for seam tears in engine where streamed vertices met the rest of the head," says Alexander, "since the skinned results in Maya varied due to differing implementation for round off error than what the GPU used."A run-down of the facial animation set-up.
There were some calculations done early on based on some early specs that determined that the team could safely have three of the highest LOD actors faces up close and personal with full vertex level animation at one time. So they decided they were safe to move ahead with the plan - and implement the system. "This was exciting," adds Alexander, "since the streaming immediately opened the door of possibilities, as it allowed for us to use whatever deformers or complicated methods we wanted to animate at a vertex level." This translated into every cutscene including a talking character if desired.
The rendering was not driven by the rig, other than the points mentioned above. The wrinkles were automated based on the compression levels which was applied as a Maya plugin and shader matching the wrinkles in engine for the animators to visualize. The faces could be animated for triggering dialog or expressions for any point in the game using the same animation rigs that were used for cutscenes, such as when the characters used their powers. However, these animations were usually just using the joint’s transforms and skinning alone and didn’t utilize the extra vertex level animations, where you could specify to the exporter within the animation scene whether or not to use the vertex data.
The project had a very tight schedule, as games often do, so there was not expected to be a lot of time for polishing any final work. But due to a well run project, they actually managed to get a simulated skin layer of vertex level animation applied to a number of characters in quite a few of our shots, most noticeable on the elderly Betty character.
The game was released to huge critical and commercial success.
See more on inFAMOUS: Second Son in our original fxguide story from last year (click here).
We've been a free service since 1999 and now rely on the generous contributions of readers like you. If you'd like to help support our work, please join the hundreds of others and become an fxinsider member.