SciTech Awards: Medusa Capturing Disney’s Characters


The ‘Sci-Tech’ awards 2019

Three of the five films nominated for Best Visual Effects in the 2019 Oscars used the Disney Research Studio Medusa Rig which was honoured itself this year by the Academy for dense facial capture. The Medusa research and development won a Technical Achievement Award (or Sci-Tech Award). The award was presented at the Academy of Motion Picture Arts and Sciences’ Scientific and Technical Achievement Awards on February 9, in Beverly Hills.

Honoured by the Academy were Thabo Beeler, Derek Bradley, Bernd Bickel and Markus Gross for the conception, design and engineering of the Medusa Performance Capture System. Medusa captures exceptionally dense animated meshes without markers or makeup, creating visual fidelity facial performance data and improving productivity for character facial performances.

Each of the four recipients flew to LA during the vetting process to present to the Academy, which does extensive research into the history of any project up for an award. Unlike the normal Oscars, the Sci-Tech awards are not a wide vote, but an award based on detailed research over several months.

Bernd Bickel, Thabo Beeler, Derek Bradley, and Markus Gross at  the Academy of Motion Picture Arts and Sciences’ Scientific and Technical Achievement Awards on February 9, 2019 in Beverly Hills, California.

The Medusa Performance Capture solution was developed by Disney Research Studios in Zurich is a mobile system consisting of a series of normal video cameras and standard illumination. As Beeler explains, “based on the video feeds, the Medusa system is able to reconstruct the high resolution geometry of the face, and in particular, how that geometry, deforms over time. It provides four dimensional data that is perfect in correspondence.”

Medusa rig – this rig is both portable and lightweight. (fxguide’s Mike Seymour seen being scanned).
Thabo Beeler at the Academy of Motion Picture Arts and Sciences’ Scientific and Technical Achievement Awards.

This is different from a Light Stage and its approach of Polarized Spherical Gradient Illumination.  A Light Stage typically needs several Illumination conditions to capture all the data required to reconstruct a face. The Light Stage is also focused on capturing the actor’s appearance, however so far this has not been the focus of the Medusa rig.  “We can capture everything that’s needed from a single instance in time, which then means that we can capture an actor, and their performance, at full frame rates,” outlines Beeler. “We don’t have to have the actor sit still and hold an expression during particular illumination conditions. The person can just act, and we can reconstruct their performance faithfully from that.” To do this, Medusa does not use a polarised system of lighting nor does it separately capture specular and diffuse skin properties. The Medusa captures a performance and then analyses it, so the rig runs at the speed that the cameras can run at. Typically an actor such as Josh Brolin, who played Thanos in Avengers: Infinity War, is captured at rates of either 24 fps or 60Hz.

Marvel Studios’ AVENGERS: INFINITY WAR..Thanos (Josh Brolin).©Marvel Studios 2018

While the Medusa rig is thought of as the camera and lighting setup, these are really not the key technology, it is the software that processes these data streams. The rig itself is far less of an engineering and electronics feat than the Light Stage.  They really are very different and often very complimentary technologies. It is not uncommon for a company such as ILM or Digital Domain to do both a Light Stage capture and a Medusa session when working on a high end project.

Derek Bradley (speaking).

Derek Bradley is quick to point out that while the four were awarded on the night, the Medusa was very much a team project. “We really need to thank our entire team and everyone who contributed to Medusa, in particular Bob Sumner and Paul Beardsley for research guidance, Max Grosse for software and Jan Wezel and Ronnie Gaensli for hardware engineering,” comments Bradley.

How it works

The Medusa rig is an extension of a photogrammetry approach that does not require dots or markers and captures the actors delivering lines and not just fixed (FACS) expressions.

Medusa is useful in two primary scenarios: 1 – building an expression shape library, and 2 – reconstructing performance dialog.

Building an expression shape library (Still expressions)

Reconstructing performance dialog (Paper 2 below)

The Medusa Performance Capture System is based on several recent technological advances created at Disney Research Zurich.  The system is central to three major research papers the Disney Research Team have published:

High-Quality Single-Shot Capture of Facial Geometry

High-Quality Passive Facial Performance Capture Using Anchor Frames

Rigid Stabilization of Facial Expressions

These three papers touch on the three critical aspects of innovation in Medusa.

The first paper highlights the capturing of a face’s geometry, it represented a modification of standard stereo refinement methods to capture pore-scale geometry of the face. The second paper is where Medusa moves beyond just an photogrammetry style approach. Many systems and facial approaches use image based techniques, and so does Medusa, except vastly enhanced. It finds correspondences in images in a flow style approach and goes a step further by using an anchor frames concept. It tries to not just perform a sequential tracking only forward in time, but instead it tries to understand how one’s face returns to certain expressions over and over again during any performance. “If you can identify these, this allows you to go straight to these expressions instead of doing these incremental hops,” says Beeler. “It allows you to do tracking more in a tree structured than in a sequential structure. This alleviates the problem of drift, and also helps with occlusion.”

Disney Research Studio Zurich

The subject of the final paper is perhaps not given enough importance. This paper highlights automatic stabilization, the separation of the rigid motion of the head versus a non rigid defamation of the face caused by expression. In fact this was a request from the first time the team delivered Medusa data. Initially Medusa did not have stabilization. “In the beginning, Medusa would just return face meshes and deformations over time,” comments Bradley. “We found out from the artists using the data on Maleficent that they actually wanted to remove the rigid motion in order to add their own rigid motion on top, for whatever the character was meant to be doing. This turned out to be a very laborious manual process for them. So we actually did the research to figure out a novel way to do this automatically. And it’s become sort of the third pillar of the Medusa technology,” he adds.

fxguide’s Mike Seymour being Medusa Scanned. Note the lights are strip LED lights and the rig has eight cameras.

ETH and Disney

The Medusa technology comes as the result of many years worth of research and scientific advances in the capturing and modelling of human faces. This research is a partnership of Industry and Academia.

Markus Gross during the Scientific and Technical Achievement Awards on February 9, 2019.

Thabo Beeler, Derek Bradley are both Senior Research Scientists at Disney Research’s Studio in Switzerland. They report to Markus Gross, Disney Vice President of Global Research and Development and the Disney Research Studios.  After countless research papers and contributions to the industry, Dr Gross is one of the most respected names in film technology research, having already previously won a Sci-Tech Award in 2013.  He is also a Professor in the Computer Science department of ETH Zurich. He has published more than 400 scientific papers on algorithms and methods in the field of computer graphics and computer vision and holds more than 30 patents. Gross represents the special and very close working relationship between Disney Research Studios and ETH, a University focused on science, technology and mathematics.

There’s a long history of research and development at ETH University into high quality facial capture. Gross’s leadership in bridging Disney to ETH directly lead to Medusa. “At some point he (Gross) hired Bernd Bickel as a PhD student and together they identified the need together to develop something like Medusa. They then hired Thabo (Beeler) as a master’s students to work on the first generation of the facial scanner.” This first effort would eventually become part of Medusa but initially it was a static scanner that just provided high resolution static geometry. “During this time, I was doing my PhD at the University of British Columbia in Canada,” adds Bradley. “I was working on methods for tracking deformable surfaces, like clothing. So I started also looking at faces and that’s the reason why Markus offered me a position to come and join Disney Research as a Postdoc.”

Bradley joined Beeler, Bickel and Gross and together they developed a method for tracking the face with high quality accuracy. One of their first innovations was to overcome the problem of drift, which often happens when a face is being tracking over time. It was this core team that put together the basis of the Medusa system and who published their first paper in 2011 on facial performance capture (see above).  In the later years, it would be primarily be Beeler and Bradley that productized Medusa and helped put it into production, with help from ILM and Digital Domain who used it for the first time on Maleficent in 2012 (the film premiered in 2014)

Bernd Bickel is now Assistant Professor, heading the Computer Graphics and Digital Fabrication group at IST Austria. IST is one of a set of universities, beyond their special relationship with ETH,  that Disney Research Studios continues to work with actively. Not only did Bickel’s research contribute to Medusa but he published work into methods that, given some strain measurements, allows the dialling in of facial wrinkles. This led to one of the key academic papers that inspired Digital Domain’s Masquerade software (see below).

Films Projects

As of writing, 19 major feature films have used Medusa starting with 2014’s Maleficent and most recently, 2019 Oscar nominees: Solo: A Star Wars Story, Ready Player One and Avengers: Infinity War, with several more films in production and being released soon.

Marvel Studios’ AVENGERS: INFINITY WAR.  Thanos (Josh Brolin).  ©Marvel Studios 2018

Medusa’s time based, or 4D capture system, informed the Digital Domain (DD) team for making Thanos. Medusa provided data showing how the face moves between key poses, as well as high resolution 3D meshes of the actor Josh Brolin’s face at any point.

For Avengers: Infinity War, DD took frames from a helmet-mounted camera system on Brolin’s head and used AI to output to a higher resolution and more accurate, digital version of the actor’s face.  Their Masquerade software learns from collected high-res tracking data from the Medusa and turns the 150 facial data points taken from a motion capture HMC session into roughly 40,000 points of high-res 3D actor face motion data.

Thanos (Josh Brolin). ©Marvel Studios 2018

This was significant, as one down side of the Medusa rig is that the actor is seated, in a special lighting rig, keeping their head relatively still but being filmed with an array of high quality computer vision cameras. DD combined the HMC camera data of the actor from a more natural acting environment, with the high resolution data from Medusa.  Compared to any on set device an actor might wear, the Medusa data is much richer and more detailed, but does result in the actor being confined to a special seated unit with controlled lighting and limited movement.

For more on Marvel Studios’ Thanos see our coverage here.

There are several significant films that the team remembers fondly,  one was Star Wars Episode VII, The Force Awakens. In the film Medusa was used for both Maz and Supreme Leader Snoke.

Maleficent and Digital Domain

One of the first projects to use Medusa was Digital Domain’s work on the Disney film Maleficent (2014)DD (see our fxguide coverage here) used the Medusa system to help bring the film’s three flower pixie characters to the screen.

The key to great performance capture is not just getting great data about key FACs poses, but to also getting the mesh moving coherently between poses. The team needed to see how the transitions happen. In the face, not all the muscles move at once, so the timing, how the shapes blend and how the pores of the skin stretch while transitioning between key poses was a real focus for this project.

For this project, the actors were scanned in a Light Stage but this data was dovetailed in with data from Medusa. The production built a motion capture stage in London, and worked with Disney Research Zurich to gain additional key facial data using the Medusa facial rig. On set the actors were filmed delivering their lines, even if they were in flying harness rigs. They would then come down and deliver the same set of lines again on the ground to allow a second capture that would map the face more accurately between the key expressions. The Medusa rig was an “excellent reference as to how particular FACs shapes transition from one face shape into another,” says Gary Roberts, Digital Domain virtual production supervisor on Maleficent. “With so many muscles in the face relaxing and compressing with overlapping time frames, the transition from shape A to shape B does not happen in linear fashion, so it is enormously helpful to see a 3D record of that transition in the form of a geometric mesh.”

Character poses and final fairy in Maleficent 2014.

The face shapes from the Medusa system allowed the team to build high complex and very detailed statistical models of the actor’s faces and necks, which were used to help process the video data from the head mounted camera systems. Medusa allowed DD to get a coherent moving mesh, which proved invaluable.  Kelly Port, the visual effects supervisor for Digital Domain explains: “It is one thing to go from FACs pose one to FACs pose two, but it is an entirely new reference to be able to see how that transition happens, because not all muscles fire at once. They are offset from one another. This is information that prior to this had not been really known. That reference was extremely helpful in seeing how poses blended one to another”.

ILM and Medusa

ILM has now two working Medusa rigs, being a part of the same extended Disney family. They have one rig in San Fransisco at the Presidio, and a second in London. “Each site has their own system and they are not exactly the same in terms of hardware. I think this actually shows the versatility of the system,” says Beeler. The system does require the cameras in a specific layout but it does not need a specific type of camera or even a specific type of Illumination. Disney Research Studios have a third configuration that they are constantly adjusting for new research.

ILM has been key in the expansion of the Medusa development and in its adoption into major feature film pipelines. Very early on ILM saw the significance of the Medusa and worked with the Zurich team to refine the technology into the ILM character pipeline.

Warcraft at ILM

Warcraft was the first film ILM did without the direct support of the Zurich team and it thus denoted a maturity of the Medusa rig. “We developed Medusa and we did all the research to bring it to production, but we are not a service house here in Zurich. We do not offer it, but ILM does,” explains Beeler. “There is always a ramp up time, training etc…so seeing it used at ILM on Warcraft, without our help, was a confirmation that the system is user friendly and can now be truly handed off to production.”

“We would like to thank our friends at ILM for adopting Medusa and offering it to production – we are very aware that we would not have received this award without them,” comments Beeler. “A big thanks to everyone in the industry who supported us and Medusa, in particular the ones who welcomed and introduced us to the world of VFX during the early days,” adds Bradley.

On the Red Carpet with ET! (Photos (c) Cyrill Beeler)

Light Stage X

Also honoured were the Light Stage team, Paul Debevec, Tim Hawkins and Wan-Chun Ma and Yu Xueming.

See our earlier story on this other Facial Sci-Tech Award Winner: click here

Paul Debevec and host David Oeylowo during the Academy Sci-Tech award,  February 9, 2019 in Beverly Hills.