The cinematics for EA’s Medal of Honor Warfighter game tell a dramatic story within the game experience, with emotional scenes of life off the battlefield. Looking near-photo-real and featuring stunning animated characters, the style deliberately carries a ‘digital’ feeling to keep the cinematics and gameplay experience in the same visual realm. We talk to Digital Domain’s Gary Roberts and Richard Morton about the approach taken and their virtual production pipeline.
fxg: This cinematic has a really interesting approach in that it maintains the game feeling – what were some of the challenges in actually not going for complete photorealism?
Richard: It was a fine line – we were approached by EA because the cinematics were trying to communicate a lot of things without actually saying anything. There wasn’t much dialogue. A lot of it was facial expression and reactions. You just can’t do that with a game-res model or with game animation. You may well ask, well why not shoot it? There were actually tests done by EA where they had tried it, and it was too real and would pull the gamer out of the game. The players still want to feel a connection to the game, but in this case it was really important to be able to communicate what was going on with non-verbal communication.
In some ways it was very freeing knowing we didn’t have to hit 100 per cent realism and we went pretty far along, and then we dialed it back because we knew we weren’t trying to be as perfect as Benjamin Button. There are moments when you get pulled into the story, and you may think for a moment, ‘This is real,’ but we always walked that fine line of always being in the game cinematic.
fxg: Can you talk about your performance capture pipeline?
Gary: The pipeline really begins before performance capture and there are some key stages. Because of the likeness of the actors required, we do a technical day with them. We bring them in and scan them and get technical reference.
Richard: You’ve got the acquisition of the scans, extremely hi-res with specific methodology of what we’re trying to capture. Then other artists process it in a different way so that texture painters and shader artists and lookdev artists can take it to the next step. Then it goes to a rigging team. There’s a lot of back and forth and making sure everyone is sync.
Gary: So we’re scanning and taking measurements of the actors, then we put together character rigs and geometry. In parallel we build a character mocap puppet which is a low-resolution version of the actor. We do the same for the environments and props. They get handed to virtual production and prep them for the shoot itself. We’re using the character mocap puppets so that they work in realtime – a realtime visualization of the characters in the environment that’s requried, along with the props and environments.
fxg: Where did the environments come from?
Richard: In some cases we had very specific reference we were matching to, in other cases we pulled reference where we pulled the essence from several different places, like the train station which is brand new but authentic to the location. Our environments team built them and gave them to virtual production early for their work. By the time we get the mocap back usually the environments are near to a renderable state.
fxg: What can you tell me about DD’s virtual production space?
Gary: We had a large virtual production and mocap studio – it was originally designed for Robert Zemeckis and his films. Digital Domain acquired the building and all the tech in it in October 2010. The building and a lot of the team came with it, because it’s like the keys to the Space Shuttle – there’s no manual.
We have three stages. Our main stage is peppered 122 motion captures around a capture volume that’s 65 feet by 35 feet, and 18 feet high. That system is designed to capture tracking features that we place on actor’s bodies, any kind of props, or set pieces. Inside the space, it’s very well lit. We have strong, soft diffuse lighting around the entire stage and that’s been designed for good video reference with witness cameras – which are used from editorial through the animation reference. The lighting also benefits our face-mounted cameras which capture the facial movement and eye movement of the actors.
The main stage is a sound stage so we get final audio at the same time. The entire system is realtime, so as the actors are performing the system is reconstructing their performance and then re-targeting it to the characters and props. We also have a realtime virtual camera – the director can pick up a device which has a HD flatscreen with tracking features and little joystick to control the field of view and everything. It’s basically a viewfinder into the digital world, and acts as closely as it can as a real camera.
We have another stage about a third of the size, with exactly the same capabilities. And then there’s a dedicated face survey stage, which we use to survey our actors prior to the shoot to get a good understanding from a statistical and algorithmic point of view how their face moves and deforms and how muscles flow across their face.
Wrapped around that in terms of infrastructure are four edit bays plugged into the same architecture. It’s very quick for us to shoot a camera on the virtual set and then render out a sequence of images for those cameras and submit them to editorial. It can take minutes from shooting on camera to cutting it together – the iteration for getting a director’s cut approved is very quick and organic. It’s a pretty cool process.
fxg: What’s then the process for capturing facial data?
Gary: The workflow is we have the actors on set in the mocap suits and in face-mounted cameras, then a virtual set and a virtual actor. We shoot it and do multiple takes. Once we’ve finished we suck in through editorial all the video and realtime reference from set, and then the director and editor sit down and cut together what we call a performance assembly and that is basically the best performances of the actors for a given scene. The final scene might be a minute long but the performance assembly may be longer. That’s then turned over to virtual production and we basically create a 3D version of the performance with all the assets used.
We also derive what we call ‘Kabuki’ – video from the face mounted cameras which we project onto the character’s face and have that play back in realtime. So we have 3D realtime version of the character’s performance with audio with video projected from the actors so the director can see the entire performance that is intended in realtime.
Then we go back down to the set and shoot the virtual camera. This is where shots are made and we’re composing the shots with the director, editor and VFX supe. Then we render out sequences that have been selected of favorite shots, almost like a film negative. They go back to editorial and put back into the cut. That’s all iterated on until everyone’s happy and we have full sequence. That’s still in low-res form and then turned over and that’s when we began the harder task of final facial and character motion. All along the way we’re getting to see the entire performance of the characters – it’s also fairly cheap as a process because we’re not doing final facial animation until we know what’s in frame and on camera.
fxg: How do you reconstruct the facial data?
Gary: With the four face cameras, we’re able to reconstruct from those cameras the features and the surface of the face and we have a pipeline that allows us to re-target that dataset of the actor onto the character in animation rig space. So when the animator gets the data, it makes sense to them.
fxg: How much tweaking of the animation are you doing?
Richard: It’s always an artistic process also of whether you want to enhance a smirk or an eye blink. In some cases a director may want to alter a performance and we can do that. This show also had a lot of cloth so we had to accommodate that as well. And then for hair and skin we followed that same pipeline. The stubble even is geometry, and it was essential to have as much detail as possible at pore level.
fxg: Tell me about some the sets – for example the kitchen set seems incredibly detailed.
Richard: I wish you could see the whole kitchen set! There were dirty dishes in the sink, drawings and magnets on the refrigerator which you never saw, thumb prints on the glass of water on the table. For our rendering, we rendered in V-Ray through Maya and composited in Nuke.
fxg: Any final views on the merger between games and film world?
Richard: More and more, games are getting to a higher level of reality and what’s expected in their cinematics and advertising keep getting closer and closer to film – I’m glad to be in a place where we can bridge that gap.
Gary: I would agree, there’s been a push for a while of live action cinematics to match the game and that’s obviously had a few critics in that they say it’s pulling away from the game. Now the game engines have the rendering and shading and lighting ability that can get to an extreme high level. And here we’re bringing it back from the photoreal environment to stay in the game. It’s exciting times, it’s the next wave creatively and technical I think.