Chris Ume is a European VFX artist living in Bangkok who has shot to international attention with his Tom Cruise Deep Fake videos or DeepTomCruise posts. Chris has demonstrated a level of identity swapping that has surprised and delighted the community in equal measure. Since he started posting the videos of Miles Fisher’s face swapped as Tom Cruise his email inbox has been swamped with requests for advice, help, and work. What has caught the imagination of so many fellow artists is how the TikTok videos have Fisher breaking the ‘rules’ of Neural Rendering or Deep Fakes. In the videos, DeepTomCruise pulls on jumpers over his face, takes on and off glasses, hats without any seeming concern about occlusion, and how DeepTomCruise regularly has his hair or his hand partially over his face.
Ume uses as his backbone the free AI or Machine Learning (ML) software DeepFaceLab 2.0 (DFL), but the process is far from being a fully automated process. For each short video Ume spends 15 to 20 hours working to perfect the shot and sell the illusion. While anyone can download the software, the final clip is anything but a one-button-press solution. As with all VFX, the artist’s role is central and what looks easy and effortless on-screen is actually complex and oftentimes challenging.
Each video starts with a conversation with Tom Cruise impersonator Miles Fisher. It is actually Fisher who films himself and sends the videos to Ume. There is never a tight script, Ume has explained the known limits and invited Fisher to push the boundaries. Ume does not direct the actor, and to date, only one video has had to be reshot. In the original version of the lollypop clip, Fisher too often came very close to the camera, turned, and dropped in and out of frame.
Ume uses DFL 2.0 which no longer supports AMD GPUs/OpenCL, the only way to use it is with nVidia GPU (minimum 3.0 CUDA compute level supported GPU required) or CPU. Ume uses an A6000 nVidia card. The actual software version of DFL 2.0 that Ume uses is faceshiftlabs, which is a Github a fork of actual DFL code.
Fisher films the base clips on his iPhone and sends the files to Ume. The resolution is not high similar to 720P but at the end of each process, Ume performs an UpRes. He prefers to do this on the combined comped clip as he feels often times it is a mismatch in sharpness and perceived resolution that makes a deep fake look unrealistic.
A key part of Ume’s process is Machine Video Editor. MVE is a free community supported tool for deepfake project management it helps with data gathering to compositing, it fully supports DeepFaceLab and data format, Ume uses it extensively for the supporting mattes that are required for the later compositing work.
When doing any such ML the training stage is time-consuming and Ume normally allows “2 to 3 days at least, maybe more, depending on how quickly the shot clears up” to tackle a new subject such as DeepTomCruise. While it is his work on DeepTomCruise that most people know, Ume has done many similar projects with different subjects and targets.
The focus of MVE is neural rendering project management, and it allows Ume to have all his DFL training material in a single project folder, and data for data scraping, extracting, with advanced sorting methods, set analysis, augmentation, and manual face and mask editor tools.
The program helps with automatic face tagging avoids the need for manual identification of eyebrows, eyes, noses, mouths, or chins. The program is not open-sourced, but it is free.
DFL 2.0 has improved and optimized the process, which means Ume can train higher resolution models or train existing ones faster. But the new version only supports two models – SAEHD and Quick 96. There is no longer any H128/H64/DF/LIAEF/SAE models available and any pre-trained models (SAE/SAEHD) from 1.0 are not compatible. Ume only uses SAEHD, he sees Quick96 as just a fast rough test model and while he has explored it, DeepTomCruise uses SAEHD.
All the compositing is currently done in AfterEffects. Ume is interested to explore NUKE, especially with its new ML nodes such as Copycat, but for now, he knows AE so well it is hard to shift applications. Some of the software in his pipeline only runs on PC so this is the platform that Ume does all his work.
As part of the compositing, Ume has experimented with changing hair color, patching skin textures, and noticed interesting artifacts from the training space into the solution space of DFL. For example, when Miles leans very close to the camera, the lens distortion is sometimes not reflected in the solution. This means the new DeepTomCruise has a jaw that is the wrong apparent width and is not receding fully with its distance to the lens. A face close to the camera at eye level will have the chin relatively thinner due to the wide-angle effect, but this is rare to see in actual Tom Cruise footage as the actor is seldom shot this way. In these cases, Ume uses the jaw much more from Miller than DeepTomCruise.
Ume is very collaborative working with VFX houses and also all the major artists working in the Deep Fake space. A group including users such as ctrl shift face, futuring machine, deephomage, dr fakenstein, the fakening, shamook, next face, and derpfakes , who collectively represent some of the best known usewrs on Github all share ideas and work to demonstrate the sort of amazing work that can be done with neural rendering technology.
Miles Fisher has sent respectful emails to Cruise’s management explaining that his & Ume’s work is just to explore and educate Deep Fakes and neural rendering technology, and he has vowed to never use DeepTomCruiseto promote a product or cause. Ume’s primary aim is to educate as to what is possible and build his own career in visual effects. “My goal was to work with Matt & Trey, (South Park) which I am now doing. My next goal is to work with ‘The Lord of the Rings‘ team. I grew up watching those movies over and over again,” Ume explains, admiringly referring to Weta Digital.