Face and Voice Re-enactment
At our tech-compound we have been working on more (fun) demonstrations of Facial Re-enactment.
As we flagged in an earlier fxguide story, we were involved with the world’s first feature film, The Champion, to convert an entire feature film from a foreign language to English. This was led by Adapt Entertainment with Pinscreen, and the film was successfully sold to Netflix. As great as that work was, since then the team has been rapidly expanding the technology. In our latest test, we not only worked again with Pinscreen but also with our friends at Respeecher, in the Ukraine.
In this latest demo, we explore both vision and audio re-enactment with machine learning (ML).
With the advancement of machine learning, AI is increasingly being used to do facial and voice re-enactments, our team has been focused on doing this at a production level of efficiency with minimal training data. The core technology involves creating ML-inferred versions of human faces and voices that are so realistic they can be used in place of real people in certain situations such as replacing the need for traditional dubbing.
Dubbing movies, or the process of replacing the original audio with a new language, can present a number of problems. Firstly, the translation may not always be accurate, resulting in a loss of meaning or cultural references. This can be particularly problematic for movies with complex dialogue or wordplay. Dubbing can be disrespectful to the actors, the director, and the screenwriters. Often the new dialogue is chosen not for accuracy or dramatic choices but to match the visemes of the original language. Secondly, the dubbed voices may not match the original actors’ performances, resulting in a jarring and unnatural viewing experience. This can be especially noticeable in emotional scenes where the original actor’s voice plays a crucial role in conveying the intended emotion. A combination of visual and audio re-enactment can provide a pathway to a much more natural viewing experience.
The ethical and practical implications are also closely explored by the Motus Lab team at the University of Sydney. This extends beyond the ethics of misuse and misrepresentation. If a subtitle is done poorly, it can seem ‘odd’. If an audio-only dub is bad; then it can be funny. But if a facial re-enactment is done badly, you can think the actor is actually acting badly. This is an incredibly important distinction. The research teams are rapidly advancing the art of re-enactment to provide a faster and more faithful translation of the acting choices of the original actor in the most respectful ways possible.
Behind the scenes – Raw video footage:
From a technical standpoint, we shot the demo with multiple cameras so that we could learn from Grant’s face and use it to drive (infer) Mike’s face. There was no special training data or other capture sessions. It was not necessary to do this with ‘Mike controlling Mike’ – as he was standing still, but in Grant’s case he was walking onto set and turning to talk. Similarly, the Respeecher team learned from both Mike’s and Grant’s voices and converted Grant’s words to sound like Mike’s. Note that the tempo and actual lip movements remain Grant’s even when he is sounding like Mike. This is important since when converting between languages an actor may choose to emphasize a different word in the order of the delivery – to better match the sub-text and acting choices needed for the new version of the scene. Above is the raw iPhone footage shot on set just to show that there were no tracking markers, special lights, or gear needed to inform the demo (the iPhone footage/audio was not used in the process).
The Emmy award-winning Respeecher was founded by friends and colleagues Alex Serdiuk, Dmytro Bielievtsov, and Grant Reaber. They use proprietary deep learning (artificial intelligence) techniques to produce high-quality synthetic speech. The company is based in Kyiv, Ukraine. This demo was only possible as Grant Reaber has a US passport and was able to join us in Sydney for the project. Apart from the enormous issues facing the team due to the war, Ukrainian men of a certain age are not allowed to travel.
Respeecher is not the only company in our industry affected by the war in Europe, but we are proud to work with them and promote the great research they are continuing to do in the most extreme conditions. Far from shutting down, the team is actively working with partners and productions around the world. Respeecher’s technology has a range of potential applications, from creating voiceovers to providing speech therapy for people with speech disabilities. The company gained early attention for its innovative approach to voice cloning.
Pinscreen is a technology startup founded by Hao Li, a computer graphics and machine learning expert. We have covered their work for several years at fxguide. The company specializes in developing cutting-edge ML for creating photorealistic digital humans. The company also has a strong track record in real-time ML technology.
Pinscreen is also expanding internationally with Dr. Hao Li now also an Associate Professor at the new Mohamed bin Zayed University of Artificial Intelligence: MBZUAI. The new University provides graduate-level, research-based academic excellence located in Abu Dhabi, United Arab Emirates. The LA-based Pinscreen research team worked with our Australian team, some of whom license technology from Pinscreen for use in the feature film conversion.
This work highlights the growing importance of AI and machine learning in the field of voice and speech technology, and visual effects generally. The AI field is changing so rapidly but it is great to see two ML companies that started years ago, still expanding and servicing the M&E industries today.