Hao Li and the team from Pinscreen recently demonstrated real-time interactive face replacement, effectively allowing people to walk up to a screen and see themselves in a live, real-time ‘deepfake’. The installation was at the World Economic Forum (WEF) in Switzerland. The WEF was held in Davos, a town in the Swiss Alps. It was founded as a Non-Governmental Organization (NGO) in 1971. The WEF’s mission is cited as “committed to improving the state of the world by engaging business, political, academic, and other leaders of society to shape global, regional, and industry agendas.”
The whole area of digital face replacement is of enormous interest to VFX companies and the wider tech community. Pinscreen’s main purpose in developing this photorealistic real-time face-synthesis technology is to build the next-generation of virtual assistants and fashion avatars, which can be produced more efficiently and at scale, while looking extremely photoreal.
Hao Li is the Founder and CEO of Pinscreen and a Professor of Computer Science at the University of Southern California (USC), as well as Director of the Vision and Graphics Lab at the USC Institute for Creative Technologies. MIT Technology Review named him the “World Best Deepfaker”.
Pinscreen’s research involves the development of novel deep learning, data-driven, and geometry processing algorithms. Hao Li himself has become well known for his seminal work in avatar creation, facial animation, hair digitization, dynamic shape processing, as well as recent efforts in preventing the spread of malicious deep fakes. Pinscreen is a leading researcher group in detecting deepfakes and informing users about misrepresentation. Facebook and Google are also active in this research space. Google President Sundar Pichai posted recently that, “detecting deepfakes is one of the most important challenges ahead of us”, while Google has been releasing both audio and video data sets to help with synthetic video detection. Pinscreen’s research has reached levels as high as 94% accuracy in detection but only for known public figures (with large samples of real footage to train on).
FXG: Hao congratulations on the World Economic Forum, how did you get involved?
HL: I was involved with the World Economic Forum since 2018.They launched an internal knowledge platform called WEF Transformation Maps and I was contacted to help them with writing articles related to societal impact and issues around AR and VR as an expert.They then invited me to become a member of the global future councils, which defines the agenda for the yearly meetings in Davos. They learned about my work on digital humans and deepfakes, and invited my startup, Pinscreen, to exhibit at the Annual Meeting of the New Champions (the summer Davos event) in Dalian, China, to explore the reception of the demo. The installation turned out to be one of the highlights in Dalian, and they were interested in a follow-up installation with better capabilities at Davos 2020 as well as a Betazone presentation on deepfakes. We found that the best way to showcase the potential dangers of media manipulations was by letting world leadersplay with them directly and experience the rapid advancement of such technology
FXG: What did you show in Europe?
HL: We demonstrated a real-time (and zero-shot) deepfake technology, where participants could look into a virtual mirror and swap their faces
with a number of celebrities and public figures (such as Michelle Obama, Will Smith, Leonardo DiCaprio, etc.).
Deepfake software that circulates on the Internet requires a significant amount of data collection and days of person-specific training to achieve high-quality results. Our solution works instantly and can immediately insert another person’s face onto the user without any training, and the face-swapping is real-time.
FXG: Is this an extension of your paGAN research or a new technology?
HL: It is a new technology, which is more similar to deepfakes in that it uses a large number of training data for the source faces (the celebrities) but it uses paGAN to avoid training a model for each new subject.
FXG: In terms of performance, PinScreen showed face tracking several years ago. Is this a combination of Pinscreen technologies with faster hardware or new ML approaches?
HL: We do have a new ML approach, but there is also a lot of low-level optimization on cutting edge GPUs to achieve real-time performance and the desired resolution.
FXG: Was the camera looking at the guests a standard RGB camera or an RGBD (depth camera)?
HL: We use a standard Logitech Webcam, which is only RGB.
FXG: GANs seem to be a very rich vein of research for this type of image-based Machine Learning. How much or how fast are GANs progressing?
HL: We only use GANs in some parts of the pipeline. All processing has to be extremely fast so that we can achieve all processing at 25 fps.
FXG: Clearly the ability to do real-time presents many challenges, is a principle issue the lack of any training data on the target, (as they just walked up)?
The challenging part of achieving real-time has to do with the following:
- the ability to handle a lot of data, and to train better models with deeper networks
- the ability to generate high-resolution frames
- the ability to parallelize/schedule tasks
For each source face (celebrity), we used tens of thousands of frames.
FXG: If you can do this in real-time, does that imply you could do even high quality if the process is allowed to not be real-time, but operate in a more traditional production time frame?
HL: Absolutely, our latest demo is a high-fidelity deepfake video using a production-level pipeline which isn’t real-time but can generate higher resolution and more natural results.
We have done a new deepfake video for a team not in the USA – which we hope we can share soon. In it, we make someone who is famous outside the US, say and do things various things that they never did. The results of these types of high-resolution (but slower) processes are remarkable and I cannot see any real flaws with the naked eye, they work so well.
FXG: Can your system handle much head rotation?
HL: Yes as can be seen in the video results from our Globo collaboration, it can also handle significant head rotations.
FXG: For the WEF demo, how important is the lighting on the person who walks up?
HL: Our system naturally handles very challenging lighting conditions. You can take a light bulb with varying colors and see the faceswap reflect the changes in the lighting condition.
FXG: What happens if someone walks up with glasses?
HL: While glasses with thin frames are okay, thick framed glasses can be an issue, as our training does not include subjects with glasses.
FXG: Can you discuss where your hair simulation research is at presently? We know from our previous articles that Pinscreen has done extensive research in this area?
HL: If you are talking about a full head swap, I believe that hairstyle synthesis is certainly a possibility and an interesting future direction. My USC lab has done research in the past on rendering photoreal hair strands using a GAN approach, and we have some new research on authoring facial hair on the face. Hair can span a huge range of styles and shapes, and they have very complex interactions with the face. More sophisticated conditions need to be investigated.
FXG: How was the installation received?
HL: Our installation was one of the highlights of the event, and we had over thousands of participants playing with it, including celebrities and world leaders.
Many people were very interested in learning about the potential dangers of deepfake-type video manipulations, and some were interested in commercial applications,
or simply had fun becoming someone else.
FXG: Do you think attendees were surprised by the rapid advance of technology in this area?
HL: Yes, even though many already knew about deepfakes, they certainly did not expect the technology to be real-time already. We have shown some other non-real-time examples that we were experimenting with, and they were shocked by the fidelity and wanted to learn more about how to prevent its misuse and how lawmakers can get involved in developing new regulations given how fast the technology advances. I think our installation has accomplished its purpose in raising awareness of deepfake-type technologies.