At fxguide we tend to focus mainly on visual effects, but the tools of animation, simulation, and real-time engines are allowing companies and artists to expand into areas not traditionally that of an old school ‘post-house’. There have been some impressive examples of this such as the real-time digital puppets from the Mill. Framestore has been doing brilliant work with Magic Leap in AR and DNEG, ILM and Weta all have teams exploring work away from traditional VFX.
With the growth of mobile, people have been shown to want experiences and real-time interaction. Adobe just published that 40% of consumers want to receive real-time offers and deal from Chatbots. For many medium-sized post houses they have seen business that was once being channelled into high-end TVC production now being channelled into experience driven e-commerce.
A key aspect of these new forms of entertainment and interaction is natural language interaction. Already many of us use Siri and Alexa on a daily basis, and the quality and error rates have been improving greatly over the last few years. Alexa, in particular, is remarkably good in understanding a wide variety of commands and instructions. Good conversational AI uses context and nuance, the responses seem instantaneous but to do this the models need to be very large and run in real-time.
NVIDIA has taken another step forward in this area with some record-breaking real-time conversational AI. It has done this as part of its Project Megatron-LM, an on ongoing research into Natural Language Processing (NLP). One of the latest advancements in NLP, and a hot area of research is ‘Transformer Models’. These language models are currently the state of the art for many tasks including article completion, question answering, and dialog systems. The two famous transformers are Bidirectional Encoder Representations from Transformers (BERT), and the GPT-2. NVIDIA’s Project Megatron-LM is an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism trained on trained on 1472 Tesla V100-SXM3-32GB GPUS & 92 DGX -2H (DGX SuperPOD) servers, making it the largest transformer model ever trained.
Google with Transformer and then BERT, Microsoft with Mt-DNN, Alibaba with their Enriched BERT base and FaceBook with their RoBERTa technology have all advanced conversation AI and sped up processing over the last couple of years.
NVIDIA’s Project Megatron-LM AI platform is now able to train one of the most advanced AI language models, BERT, in less than an hour (53 minutes) and complete AI inference in just over 2 milliseconds. This is well under the 10-millisecond processing threshold for many real-time applications, and a lot less than the over 40 milliseconds often seen in some CPU server implementations.
Some forms of conversational AI services have previously existed for several years. But until now, it has been extremely difficult for chatbots, intelligent personal assistants and search engines to operate with human-level comprehension due to the inability to deploy extremely large AI models in real-time. The issue is both one of training and of latency. NVIDIA has addressed this problem by adding key optimizations to its AI platform — achieving speed records in AI training and inference and building the largest language model of its kind to date.
Early adopters of NVIDIA’s performance advances include Microsoft and a set of young innovative startups, which are harnessing NVIDIA’s platform to develop highly intuitive, immediately responsive language-based services for their customers. AI services powered by natural language understanding are expected to grow exponentially in the coming years. Digital voice assistants alone are anticipated to climb from 2.5 billion to 8 billion within the next four years. We will see more conversational controls in ‘Smart TV’s, ‘Smart speakers’, and wearables. Additionally, Gartner predicts, by 2021, 15% of all customer service interactions will be completely handled by AI, an increase of 400% from 2017.
The good news is that NVIDIAs work in on Github now and accessible, it will be interesting to see how it is harnessed to produce new and ‘sticky’ user experiences in the months and years ahead.