Conversational Agents with Cate Blanchett: the expanding world of 'film tech' to Agents

Dr Mark Sagar leads the world in complex emotionally engaging Agents that embody cognition. He is a good friend of fxguide and we have covered his work in the past. (Click here for more on BabyX)

Complex facial animation, until recently, meant feature film scenes rendered over hours if not days. Gaming companies such as Unreal and Unity have produced remarkable results with real-time rendered graphics. But in most of these examples the characters were replications of a human. Either mimicking a motion capture artist or the performance of an actor. Soul Machines has just unveiled a commerical cognitive conversational agent that you can talk to and have it respond with answers that are not scripted. Soul Machines is the new company launched by Weta Digital alumni Mark Sagar.

Nadia is a conversational bot, with a face. She looks at you, answers your questions and holds a normal conversation, and she is not a CG replication of a source actress.

Nadia’s father Mark Sagar and her voice Cate Blanchett

In the world of CG characters nearly all fall into one of three categories Actor, Avatar or Agent. An actor is a digital copy (often motion capture). An Avatar is really a digital puppet, someone speaks and drives in real time a digital ‘stand-in’. The last category is an Agent, something almost completely stand alone which is driven by its own ‘brain’ or Artificial intelligence. It can be thought of as a super chat-bot with a digital human face.

Nadia is

able to say ‘anything’, her responses are not scripted or pre recorded
she has a highly detailed real time face able to express complex emotions
she reads the emotions of the user and reacts accordingly
she is designed to aid people with disabilities who may find interacting with a keyboard less satisfying (NDIS)
her answers are provided and powered by advanced AI, the more she is used – the smarter her answers become
a diverse group of people with disabilities and researchers aided in her design, for both effectiveness and empathy
…she is just the beginning of what is coming.

https://www.youtube.com/watch?v=3jMQuTXTj6c

Cate Blanchett provides the ‘voice’ of Nadia. Nadia is an Agent for the National Disability Program funded by the Australian government. Nadia answers, and her ontology is provided in this case by IBM’s Watson technology. The technology is designed to be knowledge based agnostic. An Agent such as Nadia, from Soul Machines, could connect to a variety of back-end conversational knowledge bots.

Nadia is the front end of the National Disability Insurance Scheme, (NDIS). This is a large national healthcare program initiated by the Australian Government for Australians with a disability. The Bill was introduced into Parliament in November 2012 and it represents billions of dollars in funding for Australians with disabilities, and will help hundred of thousands of Australians when fully operational.

Cate Blanchett the voice of Nadia

Nadia sounds like Cate Blanchett, but Nadia does not repeat catch phrases or recorded dialogue from the Oscar winning actress. Over 20 hours of Blanchett’s speech was carefully recorded and then processed. This means Nadia can now say anything. She is not limited to lines that Blanchett recorded. If Nadia has a reason to, she can say anything that is needed and yet sound remarkably like a natural, unedited sound bite that Blanchett could have spoken. This represents a major leap of faith for the international actress. Blanchett, who was keen to help this project for people with disabilities, only agreed to proceed when she was comfortable with how the technology would be used.

Nadia can vary in her emotional response, but the team did not require Cate Blanchett to record her sample dialogue with different emotions. Nadia has signal processing to vary the base read that Blanchett gave, to match Nadia’s range of possible emotional deliveries. “We got Cate to perform everything neutrally, and she was very good at keeping a consistent tone. With this consistent tone, we can modulate, in real-time, to create inflections in her voice”, Sagar explains. “For example, we can make her raise her voice at the end of a sentence, if it is a question”. The team got Cate to record a set of dialogue that had a vast range of phonemes. The system then creates a large sampling tree which the computer then explores to make up any particular sentence. “We had to do a fair bit of work” recalls Sagar, “as the existing phoneme dictionaries are not designed for ‘Australian’ (!) so we had to modify ‘English’ phonemes to make it all work”

Affective computing

The ability to have an Agent say anything that ‘she’ wants is a key aspect of the project, but it is only one part of what is remarkable about Nadia. Nadia is designed as an Affective Computing Agent. This means she responds emotionally to the emotional state of the user. If she feels you are worried she may react differently than if you indicate you are distracted or excited. Affective Computing started at MIT in 1997. In 1997 when Professor Rosalind W. Picard first started exploring computers paying attention to emotion and displaying emotion, many thought that she was throwing away a promising career to explore an academic dead end.

Rachel – a fin tech agent shown accessing Watson AI in NY last month offering credit card advice (see below)

Picard defined a role for emotion in computing. Emotions were often viewed negatively, as in the common phase “stop being so emotional”. Being more logical and less emotional was considered ‘better’ and more helpful in solving a task or an issue. In recent times, this has been very much challenged as the importance of emotion to task success and human interaction has been more widely agreed upon. While the value of emotion to success has been recently reappraised, the role of emotions has been discussed for millennia. The role of emotion in human behavior has a long and checked history. Aristotle espoused a view of emotion, often remarkably similar to modern psychological theories, arguing that emotions (such as anger), in moderation, play a useful role, especially in interactions with others. “Those who express anger at appropriate times are praiseworthy, while those lacking in anger at appropriate times are treated as a fool” But this view was replaced with one that suggested emotions should be kept in check and ignored in making ‘sensible’ decisions. Logic, not emotion, was the basis of most AI research. However, in the last decade Affective computing, built on work from MIT, have shown the great role emotions can serve in computer modeling. Of course for the film community, emotion is everything. Games and Films are built on emotional reactions and responses. So perhaps it is no accident that much of the tech driving these new Agents is coming from researchers at places such as Weta Digital and ILM.

Baby X compared to Nadia.

Mark Sagar’s earlier BabyX, Infant Agent, was more surprising or spontaneous in her responses than the new Adult Agent. Nadia has been built to be more predictable and more restrained in her responses. This is a deliberate decision. Nadia would be ineffective if she laughed at a subject someone found serious or distressing inside the NDIS scheme. BabyX was designed to be more emotional and unpredictable. This difference can be thought of as the difference between a directed actor who is ‘scripted’, and an actor who is performing improv.

Nadia’s responses are ‘scripted’ to the extent that what she says is carefully curated by the NDIS – however they can change the content of what is spoken anytime

Soul Machines aims to address both models moving forward, both the scripted agent who remains on topic and only moderates their responses with limited emotional context, and the fully emotionally interactive, unscripted model that started with BabyX.

Emotions in BabyX are, in fact, coordinated brain-body states that modulate activity in other circuits (such as increasing the gain on perceptual circuits). Emotional states modulate the sensitivity of behavioral circuits. For example, stress lowers the threshold for triggering a brainstem central pattern generator that, in turn, generates the motor pattern of facial muscles in crying. Neurotransmitters and neuromodulators play many key roles in BabyX’s learning and affective systems. An example of a ‘physiological’ digital variable that affects both the internal and external state of BabyX is dopamine, which provides a good example of how modeling at a low level interlinks various phenomena. Nadia does not have the same level of developed Neurochemical simulation, instead she communicates internally to IBM’s Watson to provide adult reasoned conversational responses on the workings of the NDIS and its service. Nadia is less emotionally engaged, but able to reason and speak in ways that the last version of Baby X could not.

“Baby X is about creating an autonomous system, the focus is on creating models of human nature and human motivation, it is really looking at everything to do with theories of motivation and aspects from a biological point of view. Nadia is very different, it is the same technology, but her role is to confer knowledge and inform from a government database.” explains Sagar.

While the behavioural sub-systems are common to both, Nadia is replying with factually correct information that the government has provided, in a way that BabyX was never designed to do. BabyX is and was a research project that explored issues such as Neural Network representations of the subcortical structures of the brain, such as the basal ganglia. BabyX is not connected to an external AI engine, such as Watson. She is designed to represent the latest in neuro chemical and behavioural learning theories in the form of an infant. Many teams have produced digital humans, but most are just the illusion of life, by copying the outward facial reactions. Baby X’s facial expressions are derived from a deep model of emotion and perception behind the face and not from a script or state machine approach.

https://www.fxguide.com/featured/avatars-and-agents-babyx/

Interestingly, the language sub-system from Nadia is informing the newest version of Baby X, who is advancing so fast she is moving from the current public version 3.0 to a version 5.0. These advances include language lessons learnt from Nadia and significant improvements in the quality of her rendering. Version 4 was to extend the baby’s representation to include her upper torso, but the new version 5.0 goes well beyond the current version in many significant areas. “It has a full body, it’s brain can control its arms or legs and it has new types of neural networks for it, – it is an entire rebuild”, says Sagar.

The Baby X project represents a serious research effort to build a simulation exploring infant neurological development. It is a cognitive simulation, not a ‘trick’ or illusion. It is one stage in a longer effort for gain a deeper understand of how we process and exist in the world.

To get a glimpse of some of these new advances fxguide has been given exclusive access to Roman. Roman is very much the big brother of BabyX. Roman represents the new generation of improved realtime Sub-surface Scattering (SSS) skin rendering and new hair and detailed facial animation. Roman is modelled on a real person in New Zealand “and when he stands side by side with the monitor it is incredible” says Sagar.

At the moment, the team in New Zealand can make a new, custom high quality Avatar in six to eight weeks, using their custom in-house tools. The team can even animate other people’s facial rigs, after they have set up a special mapping to a third party rig.

EXCLUSIVE VIDEO: A Look at the Next Generation of Agents from Soul Machines

Our affinity with faces

The aim is for this technology to have wide scale effective adoption. Thus far, much of this work in realistic faces, has been focused in the area of film where the only criterion for success has been a passive ‘creative’ success in narrative storytelling. Real-time rendering and interactive displays of human faces are largely unexplored for use in the areas of general affective computing, education or general user interfaces. This is because any partial implementation will fall victim to the Uncanny Valley Theory and provoke a negative reaction. In effect one “does it well or not at all”. In the physical world people enjoy seeing other people. People gravitate to faces and particularly to photographs of faces. It is generally agreed that people like to see other people face-to-face. The human face is one of the most expressive and important visual aspects of meaningful communication.

Given people’s natural attraction to faces, it is reasonable to explore how faces might be used in computer-based communication. It is important to focus on meaningful communication, (complex interaction), as there are certainly many occasions where one-to-one facial engagement is not the preferred method for communication, and a text based interface is more than satisfactory for most people. But what an average computer user might find satisfactory may not be the case for a member of the disabled community.

Computers offer amazing tools to allow people with challenges to function more effectively, but narrowing this interaction to a keyboard is unhelpful. Many in the disabled community have found tools such as voice activated command bots such as Apple SIRI a great help but SIRI offers any user a black, blank screen. Agents offer more than just novelty they offer a richer interaction.

Mark Sagar’s Soul Machines aims to provide many different agents, and has developed the Auckland Face Simulator (AFS) as a tool to automate much of the process. The AFS was used for Roman and is being constantly improved with new algorithms and additions to the pipeline. For example, Roman has much more complex hair than either Nadia or BabyX.

It is has been reported in the media that Nadia is ‘powered’ by IBM Watson, but this does not adequately address the relationship. IBM’s Watson is an incredible AI engine which can be used to provide intelligent answers to complex natural language questions. What Soul Machines provides is not just a face for this technology but an emotional framework that emcompasses both reading the emotions of the user and expressing complex affective emotional loops back to the user. The Agents knowledge on a subject is provided by the impressive IBM Watson but the system is engaging and appealing because the agent is not a dead talking head but an expressive interactive presence.

Furthermore, the user facing avatar can run on any computer or device. Nadia is effectively streamed to users with her complex processing happening in the cloud. Soul Machines have even more advanced plans for future versions of avatars running natively on your computer.

Why have an agent?

One of the reasons Nadia was created is to get rid of hundreds of complex forms that can be challenging for anyone to fill out, but particularly members of the disabled community. The challenge however is broad, as the disabled community is diverse. For some in the disabled community, the facial tracking of the user needs to accommodate partial paralysis, for example, stroke victims. For blind users there is no need to display Nadia’s face. Nadia remembers each user, so your history is never lost and any new sessions have the context of previous questions and answers. “The history of interactions you’ve had will be remembered so you don’t have to cover old ground” Sagar explains. “At the moment when you phone some company help or customer service line, you have to recount your entire history, just to ask your new question. Nadia doesn’t work like that, she remembers”.

Eye detail on Roman, rendered in real time.

It is unlikely that Nadia will be perfect for everyone but the more she is used generally, the smarter she becomes. At the moment that intelligence is general, in other words, Nadia improves overall, but her individual behaviour to any one individual is not progressively modified. Sagar can see no reason why the system could not be extended to allow individual customization, that would mirror familiarisation, based on individual learnt response improvement. For now, as she interacts more, it improves her responses overall not individually. “She learns the topics that people have asked about and from the responses. Like a family doctor, – when you see Nadia again she remembers what you discussed last time”.

Soul Machines does not render using an off-the-shelf tool or game engine. Their work spans from scanning to final implementation, but given the founder’s strong high end feature film background it is little wonder that the team is constantly pushing the highest level of rendering realism. While Pixar or Weta may have hours per frame to render a face, Soul Machines have milliseconds.

Agents as UI tool

The way that both Nadia and Soul Machines has been established suggests that many more agents will soon be appearing. The software engineering was not focused on a one off special project. The underlying structure is modular and designed to accommodate a variety of Agents speaking in different languages and for a vast range of applications.

Below Nadia at the LENDIT conference in New York.

Shantenu Agarwal from IBM Watson introduces “Rachel” from Soul Machines, on stage at the LendIt Conference in NY. Rachel’s knowledge about finance or credit card information comes from Watson Artificial Intelligence. Rachel exhibits Soul Machines ‘Emotional Intelligence’. She can see you and hear you, as well as being emotionally responsive and receptive. Emotional Cognition is creating a link between humans and computers, offering a glimpse of possible future forms of new User experience.

Conversational Agents with Cate Blanchett: the expanding world of ‘film tech’ to Agents

Posted by Mike Seymour ON April 6, 2017