AI Generative Fashion Videos

The team at London College of Fashion believes that generative AI could transform the fashion industry. This week they hosted a panel presentation at GTC showing the work the team designed with Johannes Saam a Generative Artist, Futurist and Creative Technologist, at Framestore.

Johannes Saam is a computer graphics veteran. Rooting his career in a long list of film credits (including Mad Max 4, Thor, Captain America, and Prometheus, he ventured into the world of real-time, virtual realities and the metaverse. He holds an Academy Award and an Emmy and continuously creates NFT collections and other Web3-related creative projects.

The team at the London College of Fashion has long been on the cutting edge of exploring new technologies in Fashion and working with leading VFX houses around the world. For London Fashion Week showcase in 2018, they collaborated on a 2-year exploration between FIA and ILMxLAB, – Lucasfilm’s immersive entertainment division, for the potential application of immersive technologies within the fashion industry. This forward-looking program at the London College of Fashion continues and the new project this time have them partnering with Framestore.

Why do a demo ML : Meta Catwalk?

We spoke to Matthew Drinkwater, Head of Innovation, and Moin Roberts-Islam, Technology Development Manager at the London College of Fashion, as well as Johannes Saam (Framestore). “We wanted to do something for GTC,” explained Roberts-Islam. “We have been looking at the uses of generative AI for a while. For example, we had previously done a project using archival catwalk footage where we used AI to extract the motion data from the movement of the models which we then used with digital models and clothes in synthetic environments. This time we wanted to do something much more modern and cutting edge with what is possible today.”After much brainstorming and reaching out to Framestore’s Johannes Saam, the idea that emerged was to showcase a potential future where you could type in a designer’s name and the AI would generate a new original catwalk fashion show, in a possible style of that designer.

The logic behind Framestore’s involvement is more than a demonstration for demonstration’s sake. It points to the evolving and expanding nature of our industry. Framestore has a very progressive and effective vision for its role that is well beyond the move from ‘post’ to ‘VFX’. It reflects a shift in approach to working with clients and being focused on experiences, not shots. The company has adopted an approach informed by the wider field of CX or Customer Experience. In the creative design world, it is said that the User Interface (UI) is what something looks like, User Experience (UX ) is how it works and CX is the entire experience with multiple touchpoints for the client. In Framestore’s translation of this, it is easy to see UI is ‘the shot’, UX is how they achieve that shot and CX at Framestore is the bespoke experience they build per project for each director or creative team. In other words, based on the story being told, or the communication required Framestore stands ready to build around the creatives not just offer the latest tech.

To build bespoke pipelines that bend and shape to the needs of the project, rather than forcing the story into a form that the tech can deal with, is enormously important. While new tech is exciting, it is not an end in itself. More tech does not mean better. By contrast, the ability to take a wide range of new technologies and craft them into bespoke solutions while remaining cost-effective is a key skill for success today.

One can see this approach in how the Framestore solved the Meta Catwalk, while it is significant in its own creative agenda, it also shows just how much an AI creative solution is actually a combination of machine learning tools. The final solution is a modular mix of cutting-edge experimental Machine Learning (ML) tools blended with traditional VFX tools such as the Foundry’s Nuke.

Meta Catwalk

The aim of the project was to produce a meta catwalk, completely with digital fashion models and never before imagined clothes in the styles of famous designers. Saam trained ML engines on various famous designers and then had the ML infer new designs and show what those new designs would look like on a digital catwalk. The work is significant not only for the fashion inferences but also because such AI examples are typically stills. Many people have used Stable Diffusion models to infer a still image from text. It is of naturally great interest to fine-tune and direct AI models or engines such as Stable Diffusion and to generate video not stills.

At its core, one could think of the whole project as being akin to the now well known ‘style transfer’ ML approach. But such a simple direct approach would have meant providing only what a style transfer model can do, and that was not a good match to the subtlety of the project brief. Framestore’s approach to building bespoke experiences meant taking the idea of style transfer but then re-imagining it in a way that could deliver something much closer to what Matthew Drinkwater and the team wanted creatively.

Saam took images of key fashions from various famous designers and inserted them into Stable Diffusion using Dreambooth. This program, published in 2022 by the Google research team, is a technique to fine-tune diffusion models (like Stable Diffusion) by injecting a custom subject into the AI model. He also merged this with several other AI models to make a series of bespoke AI models for each designer. He then used more AI and visual effects techniques to build the video.

To get the core walk movement of the human models, Saam took the archival fashion plate or background clip and then cropped it around the real model (person). An ML program then isolated or segmented the model from the background producing an Alpha of the human catwalk model.

To adjust the poses and control or ‘direct’ the Stable Diffusion, Saam’s input images using various ControlNet models. The use of ControlNet open pose model allows for artist-controlled posing. ControlNet is a neural network structure to control diffusion models by adding extra conditions, and it provides even more control to Stable Diffusion. The key thing about ControlNet is that it is a solution to the problem of spatial consistency. Previously there was no efficient way to tell an AI model which parts of an input image to keep, ControlNet changed this by introducing a method to enable Stable Diffusion models to use additional input conditions that tell the AI model exactly what to do.

 

Saam then used Stable WarpFusion model to transfer the AI models onto the imagery of the real cropped models. WarpFusion was written by Alex Spirin as well as the broader Disco Diffusion model approach. In simplest terms, these tools generate optical flow maps from input videos to warp init frames for consistent style and then warp processed frames in an ML clip for less noise in the final output video. Conceptually the process takes the first frame and diffuses it as normal, as an image input. Then this is warped with a flow map into the 2nd frame and blended with the original raw video’s second frame. This way the program gets the style from the heavily stylized 1st frame (warped accordingly) and content from the second frame (to reduce warping artifacts and prevent overexposure).

In the final video, there are only three actual human models being used, with multiple AI inferences transferred on top of them. Interestingly the archival footage fashion model that appears as a woman who turns to the left and then the right on successive walks uses a base that is a male model who only ever turned to the left.

The Room

Next Saam needed an audience and a catwalk room. The spaces are also inferred imagery. The room containing the catwalk is fully synthetic but with a video clip of an audience composited into it. These were created with similar techniques, but this time using Mid Journey combined with standard Nuke compositing.

Mid Journey Background rooms or spaces
Mid Journey Background that will be used in the final Nuke Comp

Saam then used Nuke to add shadows, and reflections and then composite the final video. This stage also involved extensive deflickering using optical flow.

The team is quick to point out that this AI video is just an extension and exploration of the artistry of the original designers.  They see this as an additional layer to enhance a traditional fashion collection and to ideate around the skill and vision of a traditional fashion designer.