In our latest story in our series of “Art of” stories, we explore the Art of Optical Flow. Optical flow is one of the most interesting growth areas of visual effects technology. We talk to the people who invented it, developed it and used it in such films as What Dreams May Come, Mission: Impossible and the Matrix series. In this week’s accompanying podcast we talk to Dr. Bill Collis who produced the original bullet-time optical flow.
This week’s podcast is with Bill Collis of the Foundry who make the Furnace optical flow plugins including Kronos retimer. When we were talking to Kim Libreri of ILM this week for the story, about retimers, he had this to say: “I think Bill’s(Collis) thing now has a better product … Kronos is probably the best thing that’s out there.” We speak to Bill about the theory and implementation of the Foundry’s Furnace plugins.
Optical flow was first used in feature films just over ten years ago, yet some of the brightest minds in the field are now working in the area of plugins and software add-ons. This will allow a whole generation of artists to get their hands on this extremely complex and powerful technology to use in most compositing packages from After Effects to Shake, Flame, Nuke, Fusion and Autodesk’s Toxik.
Optical flow, or the maths behind tracking every pixel in a moving image, is used today in standards conversion and video compression. In the world of visual effects, optical flow started as a tool for re-timing shots without producing strobing, and today it is used for tracking, 3D reconstruction, motion blur, auto roto, and dirt removal.
For each pixel we need to calculate a motion vector which is a sub-pixel x and y movement. We also need a measure of how good that vector is, and possibly such things as how different the next frame is. (That is, the absolute difference between a pixel in frame 1 shifted by its motion vector and the corresponding pixel in frame 2). There can also be a confidence level or a penalty for how different a pixel it is from its neighboring vectors.
All optical flow is done in floating point and often, but not always, stored in OpenEXR file formats.
The Rules for Production
Note from the editor: At fxguide we believe if you actually understand the actual algorithm you can be a better effects artist, and indeed we have even had emails bugging us for more theory and less product specific tips. Still some people are more artist than others. In this section we explain what is actually happening inside an optical flow program, helped generously by arguably the world’s leading researcher in the art of optical flow, Dr. Michael Black. His work has been used and referenced by most of the people we spoke to while researching this story. But if you’re not into theory – you can skip to the “rules of optical flow” below, where we summarize how the various implementation of optical flow leads to certain ‘rules of thumb’ about how to shoot material for use in optical flow.
“When we walk or drive or even move our heads, our view of the world changes. Even when we are at rest, the world around us may not be; objects fall, trees sway, and children run. Motion of this sort, and our understanding of it, seems so straightforward that we often take it for granted. In fact, this ability to understand a changing world is essential to survival; without it, there would be no continuity to our perceptions.” M.Black PhD thesis 1992.
Dr Black’s thesis presents optical flow as a representation of the apparent motion of the world projected on the image plane of a moving camera. Optical flow is presented as a 2D velocity field, describing the apparent motion in a clip. Optical flow results from moving objects in the scene or from camera motion.
In the simplest terms optical flow tracks every pixel in one frame to the next frame. The output is a series of vectors for every pixel in the shot. From these vectors and some analysis, one can do advanced image processing to find shapes and objects in the scene.
Why is it important? In very real terms optical flow is basic machine vision. Once we give the computer the power to actually see things in the frame and draw conclusions about shapes, objects and actors, it opens a door on a huge range of cool visual effects. Once the computer can see the car, it can roto the car, replace the background hidden or occluded by the car itself. Once the computer can see the car it can also see dirt spots on the film that are on part of the car and remove them – not by cloning, but by tracking in the correct texture from earlier frames. Once it can see, the computer can produce depth maps showing what is close and what is far away – and of course re-time shots – producing accurate missing frames between any two or more frames in a clip – allowing super smooth slow downs impossible any other way.
Optical flow, while a tool that lets us learn a lot about a scene, is not a 3D analysis, nor does it require any special equipment. It compares frames and tries to map every pixel from one frame to the next. If a perfectly smooth shiny but stationary sphere spins on its axis – there is no apparent motion hence no optical flow or oflow as it is often referred.
We asked Prof. Michael Black, now at Brown University, just how optical flow works in simple terms:
“Optical flow is the apparent motion of image pixels or regions from one frame to the next. Underlying optical flow is typically an assumption of ‘brightness constancy’; that is, the image values (brightness, color, etc) remain constant over time, though their 2D position in the image may change. Algorithms for estimating optical flow exploit this assumption in various ways to compute a velocity field that describes the horizontal and vertical motion of every pixel in the image. Optical flow is hard to compute for two main reasons. First, in image regions that are roughly homogeneous, the optical flow is ambiguous because the brightness constancy assumption is satisfied by many different motions. Second, in real scenes, the assumption is violated at motion boundaries and by changing lighting, non-rigid motions, shadows, transparency, reflections, etc. To address the former, all optical flow methods make some sort of assumption about the spatial variation of the optical flow that is used to resolve the ambiguity; these are just assumptions about the world which will be approximate and consequently may lead to errors in the flow estimates. The latter problem can be addressed by making much richer but more complicated assumptions about the changing image brightness or, more commonly, using robust statistical methods which can deal with ‘violations’ of the brightness constancy assumption.”
In general there are 3 optical flow algorithms:
a) Block matching
b) Frequency domain based correlation (Phased based correlation – Military)
c) Gradient based (Michael Black in 1992)
Higher level feature based techniques can be incorporated into any of these.
Block matching is exactly what is says. A small region of frame A is compared with similar-sized regions in frame B until a vector that minimizes some error criterion (typically a/bs of the difference) is chosen. This is ideal for very fast hardware implementations but in general, gives pretty poor output images and you get lots of spurious matches. It forms the basis of MPEG compression and is ideal for this as it’s fast and the odd spurious match can easily be corrected by sending a few more bits. It’s bad for image-based solutions as the resulting images look awful.
Phase correlation is the most popular frequency domain method. It was invented by Lockheed Martin in the 60’s for military purposes and then jointly developed by the BBC and Snell & Wilcox to form the first motion compensated standards converter. It works by comparing blocks in the frequency domain which tend to give more accurate results. Again, although very complicated, as it requires lots of FFT’s and multipliers, it is quite hardware friendly as it’s a closed form solution that doesn’t require iteration.
To the best of our ability to research, all software motion estimation solutions used in the post industry are gradient based. The idea is to take two blocks of data from adjacent frames and to use the gradient information to progress in an iterative manner to a solution where the pixels are aligned. As the same information has to be visible in both blocks it is usually implemented in a hierarchical fashion working through an image pyramid, starting at the small scale images with a vector per large block of pixels, eventually ending up at 1 vector per pixel. Each block at each scale requires a number of iterations to ensure convergence which means it can be highly computationally expensive. However, it gives results which look the most realistic of all methods.
When you hit process on an Oflow analysis, the computer is typically going to start tracking everything in the frame with the next frame. It often does this based on motion segmentation or breaking the shot down into regions, which produces motion fields or velocity maps. Noise and film grain fight against solutions as they break the “homogeneous regions” – as does transparent objects. Oflow also typically divides these regions into layers. So a car driving past a house with a tree out the front may result in the car on one layer, the tree on another and the house on a third layer. The better the software is at picking the edges between these things, the better the Oflow.
The four-stage process for extracting layers from a clip are:
-segment motion regions in each frame
-determine corresponding regions across other frames
-link the corresponding frames with the starting or “reference” frame
-generate intensity and alpha maps for each layer.
Most Oflow is based on an assumption about the objects in a scene – namely that there is a single motion per object, i.e. the car moves left to right, the pan of the camera makes the tree seem to move right to left, etc. This idea that in most areas there is a single motion is key if you’re on set and setting up a shot to use optical flow. The “Single Motion Assumption” makes transparent objects or objects with traveling specular highlights very hard to solve. It makes no ‘sense’ to the computer to have the car moving left to right and then have a spec ping on the car to move right to left. Most of the time you need to avoid breaking the single motion assumption to get great results. Luckily, most people do respond well to this basic assumption – since people are rarely transparent or have extremely hot highlight pings – but of course, some costume and wardrobe choices can produce complex problems. If you have a correct subdivision with lots of accuracy at or around edges the object is cut out as if with a matte. If not, parts of the backgrounds tend to be ‘carried” along with the foreground especially in algorithms using segmentation algorithms rather then per pixel solutions.
Dr Black: “Optical flow estimation is a chicken-and-egg problem: if you know how to segment the scene into differently moving objects, then computing their motion is relatively easy; if you know how to compute motion accurately you can segment the scene into differently moving objects. The problem is these two things have to be done simultaneously. And it’s hard. Automatic segmentation is, in general, a hard and unsolved problem. There are techniques that first segment the scene using color or other static cues and then estimate the motion of the segmented regions. There are also methods that attempt to combine motion and brightness cues to help segment the scene and localize motion boundaries. To date none of these methods is “perfect” and the main problem with optical flow results today is still that motion boundaries are not well localized. The accuracy of optical flow methods today is actually pretty good except at motion boundaries.”
Dr. Black’s work has been focused on the places where optical flow breaks, particularly places where multiple motions occur, such as at motion boundaries, or as a result of transparency or reflection. His 1992 thesis introduced robust statistical methods for dealing with violations of the brightness constancy violations and problems. The code has been widely used in academia and in the film industry.
Dr. Black’s later work with Allan Jepson aimed to solve the problem of layered overlapping motion. This mixture model idea had not been used previously and has had a significant impact on the field. The method allows one to estimate multiple motions within an image region and assign pixels probabilistically to the different motions, “essentially doing motion estimation and segmentation simultaneously,” he explains. The method uses something called the EM algorithm which is tailored to solving these “chicken-and-egg” problems.
Most recently his work has undertaken the first careful study of the spatial statistics of optical flow. With his team, he has found that he can improve on early approaches. This represents a potentially significant advance, but there is still more to do with respect to improving the assumption that things are of similar brightness between frames or the so-called brightness constancy assumption .
His newest research is focused on machine learning. “I also think that we have the tools now to learn a model that combines information about image brightness edges and optical flow to improve the accuracy of flow estimation at object boundaries. My work also looks at human motion and in particular the 3D tracking of people in video sequences; that is, markerless motion capture from one or more cameras.”
The Rules for Optical Flow
So what do we learn from all this theory on how optical flow works?
Optical flow code is written with some assumptions or rules – when your shot sticks closer to the rules – you get a better result. This is natural – we all understand green or blue screen keying these days – which in turn means we all have a bunch of “rules” for shooting green screen material … such as don’t wear a green shirt in front of the green screen, or don’t use promist filters or avoid having a green screen too close to the talent. You can choose to violate these rules anytime you like, but you may create more work for yourself, and achieving great results may be much much harder to obtain.
In optical flow the rules are:
Rule 1.Transparent things and things that violate the ‘Single motion assumption’ do not work as well
Rule 2. Flashing lights or things that violate the “Brightness Constancy Assumption” will work less well
Rule 3. Very grainy or noisy footage works less well
Rule 4. Pre-processing will normally hurt rather than help an optical flow analysis. We asked both Dr. Black and Dr. Bill Collis from the Foundry (the man behind the original Matrix bullet-time re-timing) and both agreed that the algorithms are built to accommodate noise and grain – and pre-processing is like using secondary colour correction on a green screen transfer to pump up the greens – it looks good to the naked eye but in reality does more harm than good. “I do not advocate pre-processing but rather a careful modeling of the noise properties in your sequences. This allows one to formulate a principled probabilistic approach to the flow estimation problem,” advises Dr. Black. Thus degrain, denoise or averaging are out.
Rule 5. Optical flow is looking for patterns between frames, so vast movements may not be easily solved, nor too random motion or motion where an object changes radically from frame to frame. If you find it hard to follow from one frame to the next – it is likely the computer will too!
Rule 6. Edges help. If it is possible to add some backlight or rim light to make something stand out from a background that will help.
Rule 7. To help beat the “chicken and egg” problem Dr. Black describes above – give the computer a chicken! Many programs allow matte input. If you can provide a roto or a key or some valid matte for an object, this will vastly improve the problem. At the moment, there is no database of shapes or higher level shape register in most optical flow programs, so isolating an object is extremely powerful. The oflow retimer Kronos from the Foundry has such a matte input. According to Bill Collis: “it is all worked out on a per-pixel basis, as, to the best of my knowledge, are all other commercial motion estimation engines. The only places where we currently use image understanding is by a user supplying mattes and in trying to detect occlusions. However, the next generation of algorithms that we are currently working on will be heavily reliant on image understanding. The per-pixel optic flow algorithms will still form a large part of the new algorithms.”
Rule 8. Processing time is more or less directly and linearly related to image size. Twice the pixels means twice the computation, although with most algorithms there tends to be little point in trying to estimate a vector exactly per pixel, as this tends to give more random vectors. Most systems tend to work on sub-sampled images, controlled by the parameter such as VectorDetail, which gives smoother more natural results.
Rule 9. Due to issues of edge separation, most software works best with motion that is relatively regular, with slower shots, and shots where there is cross-motion should definitely be avoided. People walking across the screen in both directions is an example of this.
The first major research when the Horn and Schunck method is introduced
Heeger introduces spatio-temporal filters for estimating flow and the first papers start to be published on the topic, at the “Optical Society annual meeting presentation in 1987” recalls James Bergen of Sarnoff.
The first papers apply to using oflow for automated tracking for military applications. A lot of the early tracking technology was actually developed at Sarnoff under military research contracts, as opposed to by the military themselves, but quickly the focus of research shifted to compression technology.
1998 saw the first MPEG meeting. Compression would develop as a major aspect of oflow research.
Anandan introduces the SSD matching method.
Sarnoff group popularizes area-based methods and affine motion approaches.
1992: MPEG-1 standard adopted.
In 1992 Michael Black publishes Robust Incremental Optical Flow while at Yale University. This research is pivotal to later VFX applications. Black’s work was the basis of oflow work in the Matrix UCAP system and many other implementations.
Link is here to Black’s Thesis.
TRACK was first released in 1993. In 1998, Dr. Douglas R. Roble won the technical Achievement Award for his contribution to tracking technology, and for the design and implementation of the TRACK system for camera position calculation and scene reconstruction. The TRACK system is an integrated software tool at Digital Domain that uses computer-vision techniques to extract critical 2D and 3D information about a scene and the camera used to film it. TRACK, now (2006) in version 5, has been completely rewritten and is still used today at Digital Domain. TRACK feeds Digital Domain’s (d2) NUKE compositing system with full 3D camera solutions. Roble is the first to admit that tracking is still “very much an art”. He sees the role of his team, which includes John Flynn and Henrik Falt, as providing the artists at Digital Domain with a range of tools that allow them to solve very complex problems. He explains that artists are often required to start with one method, then use another and finally a third or fourth approach to solve many of the more complex problems they face. TRACK is therefore a very flexible tool, allowing a number of different ways to track and solve a camera move. The software has moved from its roots as a 2D tracker to now encompassing Optical Flow and a highly complex 3D camera tracker. Roble remains today one of the world’s leading experts on optical flow for tracking.
In 1993 Prof. Black joined the Xerox Palo Alto Research Center where he managed the Image Understanding area and later founded the Digital Video Analysis group, where he would stay until joining Brown University in 2000. Prof. Black’s research interests in machine vision include optical flow estimation, human motion analysis and probabilistic models of the visual world.
Wang and Adelson popularize the problem of estimating motion in layers, and Jepson and Black introduce probabilistic mixture models for flow estimation.
Barron, Fleet, and Beauchemin publish a quantitative evaluation of optical flow methods.
Peter Litwinowicz and Michael Hoch developed some advanced tracking code when they were in Apple’s Advanced Technology Group (1987-94). The tools they developed included an adaptation of Kass-Witkin “snakes” that were driven by optical flow, edge-seeking, and arbitrarily-scaled and disposed correlation tracking windows along the length of each “snake.” “After he left Apple, Pete Litwinowicz went on to apply optical flow to rendering images with brushstrokes in motion, with beautiful results,” recalls Lance Williams, fellow Apple employee and himself a leading Oflow researcher (see Disney Story below).
1995 – 1996
The Start of Oflow in Feature films
Kim Libreri, now at ILM, joined Cinesite London in 1995 and was Chief Technology Officer (CTO) when Kodak Rochester contacted the facility about some new image processing technology that their researcher Sergei Fogel had developed. The technology allowed interpolation between frames, and Kodak’s American research arm thought that the London production team might find it useful for film work. The technology was quickly named Cinespeed and was the first major commercial oflow re-timer as part of the Kodak Cineon System. The Cineon was first released in 1993 and was abandoned by 1997, with the technology of Cinespeed briefly resurfacing in RAYZ (see below).
The first film shot that used Cinespeed was actually Mission: Impossible, which Libreri was working on. In the shot, the camera circled Tom Cruise as he kissed Emmanuelle Beart on a rough turntable. It was successfully smoothed and re-timed using a combination of traditional 2D and Cinespeed. “We were basically presented with a working prototype of a system for re-timing” recalls Libreri, “while the shot was actually cut from the film, it worked well.”
Kim Libreri saw the enormous potential of oflow but it was still seen by most as just “effectively a tracking system.” Around this time Libreri and Cinesite made contact with Bill Collis at Snell & Wilcox (S&W). Collis and the team at S&W were using oflow for standards conversion, although it was then called motion compensation. “Did they (S&W) have more sophisticated algorithms than Kodak?” wondered Libreri.
Libreri at this time submitted a document to S&W saying “if you give us access to the vectors then we can do…re-speeding plates, automatic rotoscoping, wire or rig removal. But you have to remember this was ’95 before any of these packages were available commercially and most people thought we were idiots!” he jokes. S&W did decide to do their own software option named flomo – as a Flame plugin but it was scrapped sometime later and never released.
At Siggraph 1995, Nick Brooks of Mass Illusion approached Libreri wondering if he had “any ideas how we could do this painted thing with a sequence of moving images that look like one painting. I said you should talk to the Kodak guys … they have this technology that would allow you track every pixel and maybe you could use it to track paint strokes,” replied Libreri.
The project was pre-production on What Dreams May Come. Brooks did get access to Cinespeed, the Kodak vector code. “That is where Pierre (Jasmin) came in. The first tests were done with Kodak Cinespeed, and that was the first non-interpolation use of oflow, the first interesting use of optical flow,” states Libreri.
Pierre Jasmin first experimented in film animation and adopted computers as his primary medium in 1984. He became a digital paint system software engineer and worked on five projects, including SoftImage Eddie and Flame at Discreet, where he was the first official software engineer. He left Discreet to work at Mass Illusions. Jasmin comments ” I did these two Mass Illusion tests to help respective production companies finance the projects with Nick Brooks, much before we actually went into production”.
“I believe Mass Illusion (where I was) and Cinesite London in 1996 (Dan Piponi, later of ESC) were the first companies to do oflow based effects that were not re-timing per se. Tracking brush strokes and 3d plants (What Dreams May Come, fall 1996) and the bullet time effect (Matrix, winter 1997) in our case, and some simple effects in the space movie whose title I forget for Cinesite,” recalls Jasmin.
Jasmin and Peter Litwinowicz wrote Motion Paint which captured the nuances of the motion in live-action photography, such as plants blowing in the wind or the swirl of a coat as a character turns. Synthetic objects were then also driven by the captured motion and placed into the live-action.
Mr. Litwinowicz has been creating software tools for created computer-generated images with a hand-crafted look since 1987. He spent nine years at Apple Computer where he developed many innovative animation and rendering techniques. Peter Litwinowicz recalls, “I was at Apple from July 1988 through March of 1997. I worked in the research group there (then known as the Advanced Technology Group). I knew the people who were helping to develop compression schemes for video for Apple. One of the guys was doing work with early motion-based encoding and he had an intern that developed a rudimentary pixel tracker to aid in the encoding. Of course, the algorithm was a very simplistic optical flow algorithm. For some reason, I thought this was a very cool scheme and put it in my brain that this code existed within the research group. This was probably around 1993 or 1994 or so.”
In the early 1990s there were many SIGGRAPH papers on non-photorealistic rendering. “I really liked the direction that those papers were going,” says Litwinowicz. “I wondered if I could process moving images in a sequence in a way that was non-photorealistic and, at that same time, didn’t seem like the filter was applied to each individual frame – what I call the shower door effect. This shower door effect looks like the process is applied independently of the image sequence. No other non-photorealistic rendering algorithm at the time (that I know of) actually tried to take into account the temporal coherence of a moving live-action sequence. So I developed a system that tracked brush strokes through the scene in order to both: a) process the images for a painterly effect AND b) animate the brush strokes to follow the underlying motion in the scene.”
Litwinowicz published the first paper that took into account temporal coherence when applying non-photorealistic rendering to live-action footage. Processing Images and Video for an Impressionistic Effect was published in the SIGGRAPH 1997 proceedings.
“Pierre saw the abstract for this paper and invited me to come work on What Dreams May Come, recalls Litwinowicz. He had independently invented the same technique for the movie and thought there was some synergy there (duh!)”.
When Litwinowicz, in the area of retiming using optical flow, there was only initially Cinespeed, later Realviz’s ReTimer would come along, and others.
WDMC, which used optical flow to push paint strokes around for a feature film, was released in October 1998. “What most people don’t know is that we also used the optical flow of the live-action to also motion blur the painted strokes,” says Litwinowicz. “So we used the calculated motion vectors of the live-action for multiple purposes.”
Jasmin and Litwinowicz ended up leaving direct production and formed the San Francisco-based REVisionFX which is responsible for a range of oflow products including one of the best retiming products on the market Twixtor, which is widely used in Shake systems especially for advanced feature film retiming. Litwinowicz recalls “Our Video Gogh program (which used a stripped-down version of what we did on What Dreams May Come) was released in August 1999. Our ReelSmart Motion Blur (which adds motion blur using optical flow) was released in January 2000. And, of course, our popular retiming product, Twixtor, was released in November of 2000.”
Pierre Jasmin and Peter Litwinowicz were rewarded with an Oscar for their “Painted World” work on What Dreams May Come. A REVisionFX press release from 1998 commented that WDMC marked an important “first” in visual effects, “successfully implemented the first production pipeline to use computer vision technology so pervasively. The “Painted World” section is the first long moving-picture sequence that relied on image-based animation and non-photorealistic rendering, two growing areas of development in computer graphics.”
Two other programmers hired at this time at Mass Illusions were George Borshukov and Dan Piponi. The two would form a powerful team and both would be instrumental with advances in oflow during the production of WDMC and the Matrix films that would follow.
Borshukov investigated oflow approaches for video compression in the Fall 1994 at University of Rochester as an undergraduate student, as well as oflow segmentation for video compression. “I first started working in optical flow fall of 1994, in Rochester on my undergraduate work. This was all in 1994 and we were doing this on Sun workstations and it was all super exciting to me,” remembers Borshukov. The work progressed from simple optical flow to breaking oflow into segments for use in compression.
From there he then studied at Berkley working under Paul Debevec on the famous “The Campanile Movie” (shown at SIGGRAPH 1997), and contributing to image based rendering. Borshukov joined Mass Illusion in July or August of 1997. “The reason they were so happy to get me is that I had worked on undergraduate work on optical flow and they were doing all this stuff on What Dreams May Come and it was a perfect match, – I could help them with the optical flow based stuff and then after 6 months move onto the Matrix…it was the perfect fit.”
What Dreams May Come involved tracking for paint strokes, 3d elements plus warper 3-d models using the oflow. The team at Mass Illusion (which later became Manex) would go on to win two back to back Oscars first with WDMC and then the Matrix.
What Dreams May Come
In WDMC Robin Williams’ world appears to be made of painted brushstrokes. He is live-action but when he touches the foliage it behaves like it is made of paint. The background always contains a bright backlight no matter what position the camera is in. However, the foliage reacts realistically to the wind and the actor’s movements. One might assume that the shots of this painted world were made entirely in front of a green screen. However, this is not the case. The director (Vincent Ward) wanted to escape the restricted detail and cinematography that result from doing the traditional compositing techniques. He wanted to be able to shoot in a free and natural way while still creating a breathtaking effect in post-production.
The solution was to shoot all the scenes in Glacier National Park. The actor was shot naturally, with orange markers in the shot.
The actor was removed from the scene via oflow technology. Then the orange markers were used to recreate the basic movement of the camera in 3D. Then using oflow once again, oflow vector maps were created to track each pixel throughout the frames. Finally, Lidar, a laser radar technology that can be used to scan topography and create a cloud of points, was used to reconstruct the 3D information about the landscape where the scene was shot.
Then using Photoshop to do previsualization, the team would decide how to paint the different portions of the scene: mountains, sky, water, foliage. They would use color, luminance, and depth to create alpha mattes for each style to be used in a scene.
Then “Motion Paint” was used to apply the different colors and brushstrokes to the scene using the motion/spatial analysis information that had been gathered. Motion Paint takes the live footage, the Photoshop mattes, and would use optical flow processing to render an entire sequence of frames.
Lastly, the actor was placed back in the scene.
Kim Libreri explains that “initially we used the Kodak stuff – ultimately found that the Michael Black published oflow algorithms were better.” Dan Piponi as senior software engineer and George Borshukov were tasked with writing new oflow tools. “We hired George out of college to work on WDMC…Dan and George developed the oflow warper, which allowed geometry of a scene to be overlayed on the live-action and then one could warp it based on what the oflow showed happened. “You’d place the geometry on the scene based on the LIDAR scan…and the oflow would warp the geometry to match as the camera moved,” explains Libreri.
The system that the team invented involved optical flow tracking and particle brush strokes. Normally the program gave birth to burst particles and they would track along with the optical flow but importantly “you could re-seed the track – to solve the problem of oflow errors, once they got wrong – you could birth new paint stokes,” says Libreri.
The problem of tracking was made more difficult by the shoot itself and the weather. “there was all this wind blowing on set, but that’s why optical flow was the only solution, we realized that we could not use traditional tracking with all the grass and leaves blowing with this crazy wind in Montana,” explains Borshukov. “There were no real restrictions on the shoot – but it was very windy – hostile even…they did put out the odd tracking marker but only for camera match moving” adds Libreri,” and we did get some LIDAR scans.”
The final solution not only tracked brush strokes but allowed for 3D flowers and shapes to be added to the scene and then warped by the optical flow analysis. Borshukov explains: “WDMC started with Cinespeed and code from Kodak, we figured out how to get the vectors out of Cineon, but also we used some stuff written in house by Dan Poponi, called the Optical flow warper, based on stuff by Michael Black.” One of Poponi’s great innovations was in dealing with issues of occlusion – implementing an oflow system that tracked forward and backwards.
What Dreams May Come is released and wins the Oscar for best visual effects.
Another key researcher in the field at Manex was J.P.Lewis. “At ILM I developed a Fourier-domain algorithm for normalized cross-correlation. Fourier cross-correlation was a well-known algorithm from the 60s (at least), but it was not known how to do the improved normalized version in the frequency domain. ILM let me publish the algorithm, and it was accepted to a vision conference, Vision Interface 95. Since then this algorithm has spread around, including now being implemented in the Matlab image processing toolbox, etc. Also, I separately re-implemented it, and contributed the algorithm to a couple of packages including Shake and Commotion.”
JP Lewis would work with George Borshukov and Dan Piponi on the Matrix films and its sequels. While started after WDMC, this work would not be seen for several more years, and not published at Siggraph until 2003 (see below).
Kim Libreri was the bullet time supervisor. “One of the problems we had was that Larry and Andy (Wachowski) wanted the moves to start very smoothly and dolly into it…the closest you can get between the stills cameras is 7 inches and the first stills camera is next to the film camera and there you have to be like 12 inches apart so having a smooth move is impossible.”
Many of the hardest problems were flickering cameras and 600 frames a sec – a lot of stability problems and colour problems- not related to oflow.
Initially the team looked at using the Cineon system, but Kodak were in the process of putting Cineon out of business, “so it became a bit of an unstable option for us,” recalls Libreri.” So again I contacted my friends at S&W, and I got Bill – he was writing their optical flow interpreter for S&W. We’d give him a lookup table – these are all the cameras that we do have photographs for, these are the frames we need interpolations and he actually did the interpolation in London – we’d actually send him all the frames on the Internet – and he’d send us the interpolated images back – it seemed crazy. At the time we were doing the movie no one even knew what the Matrix was, and no one cared about the film – but us.”
The interpolation isn’t the only problem on the bullet time shots – the team also had issues with flickering in the camera at 600 fps “which effectively this was, you had a lot of flickering issues, a lot of stability issues – that were not really related to oflow problems,” says Libreri.
The backgrounds of the Matrix roof top were done with virtual cinematography – something Borshukov suggested when the team got to that point in solving the shot. Borshukov had worked under Paul Debevec at Berkley on the “The Campanile Movie” and knew the technique well. Libreri points out that the press is fond of telling the story that he and John Gaeta saw this Debevec landmark film at Siggraph and therefore ‘found the solution’ to the bullet time backgrounds at SIGGRAPH – but in reality, George was already on their team since WDMC and just suggested virtual cinematography. They had looked at many options to solve the backgrounds, Libreri points out they could have filmed the backgrounds with models or motion control – but they would have been hard to film. According to Libreri it was during the initial Matrix bullet-time tests that “John coined the phase UCAP. He wanted to do this universal capture, he wanted to plonk down up to 10 cameras and could interpolate from any camera to do full 3D reconstruction of the scene, the actor and the environment around them, from any angle.” This would be the basis of the next huge advance the team would make as they moved from the Matrix to doing the two sequels.
Away from the Matrix films, 1999 was also the year the 5D SloMo spark was released.
Actually the first Flame plugins were not optical flow re-timers. Gerk Husisma (now at Assimilate) produced a first implementation of an Oflow solution for a broadcasting facility in the Netherlands that was then going to be turned into a spark for Flame. “This all was somewhere before I even was hired by 5D Solutions, around 1996/7 or so,” he recalls. “The Spark I made has never been released. It has been used by ‘Baby Post’, which was formerly HecticElectric in Amsterdam. They used it a lot, since it was the first and only spark using optical flow. The only other comparable slowmo software was Cinespeed (Eg: From Kodak/Cinesite).”
The work Husisma did was based on the Black/Anandan algorithm but made to be multi-processor. He then added pre-processing passes using a regular block search-based motion estimator. “The main trick when applying this to real footage, however, is in the warper for generating the interpolated image results,” he says. “No matter how good the motion estimation is, you will always miss info with respect to objects moving in front of each other. Effectively, that results in ‘holes’, which one would have to fill while minimizing visual artifacts. At the end of the day, the purpose was to make a smooth slow-motion which was pleasing to look at and not to accurately estimate the motion in an image. The main change I’ve made from the original algorithm was in how the images were warped towards each other to calculate the derivatives (gradients) for the motion estimation.”
Husisma then joined 5D, who had just made their Slomo (which was just a block search algorithm, no optical flow as commonly thought). At that time they thought they could sell the optical flow as a standalone application. This required more of an environment than just the spark, which effectively was the beginning of ‘Cyborg’ . The very first version of Cyborg shown was a plain desktop Environment with the optical flow based Slomo Tools in it. “These tools have always been part of Cyborg and never saw the light as a spark again. Then they tried to further optimize the algorithms later on, but that effectively broke the algorithm. Eg: Things like the optimizations as specified in the original paper from Black do not really work on real footage.” comments Husisma.
5D showed Cyborg at NAB 1999 for the first time to an extremely positive reception. Although the company would later fail, it was not due to the strengths or weaknesses of 5D’s product line.
In 2000 George Borshukov, Kim Libreri and Dan Poponi received an Academy Scientific and Technical Achievement Award “for the development of a system for image-based rendering allowing choreographed camera movements through computer graphics reconstructed sets”. The system also has been applied on key shots in Deep Blue Sea, Mission: Impossible 2, the IMAX film Michael Jordan to the MAX, and the Matrix.
In the UK in 1999/2000 more oflow applications were released, by companies such as RE:Vision and RealViz.
For Nutty Professor II: The Klumps in 1999/2000 Double Negative in London had to build a blue blob creature that the Buddy Love character eventually “devolves” back into. The slime/liquid effect was achieved through a bunch of Maya particles being driven by a proprietary pseudo-fluid simulation. Double Negative’s Paul Franklin recalls “It worked pretty well, but because we wanted refraction effects we were rendering in Maya’s native raytracer which at the time didn’t support motion blur.” RenderMan was able to give the team great motion blur, but no refractions back in 1999 as it was not a ray tracing shader. “After a bit of head-scratching we came across a plugin that had just been released for After Effects called ReelSmart Motion Blur – this created pixel motion vectors through optical flow analysis of the rendered elements that could be used to produce motion blur of an acceptable standard, the only drawback being that it was all 8-bit linear tiff at the time whilst the rest of the compositing pipeline was 10-bit log running in Cineon and Shake. The end result was good enough that we decided to live with whatever colorspace artifacts might arise.”
During the production of Enemy at the Gates in 2000/2001 – RealViz Retimer was released. “Up to that point we had been using Cineon’s Cinespeed tool to re-time material, but Enemy contained challenges that were well beyond what we thought that tool could achieve,” explains Franklin. The movie featured a detailed recreation of the Battle of Stalingrad during World War Two; at all times there needed to be large fires burning out of control across the city creating huge plumes of thick, swirling smoke that rose hundreds – even thousands – of feet into the sky. “There was no way that we were going to be able to shoot elements of sufficient size (it’s just flat out illegal to make that much smoke in most places) so instead we chose to shoot scaled practical smoke elements at the highest frame rates we could manage before exposure became an issue. Even this wasn’t enough to give the requisite scale so we turned to the newly-released Retimer package to slow the elements down even further. In some cases we stretched the time by a factor of ten or more – something that just wouldn’t have been remotely possible with frame averaging techniques or anything else commercially available at the time; only optical flow could give us that level of control.” The retimed elements were used extensively throughout Double Negative’s shots to great effect – “we were amazed with the quality that the technology gave,” says Franklin.
The first customers of ReTimer 1.0 included: Macguff, Tippett Studios, NHK (Japan Broadcasting Corporation), Double Negative, Blue Studio, Pixel Magic, Jim Henson’s Creature Shop, Des Werk, Giant Killer Robots, Rhythm and Hues and ILM amongst others.
In 2000 the RealViz version 1.0 of retimer was licensed and released in Inferno 4 by Discreet Logic as the Motion node in batch.
Today Double Negative continues its use of oflow tools and plugins. “Optical flow technologies have become ubiquitous in the compositor’s arsenal; the Furnace plugin suite for Shake from the Foundry has several tools that we find absolutely invaluable including Kronos (a very high-quality retiming tool) and RigRemoval, an early beta of which was used extensively in their work on Batman Begins. “In fact, it is now hard to imagine life without optical flow,” notes Franklin.
Rayz & Cinespeed
Kodak had finished selling Cineon, but in 2001 it licensed the now world-famous Cinespeed technology to Rayz. Rayz was a Linux compositing system that was briefly on the market until it was sold in turn to Apple and the rights to Cinespeed returned to Kodak. From an NAB article in 2001 released by Silicon Grail: “One tells of running a CineSpeed shot in Cineon, on an Octane, and waiting 96 hours for the result. The same shot, on the same machine, but running CineSpeed from within RAYZ, took only 10 hours. Another customer reports that re-timing a shot on his 8-processor Onyx took 16 minutes per frame; running the RAYZ version of CineSpeed on a 2-proc Linux machine was taking about 30 seconds per frame. That’s a lot of extra time available to spend (and bill) on another job, or to improve the results on the current one – right now, RAYZ is the only compositing package to provide built-in retiming (Cineon’s CineSpeed originally developed by Kodak).”
Silicon Grail also stated in the same release that RAYZ 1.0 was used on Final Fantasy and Lord the Rings. Rayz founder Ray Feeney now runs RFX in LA.
Another huge optical flow project was started in 2001, although its roots were several years earlier. In 1996-97 Lance Williams worked on, Prince of Egypt at DreamWorks (1996-97), where he supervised the development of motion blur techniques for hand-drawn animation. Peter Cucka wrote a post-process motion blur driven by a flowfield, and adapted an optical flow algorithm to create flowfields from image sequences. This work was described in a SIGGRAPH sketch. The optical flow algorithm didn’t always work on drawn animation, particularly on frames that were far apart, and needed motion blur the most. “Peter also wrote a nice pins-and-needles display of flow vectors on a coarse grid, and interactive tools for combing and styling flowfields. Saty Raghavachary wrote a nice system for generating dense flowfields from keyframe curves. Prince of Egypt (1998) is the first hand-drawn animated film to feature motion blur on character motion, so far as I know,” comments Lance Williams.
Later on, at Walt Disney Feature Animation, the team used optical flow for the Disney Human Face Project (2001-2003). “We wanted to track human facial performances in detail, ideally, without using marks. Peter Litwinowicz and Pierre Jasmin (who really pioneered optical flow for motion picture visual effects with their amazing work in What Dreams May Come) recommended Michael Black’s optical flow.”
The team arranged to get a version of this code from Dr. Black, “and it proved invaluable to our effort, speeding up our face tracking by an enormous factor. In fact, we could do respectable re-animation of the face from optical flowfield data alone.” J. P. Lewis also worked on the face-tracking project, and implemented several kinds of optical flow. Most important, J.P.Lewis contributed a concept for the use of optical flow algorithms, which he called, “drift-free” flow. Together, these technologies were exploited by a small group at Disney working on animated paintings: Jammie Friday, Chyuan Huang, George Katanics, Allison Klein, and Marta Recio. “We used Michael Black’s optical flow, and J. P. Lewis’ drift-free flow technique, to apply the idea of performance capture to the forces of nature. We extracted optical flowfields from fire, smoke, and water, and applied them to still frames, to painted images, and to animated libraries of brushstrokes. Some fun!”.
Disney Human Face Project, shown at Siggraph 2002
According to an article on the Disney Informer, a demo presented at Siggraph 2002 showed two men sitting side by side at a gaming table, pleasantly talking to one another. What’s striking is the resemblance between them. “Were it not for an obvious difference in their ages, they could be twins. But in fact, the older man is the only real actor. The photo-real 3D-CG face of his youthful clone was brought to life through a process that Lance Williams characterizes as cross-mapping. “We tracked the face of a guy in his sixties onto a guy in his thirties. We abstracted the performance of the actor in a way that allowed us to cross-map it to another face.”
This cross-mapping technique, developed by Williams’ 12-person team over those years, was initially prompted by the demands of a film that ultimately was shelved. Hoyt Yeatman, who supervised the shoot was also slated to supervise that film’s effects through Disney’s FX arm The Secret Lab, explains the challenge. The impetus was to be able to have a famous actor with a long film history “like a Sean Connery” and have a younger double that would show the actor at about 25 years of age. At first, they thought, let’s use makeup. But making someone look young is not a process of adding stuff – it requires removing stuff. Then they also tried 2D effects, but that looked like a bad burn victim. When Yeatman was brought in, he recalls, I thought we should look at the idea of doing what I would call digital makeup. We would essentially have the mature actor driving the facial performance of his double. There were a number of techniques out there, all of which were really just motion capture. No one had really conquered the idea of capturing performance.
During his extensive career, which includes tenures at Apple Computer and DreamWorks, Williams has experimented with facial animation techniques before. But the strategy of applying computer vision techniques to animation has moved his research into a new arena. “I think the time is right for the useful application of these technologies in motion pictures. They may not be ready to perform surgery, but they’re safe enough to work on cartoons!”
Williams ultimately sees the Human Face Project as the continuation of a venerable tradition. “At the dawn of motion pictures, Edward Muybridge was interested in photography as a tool for capturing data. It only secondarily was taken up by others as a sensational entertainment medium. But it’s always advanced the purposes of science, and it will continue to do so until the last producer has choked on his last cigar!”
Also in 2002 Apple buys RAYZ, but while industry observers hoped this would mean optical flow technology would move to Shake, it will be three more years until optical flow technology appears in Shake 4.0.
In 2002 optical flow was now in common use in commercials production, one early adopter was Upstairs in Miami, owner Wally Rodriguez comments: “The main thing about retimer at that time, and still now was that it either worked or it didn’t. There were some tools to “average” the motion vectors (which I believe were a sample of the actual thousands of real vectors) so that small errors could be fixed, but large ones were not so easy.”
While oflow applications made retiming commonplace, increasingly producers wanted to use it everywhere and of course, it didn’t always work, but one extremely effective use was removing the stuttering pans on 24fps material when used in 30 fps projects.
The Matrix Sequels
The first facial animation using oflow was not the Matrix sequels.
Manex acquired Mass Illusion and after The Matrix delivered in 1999 the studio worked on Mission: Impossible 2, and another show called Bless the Child. Kim Libreri recalls, “We had to do this morph transition for one of the characters, and we decided to run the oflow on a facial performance on this guy they had shot against green screen, and they had a model of this guy, so we did a quick test, we took the model of his face and ran the oflow on this model and it started talking, it was rough, and had spikes in it – it was rough but it convinced us that it could be done. The shot ended up not being used in the movie. But the test was the basis of the whole UCAP system for the Matrix sequels. “This was the big bang for us saying this is going to work this is how to do the Matrix sequels. It was enough for us to say, with optical flow, with the right 3D geometry, we can reconstruct a person’s moving face.”
Manex tried to find companies to partner with them since the team knew it would be extremely complex, the data flow from the cameras was over a gigabyte a second to disc, each night the team would backup 5 terabytes. But in the end, they needed to bring much of the work in-house due to the complexity of capturing hi-def video streams and backup each night.
Borshukov also recalls the critical Bless the Child test, “The depth wasn’t correct – but from the same camera it looked right, and that’s when we realized that if we just added more cameras we could get the depth correct.” The first test of the Universal Capture was in Feb 2000, with just 3 cameras. “We did it with dv-cams and in June 2000 – just a couple of months later, we did it with 5 HD cameras.” This first full HD test was “just the proof of concept”, in San Francisco, two years later the same rig was built in Sydney for the actual recording of the data that was used in the films themselves.
The Matrix sequels were done at the new ESC entertainment.
Borshukov served as Technology Supervisor at ESC Entertainment on The Matrix Reloaded and The Matrix Revolutions. His work was focused on leading the development and integration in the production of techniques for photorealistic animation and rendering (including faces) showcased in the Burly Brawl and Superpunch sequences. At SIGGRAPH 2003 he presented sketches in The Matrix Revealed Session on topics such as Universal Capture, image-based facial animation, measured BRDF in film production, and realistic human face rendering.
“In the Matrix sequels all this work and research came together, the marriage of the tracking in WDMC, the interpolating we did on the bullet time shots and the virtual cinematography – all came together in the Universal Capture of the Matrix sequels,” says Borshukov.
The Matrix sequels – extending optical flow use to multiple cameras and combining with photogrammetry for full 3-d motion reconstruction (including animated texture extraction):
• UCAP first three DV cam test (Feb 2000)
• Proprietary oflow developed by Dan Piponi based on Michael Black’s work (Spring 2000)
• Five Sony HD camera test with Keanu Reeves and Hugo Weaving (Summer 2000)
• Proof of concept results presented to Warner and Wachowski Brothers for greenlighting VFX work in Matrix sequels (November 2000)
• Matrix 2&3 Universal Capture shoot in Sydney (May 2002 this is what the pictures are from)
• Burly Brawl in Matrix Reloaded – 3-d head replacements and entirely computer-generated sequences using UCAP heads for Agent Smith, Neo, Morpheus (Spring 2003)
• Superpunch in Matrix Revolutions – first full-frame computer-generated close up of a real actor (Hugo Weaving) in a feature film (work done Summer 2003). The actor’s performance reconstructed using optical flow and augmented with additional deformations for dramatic effect.
HiDef Capture Setup
The team used a carefully placed array of five synchronized cameras that captured the actor’s performance in ambient lighting. For the best image quality, they deployed a sophisticated arrangement of Sony/Panavision HDW-F900 cameras and computer workstations that captured the images in uncompressed digital format straight to hard disks at data rates close to 1G/sec.
Optical Flow Photogrammetry
Optical flow was used to track each pixel’s motion over time in each camera view. The result of this process is then combined with a cyberscan model of a neutral expression of the actor and with a photogrammetric reconstruction of the camera positions. The algorithm works by projecting a vertex of the model into each of the cameras and then tracking the motion of that vertex in 2-d using the optical flow where at each frame the 3-d position is estimated using triangulation. The result is an accurate reconstruction of the path of each vertex through 3-d space over time.
Keyshaping, Adapt, Removing Global Motion
Optical flow errors can accumulate over time, causing an undesirable drift in the 3-d reconstruction. To minimize the drift the team made use of reverse optical flow. The problem was eliminated by introducing a manual key shaping step: when the flow error becomes unacceptably large the geometry was manually corrected and the correction is then algorithmically propagated to previous frames. The reconstructed motion contains the global “rigid” head movement. In order to attach facial performances to CG bodies or blend between different performances, this movement must be removed. We estimate the rigid transformation using a least-squares fit of a neutral face and then subtract this motion to obtain the non-rigid deformation.
Texture Map Extraction
For believable facial rendering to be done one needs to vary the face texture over time. The fact that the team did not use any markers on the face to assist feature tracking gave the important advantage that they could combine the images from the multiple camera views overtime to produce animated seamless UV color maps capturing important textural variation across the face, such as the forming of fine wrinkles or changes in color due to strain, in high-res detail on each side of the face.
Although the extracted facial animation had most of the motion nuances it lacked the small-scale surface detail like pores and wrinkles. They obtained that by using a highly detailed 100-micron scan of the actor’s face, the detail is then extracted in a bump (displacement) map. Dynamic wrinkles were identified by image processing on the texture
maps; these are then isolated and layered over the static bump map. These were then combined with image-based skin BRDF estimation and subsurface scattering approximation.
In 2003 Pixel farm released PFTrack, built on some of the principles of the oflow program Icarus. Icarus as a separate product had the ability to write out optical flow data in the FLO format, or as a floating-point TIFF file.
Here’s a quote from the Icarus manual: 2.2.2 The FLO Optical Flow file format
One ability of the calibration component in the Icarus system is to calculate the optical flow throughout a video sequence.
As this may be of use for other applications, it is possible to save this data using the FLO file format: #FLOx
The first line of the FLO file must contain the #FLO identifier followed by either an a for ascii or b for binary (e.g. #FLOa or #FLOb). The next two lines contain the width and height of the image. Following these lines, the optical flow data is stored.
For binary FLO files, this data is stored as two 4 byte binary floating-point numbers followed by a one-byte flag per pixel. The floating-point numbers representing the magnitude of flow in the x and y directions respectively, and the flag is 1 if there is a discontinuity at the pixel, and 0 otherwise. The floating-point numbers are stored in the standard Big-Endian Motorola binary floating-point data format. For ASCII FLO files, the movement of each pixel is stored using two ASCII floating-point numbers. The flag is a single 1 or 0, as described above.
Optical flow data can also be written in floating-point TIFF format. In this case, the red and green channels of the TIFF image are used to encode the x and y pixel motions respectively. Each motion value is offset by 1.0e 06 to ensure that only positive values are stored in the image. The blue channel is used to encode flow discontinuities and is non-zero where discontinuities occur.
PFTrack, launched in 2003, quickly established itself as the Match Mover of choice for many high-end visual effects productions it included integrated optical flow, geometry tracking and per pixel Z depth extraction. Pixel Farm products and PFTrack, in particular, is used at companies Sony Imageworks, Riot, Stan Winston Studios, Entity FX, LipSync, Senate VFX, MPC, and Animal Logic
2004: Brox, Bruhn, Papenberg, and Weikert introduce accurate optical flow method based on warping
2005: Roth and Black use first learned model of optical flow statistics
Apple beta Shake 4 with optical flow
If you liked this story please check out