How useful is the Apple iPhone 12 to VFX?

There is an old saying that the best camera is the one you have on you. For doing VFX work, especially on location, it is standard to have a set of tools including an SLR camera, measuring equipment, and various methods of sampling such as LiDAR, so you have reference images, dimensions, and textures for later post-production work. While this is desirable, it is also the case that you sometimes just need to grab what you can with a lot less planning. We road-tested the Apple iPhone 12 Pro Max to see how valuable it could be when it is all you have, and you need to gather material, LiDAR and reference.

Dolby Vision Video

One of the most significant aspects of the new top of the line iPhone is that it provides end to end Dolby Vision. This translates to stunning footage when filmed and played back on the iPhone. For non-professional users this is great, but in VFX it is likely that you will need to export this footage, use it and manipulate it away from the iPhone itself.

Simply put there are three reasons the footage you shoot on the iPhone 12 Pro Max looks so good

The phone has a great lens including sensor stabilization (yes the sensor itself moves)
The content is captured in Dolby Vision 10 bit, with the wide color gamut that implies, and
The screen on the iPhone displays at 1200 nits of brightness, so the small (but large by phone standards) screen just pops with bright dramatic imagery.

We took the iPhone (rather bravely we thought) out sailing to capture some test footage. Note the video below is not able to be posted here the way it looks on the iPhone as the video needs to be 8 bit and your screen is unlikely to be a Dolby Vision level high nits screen. Even the new Macbook Pro with the M1 Chip only comes with a 500 nit screen).

This video shows the top of the line iPhone 11 compared to the iPhone 12, but its post-production path is even more interesting. The video embedded here is not HDR, but it does still benefit from the smarts in the camera and the high-resolution capture pipeline.

Dolby Vision is one of the primary formats for high dynamic range (HDR) content and it works by storing dynamic metadata per scene, which controls how the screen is viewed. The metadata can vary depending on the conditions in the scene and can vary as frequently as frame by frame if needed. Dolby Vision is thought to be better than HDR10, the other main format, as HDR10 is a static format. Dolby Vision provides scene-by-scene instructions for Dolby Vision-capable displays such as the iPhone 12. This means that the iPhone can adjust for each individual scene, portraying the HDR content as accurately as possible. iPhone 12 only shoots in 10-bit HDR when shooting in Dolby Vision versus the format’s support for 12-bit color depth used in professional grades, but 10-bit is vastly better than 8-bit SDR and one can AirPlay Dolby Vision video to a compatible display or Apple TV. 8-bit capture supports around 16.7 million colors, the IPhone’s 10 bit supports 700 million colors, and full Dolby Vision supports a vast 68.7 billion colors.

The new iPhone Pro Max has a 47% larger sensor, there is now 7 not 6 elements to the lens, improved lens correction and the optical shift solves vibration remarkably well. All the shots above are just handheld. The vibration system now samples at 5000 times a second.

One should note really nothing looks better than this footage natively on the iPhone. It is jaw-droppingly good, rich, smooth and bright. Apple makes this process seamless and easy, but the further one moves away from the iPhone playback the more reality bites. It is possible to immediately edit the footage on the iPhone but assuming you need it for VFX this will not be sufficient. Apple makes it easy to open any clip in iMovie, file transfer is seamless and via Apple’s Photo App all the clips are loaded in the background with all the Dolby Vision Metadata and HDR settings. Similarly, you can Airdrop the files quickly to another Mac or Apple device. The images will not look quite as good on your high-end Mac, only because the screen on the iPhone is so bright (1200nits). If one stays in the Apple apps everything is preserved, but you may need to export them to use them in your production. We exported the files deliberately to a non-Apple application daVinci Resolve. Here we can take advantage of the higher dynamic range and grade with much higher confidence, knowing we can avoid banding and normal 8-bit compression artifacts. The iPhone 12 is able to share video in the format most appropriate for a given destination. This means that you can enjoy the video in Dolby Vision on a device like iPhone 12, and iPhone can automatically provide an SDR version of that video when sharing to media services that don’t yet support Dolby Vision. At the launch of iPhone 12, users can edit Dolby Vision content, directly on iPhone 12. Likewise, Final Cut Pro X is able to import, edit, and export Dolby Vision content from iPhone. Software like daVinci Resolve or Adobe Premiere is able to properly import Dolby Vision videos shot on iPhone 12, but will only see the underlying HLG HDR video data. “Software like Resolve is able to support Dolby Vision grading and can reconstruct + supplement the Dolby Vision metadata depending on the desired creative output” explained Taeho Oh, VP Imaging Business, Dolby.

Images with Strong highlights and dynamic rnage are automatically scaled to SDR on export

We picked sailing for this test as the high-frequency waves and the broad gradated sky both provide difficult material. In years gone by the sky would display very strong banding, and the water would seem pixelated as the high-frequency reflections would overload the encoding bandwidth. And it is worth noting that we also did do an HDR edit of this material and exported it and uploaded it to an HDR Youtube video. This video does start to break down with artifacts – but this is due to the number of times we encode and re-encode the video. A single shot is recorded with some compression in the iPhone, but then re-compressed on export, edit, and finally again on upload to YouTube. This many compression stages are the problem, and not because of the master Dolby Vision encoding, it is just unwise to continuously recompress the same footage over and over in any pipeline.

The video not only looks impressive but it can shoot 4K at 60 fps in Dolby Vision. From a technical perspective, Dolby Vision, as a separate technology, can represent up to 10,000 nits of peak brightness, 12-bit color with any gamut, and features dynamic metadata. “Dolby Vision is now also a very robust ecosystem encompassing hundreds of millions of consumer devices in the market, a wealth of content spanning thousands of movies and TV episodes, and support from Hollywood’s leading professionals and professional tool providers. Now with iPhone 12, billions of videos created by consumers worldwide will benefit from Dolby Vision and further expand the ecosystem,” commented Oh.

While the iPhone captures at 10bit, “All internal processing operates in 12-bits or greater precision, and Dolby Vision can also feature effective signal compression technology to pack 12-bit video into 10-bit transport codecs without compromising the picture quality,” Oh explains. “Dolby Vision provides multiple options for compression ranging from providing maximum efficiency to maintaining backwards compatibility. In its maximum efficiency compression mode, it is about 10% lower in bit rate compared to a baseline HDR such as HDR10 while being able to maintain the same perceived picture quality. It is also resolution and framerate agnostic”.

One of the fundamental benefits of Dolby Vision is that it preserves artistic intent across display technologies so that the content will look as close as possible to what was shot or what was creative envisioned. Obviously, more capable display, like the iPhone 12’s Super Retina XDR display—are able to provide a more compelling and lifelike feel with HDR video. “In the end, this means that watching movies, TV shows, or videos captured in Dolby Vision on an iPhone 12 will look amazing because the content is able to fully utilize the capabilities of this device and display” Oh points out.

Stills: HDR and RAW

The photography on the iPhone uses Apple’s Deep Fusion, which samples from 9 different samples on a pixel by pixel basis, Smart HDR 3 with a high Gain Map which means you are seeing on an HDR display 3 extra stops.

An HDR image reduced to 8 bit on export (ungraded)

Deep Fusion was launched last year with the iPhone 11 with iOS 13.2. Deep Fusion is a new computational photography process, which blends together multiple exposures at the pixel level in order to create a photograph with an even higher level of detail than standard HDR. This means even more detailed textures in things like clothing skin and pore level detail. The machine learning process looks at every individual pixel before combining, and it takes less than a second to process, but as Deep Fusion happens in the background few people are aware of it working, – just the crisp final results from it.

The stills in the iPhone are brilliant, but they are encoded to look brilliant. This means there is no control on colorspace white points or tints. Apple has announced that it is going to support an APPLE ProRAW mode which will bring the iPhone to a level of exceptional value to VFX professionals.

The iPhone 12 Pro and iPhone 12 Pro Max will be the first iPhones to use Apple ProRAW format as soon as next month, although the exact timing of the release is unknown. The ProRAW mode is completely unprocessed and thus rather large in size and complexity. This is very different from the JPG and HEIF formats used today. The HEIF format allows for burst modes (Live photos) and of course, the iPhone can also shoot HDR stills. Apple ProRAW is expected to be a hybrid format and not a pure RAW format. This is because the iPhone does a lot of clever things to produce such strong images, things that are much more than applying a color space and storage encoding. Apple says that the ProRAW won’t skip the clever multi-frame processing technology of the iPhone such as Deep Fusion.

Alok Deshpande, Apple’s Senior Manager of Camera Software Engineering, explained when ProRAW was first announced that it, “provides many of the benefits of our multi-frame image processing and computational photography, like Deep Fusion and Smart HDR, and combines them with the depth and flexibility of a raw format. In order to achieve this, we constructed a new pipeline that takes components of the processing we do in our CPU, GPU, ISP, and neural engine, and combines them into a new deep image file, computed at the time of capture, without any shutter delay. And we do this for all four cameras, dynamically adapting for various scenes while maintaining our intuitive camera experience.”

ProRAW editing will occur in the Photos app on your iPhone, but it will also be available in other third-party apps thanks to a new Apple API, which lets third-party camera apps work in ProRAW format.

Computational photography

As Deshpande touched on there is a great deal of computational photography done compared to a traditional DSLR. By traditional logic the photos and video on the iPhone should be nowhere near as impressive as they are. The lens on an iPhone are tiny compared to L-series Canon lens, and each Canon lens would normally cost more than the whole iPhone. An insight can be gained on the power of the Neural Engine that is at the heart of much of this innovation by looking at the new MacBook Pro laptops with the Neural Engine on-chip. This is because the M1 chip comes from the iPhone and iPad internal processor team at Apple.

Typically Apple is not seen as a market leader in AI, but this may be more optics than reality. While many other AI players are posting open-source libraries, Apple tends to use its extensive AI capabilities internally and AI now permeates nearly every feature on the iPhone.

Importantly while much is done on the Bionic A14 chip, traditionally Machine Learning (ML) is done on the cloud. This may lead to ML being the Killer App that first replies on the new 5G feature of the iPhone 12. Very fast and possibly Edge Compute solutions may first materialize as ML enabling technologies. 5G right now is a promise and ML may be the ‘use case’ that truly replies on it.

The iPhones have long included image signal processors (ISP) for improving the quality of photos, but Apple accelerated the process in 2018 by making the ISP in the iPhone work closely with their Neural Engine.

Lessons and clues from the new M1 Laptops

The M1 is the first processor built by Apple for their MacBook Pros and MacBook Air, but not the first processor Apple have made, as the iPhone and iPad all have Apple-designed and manufactured silicon. The M1 is not a part of the iPhone which has the Bionic A14 chip but it gives a very clear indication how Apple is approaching new chips and the growing importance of Machine Learning and the Neural Engine. For example, the iPhone takes multiple pictures in rapid succession when the user taps the shutter button, whic allows its ML-trained algorithm to analyze each image and composite what it deems the best parts of each image into one result.

The M1 chip makes the MacBook Pro 13″ insanely fast. Much of this comes from the new Unified Memory Architecture (UMA) which changes the way memory is used. Instead of separate CPUs, GPUs and AI (Neural Engine) RAM, the one core set of 8G or 16G RAM is shared. There is no need to copy data between processing units.

Focusing just on the Neural Engine, the story is even more innovative and bodes well for the future. With the new M1 chip, Apple has forked TensorFlow, the Google created open-source AI/ML platform. Previously, Apple was using TensorFlow 2.3, but the new 2.4 branch runs much faster on any Apple hardware, and has been specially optimized for the M1 chip. The M1 neural engine is able to execute 11 trillion operations per second. This means the 16 core Neural Engine is vastly faster than previous MacBook Pros at doing machine learning and the advanced computational photography that machine learning allows. The Tensorflow framework was built for the x86_64 architecture and Nvidia GPUs so the Apple fork is particularly important. The fork is currently available as an open-source, and requires MacOS 11.0 or better, and provides accelerations on Macs running the new M1 processor. Existing TensorFlow scripts run as-is with the fork and they do not need to be reworked to take advantage of its performance gains. As a side note, this allows for very fast new ML Adobe processes in the new Beta of Photoshop and is very relevant to visual effects TDs using the important StyleTransfer algorithms (see below, note smaller (faster) times are better).

The 5nm technology M1 MacBook is remarkable but for VFX it is possible that a larger version may be coming out in the middle of 2021. This will hopefully be even larger in memory capacity (and have a new higher resolution camera – the current in-built camera is only 720P). The current MacBook Pro version maxes out at 16 Gig of RAM, but given the UMA it is hard to directly compare that to the RAM of the Intel models. We have tested the M1 Laptop and it is significantly faster doing CNN image processing – we have seen 8 to 9 times speed improvements, and Apple has clocked up to 15x performance boosts and nearly 11 times faster inference performance.

LiDAR

Machine learning is used in augmented reality and with the LiDAR scanning of the iPhone 12. The hard problem here is called SLAM, Simultaneous Localization And Mapping. It allows the iPhone to use its new LiDAR scanner to build up a 3D model of what is scanning. The most common SLAM systems rely either on optical sensors, such as visual SLAM (VSLAM, based on a camera), or are LiDAR-based (Light Detection And Ranging) SLAM.

For visual effects, this expands again greatly the usefulness of the iPhone12. Taking a LiDAR scan on an iPhone is not as accurate as using a major LiDAR system, but it is insanely convenient and allows for set and location elements to be measured and visualized. Many VFX companies have access to LiDAR scanner such as the Faro 3D Focused Laser LiDAR but having access to the company scanner and just happening to literally have it in your back pocket when you need it, are two very different things.

There are a number of LiDAR apps now available on the Apple App store, while we have not managed to review them all, we prefer the 3DScannerApp.

Here is an empty set that we scanned in the 3DScannerApp at 4K (video is 1080P)

The LiDAR scanner allows both spacial reference and a 3D model to be scanned. In this example, the doorway that a 3D character will be walking through is scanned. The model captures textures and dimensions.

Note that the model can be exported in a variety of formats and also used to check relevant dimensions using the virtual tape measure. For a system that you always have with you, the LiDAR is remarkable.

The iPhone 12 Pro Max is not a cheap phone but compared to the incredible tools it provides it is invalubale. With the release of ProRAW the iPhone 12 will no doubt become a common sight on sets around the world.