With well over 6 million views since its mid-February release, YouTuber EZRyderX47’s Back to the Future deepfake video, with Robert Downey Jr. and Tom Holland seamlessly replacing Christopher Lloyd and Michael J. Fox, has become quite the viral sensation. The video is brilliantly done, from the lip-sync to the anything but uncanny eyes; the choice of films, and clip, was inspired as well, a welcome window into a new riff on a Hollywood classic. Produced using two readily available pieces of free software – HitFilm Express, from FXhome, and Deepfacelab – the startingly believable piece instantly conjures up all sorts of notions, both wonderful and sinister, regarding the seemingly unlimited horizons of AI-enhanced digital technology. If today’s visual magicians can create any image with stunning photoreal clarity, what, dare we ask, can propogandists, criminals and other “bad” actors do with the same digital tools? Ah, so nice to find a new target, if for only a few minutes, for our coronavirus-stoked paranoia.
If you watch AWN’s exclusive interview with AI expert Andrew Glassner at FMX 2019, not only will you get a great overview of AI, neural networks, and machine learning fundamentals, but… you’ll come away afraid… very afraid.
For the film’s director, François Brousseau (aka EZRyderX47), the underlying technology points to a limitless creative future. “With these tools, I can create an almost infinite number of parallel universes,” he gushes. “I can revive great actors from the past. I can put actors-of-now into movies of the past. It is almost a limitless magical universe.” For Josh Davies, CEO of HitFilm Express creator FXhome, the technology helps level the creative playing field, enhancing competition by enabling smaller studios to produce more impactful work. “It will enable more of the things that take time and effort, so that smaller teams can achieve the level of quality that larger teams have,” he notes. “Larger teams will then also be able to use these tools to produce even more amazing imagery and benefit from a better workflow. In short, what’s good will be made better.”
So, what’s a deepfake video? How are they made? How can they be detected?
The use of digital technology to replace someone in an image or video has been around for some time, from simple Photoshop morphs to elaborately crafted films like Forrest Gump. More recently, we’ve seen a slew of digital characters, both replaced and de-aged, from Carrie Fisher as Princess Leia to Samuel Jackson as Nick Fury. But, with the rapidly expanding integration of AI in VFX methodology and production, coupled with fast, AI-enabled GPUs, today’s replacement technology has taken a significant leap in sophistication. Case in point, Martin Scorsese’s recent gem for Netflix, The Irishman, made use of cutting-edge AI-backed digital tools developed by ILM. Their software, ILM Facefinder, used AI to sift through thousands of images from years’ worth of performances by actors Robert DeNiro, Al Pacino, and Joe Pesci, matching the camera angles, framing, lighting and expressions of the scene being rendered. This gave ILM artists a relevant reference to compare against every frame in the live-action shot. These visual references were used to refine digital doubles created for each actor, so they could be transformed into the target age for each specific scene in the film. The results were dramatic, allowing the actors, all in their 70s, to be transformed back into their 20s, something not possible using even the best makeup techniques.
The term “deepfake” derives from the notion of “deep learning,” a branch of machine learning, which is itself part of the AI world, and the notion of “fake,” which is to say, a counterfeit or forged version of something. With new methods of channeling enormous computer processing power to analyze massive amounts of data, AI is being harnessed more and more to visualize that data; by analyzing lots of images of a person’s face, for example, software can us AI and machine learning to get really good at understanding what that face really looks like, down to the pixel level, and how it can then be manipulated and recreated in new ways with a high degree of accuracy. With a deepfake video, the software can get good not only at analyzing and learning about the face you want to recreate, but it can also get good at understanding the image or video you want to transpose that face onto. Given the time to properly learn both faces, AI-enabled software can digitally create a face that overlays the two sets of learned data, placing the new face onto the old. Tom Holland becomes Michael J. Fox!
Brousseau has been releasing deepfake videos for some time; earlier efforts included replacing Nicolas Cage with Keanu Reeves in Ghost Rider, and Jim Parson’s Sheldon Cooper suddenly sporting Jack Nicholson’s smiling Joker face in an episode of The Big Bang Theory. The Back to the Future video is his best yet.
For the director, the process begins with the faces. “To start, I had to build the 2 facesets,” he explains. “I had to find all the angles of the actors’ faces. I found images from interviews and films. I used HitFilm to cut the scenes where the faces were at their best. I then extracted and cleaned the faces using Deepfacelab tools. I deleted the problematic images — blurred images, obstructed faces, bad angles, etc. This part took me around two or three days. This step is not very difficult, but it takes a long time.”
“You need to collate a high-quality image database, using images of the person that you’re trying to deepfake,” Davies adds. “They will also need to have some knowledge – you can’t simply rip every single photo from all kinds of footage. There are a number of things that will compromise a deepfake – for example, if an actor has a beard in one set of images and not in the other, if some images are lower resolution, are blurrier, etc. Currently, there is still a degree of manual process needed to find the best images. Of course, in the future we hope this will be AI enhanced, and they will be able to automatically identify images of the same people, and also the best kinds of images to use for deepfakes.”
Finding a “deepfakable” scene with both actors side by side is a critical, and difficult step. “I tested several scenes before finding the right one, and it took me about a week of trial and error to get it right,” Rousseau reveals.
The face detection phase of the scene was also challenging. “Over several frames, Deepfacelab had difficulty detecting the correct angles of faces and obstructed faces,” he goes on to describe. “I had to do the work manually. I also had to add a mask on some frames where the faces were obstructed. This part can be tricky, and it took me one or two days of trial and error.”
With the scene and face data in hand, Rousseau brought on the AI tool. “At that point, I started training artificial intelligence,” he states. “I tried two architecture models: the DF and the LIAE. The DF was problematic, but the LIAE was doing a pretty good job. This part took me around four or five days per face; it was time consuming but pretty easy.”
Once the AI learning was over, he used Deepfacelab to convert the images to an MP4 video, performing several tests with different parameters. It took him one day to process both faces.
Then, he editing the video using HitFilm Express. “I used the video transition effect ‘Fade to color,’ as well as the audio transition ‘Fade,’” he shares “At 0:28 of the video, there is a guy who passes in front of Doc for 2 frames and Deepfacelab wasn’t able to correctly render it. I had to use a mask of RDJ’s face provided by the Deepfacelab software. I took the mask from a frame before the guy passes and I put it on the 2 problematic frames. Then I used the ‘Blend – Darken’ effect on the mask for the hairs of the guy to be visible. It took me about a day and the masking part was pretty tricky.”
After watching the Back to the Future deepfake a couple times and marveling at its sophisticated visual trickery, you may say to yourself, “Of course it’s fake. I’ve seen the original. But… what if I hadn’t? How would I ever know which is which?” According to Davies, there are ways to spot a deepfake. “At the moment, the main places you can see telltale signs of manipulation are on the edges of what it’s replacing,” he says. “Generally, you can see this in the central two-thirds of the face, including shadowing around the chin area and where the forehead meets the hairline.” You can also find issues caused by a limited set of facial perspectives in the sampled facial datasets. “Deepfake generally works better on front angles of the face,” Davies continues. “A way around this of course is to ensure that your actor doesn’t move much, or turn his face too far from the camera. But again, AI technology will do a far better job of looking at this – it is likely they will be able to see the discrepancy in a single pixel, which far surpasses what the human eye can detect.”
When asked about the arms race already begun between those creating, and those trying to uncover, deepfake videos, Davies is optimistic “good” will triumph over “evil.” “It has been often assumed in the past that advancing technology will spell the end of humanity,” he muses. “This has never really been evidenced but it continues to be in the forefront of many people’s minds when presented with something new. Quite simply put, more money and resources will be put into working out what has been created by AI, rather than the creators making it in the first place. This is because those wanting to distinguish between real and fake life, will be supported and backed by governments, by insurance companies and industry, who want to identify anyone using this for nefarious reasons. Even now, we can see that deepfakes are being ‘uncovered,’ and those fighting the manipulation of imagery will always be a step ahead of the latest deepfake tech.”
Dan Sarto is Publisher and Editor-in-Chief of Animation World Network.