Tracing the Technology Origin of a Presidential Candidate Deepfake
The recent tweet of a doctored photo, turned into a GIF and nicknamed “Sloppy Joe”, of US presidential candidate Joe Biden has prompted controversy over whether the image qualifies as a deepfake, which would make it the first used in a US election cycle. President Trump’s Twitter account retweeted the original post shortly thereafter which resulted in debate over whether the GIF was a deepfake, what the intention was, and whether it violated Twitter’s synthetic and manipulated media policy. Several prominent press outlets, including the Atlantic and Motherboard, went back and forth whether this was the first documented deepfake use in US politics.
We investigated the technology and timeline for Mug Life, the platform that the Twitter account used to generate the Biden GIF, and determined they are in fact using deepfake technology. However, Twitter’s removal policy for manipulated media requires content to be synthetic or manipulated (which it doesn’t clearly define) and involve a real-world harm, while other platforms require the involvement of AI or machine learning (along with the risk of real world harm). Attributing Mug Life’s technology backbone is a helpful first step for social media platforms to determine whether a particular piece of content is violative of their policies.
A Mug Life press release from October 2017 for its iOS app stated that it uses technology “featuring deep neural networks” and “marries decades of video game expertise with the latest advances in computer vision.” This immediately suggests they are using at least one component of deepfake creation – training a neural network (and typically with machine learning or AI). Additionally, the Mug Life technology focuses on the synthetic media that are generally associated with deepfakes: high definition videos, stills (i.e. photos / images), animated GIFs, and animated Facebook avatars.
The Mug Life media creation process, claims to use deep neural network computer vision technology to analyze and compose photos in what they called their “Deconstruction stage,” which again suggests they are using deepfake technology. According to a November 2017 Nvidia press release (referencing the deconstructive stage) “Mug Life used a TITAN X GPU and cuDNN to train their deep learning models to analyze and decompose photos into 3D building blocks: camera, lighting, geometry, and surface texture.” The GPU and the CUDA toolkit are both core components for deepfake creation, dating all the way back to the original deepfakes technology used on Reddit in late 2017.
The timeline examination for the Mug Life app and deepfake video origination reveals an interesting point, in that Mug Life appears to have developed the technology and released it prior to the creation of the original r/deepfakes Reddit forum where these “officially” began in December 2017.
Going back to the earliest posts on the subreddit r/deepfakes from web archives, we can see the original deepfakes creator “u/deepfakes” discusses and compliments another user for creating a GUI tool accessible to people without coding knowledge in early January 2018. In the discussion for using this tool called FakeApp, which is a desktop app for creating deepfakes, both CUDA 8.0 and cuDNN software were used or tested – toolkits developed by Nvidia that provide a development environment for creating high performance GPU-accelerated applications for deep learning and video and image processing. This is the same technology used by Mug Life in their Deconstruction stage, and is still very much used in today’s deepfake apps.
Deepfakes require intense GPU use in the model training process for higher quality output. Web archives for Reddit u/deepfakes posts additionally reveals numerous discussions about GPU usage, such as on 16 December 2017 when a Reddit user posted “my GPU usage is sometimes above 90%, sometimes 0%. However, the VRAM usage is over 7GB” and that “someone should sell GPU cloud for this”. TITAN X GPU, another hardware technology used in Mug Life’s Deconstruction stage, is a graphics card also developed by Nvidia and designed for high-demand gaming and VR technologies.
The findings discussed above suggest that indeed Mug Life is using the same technology that is the standard backbone for deepfakes, particularly the same the original FakeApp and u/deepfakes videos were built on. While the Biden GIF does not likely violate Twitter’s manipulated media policy, since it also requires either deceptive context or intent to cause serious harm, we at least have some insight into where the technology comes from. Additionally, it’s important to note that this was a manipulated photo (it appears) and not a video where the largest concern for election disinformation tends to focus.
It is important for social and news media companies, particularly when they are responsible for monitoring content, to understand the technology and even original intent for the software tools generating manipulated media migrating to their platforms. As deepfake technology evolves and improves it will be critical to understand the actors and networks developing and selling these services. A detailed understanding of the actors developing the technology will help shed light on the intent and communities of end users that might indeed have disinformation or other e-crime intentions to cause harm.