Amid this past week’s controversies in AI over regulation, fears of world-ending doom, and job disruption, the clouds have briefly parted. For a brief and shining moment, we can enjoy an absolutely ridiculous AI-generated video of Will Smith eating spaghetti that is now lighting up our lives with its terrible glory.
On Monday, a Reddit user named “chaindrop” shared the AI-generated video on the r/StableDiffusion subreddit. It quickly spread to other forms of social media and inspired mixed ruminations in the press. For example, Vice said the video will “haunt you for the rest of your life,” while the AV Club called it the “natural end point for AI development.”
We’re somewhere in between. The 20-second silent video consists of 10 independently generated two-second segments stitched together. Each one shows different angles of a simulated Will Smith (at one point, even two Will Smiths) ravenously gobbling up spaghetti. It’s entirely computer-generated, thanks to AI.
And you will see it now:
We know what you’re thinking: “Didn’t I see this kind of advanced deepfake technology in 1987‘s The Running Man?” No, that was Jesse “The Body” Ventura defeating a fake Arnold Schwarzenegger in a dystopic game show cage match, set somewhere between 2017 and 2019. Here in 2023, we have fake Will Smith eating spaghetti.
This feat is possible due to a new open source AI tool called ModelScope, released a few weeks ago by DAMO Vision Intelligence Lab, a research division of Alibaba. ModelScope is a “text2video” diffusion model that has been trained to create new videos from prompts by analyzing millions of images and thousands of videos scraped into the LAION5B, ImageNet, and Webvid datasets. That includes videos from Shutterstock, hence the ghostly “Shutterstock” watermark on its output.
AI community HuggingFace currently hosts an online demo of ModelScope, although it requires an account, and you’ll need to pay for compute time to run it. We tried to use it but it was overloaded, likely due to Smith’s spaghetti mania.
According to chaindrop, the workflow for creating the video was fairly simple: give ModelScope the prompt “Will Smith eating spaghetti” and generate it at 24 frames per second (FPS). Next, chaindrop used the Flowframes interpolation tool to increase the FPS from 24 to 48, then slowed it down to half speed, resulting in a smoother video.
Of course, ModelScope isn’t the only game in town regarding the emerging field of text2video. Recently, Runway debuted “Gen-2,” and we’ve previously covered early text2video research projects from Meta and Google.
Since Will Smith eating spaghetti became a viral hit, the Internet has been graced with follow-ups such as Scarlett Johansson and Joe Biden eating spaghetti. There’s even Smith eating meatballs, a video that is perhaps actually truly horrifying. But it’s still great somehow—perfect future meme fodder.
Of course, once the outputs of these text2video tools get too realistic, we’ll have other issues to deal with—deep social and cultural issues, likely. But for now, let’s enjoy ModelScope’s imperfect, horrible glory. We apologize in advance.