Meta has recently unveiled its own media-centric AI model, named Movie Gen, designed to create realistic video and audio clips.
The company showcased several 10-second video snippets produced with Movie Gen, featuring a baby hippo reminiscent of Moo Deng frolicking in water, to highlight its potential. Although the tool is not yet publicly accessible, the announcement of Movie Gen follows closely after Meta’s Connect event, which unveiled new and updated hardware along with the latest iteration of its large language model, Llama 3.2.
Beyond simply creating basic text-to-video clips, the Movie Gen model also allows for precise modifications to existing videos, such as placing an object into someone’s hands or altering the look of a surface. In one of the demonstrations released by Meta, a woman wearing a VR headset was edited to appear as if she was sporting steampunk binoculars.
A video generated by AI from the prompt “make me a painter.”
An AI-generated video was created from the prompt “a woman DJ spins records, dressed in a pink jacket and oversized headphones, with a cheetah beside her.”
Alongside the videos, audio snippets can be crafted with Movie Gen. In the sample reels, an AI-generated man stands by a waterfall, the sound of splashes merging with a hopeful symphony; the roar of a sports car and screeching tires fill the air as it speeds around a track, and a snake slithers on the jungle floor, accompanied by tense horn sounds.
Recently, Meta published a research paper detailing more about Movie Gen. Movie Gen Video is built with 30 billion parameters, while Movie Gen Audio has 13 billion parameters. Generally, a model’s parameter count indicates its abilities; in contrast, the largest version of Llama 3.1 has 405 billion parameters. Movie Gen can generate high-definition videos lasting up to 16 seconds, with Meta asserting it surpasses competing models in video quality.
Earlier this year, CEO Mark Zuckerberg showcased Meta AI’s Imagine Me feature, allowing users to upload their photos and envision themselves in various scenarios, illustrated by an AI image of him drowning in gold chains on Threads. A video variant of a similar feature is feasible using the Movie Gen model, akin to a more advanced version of ElfYourself.
What information has Movie Gen been trained on? The details are somewhat ambiguous in Meta’s announcement, which states: “We’ve trained these models on a combination of licensed and publicly available data sets.” The sources of training data and what’s acceptable to scrape from the web continue to be a debated topic for generative AI tools, and it is seldom disclosed what text, video, or audio clips were utilized in the creation of major models.
It will be intriguing to see the timeline for when Meta will release Movie Gen to the public. The announcement hints at a “potential future release.” In comparison, OpenAI disclosed its AI video model, named Sora, earlier this year, yet it has not been made public nor has any release date been provided (although WIRED did obtain a few exclusive clips from Sora for an investigation into bias).
Given Meta’s background as a social media company, it’s plausible that features utilizing Movie Gen might gradually emerge within Facebook, Instagram, and WhatsApp. In September, rival Google announced plans to release certain aspects of its Veo video model to creators in YouTube Shorts sometime next year.
While major tech companies are currently hesitating to fully launch video models to the public, you can experiment with AI video tools available from smaller, emerging startups like Runway and Pika. Consider trying Pikaffects if you’ve ever wondered what it would look like to see yourself cartoonishly crushed under a hydraulic press or dramatically melting into a puddle.