Meta's New 'Movie Gen' AI System Can Deepfake Video From a Single Photo (arstechnica.com) 38
An anonymous reader quotes a report from Ars Technica: On Friday, Meta announced a preview of Movie Gen, a new suite of AI models designed to create and manipulate video, audio, and images, including creating a realistic video from a single photo of a person. The company claims the models outperform other video-synthesis models when evaluated by humans, pushing us closer to a future where anyone can synthesize a full video of any subject on demand. The company does not yet have plans of when or how it will release these capabilities to the public, but Meta says Movie Gen is a tool that may allow people to "enhance their inherent creativity" rather than replace human artists and animators. The company envisions future applications such as easily creating and editing "day in the life" videos for social media platforms or generating personalized animated birthday greetings.
Movie Gen builds on Meta's previous work in video synthesis, following 2022's Make-A-Scene video generator and the Emu image-synthesis model. Using text prompts for guidance, this latest system can generate custom videos with sounds for the first time, edit and insert changes into existing videos, and transform images of people into realistic personalized videos. [...] Movie Gen's video-generation model can create 1080p high-definition videos up to 16 seconds long at 16 frames per second from text descriptions or an image input. Meta claims the model can handle complex concepts like object motion, subject-object interactions, and camera movements. You can view example videos here. Meta also released a research paper with more technical information about the model.
As for the training data, the company says it trained these models on a combination of "licensed and publicly available datasets." Ars notes that this "very likely includes videos uploaded by Facebook and Instagram users over the years, although this is speculation based on Meta's current policies and previous behavior."
Movie Gen builds on Meta's previous work in video synthesis, following 2022's Make-A-Scene video generator and the Emu image-synthesis model. Using text prompts for guidance, this latest system can generate custom videos with sounds for the first time, edit and insert changes into existing videos, and transform images of people into realistic personalized videos. [...] Movie Gen's video-generation model can create 1080p high-definition videos up to 16 seconds long at 16 frames per second from text descriptions or an image input. Meta claims the model can handle complex concepts like object motion, subject-object interactions, and camera movements. You can view example videos here. Meta also released a research paper with more technical information about the model.
As for the training data, the company says it trained these models on a combination of "licensed and publicly available datasets." Ars notes that this "very likely includes videos uploaded by Facebook and Instagram users over the years, although this is speculation based on Meta's current policies and previous behavior."
Nothing is real anymore (Score:5, Interesting)
Great. We have finally been able to get to the point where nothing you see online is real. It will all be AI generated. Congratulations to us. We have just destroyed the human race and given in to our robot overlords.
Re:Nothing is real anymore (Score:5, Interesting)
Re: (Score:2)
We're just moving the internet from being an historical non-fiction to being an absurdist semi-fiction, where there's just enough reality to make everything tangentially less real.
Yeah. Including humans.
Re: (Score:2)
Well, consider the benefits, I can easily have a full Facebook profile to the envy of all my followers or friends or subscribers or whatever, without lifting a finger or sharing anything about myself.
Complete privacy.
and the plot's no problem (Score:1)
because people don't expect movies to have an interesting plot anymore.
Re: and the plot's no problem (Score:2)
So you are wishing for economic destruction so you can delight in the pain it causes others. How very Republican of you.
Re: (Score:1)
What a gift! (Score:2, Insightful)
What a wonderful gift to vindictive ex-spouses and divorce attorneys everywhere! Who wouldn't believe your ex cheated on you, why, when we have the video right here?
Just because you can do something doesn't mean you should. There have always been ways of faking photographs, etc..., but those generally required advanced skills and substantial amounts of time and money. This makes it possible for anyone with a grudge and a lawyer to use the legal system to punish anyone they dislike.
Re:What a gift! (Score:5, Insightful)
Re:What a gift! (Score:5, Interesting)
I think civil lawsuits will not be the major problem, because
1) videos produced by Meta models will be watermarked wityh their own Stable Signature https://ai.meta.com/blog/stabl... [meta.com] or a future equivalent. so if you're hit with a fraudulent filing, your attorney will suggest an expertise, which will trivially reveal the fraud. This will happen naturally as in a year or two everybody will know videos can be faked and expertise are required.
2) Nobody other than the very big actors (Meta, OpenAI, Adobe, cinema studios etc.) can afford to train such a model, so there probably won't be a rogue (un-watermarked) model for small scale fraudsters to download.
A major problem can be fake news, political/business influence. Have your competitor (political or business) say certain stupid things, and leak it to a complacent social network (that won't check AI watermarks). Even if the video is uncovered as fake, damage is done. There are also chances the video isn't uncovered as fake, because hostile foreign agencies can afford to train high quality models that won't have easy flaws.
Re: (Score:1)
Re: (Score:3)
One word (Score:1)
Porn
Is this, um, widely available? (Score:2)
My friends will be shocked to find out how long I've been dating Emma Watson.
Re: Is this, um, widely available? (Score:2)
Who gives names to their hands?
I shudder to ask what your left nut goes by. Yeesh.
Re: (Score:2)
To be fair, my asshole is named "RightwingNutjob"...primarily because of the crap that comes out of it.
Pearl Clutching (Score:2)
All the comments so far seem to assume this is the end of the human race (one literally claimed it). People also did the same "preacher's wife in The Simpsons" bit when photo and audio editing came out.
But neither Photoshop nor Pro Tools have led to the end of trust in photos/audio. You might be a little more skeptical of things that seem fake ... but otherwise life goes on. The same thing is true here: we've had video editing for years now and society hasn't collapsed.
Meanwhile, an entire generation of
Re: (Score:3)
"But neither Photoshop nor Pro Tools have led to the end of trust in photos/audio."
Because they are not content generators, they are content editors, and no made the arguments for them you claim they did.
"Meanwhile, an entire generation of incredible film makers from all over the world ... kids who can barely afford a computer,..."
Sure, kids who cannot afford a computer are "incredible film makers" that will make the next "Apocalypse Now" despite being unable to afford access to the technology that will ena
2D and 3D (Score:3)
Current AI doesn't really model in 3D, but rather apes patterns it sees in 2D frames. There's lot of subtle things this approach gets wrong in terms of consistency and perspective.
I suspect future AI will build a 3D model of the scene, not unlike for gaming, then map the movement of objects in model, then re-render it as 2D frames. It'll still make stupid shit sometimes, but at least the 3D aspects will "add up".
Re: (Score:1)
Addendum: making a 3D model first also allows human editors to more easily tweak it.
Re: (Score:2)
Current AI doesn't really model in 3D, but rather apes patterns it sees in 2D frames. There's lot of subtle things this approach gets wrong in terms of consistency and perspective.
One of the ways to spot a deepfaked image of a person is to look at the eyes. Normally there will be a light spot in the same position on each eye, which is a reflection from whatever is lighting the scene. Image generators don't/can't take this into account so you'll often see the light spots in different places.
Re: (Score:1)
Re: (Score:2)
Or the dancing 'ghost' (in a bedsheet) twirling around and around, under "Generate videos from text."
The inconsistency issue is real and probably will never be perfect, but if it's good enough...
Re: (Score:1)
Those sites likely feature the best videos and are thus not representative of average quality. I'm not claiming the problems will be apparent on *all* 2D-based generative approaches, only too many.
The potential upside... (Score:2)
to this could be amazing. Being it's Meta, they will probably shit the bed but the technology itself could be nothing short of amazing.
Imagine being able to feed your book and some pictures to represent various characters and creatures into this and it pops out a series for you. It would be incredible. It won't need to come up with a story or script, as you provide that. It would need to "read the book" to interpret the scenes for the movie and you would want to train it on images that went with a specific
The Age of Aquarius was short lived. (Score:2)
Welcome to the dawning of the Age of Unreality.
And we need this why exactly? (Score:3)
Granted there is value in showing that it can be done so that people can understand the value of having a reputable source, but since Facebook is already a mass of disinformation, I don't know how adding to the completely made up BS is helping in any way.
Honestly, are we going to have build a complete hardware and software ecosystem to establish a trust network for journalism? At this point, it seems so.
Re: (Score:2)
"trust network for journalism?" i think you misspelled porn
Time to grow a thick skin. (Score:2)
There are a whole bunch of reasons why this matters. The only thing one can reasonably do is identify when it doesn't matter, and have the internal fortitude to simply dismiss it.
Nude fake - R or X - appears in the wild? "Not me." Move on. I know... easier said than done. But once it exists, you can choose whether to expend your mental health on it or not.
When the fake is used as evidence of infidelity in divorce proceedings... then, it matters.
I don't know what to think about the technical world anymore, a
Is this how Zuch surfed wearing a Tuxedo? (Score:3)
I think what I want right now is... (Score:2)
Let's see this tech do something basic - convert 4:3 video into 16:9 and upscale it, in a way that is consistent and doesn't create fill that doesn't actually exist when the camera pans.
Then, of course, I want a nude patch so I select who I want to see in simulated nudity.
After that we can get into editing out laugh tracks and the dialog pauses to allow for them, adding in blood that's been kept out for a lower rating, etc.
Then I want to be able to make random actor substitutions including voice and manneri
While is easy to take an anti meta (Score:1)
While is easy to take an anti meta stance, we all know there will be others, currently researching, developing and preparing their much the same but competing generative AI's.
To me we haven't got to the bad shit yet, its yet to come, once AI without oversight can start to interact with the physical world, posting videos, stories comments to all and every website, sending texts, emails, making phone calls the average person has difficulty now being able to identify "fake", the below average, yes that is 50%