Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials (techcrunch.com) 22
An anonymous reader quotes a report from TechCrunch: Meta (AKA Facebook) researchers are working on [...] an AI that can learn capably on its own whether it does so in spoken, written or visual materials. The traditional way of training an AI model to correctly interpret something is to give it lots and lots (like millions) of labeled examples. A picture of a cat with the cat part labeled, a conversation with the speakers and words transcribed, etc. But that approach is no longer in vogue as researchers found that it was no longer feasible to manually create databases of the sizes needed to train next-gen AIs. Who wants to label 50 million cat pictures? Okay, a few people probably -- but who wants to label 50 million pictures of common fruits and vegetables?
Currently some of the most promising AI systems are what are called self-supervised: models that can work from large quantities of unlabeled data, like books or video of people interacting, and build their own structured understanding of what the rules are of the system. For instance, by reading a thousand books it will learn the relative positions of words and ideas about grammatical structure without anyone telling it what objects or articles or commas are -- it got it by drawing inferences from lots of examples. This feels intuitively more like how people learn, which is part of why researchers like it. But the models still tend to be single-modal, and all the work you do to set up a semi-supervised learning system for speech recognition won't apply at all to image analysis -- they're simply too different. That's where Facebook/Meta's latest research, the catchily named data2vec, comes in.
The idea for data2vec was to build an AI framework that would learn in a more abstract way, meaning that starting from scratch, you could give it books to read or images to scan or speech to sound out, and after a bit of training it would learn any of those things. It's a bit like starting with a single seed, but depending on what plant food you give it, it grows into an daffodil, pansy or tulip. Testing data2vec after letting it train on various data corpi showed that it was competitive with and even outperformed similarly sized dedicated models for that modality. (That is to say, if the models are all limited to being 100 megabytes, data2vec did better -- specialized models would probably still outperform it as they grow.)
Currently some of the most promising AI systems are what are called self-supervised: models that can work from large quantities of unlabeled data, like books or video of people interacting, and build their own structured understanding of what the rules are of the system. For instance, by reading a thousand books it will learn the relative positions of words and ideas about grammatical structure without anyone telling it what objects or articles or commas are -- it got it by drawing inferences from lots of examples. This feels intuitively more like how people learn, which is part of why researchers like it. But the models still tend to be single-modal, and all the work you do to set up a semi-supervised learning system for speech recognition won't apply at all to image analysis -- they're simply too different. That's where Facebook/Meta's latest research, the catchily named data2vec, comes in.
The idea for data2vec was to build an AI framework that would learn in a more abstract way, meaning that starting from scratch, you could give it books to read or images to scan or speech to sound out, and after a bit of training it would learn any of those things. It's a bit like starting with a single seed, but depending on what plant food you give it, it grows into an daffodil, pansy or tulip. Testing data2vec after letting it train on various data corpi showed that it was competitive with and even outperformed similarly sized dedicated models for that modality. (That is to say, if the models are all limited to being 100 megabytes, data2vec did better -- specialized models would probably still outperform it as they grow.)
Re: (Score:2)
Re: (Score:2)
curious.
have built
or are building
which is it
oh you mean like a bundle? (Score:1)
import text_ml
import audio_ml
import visual_ml
lols
Re: (Score:2)
Re: oh you mean like a bundle? (Score:1)
"Meta" stands for (Score:3)
"Metastasize"
Muffins, yum (Score:2)
Do an internet search for "Chihuahua or muffin" to see how well this approach works.
Re: (Score:2)
Do an internet search for "Chihuahua or muffin" to see how well this approach works.
Strange you should mention that because I've been working with my state legislators to solve that problem. The solution was so simple: legally all chihuahua will be now be named "Muffin".
Re: (Score:2)
Re: (Score:2)
On the minus side, those classifiers now have a designed-in defect. I hope they at least have that documented well.
Use videos. (Score:2)
But that approach is no longer in vogue as researchers found that it was no longer feasible to manually create databases of the sizes needed to train next-gen AIs.
It seems it would be easier to simply use videos of the subject of interest. For one, you can get a lot more information by seeing the subject in all it's various states. Cats alone have many different states which can show up in a single video of a cat.
The approach of learning that they are taking is really just speech-to-text which is then pushed into natural language processor to extract a label. I'm not saying it's bad but the natural language processing is going to be limited in what it can do witho
Re: (Score:2)
I'm not saying it's bad but the natural language processing is going to be limited in what it can do without having a grasp of word meanings beyond mere objects.
You've hit on the most interesting problem, and one that is most often completely ignored. That's deriving semantic content from pure syntax. From the summary:
or instance, by reading a thousand books it will learn the relative positions of words and ideas about grammatical structure
This is the easy problem, and something you can easily get from very simple algorithms. It's when we want to get from that to meaning that we hit the wall of impossibility. That doesn't mean you can't still get useful results, but they'll never reach anything like the science fiction myth some people like to push.
Now, because this is an article
Re: (Score:2)
Well, duh. Of course, these AIs don't have legs, hands and dicks. They have no skin in the game, nothing to lose. We, on the other hand, could lose our life or have no kids if we mess up.
My bet is that large language models + RL would be much closer to humans than what we have now.
Re: (Score:2)
Well NLP will never have any "grasp of meaning". The whole thing is more like a desperate attempt to hide the fundamental stupidity of Artificial Ignorance a bit better to actually make it useful. This will work to a degree, no doubt. It will do absolutely nothing about the fundamental shortcomings of AI, namely that it has no clue, no insight and no understanding. That said, an automaton that can fake some base understanding of things is useful, as long as you do not rely on it too much because it fail in
I.e. "learns crappily from all of them" (Score:2)
Artificial Ignorance cannot "learn well". IT can only "learn" badly because all it can do is imitation and averaging, not insight. Putting in more source formats does not change that. Sure, this is useful as it makes AI somewhat cheaper to train, bit the results of that training will still be pretty bad.
Re: (Score:2)
Misleading (Score:2)
Re: (Score:2)
My concern is (Score:2)