Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials (techcrunch.com) 22

Posted by BeauHD on Friday January 21, 2022 @08:02PM from the learning-more-generally dept.

An anonymous reader quotes a report from TechCrunch: Meta (AKA Facebook) researchers are working on [...] an AI that can learn capably on its own whether it does so in spoken, written or visual materials. The traditional way of training an AI model to correctly interpret something is to give it lots and lots (like millions) of labeled examples. A picture of a cat with the cat part labeled, a conversation with the speakers and words transcribed, etc. But that approach is no longer in vogue as researchers found that it was no longer feasible to manually create databases of the sizes needed to train next-gen AIs. Who wants to label 50 million cat pictures? Okay, a few people probably -- but who wants to label 50 million pictures of common fruits and vegetables?

Currently some of the most promising AI systems are what are called self-supervised: models that can work from large quantities of unlabeled data, like books or video of people interacting, and build their own structured understanding of what the rules are of the system. For instance, by reading a thousand books it will learn the relative positions of words and ideas about grammatical structure without anyone telling it what objects or articles or commas are -- it got it by drawing inferences from lots of examples. This feels intuitively more like how people learn, which is part of why researchers like it. But the models still tend to be single-modal, and all the work you do to set up a semi-supervised learning system for speech recognition won't apply at all to image analysis -- they're simply too different. That's where Facebook/Meta's latest research, the catchily named data2vec, comes in.

The idea for data2vec was to build an AI framework that would learn in a more abstract way, meaning that starting from scratch, you could give it books to read or images to scan or speech to sound out, and after a bit of training it would learn any of those things. It's a bit like starting with a single seed, but depending on what plant food you give it, it grows into an daffodil, pansy or tulip. Testing data2vec after letting it train on various data corpi showed that it was competitive with and even outperformed similarly sized dedicated models for that modality. (That is to say, if the models are all limited to being 100 megabytes, data2vec did better -- specialized models would probably still outperform it as they grow.)

Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 22 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by sudonim2 ( 2073156 ) writes:
  
  I came here to say this.
  - Re: (Score:2)
    
    by LifesABeach ( 234436 ) writes:
    
    curious.
    have built
    or are building
    which is it
oh you mean like a bundle? (Score:1)

by Huitzil ( 7782388 ) writes:

def data2vec(text,audio,visual):

import text_ml
import audio_ml
import visual_ml
lols
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  They are all into one single network, thus learn cross-modal correlations.
  - Re: oh you mean like a bundle? (Score:1)
    
    by Huitzil ( 7782388 ) writes:
    
    I sort of read it as a vectorizer that would infer input type. Didnâ(TM)t really read it as correlating 3 input types to infer an objectâ¦ but sounds like you may be right
"Meta" stands for (Score:3)

by Sebby ( 238625 ) writes: on Friday January 21, 2022 @09:21PM (#62196117) Journal

"Metastasize"

Muffins, yum (Score:2)

by null etc. ( 524767 ) writes:

The traditional way of training an AI model to correctly interpret something is to give it lots and lots (like millions) of labeled examples
Do an internet search for "Chihuahua or muffin" to see how well this approach works.
- Re: (Score:2)
  
  by Gravis Zero ( 934156 ) writes:
  
  Do an internet search for "Chihuahua or muffin" to see how well this approach works.
  Strange you should mention that because I've been working with my state legislators to solve that problem. The solution was so simple: legally all chihuahua will be now be named "Muffin".
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    Google too removed gorilla from image classifiers, problem solved. No more confusing blacks.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      On the minus side, those classifiers now have a designed-in defect. I hope they at least have that documented well.
Use videos. (Score:2)

by Gravis Zero ( 934156 ) writes:

But that approach is no longer in vogue as researchers found that it was no longer feasible to manually create databases of the sizes needed to train next-gen AIs.
It seems it would be easier to simply use videos of the subject of interest. For one, you can get a lot more information by seeing the subject in all it's various states. Cats alone have many different states which can show up in a single video of a cat.
The approach of learning that they are taking is really just speech-to-text which is then pushed into natural language processor to extract a label. I'm not saying it's bad but the natural language processing is going to be limited in what it can do witho
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  I'm not saying it's bad but the natural language processing is going to be limited in what it can do without having a grasp of word meanings beyond mere objects.
  You've hit on the most interesting problem, and one that is most often completely ignored. That's deriving semantic content from pure syntax. From the summary:
  or instance, by reading a thousand books it will learn the relative positions of words and ideas about grammatical structure
  
  This is the easy problem, and something you can easily get from very simple algorithms. It's when we want to get from that to meaning that we hit the wall of impossibility. That doesn't mean you can't still get useful results, but they'll never reach anything like the science fiction myth some people like to push.
  Now, because this is an article
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    > It's when we want to get from that to meaning that we hit the wall of impossibility
    
    Well, duh. Of course, these AIs don't have legs, hands and dicks. They have no skin in the game, nothing to lose. We, on the other hand, could lose our life or have no kids if we mess up.
    
    My bet is that large language models + RL would be much closer to humans than what we have now.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Well NLP will never have any "grasp of meaning". The whole thing is more like a desperate attempt to hide the fundamental stupidity of Artificial Ignorance a bit better to actually make it useful. This will work to a degree, no doubt. It will do absolutely nothing about the fundamental shortcomings of AI, namely that it has no clue, no insight and no understanding. That said, an automaton that can fake some base understanding of things is useful, as long as you do not rely on it too much because it fail in
I.e. "learns crappily from all of them" (Score:2)

by gweihir ( 88907 ) writes:

Artificial Ignorance cannot "learn well". IT can only "learn" badly because all it can do is imitation and averaging, not insight. Putting in more source formats does not change that. Sure, this is useful as it makes AI somewhat cheaper to train, bit the results of that training will still be pretty bad.
- - Re: (Score:2)
    
    by Oligonicella ( 659917 ) writes:
    
    You may want to explain why the post is wrong instead of just writing a vague ad hominem.
Misleading (Score:2)

by rantrantrant ( 4753443 ) writes:

When words like 'understand', 'learn', 'recognise', are used in reference to AI, it misleads the vast majority of readers, who have insufficient, background in cognitive science & applied linguistics. The idea of a machine that can correctly infer & understand what we mean to say from the utterances or graphemes that we produce is pure science fiction for the foreseeable future. See Michael Tomasello's work on the origins of language, the psychological & social conditions that made it feasible
- Re: (Score:2)
  
  by Oligonicella ( 659917 ) writes:
  
  For some portion of /. you're just baying at the wind. This *is* the place where posters used to equate the human brain with binary circuitry.
My concern is (Score:2)

by RitchCraft ( 6454710 ) writes:

Any company creating supposed AI products needs to have an idea of what intelligence is. Tech CEOs seem to miss that standard given past history.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials (techcrunch.com) 22

Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials More Login

Meta Researchers Build an AI That Learns Equally Well From Visual, Written or Spoken Materials

Re: (Score:2)

Re: (Score:2)

oh you mean like a bundle? (Score:1)

Re: (Score:2)

Re: oh you mean like a bundle? (Score:1)

"Meta" stands for (Score:3)

Muffins, yum (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Use videos. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I.e. "learns crappily from all of them" (Score:2)

Re: (Score:2)

Misleading (Score:2)

Re: (Score:2)

My concern is (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot