Friday, January 5, 2018

Semantic Primes

I've been thinking about how to build up a dictionary how you would if you were trapped on a desert island with someone who didn't speak your language and you wanted to teach it to them from scratch. I got a very interesting book called Semantics: Primes and Universals by Anna Wierzbicka that talks about the very first words you would use to start such a project. The "semantic primes" are around 40 words that she takes as undefinable but from which a much larger number of words can be defined. She says that these words are special because every human language has a word (or a sense of a word) that is a direct translation of each of these 40 words, and that they are among the first concepts that children learn to express. She also talks a little about a simplified grammar that lets you combine these words.

learnthesewordsfirst.com is an online dictionary/lesson plan that builds up English in this way. It starts with

61 semantic primes, defined mainly by pictures and examples. Using only these 61 words, it defines

300 "semantic molecules." Using only these semantic molecules, it defines

2000 words used in the Longman Defining Vocabulary. These words are used to define

230,000 words in the Longman Dictionary of Contemporary English.



Now imagine that you found a way to program the meaning of these 61 words into a robot, and programmed in the ability to read a sentence using these 61 words and derive the meaning of a new word from that. You could build up to the meaning of all the words in the dictionary this way. Programming those first 61 words and the grammar would be challenging, but I don't think impossible.



I don't think this would be sufficient to understand everything about those concepts. Suppose I gave the definition "an arc-shaped fruit, around 6-12 inches long, with soft white flesh and a skin that is green when unripe, yellow when ripe, and soft and brown when overripe, and grows in bunches." This would be enough to pick out a banana from any other food in the supermarket, but it wouldn't tell you much about what a banana really looks like. You wouldn't be able to recognize a banana split from such a definition. A good definition generally tells you just enough to distinguish the item from any potential confusers. But it would be an excellent start that you could begin to flesh out with other capabilities.   

4 comments:

  1. Yeah, there's just no substitute for qualia, is there?

    ReplyDelete
    Replies
    1. This is true. But there may be some way of 'faking it' with collections of multimedia and tabular data for each concept, e.g. examples_of and not_examples_of.

      You could have a discriminator (and possibly generator) of instances based on some sort of machine learning, e.g. neural nets.

      Add these into the dictionary, a Shank-ian language of basic relations and stir in some CYC entries and I think you could really have something.

      Way out of my brain league though :(

      Delete
    2. Is anyone building robots that collect their own observations about real-world objects by just playing with them the way babies do? Learning just how squishy a banana is by having pressure sensors built into its' hands; "tasting" things by putting them into a spectrometer; whacking things together to see which are stronger, etc.? That's robots doing data collection by observation even if they're not "experiencing qualia" while they do it. If they then attach that data to their dictionary, it would produce a similar experiential knowledge to what humans gain.

      Delete
    3. This is some work I'm aware of that our collaborators at CMU are doing along the lines you describe:
      https://arxiv.org/pdf/1604.01360.pdf
      I agree that while qualia play some role in how humans understand things, it seems that we should eventually be able to build robots that demonstrate the same level of understanding on any tasks we could propose without actually having qualia.

      Delete