Friday, August 30, 2013

My PhD dissertation: Understanding Images

Next month I will be defending my PhD dissertation. I thought you might like to hear what it is about.
There is a difference between knowing a thing and understanding it. This is illustrated by a story related by the inventor of the Dewey Decimal system, John Dewey. He was talking to the class of another educator and asked, "What would you find if you dug a deep hole in the earth?" When none of the children were able to answer after repeated questioning, the students' teacher explained to Dewey, "you're asking the wrong question." She asked the class,"What is the state of the center of the earth?" to which the class replied unanimously 'igneous fusion.'"
The students knew the answer to the teacher's question, but they didn't understand the answer. The facts in the students heads weren't connected to related facts in a way that would let them answer a question phrased differently.
So can a computer really understand something? It's easy to make a computer program that will answer "Albany" when you ask it to name the capital of New York. But if that fact isn't connected to anything else, the answer is essentially meaningless.
My field is computer vision: writing software that takes images or video as input, and tells something about the scene as output. Normally, this is just a label for part of the image, like "person" or "sky." In fact, to save effort, I usually just output a particular color and have a key sitting off to the side saying, "red means person, yellow means road, etc..." As far as the program is concerned that color is a meaningless symbol.
After working with some colleagues in Europe, I started to think about what you would need to do in order to go beyond that, to make a program that really understood what it was seeing.
Now, in a way, this is clearly impossible. A computer can't experience visual sensations the way we do. It isn't conscious, it can't know what it is really like to see the color red. At the most fundamental level, the color red is just [255 0 0] for the computer. That's a weird, deep philosophical issue. So I changed the question a little. I asked, "How can I build a machine that passes tests of understanding?"
I used to be a teacher, and one of the things they taught us was that there are different kinds of questions you can ask on a test. One kind of question tests for knowledge, and another kind tests for understanding. If you want to test whether a student knows what is in a picture, you ask them to label the items depicted. If you want to know whether a student understands a picture, you ask them questions that require the use of knowledge they already have from another source. So in order to make a program that can answer this second type of question, I needed to give it a source of knowledge about the world and the ability to connect what it was recognizing in the image to what was already stored in its memory.
First, though, I needed the program to be able to recognize what was in the image. Imagine you are asked to describe a scene. You might say, "There is a girl running on a gravel road eating blue ice cream." This description contains:

  • concrete nouns: girl, road, ice cream
  • concrete verbs: running
  • concrete adjectives: gravel, blue
So I wrote programs to do each of these things: a program to label objects (like person and road), a program to label materials (like gravel) and colors, and a detector for activities. (Actually, the activity detector still needs some work-- I can get it to work if I have a Kinect sensor, but not with just video.) Other people had written programs to do these things before, but I invented my own, new ways of doing each of them.
Now, you shouldn't overestimate what these programs can do. I can't just hand it an arbitrary picture and expect it to tell me everything in that picture. Instead, I can pick maybe a dozen nouns, give it examples of those things, and it can recognize those pretty well in a new image. But that's okay, I think. It's the job of someone like Google or Microsoft to go to all the work of making it able to handle lots of words, and other researchers can figure out ways to make them more accurate. I'm just trying to show it can be done at all.
Once I had it where it could make these descriptions of the scene, I connected it up with something called "ResearchCyc." Cyc is one of the longest running projects in computer science. Its goal is to encode everything we know as part of common sense. I wrote a little program to take these descriptions of the scene, and express them in a language Cyc can work with. 
With all these parts combined, you can ask questions like,"What is there in this scene that might catch on fire if exposed to a flame?" The program will then reason something like this:
  • There is an object 89% likely to be a fence in the scene.
  • The fence is 77% likely to be made of wood.
  • There are objects 93% likely to be trees in the scene. These objects are 62% likely to be concrete [because they happen to look smooth and grey in this particular picture] and 35% likely to be wood.
  • All trees are made of wood. [It has never heard of fake concrete trees and thinks that things are what they appear to be. It's kind of like a little kid that way.]
  • Therefore, these trees are made of wood.
  • Wood things are flammable.
  • Flammable things catch on fire if exposed to a flame.
  • Therefore, the fence at location (50, 127) and the trees at location (156, 145) might catch fire if exposed to flame.
In fact, it can report that chain of reasoning and answer in (pretty good) English, because it has canned translations for each of the concepts it knows into English. Unfortunately, it doesn't go the other way yet: you have to write your questions in the language Cyc understands, instead of just writing them in English.
There's still a lot of work to do. Despite all the work that has been done, Cyc still has huge gaps of things it doesn't know. What we would really like would be some way to automatically build a resource like Cyc from analyzing web pages, or from a robot interacting with the world. That's still pretty hard, but programs like IBM's Watson, the Jeopardy champ, make it seem like it might be reachable in the next decade or two. One thing my program can't do at all is guess why someone is doing something. It doesn't have any kind of model of what is going on in people's heads. It can't perform visual reasoning in any way at the moment. My hope is that other researchers will see what I've done and say, "okay, technically this can answer some understanding questions, but its really quite lousy. I can do better." If enough people do that, we might get something useful.

Here are some other cool things the program can do:

  • draw like a child (well, in one way) http://machinamenta.blogspot.com/2013/08/drawing-like-child.html
  • recognize that there is probably a window in the scene even though it doesn't have a window detector. How? It knows that buildings and cars usually have windows, and it can detect those.
  • recognize that it is probably looking at an urban scene or a rural scene or an indoor scene based on what kinds of things it can see.
  • guess that something that might be a dog or might be a cow but it can't tell which is a mammal, a quadruped, an animal, a living thing, is able to move, and so forth. This is called least common superclass.







Wednesday, August 21, 2013

Excerpt 8: Divination and Games


This same pattern played out again and again, in China, Europe, Babylonia and the Americas as well as Africa: games of chance and skill, with their discrete states and physical markers, were invariably associated with divination.
Upon comparing the games of civilized people with those of primitive society many points of resemblance are seen to exist, with the principal difference that games occur as amusements or pastimes among civilized men, while among savage and barbarous people they are largely sacred and divinatory. This naturally suggests a sacred and divinatory origin for modern games, a theory, indeed, which finds confirmation in their traditional associations, such as the use of cards in telling fortunes.[1]
When we think of divination as a kind of game, as a way of generating new sentences from thin air, the problem of predictive accuracy is marginalized. The system was generally set up so that whatever sentence was generated would be a true sentence, because the truths encoded in the system were general truths, applying universally.
…The experiential (both recreational and revelatory) value of divination and board-games is that they create an unlimited variety of vicarious experiences, i.e. stories. Spinning relevant, even illuminating and redeeming stories out of the raw material which the fall of the apparatus in combination with the interpretative catalogue provides, is the essence of the diviner’s skill and training; and in the same way board-games can be seen as machines to generate stories. [2]
Nearly all of the ancient board games were associated with divination at one time or another.
Senet: Senet was a board game played in Egypt from around 3500 BC. Tomb paintings show the importance that Egyptian society placed on the game. A successful player of Senet was assumed to be under the protection of Ra and Thoth, since the chance fall of the throwing sticks was believed to be under their control. (For this reason Senet boards are of found among the items buried to be taken into the afterlife.)

The Royal Game of Ur: Dating to about 2600 BC, this game was played in Mesopotamia. Like Senet, it was a race game something like Backgammon. This game had certain squares thought to bring good fortune.
Go: The most prominent Chinese board game, Go was invented by the third century BC. The Go board was also used for divination, by casting the black and white stones and analyzing the patterns of how they fell. As Ban Gu described in The Essence of Go in the first century AD, “The board must be square and represents the laws of the earth. The lines must be straight like the divine virtues. There are black and white stones, divided like yin and yang. Their arrangement on the board is like a model of the heavens.” As in Mancala, the patterns were associated with a model of the world.

Chess: There are multiple theories on the origin of chess, but one possibility is that it stems ultimately from Chinese divination methods. Chess historian Joseph Needham writes:
The game of chess (as we know it) has been associated throughout its development with astronomical symbolism, and this was more overt in related games now long obsolete. The battle element of chess seems to have developed from a technique of divination in which it was desired to ascertain the balance of ever-contending Yin and Yang forces in the universe…. It appears that the pieces on the board in this divination technique represented the sun, moon, planets, stars, constellations, etc. The suggestion is that this “game” passed to 7th-century India, where it generated the recreational game conceived in terms of battling human armies… “Image-chess” derived in its turn from a number of divination techniques which involved the throwing of small models, symbolic of the celestial bodies, on to prepared boards. Thus there was a dice element as well as a move element, and there were many intermediate forms between pure throwing and placement followed by combat moves. All these go back to China of the Han and pre-Han times, i.e. to the -4th or -3rd century, and similar techniques have persisted down to late times in other cultures.[3]
Dr. John Dee, astrologer to Queen Elizabeth II, invented a four player chess variant called “Enochian Chess,” which was designed explicitly for use in divination. Unlike random divination, it was thought that players could influence the outcome of fate through their actions on the board.

Cards: Playing card games are associated with the development of fortune-telling via Tarot cards (from which the common playing card is a simplified derivative). In the 1500s in Italy, a dealt hand of cards was used as a kind of random poetry generator. The poet would need to fit the images on the cards or their meanings into his poem. This practice was known as “tarocchi appropriati.” The fortune telling aspect of Tarot cards seems to have evolved from this game.

Divination and Mathematics
These games and divination systems are remarkably old. Consider the die used in most games of chance: the reason it has pips instead of numbers on the faces is that the form of the die settled into its present form before the invention of Arabic numerals.
Divination drove the development of mathematics: much of Mayan, Egyptian, and Babylonian mathematics were used for astrological purposes. For example, our measurement of time and angles come from Babylonian astrologers’ division of the heavens in their base 60 system.[4] The most advanced mechanical computers from Greek and from Arab inventors in the ancient world were complex representations of the heavens, used for navigation and astrology. The Antikythera mechanism (often called the first mechanical computer) is the best known of these, as few others have been preserved. Found in a shipwreck and dating from around 200 BC, it showed the position of all the known planets, the sun, and the moon, requiring over 30 gears to do so. Modern scientists, who find such a device fascinating for the level of mechanical sophistication it displays, seem reluctant to admit that the only practical use such a device could have had was casting horoscopes and determining auspicious days. Watching how the planets move back and forth around the wheel of the zodiac on a recreation of this device, it is not hard to see how such an irregular motion would give the impression of an intelligent and willful plan being acted out. Early attempts by archaeologists to understand the device focused on the words inscribed on it, and were unsuccessful. It was only when an attempt was made to understand the gearing system that the meaning of the device was recovered.
Later, it was the analysis of games of chance that led to the development of probability theory and statistics, which are key components of most modern AI systems, since absolute reasoning is often too brittle to deal with real-world situations.
The combinatoric principles of the I Ching[5] and the geomantic divination (introduced at the beginning of this chapter) inspired the 17th century philosopher Gottfried Leibniz to develop binary notation. These binary codes are found in other divination systems around the world, such as the African Ifa or Sikidy systems of divination. In recent years, the fields of “ethno-mathematics” and “ethno-computation” have begun studying these cultural artifacts to explore the mathematical ideas of non-Western cultures. [6]
Elements of recursion play a large role in these games and divination systems, where the state resulting from a series of actions is the beginning point for the same series of actions, performed again and again. In Mancala, for example, the game is played by choosing one pit, scooping up all the seeds from the pit, and planting one in each of the following pits. The object of the game is to be the first to get all of one’s seeds into the final pit. One strategy to do this is to find a pattern that persists over time, so that the seeds in multiple pits move together in a train. These patterns were discovered and used by experienced players across Africa. In the field of cellular automata, this is known as a “glider.”[7] As a form that maintains itself as it moves through a space divided into discrete cells, it is an important component in the study of these computational systems, a study which only began in the 1940s as computers were invented.
These connections to mathematics are a natural extension of the representational nature of the tokens and spaces used in board games and divination. As a simpler system than the real world, it provided a fertile ground to begin development of mathematical ideas.




[1] Stewart Culin, Gambling Games of the Chinese in America, 1891
[2]Wim van Binsbergen, ibid.
[3] Thoughts on The Origin of Chess by Joseph Needham, 1962
[4] Because 360 is a nice round number near to the number of days in the year in base 60, the ancient Babylonians divided the sky into 360 degrees. (This is easy to accomplish using a compass and straightedge.) The fact that we use 24 hour days and 60 minute hours also derive from this way of dividing up a circle.
[5] The I Ching or Book of Changes is a Chinese method of divination that involves casting small sticks that can land in one of two possible ways. Based on the binary pattern formed by several of these casts, a fortune can be looked up in a book (thus the name).
[6] Viznut, “The Mystery of the Binary,” [Alt] Magazine, 2003
[7] Ron Eglash, African Fractals, 1999 

Tuesday, August 20, 2013

Excerpt 7

Most machines have predictable output. The mill, the clock, the engine, each has a cycle that is unvarying and expected. Even in prehistoric times, however, people built a different kind of machine, devices that were generative: they produced original results not explicitly intended by their creators. One very early example is the family of divination systems used throughout Africa called geomantic systems. These are still in wide use today, and we know from inclusions in burials that they were already old when the Egyptian dynasties were just beginning. They were largely virtual machines, or software: a set of rules that if followed exactly would provide a result, rather than a physical apparatus that applied those rules.

The “hardware” of these systems is extremely simple: a grid of squares drawn in the dirt with a stick, or an array of pits dug into wood or stone, along with a handful of different colored markers. There are many variations throughout Africa and the Middle East, with layers of complexity built up over time. A typical example of their use would go something like this: the fortune-teller takes a handful of seeds and drops a few into each pit. The seeds are removed from each pit in pairs, leaving either one or two seeds in each pit. This binary code is recognized by name and used to pick out an answer to the query from a memorized structure. The code is sometimes related to the appearance of the symbol string. For example, in one system the pattern 2-1-1-1, bearing a resemblance to a flag on a flag-pole, carries a meaning of exultation (in the table below this pattern is labeled Caput Draconis, perhaps because of its resemblance to the head of the constellation Draco).
These simple patterns composed of four binary symbols are generated in groups, and the elements are recombined to derive new patterns, such as taking the first symbol from the first pattern, the second symbol from the second pattern and so forth to form a derived daughter pattern from the original mother patterns. The daughter patterns then could be recombined using addition (mod 2) to form yet another new pattern, whose meaning modifies that of the original pattern. The details are strictly passed down within a tradition, but variants exist across Africa and the Middle East. The patterns are associated not only with an interpreted meaning, but with the planets, the elements, the gods, the points of the compass, the signs of the zodiac, and so forth.
Where did such a system come from? Anthropologists can only speculate, but the same block of pits and seeds is also used for other purposes in these societies. For an illiterate population, it is a way of performing addition, subtraction, multiplication and division in a concrete way that all parties can verify, by literally reenacting the event being calculated with a single seed standing for a single item.[1] It performed functions of rewritable memory that had previously only been possible within the brain. Before the invention of writing or numbers, it was a system of symbols that represented other goods, that remapped time and space into an abstract world, with its own discrete units of space and time.


Pattern
Name
Meaning
::::
Populus
People
····
Via
Way
::: ·
Tristitia
Sadness
·:::
Laetitia
Joy
:: ··
Fortuna Major
Greater Fortune
··::
Fortuna Minor
Lesser Fortune
: ·:·
Acquisitio
Profit
·:·:
Amissio
Loss
·:··
Puella
Girl
··:·
Puer
Boy
·::·
Carcer
Prison
: ··:
Conjuctio
Connection
:: ·:
Albus
White
: ·::
Rubeus
Red
: ···
Caput Draconis
Head of the Dragon    (heaven)
···:
Cauda Draconis
Tail of the Dragon         (the underworld)
modified from Games of the Gods by Nigel Pennick, p. 55-63

.
A device which automates the steps of geomantic divination has been preserved in the British Museum. Built in 1241 in Damascus, it is a beautiful rectangular framed structure, made of brass and covered with inscribed dials, built by the metalworker Muhammad ibn Khutlukh al-Mawsili. Based on the setting of four dials to a set of binary patterns, further derived dials are set and a large rainbow-shaped area at the bottom displays the meaning of the pattern and an answer to the question being asked. The face is inscribed with the message:
I am the revealer of secrets; in me are marvels of wisdom and strange and hidden things. But I have spread out the surface of my face out of humility, and have prepared it as a substitute for earth.… From my intricacies there comes about perception superior to books concerned with the study of the art.

From this inscription we can see that the device was personified, yet was presented without pretence as a machine. Whether using a mechanical device or simply following a set of rules, the petitioners believed that a mechanical process could behave in an intelligent way. This was their central discovery: that ideas could be held in objects, and by manipulating those ideas mechanically, one could learn something new.
The connections with modern computers are more than coincidences. The same features that made a pitted board useful for tracking heads of cattle also made it ideal for playing a game and for divination: external symbols that both players could refer to. Regarding this relationship between games and divination devices, anthropologist Wim van Binsbergen writes:Another use of the same type of pits and seeds was the board game Mancala. In Mancala, the pits are said to represent fields and the markers represent seeds being sown. In this use as well, we see the board acting as a model of another activity, a simplified model with continuous space and time replaced with discrete divisions. The seeds and pits resemble the paper tape and marks along it that Turing imagined in his seminal paper on computation.
Both material divination systems and board-games are formal systems, which can be fairly abstractly defined in terms of constituent elements and rules relatively impervious to individual alteration. Both consist in a drastic modeling of reality, to the effect that the world of everyday experience is very highly condensed, in space and in time, in the game and the divination rite. The unit of both types of events is the session, rarely extending beyond a few hours, and tied not only to the restricted space where the apparatus (e.g. a game-board, a divining board or set of tablets) is used but, more importantly, to the narrowly defined spatial configuration of the apparatus itself. Yet both the board-game and the divination rite may refer to real-life situations the size of a battle field, a country, a kingdom or the world, and extending over much greater expanses of time (a day, a week, a year, a reign, a generation, a century, or much more) than the duration of the session. In ways which create ample room for the display of cosmological and mythical elements, divination and board-games constitute a manageable miniature version of the world, where space is transformed space: bounded, restricted, parcelled up, thoroughly regulated; and where time is no longer the computer scientist’s “real time” — as is clearest when divination makes pronouncements about the past and the future. Utterly magical, board-games and divination systems are space-shrinking time-machines. [2]
Considered as a way to predict the future, any existing form of divination will be little better than chance. Its interest for our purposes lies not in its accuracy, but in the way it brought people to confront the issues of artificial generation of meaning millennia before the invention of computers. As a way of holding information and allowing it to be manipulated, these techniques provided a way of working out possibilities in a safe space.






[1] The first abaci were drawn in the sand and used pebbles as counters, and later used pits and grooves carved in wood. The word abacus comes from the Hebrew abaq, meaning “dust.” The pebbles (calculi) used in the Roman abacus are the origin of words such as calculate.
[2] Wim van Binsbergen, Board-games and divination in global cultural history (web page), 1997

Monday, August 19, 2013

Excerpt 6

Brewster was not the only one exploring the relationship between beauty and technology at the time. Mary Boole was fascinated by the mechanization of logic that her husband George and his colleagues were developing[1]. She did not have the opportunity to study mathematics herself (as she points out, women were not allowed to attend college in her day) but was eager to discuss these ideas and try to understand them in a larger context of aesthetics and religion.
Within the last generation we have gained a “Calculating-Engine,” a “Calculus of Logic” (with many and widespread applications), and a “Logical Abacus;” and we are fast discovering means of making the generation of the most complicated and beautiful curves as mechanical a process as Logic has become. Of what are these inventions a sign? The reasoning-machines of Babbage and Jevons, and the sympalmograph, and other inventions for illustrating the mathematical genesis of beauty, seem to me to have brought to a reductio ad absurdum the worship of intellectual power and artistic genius.[2]
As the machinery behind thought and artistic creation became understood, she thought, they would be valued less. The 19th century German philosopher Arthur Schopenhauer expressed a similar idea: “That arithmetic is the basest of all mental activities is proved by the fact that it is the only one that can be accomplished by a machine.”

The Harmonograph

The sympalmograph or harmonograph that Mary Boole spoke of was a device for tracing out the path of combined harmonic vibrations. The pattern of Lissajous curves it traced out was a mathematically constrained image that was widely admired for its beauty. Mary Boole was especially interested in how mathematical curves could lead to the beautiful forms of leaves or flowers, a theme taken up in our century in such books as The Algorithmic Beauty of Plants, or The Fractal Geometry of Nature.

Multiple versions were built, initially simple fragile oscillatory pendulums, later ones with careful arrangements of brass gears. Various inventors designed ones of the latter type which could render the same image twice with slightly offset phases, so that when viewed through a stereoscope rendered a 3D curve in space. Photographer Charles Bentham devised a way to create similar figures photographically with long exposures, writing:
It was generally judged that “successful” harmonograms required the lengths of the two pendulums to be in a ratio (such as 1:4). The chaotic curves which result from unturned lengths were judged to be less interesting and “out of harmony.”
The kaleidoscope and sympalmograph are examples of a particular type of a device that is intended to generate new works of art. Each one follows the same pattern:
·        Hand-selected forms to be recombined. Brewster at various points recommends buttons, bits of broken glass, a distant bonfire, dancers, coins, engravings, gems, polarized lenses, and flowers.
·        Random or nearly random input. In the kaleidoscope, this comes from shaking the bits of glass.
·        Formal constraints. The kaleidoscope uses mirrors to impose symmetry.
For example, a Markov poetry generator takes a set of words (preselected for the intended effect, such as all words used in works by a given author) recombine them randomly, but imposing constraints of use frequency patterns. Fractal generators use slightly more sophisticated symmetry constraints on the randomness. Similar examples can be found for music, such as David Cope’s EMI program.

Is the Kaleidoscope Creative?

When the kaleidoscope first appeared, it was hugely popular among all segments of society—rich and poor, old and young. Within a few years, however, the appeal had greatly diminished. We now see a kaleidoscope as a toy for young children. It is almost as if it were an infectious disease like chicken pox, a fascination that overtakes each of us the first time we are exposed, but we gradually become accustomed to. After a short time new kaleidoscope images fail to add anything new to the already formed impression. Instead of seeing individual works, our brains pick up on the underlying pattern that unites all the images formed. While at first it seemed that the kaleidoscope was being creative, it later becomes apparent that a store of creativity injected, as it were, during the creation of it has merely been allowed to leak out slowly.
Simple recombinative novelty, then, isn’t the only thing a device needs in order to seem truly creative to us.
What I would like to do in the following chapters is explore just what it is that separates human creativity from the limited kind of pseudo-creativity exhibited by the kaleidoscope. These questions are mixed in with a tour of similar devices found in many different fields throughout history.
·        How can a machine evaluate the quality of the work it produces?  Can a machine be built that in some sense understands the meaning of its own output?
·        How does the free will of the artist affect artwork and our perception of it? When an artist and a viewer perceive artwork, what kinds of processes occur in our minds, and can they be automated, even in principle?
·        What influences our perception of beauty, art, and creativity?  How can we define these in a rigorous way? Can a definition ever be provided for something that by its very nature is about discovering how to go beyond previous limits?
In the final chapter I propose a design for a machine that would incorporate a few of the more modest of these goals. Such a machine would have the ability to interpret its own products in a kind of aesthetic framework and make decisions about how to revise its output to make it more appealing.




[1] See Chapter VII for more about George Boole’s logical calculus.
[2] Mary Everest Boole, Symbolical Methods of Study,1884, p.32. Not everyone came to the same conclusion:
“That art which is above all others a cultured art—that which aims at the production of symmetrical form, and the beauty which is geometrical, of man, rather than irregular, of nature, is largely a matter of machinery. For the natural and inevitable tendency of machinery is to produce symmetry…. It seems strange that many are but now awaking to the consciousness that machine-made articles need not be ugly.
But even when men have so awakened, the degrading influence of overmuch faith in machinery makes itself felt. Who can contemplate without a shudder the Corinthian cast-iron pillars of a railway station? The common mind, when it finds that some artistic process can be performed by machinery, at once jumps to the conclusion that art itself is, or can be made, a matter of machinery. This recalls the familiar story of the organ-blower, who remarked after a beautiful voluntary, ‘Ah, what fine music we do make, to be sure.’” (The City of London school magazine 1877)
[3] Charles Bentham, “Immaterial Solids,” The Process Photogram, Vol. 12, p. 49

Monday, August 12, 2013

Drawing Like a Child


Most non-photorealistic rendering techniques leave the proportions and projection of the representation essentially unchanged. Evidence from studies of children's drawings and naive adults (untrained in making art) shows that the human process of making images works very di fferently than any of these techniques. As Martin Gardner writes, "[A] child wants...and is perhaps driven to invent, graphic equivalents for those categories that occupy her thought processes; and so it becomes natural for her to develop a formula, or prototypical schema, which can represent or stand for the full range of instances of this category." It is only with artistic training and education about the e ffects of perspective, tricks for seeing the scene as blobs of color rather than individual objects, and measuring and copying proportions that artists are able to reproduce images in true proportion, the way that is most easily realized by automatic computer techniques.


This figure shows 21 images of a particular street scene drawn by children ranging from age 3 to age 13. The oldest children (one of the eight-year-olds, one of the 10-year-olds, and both children 11 and older) attempted to capture the strong perspective on the street by drawing converging rather than parallel edges. The younger children's drawings all show the street as seen from directly above. Yet none of them have ever seen this particular street from that perspective. Instead, it seems like the drawing process the children are engaged in is something like the following: 

  1.  Look at the picture.
  2. Notice and recognize the most salient objects. (This may be very different for various observers. Johnny (age 4) apparently noticed the vehicle, the fence, and the sidewalk, while Maggie (also age 4) noticed the two figures, the tree, two buildings, and the sidewalk and road.)
  3. Use pre-learned techniques for representing these objects, introducing only small variations to match what is present in the scene. (For example, Elena (age 6) draws conventional stick figures, but extends the arm of one around the shoulder of the other to match what she believed she saw in the image.)

The younger children have learned such a technique for drawing a house but not for a building, but use this representation because it is conceptually the closest they have. All but one of the children age 5 or older also copied the road markings, perhaps because they had the ability to represent them more or less accurately.
Even in the drawings of the older children, traces of this method are still present. In the 11 and 13-year-old drawings, attempts are made to capture the perspective on the buildings. Yet certain angles that ought to be acute in the perspective projection are instead drawn as right angles. It seems likely that this is due to the
fact that the artists knew these angles must be 90 degree angles because of experience with buildings in the past, and that this knowledge informed their drawing, forcing accuracy in other areas to be compromised as the artist tried to reconcile the two implicit perspectives. Daniel (age 10) spontaneously commented as he was doing the drawing that he "didn't know how" to draw the cars from an angle.
In 1997, J. Willats performed similar experiments in which young children depicted a scene of a die. They often did this by including all sides of the die they could see within a single containing rectangle. He wrote, "it suggests that analogies between picture production by either photography or computer graphics and human
picture production... are grossly oversimplified.... pictures can be derived from either object-centered descriptions or viewer-centered descriptions... the presence of characteristic anomalies...may provide the only available evidence about the nature of the production process." 
Discussing these experiments, Fredo Durand wrote, "This might seem like a very odd example due to the lack of skill. In fact, this is a caricatural but paradigmatic demonstration of a very fundamental principle of depiction: Depiction is not about projecting a scene onto a picture, it is about mapping properties in the scene to properties in the picture. Projection happens to be a very powerful means to obtain relevant mappings, but it is not the only one, and it is not necessarily the best one." 

Automatic Generation of Child-like Visual Representation
A simplified model of the production processes that children employ in making drawings of a scene was easily implemented within the automated image compre-hension framework. Using the scene parsing software, a description of the objects in the scene and some information about their spatial relationships is stored in a microtheory in the Cyc ontology. A pre-stored schema for generating images in each of those categories is associated with it. (In this case, it is simply a stored bitmap, but a more realistic model would store it as a series of hand motions to create particular shapes with strokes of the pen and the spatial relationships of those shapes.) Using these schemas and the information about how the objects are distributed around the image relative to each other, a child-like representation is then generated. An example generated image and its associated input is shown in the figure below.

Although it doesn't model everything a child is doing as she creates a depiction of a scene, it does capture one important aspect of that process. The point here is not to create a new way of generating cute images, but to suggest a new approach to NPR rendering that makes use of this important feature of human art. Consider the following goals of various artists:
  • A cartoonist may want to reduce the number of individuals in a crowd to simplify the depiction.
  • A painter may concentrate brushstrokes on the face of the focal character, while leaving the background depicted with broad strokes.
  • A painter who wants his charming landscapes to be hung above the couches in people's homes may leave out garbage blowing down the street and modern automobiles present in the scene.
  • A fantasy artist may wish to depict horses with wings.
Any NPR system that only considers the input image in terms of simple primitives such as contours and regions of color would be incapable of being directly extended to handle these kinds of problems. A system including image comprehension with a semantic knowledge base, however, could be extended to deal with any of these problems in a fairly straightforward way.