Machinamenta: May 2012

Thursday, May 24, 2012

A good proof is like a story. Both are like reality run backwards: disparate things flow to an ordered final state.

Monday, May 21, 2012

Take-the-Best

This is continuing the ideas in a previous post, Franklin's Perceptron. I am trying to find out whether any ancient methods of artificial intelligence actually worked. Any ancient methods of AI must have been very simple, but a surprising finding in machine learning has been that simple models can often perform better than more complex models.
One example of this is the rule called take-the-best. Here's how it works:
Suppose you want to predict whether one team or another will win a soccer game. (Maybe you're going to place a bet.) You have various pieces of information about the teams: the ages of the players, the number of wins each team has had this season, and so forth. You also have this same kind of data from the whole league from a few seasons ago, along with which teams won and which teams lost. We call this the training data.
The take-the-best rule goes like this:
See how well each piece of information does at predicting wins on the training data. For example, the team with the most past victories won 75% of the time. The team with the oldest players won 64% of the time (say this is a youth soccer league) and so forth.
Now, to make your prediction, just choose the piece of information that does the best job at predicting wins, and go with that. Ignore all the rest of the data. (If that piece of information says they are both as likely to win, you can use the next best predictor as a tiebreaker.)
Now it's easy to see that this is a good quick-and-dirty method that should work pretty well. The surprising thing is that in tests with real data, this method usually does better than statistical regression, artificial neural networks, and all kinds of other complicated techniques people have come up with to solve this kind of problem. It's as if the food they served on the dollar menu at McDonald's was not just the cheapest food, it was also the tastiest and the most healthy!
The reason it works so well seems to be that it avoids overfitting. It's easy for more complex methods to find patterns where there are none, and be misled by noise. Here's a paper that explains more.
What I'm looking for now is people who advocated this method, the longer ago the better. Even better would be if they built some kind of device to do this for them. And if not, it's at least would make a good basis for a science fiction story set in an alternate past.

Sunday, May 13, 2012

The First Search Engine

(Excerpt from Machinamenta)

The idea of punched cards for storing data was taken up by a Russian inventor, Semyon Korsakov, in 1832. Korsakov worked in the police ministry and kept extensive records on the populace. He devised a way of recording data by means of holes punched in cards. If two cards shared many of the same holes, it meant that the data they contained was similar. This fact could be used for creating a kind of search engine, where a plate with pins would rapidly scan across hundreds of records, only falling in and stopping when the pins could fall through all of the holes, indicating an exact match. Korsakov was excited about his ideas, and thought that they could be used to enhance human intelligence in the same way that the microscope and telescope had been used to enhance human sight. He wrote,

"machines intellectuelles would limitlessly strengthen the power of our thought, as soon as distinguished scientists apply their knowledge to studying the principles of this process and compose the tables necessary for its application in various fields of human knowledge."

His designs for the ideoscope and homeoscope were released to the world (open source fashion) rather than patenting them to encourage their widespread use and further development. Unfortunately the Russian Academy of Science didn’t see the potential and little resulted from the inventions. He is today better remembered by the homeopathic medicine community for his remedies, than by the information science community for his ingenious method of searching through the database of those remedies.

For more about him, check out the Wikipedia article (and also the references at the end of the article):
http://en.wikipedia.org/wiki/Semen_Korsakov

I like this because it is basically the original version of Google. We don't think of search engines as AI, usually, but they really are automating, in a spectacular way, something that was only possible before through human intellectual effort. Like topics in philosophy, once something is solved we no longer call it artificial intelligence.