Monday, May 21, 2012


This is continuing the ideas in a previous post, Franklin's Perceptron. I am trying to find out whether any ancient methods of artificial intelligence actually worked. Any ancient methods of AI must have been very simple, but a surprising finding in machine learning has been that simple models can often perform better than more complex models.
One example of this is the rule called take-the-best. Here's how it works:
Suppose you want to predict whether one team or another will win a soccer game. (Maybe you're going to place a bet.) You have various pieces of information about the teams: the ages of the players, the number of wins each team has had this season, and so forth.  You also have this same kind of data from the whole league from a few seasons ago, along with which teams won and which teams lost. We call this the training data.
The take-the-best rule goes like this:
See how well each piece of information does at predicting wins on the training data. For example, the team with the most past victories won 75% of the time.  The team with the oldest players won 64% of the time (say this is a youth soccer league) and so forth.
Now, to make your prediction, just choose the piece of information that does the best job at predicting wins, and go with that.  Ignore all the rest of the data. (If that piece of information says they are both as likely to win, you can use the next best predictor as a tiebreaker.)
Now it's easy to see that this is a good quick-and-dirty method that should work pretty well.  The surprising thing is that in tests with real data, this method usually does better than statistical regression, artificial neural networks, and all kinds of other complicated techniques people have come up with to solve this kind of problem.  It's as if the food they served on the dollar menu at McDonald's was not just the cheapest food, it was also the tastiest and the most healthy!
 The reason it works so well seems to be that it avoids overfitting.  It's easy for more complex methods to find patterns where there are none, and be misled by noise.  Here's a paper that explains more.
What I'm looking for now is people who advocated this method, the longer ago the better. Even better would be if they built some kind of device to do this for them. And if not, it's at least would make a good basis for a science fiction story set in an alternate past.

No comments:

Post a Comment