Friday, June 30, 2017

Generation of new artistic styles

A deep learning system that can apply an existing style to a new image is one thing, but can an AI actually generate a new artistic style? These researchers (an assorted group including people from Rutgers and Facebook) and have experimented with using generative adversarial networks to create artwork that is recognized by another neural network as artwork, but doesn't seem to come from any existing style. Personally I like the clouds on the upper right the best, but there are interesting things going on in all the images-- I like how it uses color gradients in particular. It's an interesting new advance, but to feel like art I think one thing this lacks is a motivation for choosing a particular subject. I would like to see a system that makes aesthetic choices in the goal of expressing an idea about the world that comes from having experienced and thought about the world.
The following images generated by the adversarial system were judged by human subjects the most highly in their respective categories (click to enlarge):

Link to the paper

Saturday, June 17, 2017

Wednesday, February 22, 2017

Estimating photos from sketches

When I finished writing Machinamenta six years ago, I suggested some things that could be done to make artificial creativity go beyond simple kaleidoscope patterns. In many ways, deep learning software has surpassed the suggestions I put forward. Here is another example. 
This work comes out of the Berkeley AI research lab at the University of California, Berkeley. Alexei Efros is a familiar name-- he worked on image quilting and automatic photo pop-up and was at CMU (along with Martial Hebert and Abhinav Gupta) during the period I was working with them professionally.
The way this works is that a neural network is trained on pairs of images. The right hand image of the pair is a photograph of a cat; the left hand image is an automatic edge detection on the photograph, using the HED image detector. This means that no humans were needed to create the training data-- important because of how many training samples are needed. It does mean that the edges it is looking for are not necessarily the ones people perceive as most important, but modern edge detectors like HED do a lot better job of that than the Canny edge detector, which was the best available when I first started working on computer vision.
The sketch contains far less information than the photograph. The only reason it is possible to do this at all is that the system has a great prior model of what cats look like, and does its best to fit that model to the constraints of the sketch. I wonder if drawing a Siamese profile would be enough of a hint to give it Siamese coloration? What happens if you try to draw a dog, or a pig, or a house instead?

Try it out yourself, it's a lot of fun.

The original paper

Sunday, September 11, 2016

No, really, why does word2vec work?

There are a lot of notes out there about why word2vec works. Here are a few from the first page of Google:
How does word2vec work? on Quora
Why word2vec works on Andy's Blog
How exactly does word2vec work?
Making sense of word2vec by Radim Rehurek

I've read some of these pages, and they've all been helpful in their own way, but I feel like they don't really get at the heart of it. They all say that word2vec is learning the relationships between words, and then show the math that means they are maximizing for a function that does that. Which is true in its way, but it doesn't satisfy my need for an explanation I really understand.
What I would like to do is start with some simple principles, and show that they imply the analogy finding capability of word2vec, and show an easy test for when it will break down.
You only need to accept one premise: that the process of assigning word vectors assigns words that are similar to similar vectors.
High dimensional vectors behave differently than you might expect. There are three million words embedded in Mikolov's Google News word2vec space, and three million is a very tiny number compared to the number of possible locations in 300 dimensions. Because of this, there is plenty of room for a vector to be close to several different clusters at the same time.
This certainly works for some kinds of words. For example, here are the nearest neighbors of the word 'royal':

 'royal'    'royals'    'monarch'    'prince'    'princes'    'Prince Charles'    'monarchy'    'palace'    'Windsors'    'commoner'    'Mrs Parker Bowles'    'queen'    'commoners'    'Camilla'    'fiance Kate Middleton'    'Queen Elizabeth II'    'monarchs'    'royal palace'    'princess'    'Queen Consort'

Clearly it has mapped several words that have to do with royalty together. There are similar clusters of terms having to do with male, female, and person, though it isn't quite as easy to find them as just searching for words near to the word 'male' or 'female' or 'person'.

Because this is a vector space, the words near the average of two vectors a and b will be nearly the same as the intersection of the set of words near a and the set of words near b. If you picture a Venn diagram, the average of a and b will fall into the overlap of the circle surrounding a and the circle surrounding b.
This is all we need to show that word2vec works. Lets call royal r, female f, male m and person p.

The word 'king' is in the intersection of 'royal' and 'male' so it is approximately r + m. (I should mention that I've normalized the vectors, so taking the sum is essentially the same as taking the average.) 'queen' is close to r + f. 'man' is close to m + p, and woman is close to f + p.
Putting it like that, the analogical property just falls out:

king + woman - man = queen


r + m ) + ( f + p ) - ( m + p ) = ( r + f )

This also tells us an easy way to tell when this property will totally fail. For example, I think this is a pretty obvious analogy:

blueberry:blue jay::strawberry:cardinal

If we try this in word2vec, we get

 'blue jays'    'red bellied woodpecker'    'grackle'    'ovenbird'    'downy woodpecker'    'indigo bunting'    'tufted titmouse'    'Carolina wren'    'chickadee'    'nuthatch'    'rose breasted grackle'    'bluejays'    'raccoon'    'spruce grouse'    'robin'

It seems to have picked up that we are looking for a bird, but it has missed the idea that we are looking for a red bird. So what went wrong?

Let's break it down like we did above. call red r, blue b, fruit f, and songbird s. The equation is
( b + s ) + ( r + f ) - ( b + f ) = ( r + s )

We can easily find clusters of fruit and songbirds (terms near those words are close enough.) But what about red things or blue things? Is there some cluster which contains firetrucks, strawberries, and cardinals? Probably not, at least, not a great cluster. Their color just isn't that relevant to the way newspapers talk about those things. (If we trained word2vec on books for toddlers, that might be different). Because red things aren't clustered together in word2vec, the analogy fails.

This doesn't mean this word2vec space is useless for this analogy, though. Suppose we used a dictionary to look at the different sense of words. One sense of cardinal would be a 'Catholic dignitary', while another would be 'red bird.' If we averaged those terms to get new vectors for 'cardinal: sense 1' and 'cardinal: sense 2,' and did the same for strawberries and blueberries and bluejays, we could engineer a version of word2vec that would be able to solve the analogy. It's adding information in by hand rather than learning it from scratch, which some would call cheating, but I just call it efficiently mining the corpus consisting of the dictionary.

Monday, August 1, 2016

No Man's Sky

There's been a lot of hype over a videogame about to be released called No Man's Sky. The game is set in "a procedurally generated deterministic open universe, which includes over 18 quintillion (1.8×1019) planets."
This is what I refer to in Machinamenta as the kaleidoscope pattern. They have come up with a limited set of ways that planets can vary, and then counted all the variations as unique creations. But there is going to be little joy in discovering new combinations of those patterns, once the patterns themselves are recognized from a few examples. I understand the appeal! But ultimately I think people will be disappointed once they realize that the variation is limited, as it must be for this kind of universe.
I do like that they are using Lindenmeyer systems for the plants! It's the difference between producing new sentences with a grammar versus producing them with mad-libs.

How could a future game do better? Here are a few suggestions:
All of which are too computationally intensive for a game right now.