Thursday, March 19, 2015

A Poetry Machine

robot: anthropomorphic automaton
songbird: arboreal artist
storybook: amazing adventure
textbook: authoritative algebra
birdhouse: architectural aviary
chemistry: academic alchemy
tin: antique aluminum
bronze: archaeological alloy
neon: amber ambiance
divide: antagonistic arithmetic

These are what you might call "poetic paraphrases." They are all two word definitions-- an adjective followed by a noun-- where both words begin with the letter A. All of these were generated automatically, by a program I wrote. The program uses what's called a distributional semantic vector space to map English words into a numeric representation and then map that numeric representation to the closest poetic paraphrase.
I chose the constraint "starting with the letter A" because it was easy to get a list of adjectives and nouns starting with a particular letter, but it could be any constraint you want, as long as you can put together a list of results that meet that constraint: a particular meter or rhyme, for example.
The trick is all in representing a word by a list of numbers in such a way that similar words are nearby, and the average of two words carries some of the meaning of each of them. Someone else did that hard part for me. It's a program called word2vec, and you can download it for free if you want to play with it.
What I like about the program is that the paraphrases strike me as clever or witty. They remind me of the kennings in Beowulf. It would have taken me awhile to think of them, and I'm not sure mine would be as good. The connections the program makes are associative, the way memory works, rather than completely literal. "Antagonistic arithmetic" takes both meanings of the word "divide" and acts as if they were talking about the same thing. "Amber ambience" is an evocative phrase for the yellow glow we associate with a neon sign-- an overly literal interpretation wouldn't be able to call neon "amber" because it's actually helium in a glass tube that glows yellow. But because the system wasn't hand-programmed but uses statistical properties of enormous amounts of newspaper text to come up with its associations, what it knows about neon is all wrapped up in how we talk about neon. Tin isn't really a kind of aluminum, but we used to use tin foil and tin cans, and now we use aluminum foil and aluminum cans. Tin toys are almost always antique toys.
Like most computer generated art, there's a little bit of human interference in what you get exposed to. I generated definitions for three times as many words as you see above, and picked my favorites. I also generated ten definitions for each word, and picked my favorite from that list. So what you are seeing is my top ten choices from about 300 generated phrases. But I feel like that's just me being an editor. The computer is really the artist here.
Word2vec can also be used to come up with analogies, which can also sometimes be quite clever. Here's an online version you can play with if you want to try that out.
If you send me a few words, I'll generate a paraphrase for you (you might have to wait a few days.)