I’ve really enjoyed watching the Netflix Prize develop. Amazingly, over 3600 teams have submitted a prediction, which makes Netflix the big winner in this contest. The company will undoubtedly end up with a better product due to the amount of interest and research in collaborative filtering they have generated.
But ultimately, better movie recommendations don’t matter a whole lot to me. I’m more interested in the fact that by providing a unique set of data and a prize, they’ve been able stimulate so much interest. The other day I was thinking about which companies are in a position to sponsor contests in other fields that might have a bigger impact on my life, and one thought jumped into my head – Google’s 411 phoneme collection service. Marissa Meyers says:
You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that
Presumably, Google has already done the heavy lifting to manually transcribe a large number of these samples so that they can train their own algorithms. Why not create a contest that lets teams submit an algorithm that gets trained on a subset of the data and then tested against the rest? Speech recognition is more complicated than movie recommendations, but making it easy to train and test an algorithm against an interesting number of samples would certainly lower the barrier to entry.
Google would benefit from this in hiring, if nothing else. It would give them a chance to realistically evaluate the work of all kinds of grad students and researchers, and demonstrate to the candidates the advantages of working for the company with the biggest databases.