Author Archive for Tom

Where Are the AB Testing Frameworks?

I read news.yc and reddit/programming pretty regularly to keep up with what is going on in the biz. Based on that reading, I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can’t name a single AB testing framework other than Google Website Optimizer. That seems like a serious inversion of priorities for most startups. Everyone with a sign up page should use AB testing. Not everyone needs a message queue.

Is this because:

  • Nobody needs anything other than Google Website Optimizer?
  • Startups don’t actually do AB testing, possibly because they don’t get enough traffic to get meaningful results, or maybe because they don’t have time?
  • AB testing (including the statistical analysis to determine if results are valid) is so simple that everyone just bangs out their own?
  • As a largely theoretical issue for most startups, scalability is more fun to talk about on the Internet?
  • Everyone that is using AB testing is so happy that they are trying to suppress information about it so their competitors don’t start doing it too?

If everyone is secretly using some great framework please shoot me an email and let me know.

If you haven’t thought much about it before, here is a short paper on AB testing from some folks that made Amazon a ton of money.

Two and a Half Months of Twitter

After a few months of playing around with Twitter, the service is really growing on me. The ability to have casual IM-ish conversations without any immediacy is nice. Also, having a place to record short thoughts and interesting links that other people might like scratches some sort of itch for me. I wouldn’t want to write up a whole blog post for any of these, but they were all interesting enough to post on twitter:

But, I don’t think I’ve reached the critical mass of followers necessary to really unlock the Q&A potential of the site. Having a few hundred technical folks all following each other would be a tremendously useful resource for everyone involved. For example, I’m considering upgrading my desktop to 8 or 16GB of RAM. I’m going to need a new motherboard, processor, and RAM. My normal approach for this would be to spend a few hours on Newegg and the hardware review sites trying to figure out where the price/performance curve is and making sure I’m not getting ripped off. If someone else has done this same research it would be nice to use their information as a starting point, and twitter provides the kind of free-form conversation necessary for that kind of sharing.

To really make this work, you need to run one of the desktop apps so you don’t have to constantly reload the website (I use Twhirl).

Next Gen Productivity Monitoring Software

Now that I have a new baby, it is even more important to me that the time I spend in front of the computer is spent efficiently and productively. I’ve played around with productivity-monitoring software like RescueTime and TimeSnapper, and they provide a convenient way to record how I wasted my day. It’s a nice first step, but I’d like to see this class of application expand into 3 new areas: positive feedback, targeted recommendations, and an attention API.
Continue reading ‘Next Gen Productivity Monitoring Software’

Netflix Prize Concept + Google 411 Data

I’ve really enjoyed watching the Netflix Prize develop. Amazingly, over 3600 teams have submitted a prediction, which makes Netflix the big winner in this contest. The company will undoubtedly end up with a better product due to the amount of interest and research in collaborative filtering they have generated.

But ultimately, better movie recommendations don’t matter a whole lot to me. I’m more interested in the fact that by providing a unique set of data and a prize, they’ve been able stimulate so much interest. The other day I was thinking about which companies are in a position to sponsor contests in other fields that might have a bigger impact on my life, and one thought jumped into my head – Google’s 411 phoneme collection service. Marissa Meyers says:

You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that

Presumably, Google has already done the heavy lifting to manually transcribe a large number of these samples so that they can train their own algorithms. Why not create a contest that lets teams submit an algorithm that gets trained on a subset of the data and then tested against the rest? Speech recognition is more complicated than movie recommendations, but making it easy to train and test an algorithm against an interesting number of samples would certainly lower the barrier to entry.

Google would benefit from this in hiring, if nothing else. It would give them a chance to realistically evaluate the work of all kinds of grad students and researchers, and demonstrate to the candidates the advantages of working for the company with the biggest databases.

Handling Human Error In the Datacenter

When I was working on Live Mesh at Microsoft, I had the good fortune to meet James Hamilton. James is full of good ideas, many of which are captured in his paper “On Designing and Deploying Internet-Scale Services.” There is a lot of wisdom in those pages (Greg Linden had some thoughts on it), but I’d like to focus in on this snippet in particular:

Design the system to never need human interaction, but understand that rare events will occur where combined failures or unanticipated failures require human interaction.

Continue reading ‘Handling Human Error In the Datacenter’

tklein on twitter

I’m on twitter now. Follow me at http://twitter.com/tklein

Dev Diligence: Don’t Invest in the Wrong Code

When I’m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code. Sometimes this is obvious – I’m not going to write my own RDBMS — but frequently, it is a more difficult decision than it should be. In making a decision, I look first at the questions that I can actually get answers to:

  • Am I getting more than I need? It pains me to add a multi megabyte DLL to a client download for a small amount of functionality.
  • Will I spend more time learning the interface than I would writing the functionality I need myself?
  • Is this an active project, and is there any documentation?
  • If scheduling isn’t an issue, how much fun would it be to write my own version?

Next comes a set of questions that are oftentimes harder to answer:
Continue reading ‘Dev Diligence: Don’t Invest in the Wrong Code’

Crashing When Something Feels Wrong

I’m sort of lazy, so I really like the idea of code that continually checks itself by using assertions. I even like running production services with assertions turned on. To be clear, I’m talking about assertions that check for actual bugs in your code – not assertions that socket() didn’t fail. Still, crashing production servers is a contentious issue, but sometimes (hopefully rarely) it is the best thing to do. For something like FolderShare, crashing a server as soon as there is any hint of an error is vastly safer than possibly deleting someone’s files due to a bug. Of course, this introduces the risk that you could have multiple servers fail in a short amount of time, but you need to design for that case anyway.

Continue reading ‘Crashing When Something Feels Wrong’

Housekeeping

I don’t plan on having many non-technical posts here, but I’m breaking my rule today for a good reason. I’ve got a kid now! My first child, Margot Lee Kleinpeter, was born about 10 days ago. Between a long, drawn out labor, a few nights on a hospital couch, and fatherhood in general, I’ve fallen a bit behind on publishing. Much to my surprise, Margot prefers clean diapers and songs to essays on startups and programming. But, I’ve got a new post for today and I’ll hopefully be back on a more normal schedule soon. In the meantime, enjoy this picture of her sleeping:

Things That Are Important: Where Clauses

quake-3-bones.jpgWhen you are running a distributed service in a datacenter, you encounter a lot of interesting problems. At Audiogalaxy, I ran into all the standard application level bugs, crashes, and race conditions. Once we had a certain number of machines, we even had to deal with flaky memory, disks, and networking cards. But all of that was pretty typical compared to the weirdest bug I ever had to deal with – the one that was caused by Quake III Arena.

Continue reading ‘Things That Are Important: Where Clauses’