Where Are the AB Testing Frameworks?

I read news.yc and reddit/programming pretty regularly to keep up with what is going on in the biz. Based on that reading, I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can’t name a single AB testing framework other than Google Website Optimizer. That seems like a serious inversion of priorities for most startups. Everyone with a sign up page should use AB testing. Not everyone needs a message queue.

Is this because:

  • Nobody needs anything other than Google Website Optimizer?
  • Startups don’t actually do AB testing, possibly because they don’t get enough traffic to get meaningful results, or maybe because they don’t have time?
  • AB testing (including the statistical analysis to determine if results are valid) is so simple that everyone just bangs out their own?
  • As a largely theoretical issue for most startups, scalability is more fun to talk about on the Internet?
  • Everyone that is using AB testing is so happy that they are trying to suppress information about it so their competitors don’t start doing it too?

If everyone is secretly using some great framework please shoot me an email and let me know.

If you haven’t thought much about it before, here is a short paper on AB testing from some folks that made Amazon a ton of money.

15 Responses to “Where Are the AB Testing Frameworks?”


  • It’s not a framework, but Zend PHP ships with a web-based frontend to AB that makes testing your site under different conditions and scenarios a breeze.

    I’m no longer a Zend user or a PHP fan; but they did a good job with their stress-testing frontend.

  • Also, the plugin I linked to above only took a few hours to make. Maybe others have had the same experience.

  • I thought the same thing so I wrote a javascript framework for A/B testing and optimization. http://genetify.com

  • Philipp Pfeiffenberger

    Most of the AB testing I’ve done has been too result-specific to extract any re-usable components. The scaffolding required to put it in place often outweighs the cost of writing your own from scratch.

  • Greg, thanks. Genetify looks fantastic.

  • @Greg — thanks for the link, that looks really cool.

    @Philip — Ah, but when has that ever stopped people from writing a framework :) I think there is a whole management system (to see test results, create new ones, temporarily disable current ones) that could be common.

  • I wish more startup businesses paid attention to split testing. It’s the best way to find out what works best for your visitors and in turn you business. Thanks for the link to the pdf research article on split testing. I created a mindmap of how to document a split test so that you can keep track of your experiments and results.

    Just having a process to follow with Website Optimizer is helpful for small companies to manage their testings for better conversions. Here’s the mind map: http://phantomcto.com/blog/business-tech/how-to-document-split-test/

  • You’ve definitely hit on something here. People seem to vastly underestimate the impact seemingly simple changes like color can make in usage (and revenue).

    There is room for something more here though. You can derive some great value out of briefly presenting a few random users/sessions with a moderately different UI to see the impact on usage. I suspect the frameworks for those sort of experiments would wind up being pretty specific to the larger toolchain people are tied to though. (The least of which being the need to tie into logging and export the experiments that were live for a given page that was served.)

    Not just for UIs: There is also good reason at times to do A/B performance testing of code that is suitably non-deterministic that it’s easier model with a fraction of live traffic than with some heavyweight testing framework.

  • I think there is a good reason why systematic experimentation has not become more common in web application development, despite individual successes (amazon.com, etc) and the general ease of collecting data.

    It is ultimately very important for ease-of-use that the alternatives being tested in a system are stored in the system itself. Otherwise you run into problems of consistency and control. It is the principle of DRY, code reuse, single authoritative source.

    Some companies are A/B testing by swapping outgoing text at the HTTP server level. This can’t be the way of the future!

  • Hey Tom, check out http://mixpanel.com, we’re sort of invented for people like you. I’d be happy to set you up with an account, just shoot me an email.

  • AS you have said GWO is best tool for ab testing , but i have one doubt, how to do Testing on third party domain like i am testing pages of abc.com but my convesrion page is on XYZ.com which is also belong to me. But in GWO it is clearly mentioned that the coversion domain should be on same domain

  • It’s Vanity, and it’s here: http://vanity.labnotes.org/

    Sorry for the belated comment, but most likely the answer is still pertinent.

  • Ronen – noticed this text on your main page: “90% probability this result is statistically significant”. That is a nonsensical statement. I think you mean to say the p value was 0.1, which is not considered significant in most cases. But there is no probability of significance, ever.

    (Note: just commenting on the text — haven’t evaluated the product and don’t mean to suggest product is bad because the text is bad. :)

Leave a Reply