Tag Archive for 'uptime'

Handling Human Error In the Datacenter

When I was working on Live Mesh at Microsoft, I had the good fortune to meet James Hamilton. James is full of good ideas, many of which are captured in his paper “On Designing and Deploying Internet-Scale Services.” There is a lot of wisdom in those pages (Greg Linden had some thoughts on it), but I’d like to focus in on this snippet in particular:

Design the system to never need human interaction, but understand that rare events will occur where combined failures or unanticipated failures require human interaction.

Continue reading ‘Handling Human Error In the Datacenter’