I’m sort of lazy, so I really like the idea of code that continually checks itself by using assertions. I even like running production services with assertions turned on. To be clear, I’m talking about assertions that check for actual bugs in your code – not assertions that socket() didn’t fail. Still, crashing production servers is a contentious issue, but sometimes (hopefully rarely) it is the best thing to do. For something like FolderShare, crashing a server as soon as there is any hint of an error is vastly safer than possibly deleting someone’s files due to a bug. Of course, this introduces the risk that you could have multiple servers fail in a short amount of time, but you need to design for that case anyway.
I originally fell in love with assertions after reading Steve Maguire’s Writing Solid Code many years ago. After I saw how helpful they could be, I started to structure my code to make it more “assertable.” For example, I like state machines that use a table of valid transitions combined with assertions. This prevents anything I don’t explicitly anticipate from happening and is really helpful in a networked, asynchronous world.
But after developing a few long running services, I’ve started to perceive the need for a new type of assertion that is a bit higher level than a single conditional. I want something that will let me crash a service (and save a memory dump) when something “feels” wrong. For the moment, I call these “probabilistic assertions,” and I would have slept better while running FolderShare if I’d implemented them then.
Like all synchronization software, FolderShare had a few nightmare scenarios that I worried about all the time. If, due to a bug, the service told every client that its files were all deleted, I’d probably have to read a lot of nasty blog posts, and I would feel like crap for a few years. I would much rather crash all my servers and debug the problem with the system offline than take a chance on pissing off so many people.
Anyway, a normal assertion that looks at a single conditional wouldn’t help in this case. Asserting that no files have been deleted when a client connects doesn’t make any sense. But for a large enough sample size, I could assert that 80% of clients that connect shouldn’t have any files deleted. And I could probably assert that 95% of clients that connect shouldn’t have all their files deleted.
Functions like these cover most of what I’m thinking about:
Depending on your application, these functions could be implemented either as assertions or as triggers to alert an administrator. For alerts, you could really take this to another level by checking that incoming events are following a certain distribution for example. Granted that it’s a bit over the top, but I can’t help but imagine checking if a stream of incoming events obeys a Poisson distribution or if the sizes of certain data are following a Gaussian distribution.
Has anyone seen anything like this in the wild? I’d love to hear about it.