When I’m starting a project or thinking about adding functionality to an existing code base, I always consider using any existing code. Sometimes this is obvious – I’m not going to write my own RDBMS — but frequently, it is a more difficult decision than it should be. In making a decision, I look first at the questions that I can actually get answers to:
- Am I getting more than I need? It pains me to add a multi megabyte DLL to a client download for a small amount of functionality.
- Will I spend more time learning the interface than I would writing the functionality I need myself?
- Is this an active project, and is there any documentation?
- If scheduling isn’t an issue, how much fun would it be to write my own version?
Next comes a set of questions that are oftentimes harder to answer:
- Who else is using it?
- Will I be using it the same way as other people who are successfully using it?
- What am I going to find out when I put more stress on it than anyone else?
One library that passed my gauntlet of questions is libredblack. It ended up on a bunch of production servers at FolderShare, and it worked out great. But there was a catch: I wanted to use it to store large numbers of items, but for every item I put in the tree, the library would allocate an object that held 4 pointers and an enum. That took 40 bytes on my dev box. Throw in malloc’s overhead, and I was up to 48 bytes. The objects I was storing pointers to would also have some heap overhead, which may be as much as 24 bytes. So to store 10M items in memory, I’d need an extra half gigabyte of memory just for overhead.
A second example from personal experience is librsync. Again, the library works exactly as advertised. But if you want to transfer deltas for large (gigabyte+ files) on machines that have hard memory limits (like embedded devices), you need to know that the memory usage is proportional to the file size. For my situation, I ended up having to adjust the window size as file sizes grew just to keep the memory usage reasonable for large files.
I don’t want anyone to think I’m complaining about this stuff – I’m a fan of both libraries. But both of these examples illustrate a class of problem that is particularly frustrating: the one you might not find until you are heavily invested in a solution. These gotchas won’t affect most people, and thus aren’t likely to show up when you are researching possible solutions. They aren’t bugs, either, but they might be something you have to deal with. So the sooner you can find out about them, the better.
Fortunately, the internet has plenty of software built for solving problems like this. Dev Diligence is a new wiki I’ve started to collect details like these. My goal is to have a reference page for any library or service developers might consider using in their solution. For sufficiently large libraries, pages for classes or functions might be necessary, but let’s not get ahead of ourselves. Ultimately, I’d like to have 5 headings for everything in the wiki:
- Overview: Brief description of the software and a link to the homepage
- Short case studies or war stories: These would include a brief description of how you are using the software, the version you used, and ideally some metrics. If you used it for a while and then switched to something else, an explanation of that decision is very valuable information. For libredblack, the relevant metrics would be things like average number of elements in your trees or insertions/deletions per second.
- “Gotchas” (like the ones I’ve mentioned above): Subtle problems (hello, heap fragmentation) and things that aren’t necessarily bugs, but issues that may affect your design or help you choose one solution over another.
- Alternatives: The name pretty much says it all. With links, please.
- Other Resources: Links to blog posts, email threads, or reference pages would be great.
- libev, libevent, boost.asio, and Twisted
- sqlite and berkeleydb
- memcached, spread, the reliable queue solutions (Starling, TheSchwartz, etc), and anything that uses “pubsub” in its description
- libcurl and wininet (stuff like Nick Bradbury’s description of a CPU spike in WinInet that can be triggered by chunked-encoding is gold)
All of these and more are linked to from the WishList page.
Can you guys help me out? I’ve got enough people subscribed to this feed that I’m certain at least one of you has used everything on my list. If you take 10 minutes to write down your experiences, you can make the software world a better place. To justify doing it on your company’s time, keep this in mind: if you document the fact that you are successfully using a solution, you increase the chance that other people will use it as well. The more users a solution has, the the better it will become.