A Lesson in Debugging: Big Projects Have Critical Bugs Too

I recently had an interesting problem which served as a great learning experience. It involves hair-pulling levels of frustration, vicious finger-pointing, and an unexpected ending — not a TV Soap opera episode, just a day in the life of a developer.

It all started with a REST API I had built for a customer proof of concept that started refusing requests after an arbitrary period of time. Nothing was unusual in the codebase of the REST API — it was two simple CRUDL endpoints on equally simple objects.
I’d built similar many times before, and probably will many times again – but yet still, this process kept hanging, refusing requests. The time it took to fail was arbitrary — sometimes immediate, sometimes hours, sometimes days.

Eventually, through some SSH foo we discovered that the count of open file descriptors was growing inside the app’s container. I wrote a simple endpoint to query the number of open file descriptors, and it looked like this:

app.get('/fd', function(req, res){
  var fs = require('fs');

  fs.readdir('/proc/self/fd', function(err, list) {
    if (err) {
      return res.status(500).json(err);
    return res.end(list.length.toString());

Indeed, after restarting the process, it was clear that the file descriptor count would grow every few minutes, without stopping. An empty Node.js application didn’t exhibit the same problems, only this app.

Another week went by of monitoring the file descriptor count across a number of scenarios – my thought process went like this:

  1. Maybe it was the version of Express?
  2. Maybe it was the way we set up the Mongoose -> MongoDB Connection? (Many hours spent here..)
  3. Maybe it was the version of Mongoose? (Hint: Warmer..)
  4. Maybe it was some other smaller dependency in the tree?(a week of this passes)
  5. ..maybe it’s the MongoDB Driver itself?

Bingo. Turns out Version 2.1.8 of the MongoDB Native driver for Node.js leaks sockets in a hard-to-reproduce race condition.

Lesson learned?

Well, for one, don’t use mongodb-native@2.1.8.

More importantly, while I was pointing at the “smaller” community projects, and spent a long long time debugging how I handled database connections, Mongoose handled database connections, never once did I think to try bumping the version of the MongoDB native driver – something I perceived as “rock solid”. Although these leaks were an edge-case, when reproduced the effect was fatal, bringing down the application. It turns out no open source project is too large for a critical bug!

Special thanks to Shannon Poole from the mobile support team for finding this gem of a bug in the MongoDB Release Notes

@cianclarke is a Software Engineer with Red Hat Mobile. Primarily focused on the mobile-backend-as-a-service space, Cian is responsible for many of Red Hat’s mBaaS developer features.  

Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.



Leave a Reply