You too can fail fast (or at least really quickly)

For most people the title of this post would not really seem like good advice. But programmers are not most people. When it comes to programming failing fast is one of the most important guidelines that you can follow. To quote Wikipedia (as often happens):

Fail-fast is a property of a system or module with respect to its response to failures. A fail-fast system is designed to immediately report at its interface any failure or condition that is likely to lead to failure.

Programmers often have a misconception that their software should always work and if they catch enough exceptions and put enough error checking and correcting code in, then they will attain this goal. Well, I hate to be the harbinger of bad news, but this is the real world. And in the real world, things fail. You may write the most pristine code in the world and then one day the hard drive on the server fills up. Then what? Boom, is what.

Now this may not be a good example of my point, but I'm just trying to say that your software is going to fail, and you should prepare yourself for that. Because until you prepare yourself for that, you won't be able to quickly recover from it. The idea is that you want your software to let you know where a failure occurred, you don't want it catching an error and trying to ignore or fix the problem only to have the problem cause another issue 5 seconds or 5 days down the road.

Imagine that you are writing some banking software and you have a method that does this…

<sarcasm>And yes, before you ask, this is production code.</sarcasm>

Now say one day part of your software passed a negative amount into this method. Now this method is designed to accept deposits, not withdrawals, and so no withdrawal is logged from the account yet the account balance dropped. Then a week down the road an automated process comes through and starts going through accounts and starts taking the difference in this account from the previous week. It sees that this account dropped in value and decides to take this amount and divide it by the number of withdrawals. Instant DivideByZeroException. So you look at this and you say, whaaaaaat? How did this account have a drop in value with no withdrawals? Your first instinct is that somewhere in the code that logs withdrawals something must have failed to log it. Clearly a withdrawl happened, did it not?

I'm sure that we have all come across a bug like this, and they sure can be a time sink. This example is obviously simplistic and most likely the bug would be found post haste, but what if the Deposit method was 50 lines of code and you had different accounts with overloads for each, etc… But what would happen if we had followed the idea of fail fast? Our method would have probably looked something like this…

In this method we have defined a contract. You are probably hearing this term overloaded a lot these days with things like WCF (Windows Communication Foundation) saying that they are defining contracts. These are a similar idea, but in that instance you are defining an interface that defines how some other process is going to interact with you service, in this instance we are providing an explicit guarantee that this method is going to do what it says it is going to do or it is going to fail spectacularly. While this may sound bad, this IS A GOOD THING.

If we pass in -10 to the method above we are going to immediately get an ArgumentException that is going to most likely propagate up to our global exception handler where it will be logged. We will then immediately know where the problem occurred. Hopefully. If -1 was actually passed into the method above, then most likely the problem lies elsewhere. The method that is passed this value into this method may not have validated it or the interface may not have validated it. But at least since we put this code in our method, we can see a stack trace that will *hopefully* lead right to the creation of the value.

Now those of you who have looked at or written in Eiffel before will immediately noticed the "require"/"ensure" comments. These are actually the keywords that Eiffel uses in order to define contracts. In Eiffel contracts on methods are a first class construct and I wish that C# had some similar construct. It would require you to at least think about the contract when you are writing a method.

Another method of ensuring these contracts is through the use of assertions, but I'm not a big fan of this approach. The reason for this is that most people see assertions as something that you remove from software when you put it into productions, while I think that these kinds of checks should almost always be in there. As long as being correct is more important than squeezing every last clock cycle out of our monster quad core CPU's then I would most certainly argue for leaving these checks in there and using exceptions to report when these contracts are violated. (Of course if you wanted to CYA you could throw all of these in conditional compilation blocks so that you could rebuild and remove them later if someone cries about the application running way too correctly) Then when you are writing your unit tests, you can just pass in invalid values and use an ExpectedException attribute (most unit testing frameworks have this now) in order to guarantee that these methods won't take invalid values. Once you follow these simple rules, then you too can fail fast, just don't put it on your resume. I'm not sure most company's HR departments would understand if they saw in your technical skills "able to fail fast."

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (4)

Daniel Crabtree says:

November 3, 2007 at 3:05 am

If you are interested in design by contract, Microsoft has a research project for adding these type of features to C#. The project is called Spec# and it adds the kinds of features you might like.

Justin Etheredge says:

November 4, 2007 at 6:48 pm

@Daniel

Thanks for the heads up, I’ll look into that.

Chris K says:

November 8, 2007 at 4:45 pm

This is a really small issue, but the last exception confuses me. You test for lessthan or equal, but say unaffected when it could be affected negatively(according to the test). Like I said, small but it would confuse me later on…

Justin Etheredge says:

November 8, 2007 at 5:04 pm

Well, in this case I could put in two checks, one for if it was equals and one for if it was less. Then throw more accurate exceptions for each. It probably also wouldn’t hurt to include variable values in the exception methods.

You too can fail fast (or at least really quickly)

Comments (4)

Leave a Reply Cancel reply

Leave a comment

More Insights

2024 Summer Hackathon in Paris

The Philosopher King, the Demiurge and the Programmer

The Path for Energy Transformation: Vision and Purpose

Running Celery 5 on Windows

Interested in empowering your energy and utility operations?