Your Software Can’t Heal Itself

/ @JustinEtheredge / July 10, 2008February 24, 2021

There is an acronym that has been thrown about quite a bit in most agile development circles (and elsewhere) and that phrase is YAGNI. In case you don’t know, it means “You Ain’t Gonna Need It”. The idea is that any time you spend implementing something that you don’t end up needing is wasted time, so don’t do it, because “You Ain’t Gonna Need It”! Now, while this phrase doesn’t quite jibe with what I am talking about, I want to introduce a new and related phrase.

This new acronym is IAGW, and is pronounced aye-ag-wuh. It stands for “It Ain’t Gonna Work” and can be applied to virtually every part of an application development, but I only want to talk about it in terms of “self healing software”. I really love the term “self healing software” because it implies that the software is going to do something to actually repair itself without your intervention. While a more accurate term would probably be “robust software” I think that this phrase probably wouldn’t create as many Phd candidates or sell as many pieces of software.

So, let me first start off by saying that my problem is not about robust software, my problem is that people start ignoring YAGNI when they start thinking about how they are going to make their software robust. In fact, I’m probably going to eat a lot of crap for saying this, but “YAGNI” and building robust software can sometimes be at odds. A lot of what you see out there passing itself off as robust is just developers trying to anticipate where and when the software is going to break. Don’t you love it how most of us cannot write a method with more than 20 lines without introducing a bug, but somehow we think that we can predict when a piece of million line software is going to break? The reality is that you can’t. And if you try to, well, IAGW. If you account for one bug, then some other bug will happen. If you try and recover from one failure, then you are going to get bit when the failure doesn’t happen exactly as you expected. How bad would it be if your code that you put in to try and recover from a problem cause another one or hid the real problem?

So am I saying that you shouldn’t try to write robust software? Of course not! What I am telling you is that unless you know for a fact that something in your application is subject to breaking (such as a db call or a remote web service call), then you shouldn’t try to pile in code to account for it. You should run your application and test and prod to find out where your breaks are going to occur, and then you should fix them. If you run your application in production and it breaks, then you need to start putting code in to resolve problems. One thing to focus your efforts on is instrumentation so that you can find the bugs that do pop up. You may think that technically putting in lots of instrumentation violates the YAGNI principle, but I’ll let you dwell on that technicality when you put an application into production and have no way to figure out why your application did something odd.

So get out there, and stop trying to be a psychic, and start being a software developer!

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (6)

Kevin Hazzard, MVP says:

July 12, 2008 at 12:21 am

Aye-ag-wug sounds a lot like Viagra, maybe the way Elmer Fudd would say the name of the drug. So you take Viagra when things don’t work and you practice IAGW when you’re in the mood to build things that don’t work. I think you’re onto something there Justin!

I may have missed the point but one of the reasons why I like the instrumentation model in WCF, for example, is that it’s deep and wide, sprinkled all through Microsoft’s System.ServiceModel code. There was certainly a lot of YAGNI push-back on the level of detail those guys went to. But it’s saved my butt more than a few times. I think when it comes to plumbing (which is what most architecture’s about anyway) it’s hard to over-engineer the code you’ll use for doing problem triage. I understand feature cost but these features, the pipes and pumps, need a lot of what many would consider YAGNI.

Reply
Justin Etheredge says:

July 13, 2008 at 6:58 pm

@Kevin I think there is a big difference between building framework code and building application code. When Microsoft goes about building something like WCF, which is a black box for the most part, then they don’t have the luxury of developing it in an Agile manner. Their customers are the future users of the platform and they are lucky to get a new CTP out every few months until the platform releases.

So, in their case, they can’t just say "Oh, we need this, lets add it in." Their distribution model doesn’t support that. So, they have to build in as many hooks and features up front as they think that they need and can get built in a reasonable time. But when you are developing in-house software, you are much more free to refactor when necessary. The problem I see is that people still want to grotesquely over-engineer things, even when it is not needed and could be easily added later.

As I stated in my post, I think unused features are waste. They are resource waste and they are just more potential bugs. Every line of code introduced in a project is a potential bug, and so I strive to find that sweet spot between introducing complexity and making my software robust.

And like I also said in my post, instrumentation is something that you should always focus on (maybe you thought I was saying the opposite). If you don’t have lots of instrumentation, then you have no clue what is actually happening in your application!

Reply
Xerxes Battiwalla says:

July 13, 2008 at 8:07 pm

Unless i’ve catastrophically misunderstood your point, your suggested approach to writing software is *exactly* what is wrong with software today. People taking a blase approach and throwing something together without actually sitting down and seriously considering all known points of failure…Then they wonder why their application got hacked despite their code being littered with buffer overflows, SQL injection attacks, etc.

IMO, an approach centered around a Defensive Programming development style will lend itself to creating more robust software – not the "watch it break and fix it later" style.

Reply
Justin Etheredge says:

July 14, 2008 at 2:11 am

@Xerxes I’m not saying to use YAGNI as an excuse to write bad software. Not accounting for SQL Injections and buffer overflows is just bad software, not lean software. I would say that code protecting from SQL injection is "needed" and therefore not covered under YAGNI.

My concern centers more around people that say "Well, this part here might do this and then this might happen and then we could end up like this. So I’ll put a bunch of code in here to check for this, then perform this action, which will recover from this." The issue is that what you expect to happen doesn’t, something similar but not the same *will* happen, and when your cleanup code runs it will do something wholly unexpected and hide the true cause of the problem.

I am advocating to protect where possible, instrument as much as you can, and then *test* your software to find weaknesses and potential points of failure. Then start going in and patching things up.

And as a side note, I think that defensive programming is an excellent practice that can avert many errors by causing the failure to happen at its source. This is very important, but again, I think that failing fast is needed. http://www.codethinked.com/post/2007/11/You-too-can-fail-fast-(or-at-least-really-quickly).aspx

Maybe I am contradicting myself a little bit, but reality is always somewhere in the middle, right?

Reply
Justin Etheredge says:

July 14, 2008 at 2:13 am

My blog software completely swallowed and chewed up that last link in my comment. I guess it didn’t like the parentheses. Just look for my blog and a post title "You too can fail fast." Thanks!

Reply
Xerxes Battiwalla says:

July 14, 2008 at 2:43 am

i’m relieved that I had a catastrophic misunderstanding :). I see what you’re talking about now – yes this happens a fair bit, and it’s hard to draw the line between whats necessary and whats "stretching it a bit". Typically i’ve seen this kind of code from junior devs who try to cover their asses, but go too far (and then deliver too late)

I like your linked article. Totally agree with the pre/post conditions, having been taught Eiffel myself. Spec# looks pretty interesting, and a great way to spend 2 hours at work researching developer tools 🙂

-xerx

Reply