Code Generation Should be the Nuclear Option

Writing

Yep, I am probably going to piss a lot of people off with this post. But I have been privy to a conversation recently where some people were going back and forth on this topic, and since I have a giant megaphone (my blog) I felt like I should share my thoughts on this topic with everyone else. So let me first just start off by saying: I do not like code generation. In certain cases I think it can help greatly, but many people are far too eager to jump to this solution. I don’t think that there is anything particularly evil about the process of generating code, but I do feel like using code generation as a day to day tool is a very bad practice. Code generation should be the tool of last resort when there is no good way to cleanly implement a solution which doesn’t require code to be spread out everywhere. There are several reasons for this, which I will graciously inform you of…

1) Maintenance – Generated code can be a maintenance nightmare. If your solution to a problem is to just generate reams of code, then you have created that much more code that you now need to maintain. The alternative then is to simply make it so that you don’t touch your generated code at all, thus allowing it to be regenerated. This causes problems as well. Now you have to do things like create "buddy classes" that can hold metadata, creating partial classes which can hold custom code, maintaining templates so that you can generate all of your classes (including code for "one-offs"). Sadly, many of the problems that people use code generation for can be solved with a bit of reflection, a few abstract or virtual methods, and a sprinkling of interfaces. And don’t get me started on your arguments about the performance of reflection… I always prefer developer productivity to runtime productivity unless I can prove that I need the extra performance.

2) Flexibility – Generated code is often spit out by a custom piece of software or a template. This means that you either accept what is generated for you, or you have to customize the template. What happens when you need some piece of logic generated into a particular class? Well, you either edit your template to enable this case to be covered, or you create a new template for your class to enable this "one-off" solution. And all of this is assuming that you even have access to the templates which are driving your code generation and that you are able to swap out templates when you need to. These are two big assumptions, which quite often are not the case.

3) DRY – I put this one last in the list because I too am tired of people harping about the SOLID principles. They are extremely important, I just feel like I can’t open my feed reader without seeing SOLID about 15 times. So, this post is one more time! DRY is important though, and code generation can be the ultimate DRY violator if you aren’t using it for the right kind of code. Whenever I see someone use code generation for something that could be easily pushed into a base class or abstracted away into another chunk of functionality, I just want to scream! Code generation, when used, should be used for code which does not repeat. I’m sure there might be some exceptions to this, but I can’t think of any good ones right now. 🙂

So there you have it, Code Generation should be used sparingly, if at all. If Code Generation saves you typing for something that you would have had to manually write, then great! But if Code Generation is being used so that you don’t have to write some particular piece of code generically, then you will be bitten eventually! And I also realize that I am going to get 50 comments on this post saying "Dude, you are saying it is bad because people don’t use it right. If you have good developers, and they use it right, then code generation is a very useful tool!" Yes, I realize that, but not everyone is a good developer, and people don’t use code generation correctly. Done. Hope you enjoyed.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (19)

  1. I couldn’t agree more! It’s a bit of a hammer & nail thing, once you’ve gotten into code generation, suddenly anything seems suitable for it. Ofcourse I’m exaggerating a bit here but using CG sparingly is equal to being well-considered IMO.

  2. I started writing a comment but instead will make a blog post because it turned out to be a fairly lenghthy comment.

    The core point I want to make is that like all other tools this is a grey area and when used correctly it can actually be a very valuable tool.
    The case that you are talking about isn’t such a case where it is used correctly ;).

    I can also say that you too use code generation on a daily basis and multiple times 😛 You’re guilty as sin as am I 😉

  3. We are all using code generation everyday…
    EF designer, Linq2Sql designer, MVC viewpage templates are embedded into VS.
    Also when you add a webservice reference there is code gen that creates the proxy for you.

    I think codegen is fine when you always have the input of your codegen process, or when you have codegen part of your automated build process.

    Not good when codegen is just the fist step of the development, and then you need to manually modify the code that is generated.

  4. Code generation is evil. Thank God you hand coded the HTML in this blog post 🙂

    Code generation (like all development related topics) has its place and can be abused.

    Good post.

  5. A human software developer is a code generator; they can generate crap code too, violating SOLID, DRY, etc.

    I will now invoke the Firearms Rule:
    Guns do not kill people; people kill people…

    So it follows:
    Code generation is not evil; poor usages of code generation (including the human code generator) is evil…

  6. I think @Simone hit on a key distinction.

    Active code generation, where it’s integrated into an automated process, can be good. Continuous regeneration prevents/discourages people from hand editing the generated code.

    Passive code generation, where code is generated once on demand, is often bad. The generated code is often used as the starting point for classes that are manually edited.

    Although, I think the "Linq2Sql designer" is actually an example of bad code generation.

    If I understand where Justin’s coming from, I agree. On more projects than I care to quantify, I’ve been using a framework where I must generate a lot of the data access code, instead of using an intelligently configured ORM or convention-based JIT mapper.

    Also, I don’t really consider a WYSIWYG HTML editor to be [i]code generation[/i]. It’s simply a [b]visual representation of the HTML model[/b]. Most WYSIWYG designers are bidirectional. Most code generation tools, as I define them, as unidirectional. They can generate code from a model/resource, but they do not understand changes in the generated code and cannot map those changes back into the model.

    You will encounter friction in any scenarios where developers are editing generated code — IMO.

  7. @Al Yep, I think you see where I am coming from. There are multiple kinds of generation. One-time generation is IMO, one of the most useful though. It can often generate boilerplate code that needs to be there, based on some existing schema, while allowing the developer to have full flexibility with modification.

    I’m thinking Rails scaffolding or code generation that spits out classes based on tables that I then get to change. It is the requirement that we need to regenerate the code which often gets us in trouble, because then we have to implement shoddy solutions to get around the fact that we often need to put custom logic inside of generated code. The fact that we need to get custom logic into the generated code should be enough to tell us that the code generation doesn’t full accomplish the task and should be evaluated.

    The idea of doing code-gen in an automated process is better, because it encourages generating only code which does not need to be modified after it has been generated.

    So, maybe my opinion can be summed up as, generate code for me once if the code needs to be edited later. If it doesn’t need to be edited, then generate it for me and I’ll forget about it. 🙂

  8. [quote]Although, I think the "Linq2Sql designer" is actually an example of bad code generation.[/quote]

    I know I just said that, but upon further thinking, I am not qualified to make that judgment. I have only used the designer for experimental purposes, and it felt clunky to me. But I certainly have not used it enough to say it’s [i]bad[/i].

  9. @Al It is bad in the sense that it generates a bunch of partial classes in a single namespace and file, and because of that I am required to do things like create "buddy" classes for attributes, and create partial classes that implement partial methods in order to get hooks into properties.

    <myopinion>It is hack upon hack upon hack in order to do something that should have been spit out once so that I can then edit it. </myopinion>

  10. Yeah, one-time generation can be very useful. I guess I’m just negative on editing generated code, because [b]more often than not[/b], somebody will want to regenerate that code, even though everybody agreed it would only be generated once.

    It’s similar to writing those little throw-away utilities. You wrote it just to help you accomplish a [i]one-off task[/i], and years later, that utility is an integral part of a company’s development/business process.

  11. Justin, let’s just be brief: you don’t get what code generation is all about (i.e. telling a machine to do the typing for you).

    That’s fine though. The one person you’re hurting with being unaware what code generation can do for you is … you. 😉

    That said, not all code generation is good. Like I said, it should be used as machine-driving typing. All other uses are pretty much overkill/not useful. Unfortunately, you too fell into the trap of looking at those cases and extrapolate it over the complete spectrum and declare code generation as something useless.

    I hope you have a great keyboard. You’ll need it 😉

  12. One thing I’ve always wondered about code-generation is.. can we treat the generator as just another commiter? e.g. why can’t I generate code, edit it, generate it again, and look at a diff file to resolve conflicts, without it having automatically overwritten my changes? I’ve never seen this, though I assume it has been done.

    It might be more of a pain than it’s worth, but I would figure with a smart enough merge tool/generator that you could at least be able decorate properties without a huge amount of pain.

  13. @Frans Come on! You are almost as sensationalist as I am! I’m going to sit here patiently while you actually read the post. Then we can talk. 🙂

    Quote from post:

    "If Code Generation saves you typing for something that you would have had to manually write, then great!"

    I’m fine with code generation if it keeps me from having to type tons of unnecessary code. I’m not fine with code generation when it forces me into suboptimal solutions so that it can regenerate said code. In that case I would treat code generation as a smell.

  14. @justin That would be an interesting solution. I do wonder though how much pain this would cause. I guess it all depends on how much the file is being edited manually and what parts are being modified.

  15. I agree with your basic points.

    One additional one I would have made is that often times I’ve seen people sink more time into writing the code generators themselves than it would have taken to just write the code in the first place. This is actually worse in at least one way – no time was saved as a result – and potentially two ways – a significant amount of the time/brainpower spent was focused on the [i]automation [/i]of the task, not the task itself.

    This is obviously more of a problem for one-time code generation cases than it is for items that get generated frequently because there are fewer chances to reap the time savings.

    As you and others have already pointed out, this is yet another tool in the toolbox and can be fantastic when used properly in the right situations. I believe your main issue here is with the rampant misuse of the tool or use in the wrong situations, something I have seen a lot of over the last two years as well.

  16. @Zack Good point, some would argue that automating it will save you from manually generating it in the future, but others will make the YAGNI argument. I tend to lean toward YAGNI, but depending upon the likelihood of having to reproduce the code in the future, I might lean toward automated generation.

  17. I agree but only up to a point. Code generation in the case you are talking about ABSOLUTELY need to be considered as a nuclear option. However, there is time where code generation can be used to good length.

    A good example of this is the views for ASP.NET MVC. They are using the T4 templates but are only used as a starting point for the user. Another example I consider useful is if you need to generate class to access a custom ConfigurationSection or an XML file. Generating the code to have OO classes to access those makes sense while keeping the code DRY.

    Of course, with great power come great responsability. T4 should not be used like a hammer but really like an absolute necessity.

  18. I agree. With linq2sql, in the past, when using sql server 2000, i get quite often “no return type value” from my stored procedure. (when you make something like this : declare @sql varchar(xxx); set @sql =’select ……’ exec @sql)

    When moving to sql 2k8 it’s still hapening sometimes.

    If i add a column in my return stored procedure, i had to change generated code. A huge pain :/

  19. You make some good points, but you are generalizing a bit. When you are generating data objects, there really is no substitute. If you have a bunch of objects representing data such as an address, or person, for instance, code generation is an acceptable way to generate them. Whatever common behaviors can be inherited from a common class such as serialization or transmission.

    The only way around it is to remove type information and using the least common denominator datatypes (i.e. strings instead of numbers). Your just trading an maintenace problem, into a full fledged nightmare. In such cases, you may as well program in an unsafe typed language.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All