Code Generation: the Nuclear power of programming

I’ve been a Software Engineer working on java projects for almost 15 years, and on a good portion of this time, we have used code generators to some extent in our projects. Today, I want to share my most recent observations as a result of using OpenAPI generators on our current project. If you have been in the same company as me, you probably know of my recent obsession with code generators, and probably heard enough of me talking about JetBrains MPS and creating your own language using this tool. But there’s a bigger question that baffles us as engineers, “Is Code Generation Good?”

TL;DR

Code generation is not that dangerous. Try to start small and controlled (put all generated code in someplace safe where main application logic doesn’t use them), and make sure you have rules preventing a disaster!

This is the kind of question that would get rejected immediately from StackOverflow because it’s very opinionated.

And that’s why I’m comparing Code Generation to Nuclear power. Can we say Nuclear power is absolutely bad? Those who are benefiting from nuclear power would tell us, if you do this in a controlled environment and use the experience of others who suffered before, then it’s the best thing ever!

Why generate code?

What do you gain by using code generators? The same thing you gain from using Spring, focus on your main problem! Don’t get caught up on “implementation details”. If you really think about it, Spring Data or OpenFeign are kind of “generating behavior” but they’re doing this at runtime. I remember at first, I also heard arguments over whether we should use Spring Data when it’s so easy to write a DAO? But nowadays, almost no one complains about these nice repositories! They allow you to decouple yourself from how transaction should be managed, DB query should be executed, the result should be extracted, and only focus on the problem you’re trying to solve in your Domain. The biggest advantage code generations has as opposed to those who “generate behavior at runtime”:

You can see actual code, so it’s not “black magic” anymore
It’s easier to read/debug the code
If you ever decide to undo your use of code generation, you can just pick up the latest generated code and get rid of the generator.

While writing this blog post, I don’t really have the clearest view of “what’s best” when it comes to code generation, but I have gone a few steps in this direction and have gathered some experience with code generation to some extent. So I decided to share my 2 cents on the topic.

Before you continue

Usually when I start explaining a topic which is in the grey area (like this one), I start by describing the two end extremes (black and white) and where we stand. In this case, of course one extreme would be not having any generated code at all, and the other extreme is having everything generated. Using Jetbrains MPS you could technically write all your code in MPS models since they have ported java into a MPS language. Then you can add your own language on top of their existing ones, and in the end you will have only MPS models in custom languages generating java code and maybe other files. The challenge here would be, your code isn’t text anymore and you will have to use MPS for everything (even code review for example). We are more interested in generating part of our final project and implementing the rest. Our ideal would be to decouple from any implementation details such as DB, http, etc.

In this post, I’m trying to address the concerns of our pessimistic colleagues who either have had bad experience with code generation or have heard negative feedback from others. I’ll share some basic rules when it comes to code generation, and one final architecture suggestion to prevent the meltdown of this nuclear power!

Changing the generated code

The number one challenge for changing generated code is “Regeneration”. You have generated code, you change something there, and then regenerate. What will happen? Most probably your changes will be lost.

There were some old code generators such as Eclipse Xpand that had a very creative solution for this “challenge”. They’d use special comments in the code to mark part of the generated code as a “Protected Region”. Then when re-generating code, the protected region will stay intact, and only the rest of the file would be generated. As interesting as that solution sounds, it wasn’t practical unfortunately. For example what to do with the imports in the file? Some of the imports were from the generated code, and some of them were from the developer code. And an IDE or a lint tool will usually make sure the imports are in a certain order; making it close to impossible to apply protected regions there. Another challenge was, what if you want to change for example the parent class for the generated code? If the parent class was declared as a protected region, the language designer could not change it in the future because old generated code will keep the old parent class, and if it wasn’t protected, the developer can never change the parent class. Basically, any changes you wanted to make, needed to be foreseen by the language designer, and it could be either something that language designer can only change, or something that the developer can only change.

Skip a few years… Lesson learned: Do not touch the generated code! It may sound like “erasing the problem” as we say in Farsi, but it actually seems like the most practical option [so far]. It’s way too complicated to allow developers to change the generated code, and then still have a successful regeneration of code, so don’t allow developers to change the generated code. Thinking of generated code as the fuel for our nuclear plant, if you mix it with the normal code, it will contaminate your project. It may work, but it’s way too unstable!

Example: this is kind of the default behavior when using OpenAPI code generators. If you have OpenAPI generation as part of your normal build process (as you should), then changing the generated code will be lost if you just build the project because the OpenAPI generator will regenerate the code overwriting any custom changes.

Do not track the generated code

This may seem obvious once you read it, but if you’re new to code generation, you might think “I need a src/main/generated folder for my generated code”, do a “git add .” resulting in adding generated code to git; but that’s not a good idea. You will have two sources of truth, the model you’re generating from, and the generated files. Especially considering our first lesson, not allowing anyone to change the generated code, there’s no point in keeping track of the generated code. Again, borrowing from the original metaphor, you don’t want enriched uranium on you code repository!

In my current project, we actually put the generated code in the build folder and I think that’s a good practice for all generated code. Your model is on the src folder, and the generated code is on the build folder.

Distinguish between Generated Code and Normal Code

The challenges we mentioned so far, may sound too easy but they do happen too often especially because code generation is not that common, and as developers we think code is code, what’s the difference?

True, code is code, but generated code is a bit different. We should always remember our original source of truth sits on the src folder, so if you’re using OpenAPI for example, the spec file is the code you’re writing. If there’s a generated controller, that’s just because at this moment that’s the way this “language” is set up. In the future, they might get rid of this controller and replace it with something else while delivering the same functionality. So the Controller java file you’re seeing is very different than the Service java file you created on the src folder. Compare this with Spring Data Repositories. If there was an implementation file, you wouldn’t use that on your code, would you?

Architecture of generated code

Even though you know all these rules, chances are, your next teammate doesn’t know them. So better put your Nuclear Core in a controlled environment! I have talked in one of my previous posts about DDD and specially Hexagonal Architecture, and I think that could provide a nice solution here. Isolating generated code in an “Adapter” and making sure it doesn’t affect our Domain. Better put it this way

We can’t guarantee not changing the Domain, so you better not use any generated code in the Domain.

How to make sure no one will break this rule unintentionally? In my previous project, following DDD modular setup, I created a multi-module project to achieve this. It’s pretty simple actually. You have one domain module, and a bunch of adapter modules. The golden rule is, domain module cannot depend on any adapter modules, but adapter modules can and will depend on the domain. If you go with this setup, then you will have one or more modules containing the generated code and whatever extra classes needed for them. For example, if you’re using OpenAPI to generate REST Controllers based on a spec file, these generated Controllers, and converters they need to convert DTO to Entity and vice versa will sit in the http-adapter module. Domain cannot have a dependency on adapter modules, so Domain cannot use the generated code.

There’s another way to achieve similar goal (and much more) in single project setup, using a test library called ArchUnit. This library allows you to define “Architecture Rules” as tests. So if someone breaks these rules, the corresponding test will fail. In this case, we want to make sure our domain will remain generated-code free. For example, assuming we put all the generated code in packages that contains generated as part of the package qualified name, and all domain classes are in sub-packages of a package containing domain, then this ArchTest will ensure no one will add a dependency (any imports) from a domain class to generated code.


@ArchTest
fun domainShouldNotDependOnGeneratedCode(importedClasses: JavaClasses) {
ArchRuleDefinition.noClasses().that().resideInAPackage("..domain..")
.should()
.dependOnClassesThat()
.resideInAPackage("..generated..")
.check(importedClasses)
}

Side note: code generators such as OpenAPI usually annotate their generated classes with javax.annotation.Generated and that’s a much better grouping strategy for ArchTest than package qualified names.

Either of these two techniques are good “control rods” empowered with state-of-art technology and thus are excellent choices to contain your generated code. This way, you can be sure there will not be a nuclear meltdown!

Conclusion

Code generation can be very fruitful if you use it carefully

Put generated code in build folder
Reduce dependency on generated code as much as possible
Separate generated code from your core logic

P.S. Trying to apply these rules with OpenAPI for example could be very challenging and that’s something we’ve been struggling a bit in the past year. For example, OpenAPI doesn’t know about domain models, and thus the generated Rest Controllers pass DTO directly to the Domain service. That’s a deal breaker for us since DTO is generated code, and should not be used in the Domain service. I’m working on my own MPS language and code generator which I will talk about more in details when it’s ready. I have also ported a very simple open source MPS language, equivalent to OpenAPI here. So, if you have a good MPS solution for this problem, feel free to contribute to this project also. Or if you know of other solutions, share your experience on the comments.