There’s a whole chunk of the software development world that takes Infrastructure as Code (IaC) for granted. They’ve been at it for years, are totally bought in, and don’t really need to consider the benefits any more because it’s just the air they breathe. But it turns out there are still a lot of environments that are being managed manually, carefully, and painfully.
If you went to tell your manager that you need to spend time using Cloudformation or Terraform to implement your new environment because you can version control it or automate it, you probably wouldn’t get very far. Unless you’re fortunate enough to have a manager who is intimately familiar with software development, in which case, you probably don’t need to sell it.
Versioning and automation are the features of Infrastructure as Code, not the reasons why you would want it. My goal is to get to the why behind Infrastructure as Code, and what you can expect to get out of it. And at the end, we will talk about a few of the tradeoffs you might experience.
On with the why…
Yes, you read that right—confidence. Have you ever managed an environment created manually? Are you terrified with every change you have to make? Do you push back against even the smallest changes because you fear what might happen? Are you worried about backups being corrupted, and the server crashing, and being unable to recreate the environment? Are those scenarios the basis of every anxiety dream you’ve ever had? I got a little twitchy just typing them out.
Infrastructure as Code gives you the freedom to make changes without the fear that you’ll put things in an unrecoverable state. And it gives you a better understanding of how the environment came to be the way it is, which allows you to be more confident to make the changes you need.
Is it safe to remove this environment variable? Why was this flag set? How did that configuration file get in that state? Who added that site to the server? With Infrastructure as Code, you can look back in the source history and find out the answer to all those questions. And if that change breaks the environment? Well, just roll it back. Did that still not fix it? Build a whole new environment with a few keystrokes. That is the kind of magic that IaC enables.
Even in a software environment where you have a lot of tests, one of the biggest causes of failure is having integration/staging environments that don’t properly mimic production. Infrastructure as Code eliminates this problem by allowing your infrastructure to be instantiated in an automated way, making it easy to build multiple identical environments.
Doing Infrastructure as Code in the cloud adds another huge advantage—you can build specific environments for each release, rather than having dedicated testing environments. This can be a boon particularly if you need to modify your environment along with your software releases.
Have you ever had an environment change cause problems, but you’re not sure what changed? Infrastructure as Code virtually eliminates this problem because you can look at your source control and see exactly what changed in the environment. As long as you are diligent and don’t make manual changes to your environments, then Infrastructure as Code can be a game changer.
Most disaster recovery (DR) plans require the ability to set up an alternate environment in a different datacenter or region. Infrastructure as Code makes this a much more manageable prospect. Not only do you have an easy way to create a new environment from scratch, but if you need to maintain two active environments it’s easy to keep them in sync.
There are many times, especially in regulated industries, where you will need to be able to audit both changes and access to an environment. Infrastructure as Code simplifies this process immensely by allowing someone without access to a production environment to make changes to your IaC scripts and then hand them off to someone with access who can apply those changes in a controlled environment.
For anyone who has had to manage a process like this, you can understand how complicated this can be if you’re not using automation to manage these things. Infrastructure as Code gives you all of this right out of the box.
As an environment expands over time, it can be challenging to tell what has been provisioned or deployed. In the cloud this can be a huge cost issue. Infrastructure as Code can be helpful here, because instead of having to explore and enumerate everything in your environment, you can instead audit your IaC scripts.
If you’ve ever hunted through an AWS Org with multiple accounts using multiple services in different regions, then you understand how massive this problem can be. With your environments managed using Infrastructure as Code, the cost-savings alone can make it a worthwhile investment.
While many IaC tools don’t provide any abstraction across cloud providers (this probably wouldn’t be ideal anyway), some of them do allow you to work with multiple cloud platforms at the same time. This makes building multi-cloud environments a more manageable prospect.
IaC tools can make moving between clouds easier as well, since translating a Terraform script from one cloud provider to another is considerably more simple than having to recreate your entire environment in a cloud-specific IaC tool or rebuilding it from the UI or CLI. The visibility your IaC scripts will provide you into what is provisioned, and therefore what needs to be converted, is worth its weight in gold if you’re considering this kind of switch.
This is just another angle on visibility and auditability, but it is important to call it out separately. Being able to tell when something was changed, and who made that change, and who has access to what is critical for the security of your environment. Being able to see the history of changes to your security group rules along with commit messages can do wonders for being confident about the security configuration of your environment.
You’ll still need something like AWS Config or Google Cloud Audit to see any live changes made to your environment. However, a good Infrastructure as Code tool will make it easier to setup and configure these tools, and most importantly audit their configurations. You don’t want to find out that someone turned off the notifications coming out of your auditing tools because they were annoying.
I’m sold, what next?
What comes next will depend on your environment. At Simple Thread, most of the software we build is cloud native and we deploy it in AWS. Because of that we generally use two different tools to implement our Infrastructure as Code deployments: Terraform and Docker.
Why do we need two tools? Well, almost every Infrastructure as Code setup needs two different levels of automation. One is at the environment level, and one is at the machine or container level.
- Environment level – At the environment level you have tools like Terraform, Pulumi, AWS CloudFormation, Microsoft Azure Resource Manager, and Google Cloud Deployment Manager. These tools allow you to set up your VPC, subnets, route tables, security groups, DNS, load balancers, container services, etc. – basically anything that is part of your environment in your cloud provider or private cloud.
- Machine/Container level – At the machine/container level you have tools like Chef, Puppet, Ansible, Docker, and App Container Spec (appc), among others. These tools allow you to configure a machine, or build a container that can run one or multiple processes. With Chef/Puppet/Ansible, you’re setting up access, settings, services, mounts… anything you need to configure the machine. In the case of Docker, you’re usually doing something a little more focused, setting up an environment in which a single process will run.
At Simple Thread, we use a variety of tools, but we often use Terraform to automate our AWS environment and AWS Elastic Container Service and then deploy Docker containers for individual applications and services. When we find ourselves having to setup and provision individual servers, we often reach for Ansible.
You said there might be challenges
As with most things, there are some tradeoffs you’ll need to accept when you move to an Infrastructure as Code environment. Here are a few that you can expect to face:
The lure of the “One-Off” change
If you move to an Infrastructure as Code environment, you’ll need to make sure that nobody makes manual changes. It can be easy to say, “I’ll just make this one change, and then I’ll update the scripts later,” but later never comes. Once you start making changes to your environments manually, they begin to diverge and all of your effort will be for naught. Remember those anxiety-dream-producing scenarios from earlier? If you’re in IaC, this is the portal back to waking up in a cold sweat. Don’t do this.
Slow down to speed up
Sometimes it can take a bit longer to make changes with Infrastructure as Code. This can be especially true in smaller environments. It can be a quick thing to just tweak a setting in the load balancer or change a security group ingress rule. On the other hand, updating scripts, making the change, committing the change, running the change against a test environment, and then running the change against a production environment can feel so laborious. Especially when you’re just getting started.
But this is one of those situations where you need to slow down to speed up. Being diligent about making changes through your scripts will undoubtedly save you countless hours during an outage or when troubleshooting. And you’ll be so much more confident in your changes because you can test that change against a test environment instead of crossing your fingers and running the update straight against production. Even in small environments the payoffs can be enormous.
All of these tools have a pretty steep learning curve. You have to be able to create infrastructure in scripts that used to be easily created from the UI or the CLI. Among a seasoned ops team, this can be one of the biggest obstacles to IaC adoption. But if you commit, once you’re through the initial pain you’ll never look back.
Just do it
Infrastructure as Code is one of those rare techniques that is valuable in both the smallest and largest environments. Almost any engineering team can leverage IaC to their benefit. We started the journey many years ago with Chef and Puppet. Then we used CloudFormation before moving over to Terraform. We love Terraform because of its ecosystem and its ability to work with almost any public or private cloud tools. We’ve heard some good things about Pulumi, but we haven’t taken the plunge yet.
If you’re still managing environments manually, I hope you’ll strongly consider how much of an improvement a move to Infrastructure as Code could be for your company. At the risk of sounding hyperbolic, it really will completely change the way you do your operations work. You’ll feel better about your environment, more confident in your changes, and more secure in your disaster recovery plans.