Infrastructure as code: evolution and practice

This blog post was first published in The New Stack.

As infrastructure has evolved and matured over the last decade, the way in which we build and deploy that infrastructure has — for the most part — kept pace. As the velocity of deployments increased, and practices such as continuous deployment and delivery became the norm, it became critical that we manage infrastructure and deploy applications in a similar way.

high voltage substation-1

From this need arose “infrastructure as code” (IaC), the management of infrastructure (and everything therein, from networks to VMs to load balancers) that provides the foundation for your apps. By deploying and codifying your deployments in the same way, you establish one framework — one source of truth — for the state of configuration for everything from your infrastructure to your applications. As Jesse Robins and John Allspaw put it in Web Operations, IaC lets you: “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal resources.”

By implementing IaC, you provide one path for ops and devs to learn about your infrastructure, giving them a sense of how things are deployed and configured. You also empower the security team — they can audit that single source of truth to make sure you’re deploying in a way that meets security requirements. If they have concerns, they can raise them within the context of infrastructure as code — because everything is codified and documented through that code, you can have cross-team conversations with the same context and understanding.

In this post, I’ll take a look at the evolution of IaC — including where we were before — and what it looks like in practice.

The evolution of deploying code

Once upon a time, applications were manually deployed to their hosting environments.

After that, we began to automate that process through scripts, fragile though they were. A single script was often written for a group of actions (e.g., apt-get update and apt-get install apache2) an operator needed to perform — one for setting up load balancers, another for installing dependency libraries, and so on. Modifying scripts to adapt to new requirements was difficult and time consuming; worse, only a handful of operations engineers would understand all of the arcane toggles. If a server became unavailable due to a bad deploy, a misconfigured option, inoperable dependencies, or a dozen other unaccounted for outcomes, it could have taken hours before the problem was diagnosed and resolved.

The rise of DevOps, however, has meant that many future-oriented organizations are opting for IaC.

These days, we treat our infrastructure the same way we treat our application code. We write code that provisions and manages our infrastructure in a predictable way. That means that an application, regardless of its environment or where it’s hosted, can be spun up with a predefined list of requirements entirely from scratch.

That same code can run in production, in staging, and on your local dev environment, ensuring (fairly) consistent results wherever your application runs.

And, while the bespoke scripts of yesteryear may work in smaller environments (i.e., a few servers), they’re difficult to scale. Tools such as Chef, Puppet, and Ansible (to name a few) bring a framework to your infrastrastructure as code, allowing you to stand up thousands of machines which eventually provision themselves. Idempotence — a key principle of IaC — enables clients to make the same call repeatedly while producing the same result. That is, making multiple, identical requests has the same effect as making a single request. Idempotence makes it inexpensive to run your code on the same machine over and over again and ensures that results are as you’d expect. Chef and Puppet are particularly focused on this principle, without which you’d be back to running custom scripts.

The benefits of IaC

A great advantage to IaC is that it’s a single source of truth: the knowledge about how your services run and the dependencies they require can be shared amongst an entire application delivery team, rather than a strict subset of highly technical operations members (whose time can be better spent fine-tuning network and database performance). Spreading knowledge this way ensures that everyone can self-sufficiently tweak servers, build features, and minimize downtime. (You still have to be careful — in some cases you can automate downtime and amplify mistakes.)

Since our infrastructure is generated from code, that also means that you can version your infrastructure and discuss changes with your team. Automation offers several business benefits (reduced operational costs and downtime) reducing impact on engineer productivity.

You can (and should!) also test your configuration code, to ensure that there are fewer unexpected breakages. For example, if an upstream package changes an interface and breaks its dependents, you can detect that change the moment it becomes a problem on your testing environment, and hold off any package upgrades until the issue is fixed. Tools like test-kitchen (which I like using with the kitchen-docker driver) and serverspec help with this by providing an integration tool for developing and testing IaC code and software.

Speaking of detection, an IaC setup also grants you the ability to monitor how the application is configured. Just as you check the performance of software, you can track various aspects of your infrastructure setup, such as how long it takes to provision an app from start to finish. You can automate tasks based on when an event occurs, like receiving alerts if a node misbehaves, if a critical service such as Nginx goes down, or even when measuring resource allocation such as disk space or memory usage.

Trade-offs to consider

IaC approaches aren’t perfect, and as with any practice, the trade-offs need to be considered.

To start, you’ll need to be sure to implement a comprehensible workflow — one that makes sure most teams in your organization are using the same strategy to deploy their apps and manage their infrastructure. This is particularly true if your applications are designed as several microservices. There are, as always, certain compromises when it comes to objectives — one enterprise IT manager I spoke with recently cited using a combination of Puppet, Ansible, some scripts, and Docker to run their infrastructure.

You’ll also likely need guidelines and permissions in place that prohibit editing configuration files directly on the servers, as this may introduce a drift between the expected setup stored in your version control system (VCS) and the code that’s actually running. Luckily, both Chef and Puppet automatically correct control drift by overriding changes — next time Chef or Puppet runs, they will wipe out changes to get the expected, declared state.

Another possible drawback is the learning curve. Developers who are unfamiliar with the intricacies of OS package managers or the frameworks that use them, like Chef and Puppet, might require some training first. However, in the long run, this is good information to have, as the gap between strictly “software engineers” and “operations engineers” is closing. A good understanding of how all the parts of an application work together is essential to developing reliable software.

The ease of simple expectations

In the end, infrastructure-as-code defines a process of configuring your infrastructure and applications in a reliable and efficient way. A simplified configuration process for your applications enables you to rapidly deploy better software. When you test and monitor your infrastructure, you can ensure stable and repeatable deployments across every environment you have.

As IaC practices continue to develop, we can expect to see more improvements in how infrastructure is managed. We can also expect machine learning to report issues or automate security upgrades. Chef Automate, for example, aims to identify inefficient configurations, just as we’re starting to gain the same insights into our systems. In my next post, I’ll cover incorporating monitoring into your infrastructure as code workflow.

Sensu empowers businesses to automate their monitoring workflow and gain deep visibility into their Kubernetes, hybrid cloud, and bare metal infrastructure. Learn more at sensu.io.