This article originally appeared in “97 Things Every Cloud Engineer Should Know” edited by Emily Freeman and Nathen Harvey, 2021 O’Reilly Media

Infrastructure is important. Infrastructure and application code are equally critical to success as a cloud engineer. Most engineers either choose the correct runtime environment or iterate through runtime environments until they find the appropriate one for their application. How you provision, deploy, and recover whatever infrastructure you use is equally critical to choosing the appropriate runtime. Designing, architecting, developing and deploying applications are absolutely the sweet spot for most cloud engineers. Error reporting, debugging, logging and logs aggregation, and alerting generally are easily baked in when working on a major cloud platform or working with common toolsets. One of the greatest advantages to working in the cloud is the plethora of managed services and tools readily available to meet those challenges! Cloud is awesome! That’s a knife that cuts both ways however. Managed services that are easily turned on can be easily and accidentally turned off. A managed database or a function running on serverless can be inadvertently dropped. And if you are provisioning and deploying those resources by hand or via shell scripts, they can introduce unnecessary toil into your downtime recovery and remediation strategies. And NOBODY wants toil. Using managed services is a great strategy. Treating managed services as infrastructure, and defining them using declarative and idempotent tools is an even better strategy. Define and declare your infrastructure as code, check that code into your version control system of choice, and peer review that code before changes make it into your live systems. This will save you downtime, heartburn, and headaches.

There is a wide variety of infrastructure as code patterns and tools. The most basic form of IAC is to simply write shell scripts to create your infrastructure. This method is not optimal. Scripting infrastructure provisioning is imperative, lacks the benefits of parallelized execution and dependency management, and is just an scripted version of manual provisioning/deployment. Maintaining and debugging scripts introduces unnecessary toil into infrastructure. To avoid introducing all that potential toil, we can use IAC tools and methods that are declarative and idempotent. Each of the major public clouds offers their own IAC tooling. Amazon offers Cloud Formation, Google offers Cloud Deployment Manager, and Microsoft offers Azure Resource Manager. These IAC tools offer degrees of declaration and idempotency, but they all only work in their respective public clouds. As multi-cloud and hybrid cloud approaches become more common in the industry, this isn’t the direction we want engineers and SREs moving in.

Good IAC uses idempotency to create a diff between your code (desired state) and your current state and identify drift. Tools like terraform present this diff and then give you a ready to go remediation plan to bring your current state back into harmony with your desired state. Automating away infrastructure drift with IAC tools is essentially a superpower for cloud engineers! Cloud engineers that design, provision, deploy, and remediate efficiently and effectively build reputations as reliable and capable of delivering. Infrastructure as Code is a super power for cloud engineers!