Originally published in WyeWorks blog.
We all know that modern applications can get very complex, very quickly. Because of that, I've been trying to add more automation to my daily life in general, as I know I’m human and will eventually mess up due to being distracted, tired, or something else.
One of the first things I did while on my automation journey was to figure out a way to automate the creation of cloud resources. In order to accomplish that, I became proficient at using Infrastructure as Code (IaC) tools such as Terraform and AWS CloudFormation, among others.
After that, the next logical step was to integrate that knowledge in infrastructure provisioning into the CI/CD pipelines, since I have (or try to have) CI/CD pipelines setup in every project I’m involved in. In this post, I will present the benefits of doing just that and will demonstrate it using a specific Terraform + CircleCI example.
Benefits of automating infrastructure
If you’re not used to the concept of automatically deploying infrastructure, it might sound crazy, but in my opinion the benefits far outweigh the risks. The main benefits are as follows:
- The deployment will be automatic.
- You’ll tend to be more granular when making changes.
- You’ll exercise more caution when making changes to your infrastructure and this will almost always lead to less downtime.
- You will never forget to update your branch before deploying.
- Your development and feedback loops will be shorter; you merge your changes and everything gets applied for you. No need to wait for an "Ops" team to help you out.
- If you need to perform multiple steps for a deployment, you simply merge multiple PRs, one after another.
- You can automate the testing of your infrastructure by adding some automated smoke testing following deployment (or beforehand if you use blue/green deployment).
- You can invest more time in developing features.
- All deployments can be tracked in your pipeline.
Risks of having automated infrastructure
As with everything in life, there are also cons to this approach. They are, however, easy to overcome with some team training or by simply adding some security mechanisms.
Let's review some of the risks:
- Your team must be aware of the automated process. If you merge a PR, it’s because you wanted to merge it. If you accidentally merge code, it will get applied. To solve this potential problem, you can add a manual confirmation of the deployment for the
production
environment. - You might forget how to manually deploy. This is easy to solve - just manually perform the deployment process every now and then.
Now that we have seen the pros and cons of automating your infrastructure, let's go through a concrete example of how to do it.
CI/CD with Terraform and CircleCI
I created an example repository that includes some ideas which you can use to kickstart your project or make your current project better. The repository includes a detailed readme explaining the reasons why I made each decision.
The main purpose is to demonstrate only one of the many ways you could manage your infrastructure using an IaC tool like Terraform and a continuous integration tool such as CircleCI.
I have followed many of the best practices described in the book Terraform: Up & Running and the CircleCI 2.0 Documentation. I will also assume you have some knowledge of these tools and Amazon Web Services in general (though no need to be an expert).
Suggested workflow
Assuming you have two environments, production
and staging
: when a new feature is requested, you branch from staging
, commit the code and open a PR to the staging
branch. After that, CircleCI will run two jobs, one for linting and one that will plan the changes to your staging
infrastructure so that you can review them (see image below).
Once you merge the PR, if everything goes as planned, CircleCI will run your jobs and automatically deploy your infrastructure!
After you have tested your infrastructure in staging
, you just need to open a PR from staging
to master
to "promote" your infrastructure to production
.
In this case, we want someone to manually approve the release to master, so after you merge you need to tell CircleCI that it can proceed and it will deploy the infrastructure after receiving confirmation.
Improvements
This is a very basic workflow and there are many things that must (or can) be improved upon. Here are a couple of them:
- The most important improvement that needs to be made, in my opinion, is to add a way to test the infrastructure for each PR before merging them with
staging
. Perhaps something like the Heroku Review Apps would be best. If you use Terraform modules and you write them in a generic way, you could very easily implement infrastructure testing. - Add tests! This is of course a must in every project. If you want quality, you need to test. You can use tools like KitchenCI or InSpec to accomplish this.
Conclusions
I always like to present some final, very general takeaways in my articles, so here are a few:
- Try to automate more. This will lead your team towards efficiency and make it less error prone.
- Use IaC tools. Doing so will help with maintainability and understanding of all the pieces.
- Find ways around problems; don't let problems mold you.
Top comments (8)
I've often wondered if tf is actually iac or just a declarative space? It's like puppet is declarative but chef is more like programming. Could the same thing be said for tf? You can't modify the space with tf like you can with something like cloudformation. Thoughts?
My definition of IaC is "whatever allows me to create/modify resources without clicking around and can be included in version control". That definition is of course personal and you might not agree, but it matches TF.
That being said I agree that TF is more a declarative thing that a "programing" language, but IMO it gets the job done as good as many other tools. In fact I prefer TF rather than things that are more like a programming language (Chef for example).
I believe it often happens that people try to use TF as if it is a programing language and try to do "for loops" or "case" statements or other crazy things, but they shouldn't be doing that, because TF is in fact declarative (according to their site).
Regarding CF vs TF, I think you can achieve the same things with both tools, and IMO they are the most similar tools on the market of IaC.
Yeah, whatever makes the business work, right? I think my big hangup with TF is that it's basically throwing away the last 20 years of programming knowledge in favor of something that avoids "crazy" things like for loops and control statements.
When I use CF, I'm importing JSON chunks into the environment ( usually ruby ), then manipulating the data structures to create an outcome that matches my desired state. With this model I can do things like have isolated stacks of infrastructure, which you can't do with TF.
I like the isolated stacks thing because it allows me to yank an entire stack out of existence without having to worry ( mostly :) ) about artifacts that might be left over. This also allows me to create an entirely new silo of things that I know works together. I don't get that same kind of thing with TF.
Honestly, I'm just baffled that people would choose such an ineloquent solution to such a beautiful problems space as IaC.
That is a perfectly valid way of thinking ☺. In some cases I would definetly choose your approach. It is important to know all the possibilities, or at least a good chunk of them.
Cheers 🍻
We had “review apps” in one of the teams I was working in, we called it “branch staging” env, might as well be called “dev” environment (not to be confused with your local env - your laptop)
I actually liked this idea very much
You missed "dependency hell". Nothing quite like doing a re-deploy to get the code you were updating into production only to find that one of your project's upstream dependencies changed ...in a way that breaks everything.
Yeah, ideally, you catch that by having a pre-deployment testing task, but depending on how much time elapses between your testing-deployment and your "real" deployment, one of your upstream, internet-hosted projects may have changed. Only real way to guard against it is to create local, semi-static mirrors of all your upstreams.
I agree that this can happen, but it can happen in almost every setup. But as you said you could solve it, it is not an unsolvable problem. Thanks for mentioning that! ☺
Like many things, the most well-learned lessons come as a side-effect of a bloodied nose. =)