In my last blog post I introduced a stage which executed Checkov to my Terraform Azure DevOps pipeline, this scanned the Terraform configuration and stopped the deployment if there was an issue. I also added a stage which checks to see if there are any resources being destroyed.

Whats missing?

Both of these I thought should give some basic protection against problems caused by common configuration issues, which they did, but it didn’t take into account the end user, i.e. me, making a change which would dramatically increase the running costs of the deployment.

Enter Infracost

One day as I was skimming through Reddit and I noticed mention of Infracost (I can’t remember the post, sorry) - the description of the tool got my attention “Cloud cost estimates for Terraform in pull requests” peaked my interest and gave it ago locally.

Installing and registering Infracost locally

As I am macOS user installing Infracost locally was a Homebrew command away:

Install Infracost on macOS
brew install infracost

Once installed you need to register for an API key, this can be done with a single command:

Register for an API key
infracost register

It will ask you for your Name and Email address, once you enter these you should see something like the following output:

Tip

Please make a note of the API key, you will need it later.

Output
$ infracost register
Please enter your name and email address to get an API key.
See our FAQ (https://www.infracost.io/docs/faq) for more details.
Name: Russ McKendrick
Email: [email protected]

Thank you Russ McKendrick!
Your API key is: IaMnOtREaLlyANapIK3y

Success: Your API key has been saved to /Users/russ.mckendrick/.config/infracost/credentials.yml
You can now run infracost breakdown --path=... and point to your Terraform directory or JSON/plan file.

That is all of the configuration you need to do, once installed you can try running the tool.

Running Infracost locally

Next up we need some Terraform to run it against, I have some test code which launches a Linux Virtual Machine in Azure so decided to use that.

Info

The Terraform code I am using can be found at the russmckendrick/terraform-vm-local-example Github repo.

To start with run the following:

Run some Terraform commands
terraform init
terraform plan -out tfplan.binary
terraform show -json tfplan.binary > plan.json

This will download all of the Terraform providers and create a Terraform Plan file and then convert it to JSON. Next up we can run Infracost against the JSON version of the Terraform Plan file u sing the following command:

Run Infracost for the first time
infracost breakdown --path plan.json

As you can see from the output below (you may need to scroll right), the virtual machine using the SKU Standard_B1ms is going to cost $17.23 per month with an addition cost of around $1.69 for the disk operations:

Output
$ infracost breakdown --path plan.json
Detected Terraform plan JSON file at plan.json

✔ Calculating monthly cost estimate

Project: russmckendrick/terraform-vm-local-example/plan.json

 Name                                                     Monthly Qty  Unit                      Monthly Cost

 azurerm_linux_virtual_machine.main
 ├─ Instance usage (pay as you go, Standard_B1ms)                 730  hours                           $17.23
 └─ os_disk
    ├─ Storage (S4)                                                 1  months                           $1.69
    └─ Disk operations                             Monthly cost depends on usage: $0.0005 per 10k operations

 OVERALL TOTAL                                                                                         $18.92
----------------------------------
To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file

That’s a reasonable cost, so lets launch the Virtual Machine by running:

Warning

Warning! If you are following along running the commands below will incur cost.

Run some Terraform commands
terraform apply

Now that we have the Virtual Machine, lets increase the specification by updating the SKU to Standard_B4ms, this can be done in the terraform.tfvars file in the repo. Once updated, generate a new plan file and run Infracost again:

Run some Terraform commands
terraform plan -out tfplan.binary
terraform show -json tfplan.binary > plan.json
infracost breakdown --path plan.json

You will notice that when you ran the terraform plan command it checked against the Terraform Statefile, however as you can see from the output below …

Output
$ infracost breakdown --path plan.json
Detected Terraform plan JSON file at plan.json

✔ Calculating monthly cost estimate

Project: russmckendrick/terraform-vm-local-example/plan.json

 Name                                                     Monthly Qty  Unit                      Monthly Cost

 azurerm_linux_virtual_machine.main
 ├─ Instance usage (pay as you go, Standard_B4ms)                 730  hours                          $137.97
 └─ os_disk
    ├─ Storage (S4)                                                 1  months                           $1.69
    └─ Disk operations                             Monthly cost depends on usage: $0.0005 per 10k operations

 OVERALL TOTAL                                                                                        $139.66
----------------------------------
To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file

… all it shows is the new cost, wouldn’t it be good if you could figure out the difference? Well you can, just run the following command:

Check the differences
infracost diff --path plan.json

This time I got the output below:

Output
$ infracost diff --path plan.json
Detected Terraform plan JSON file at plan.json

✔ Calculating monthly cost estimate

Project: russmckendrick/terraform-vm-local-example/plan.json

~ azurerm_linux_virtual_machine.main
  +$121 ($18.92 -> $140)

    - Instance usage (pay as you go, Standard_B1ms)
      -$17.23

    + Instance usage (pay as you go, Standard_B4ms)
      +$138

Monthly cost change for russmckendrick/terraform-vm-local-example/plan.json
Amount:  +$121 ($18.92 -> $140)
Percent: +638%

----------------------------------
Key: ~ changed, + added, - removed

To estimate usage-based resources use --usage-file, see https://infracost.io/usage-file

… as you can, here we have an increase in cost of 638% - probably best that I don’t update the SKU !!!

Now lets look how this can be applied to the Azure DevOps pipeline, but not before I run the following to remove the Virtual Machine:

Run Terraform Destroy
terraform destroy

The Pipeline

The stages of the pipeline are not changing too much, they are still the following

  • Checkov Scan, there are no changes to this stage
  • Terraform Validate, there are no changes to this stage
  • Terraform Plan, there is where all of the changes are and we will be covering this stage in more detail below.
  • Terraform Apply (Auto Approval), there are no changes to this stage
  • Terraform Apply (Manual Approval), there are some minor changes to this stage, mostly around the wording

Additional Pipeline variables

There is an addition of a single variable at the top of the azure-pipeline.yml file, this sets the cost_increase_alert_percentage threshold - in my case I set this to 50%:

Run some Terraform commands
variables:
  tf_version: "latest" # what version of terraform should be used
  tf_state_rg: "rg-tfstate" # name of the resource group to create/use for the terraform state file
  tz_state_location: "uksouth" # location of the resource group to create/use for the terraform state file
  tf_state_sku: "Standard_RAGRS" # sku to use when creating the storeage account to create/use for the terraform state file
  tf_state_sa_name: "tfstatesa20210606" # name of of the storage account to create/use for the terraform state file
  tf_state_container_name: "tfstate" # name of of the container to create/use for the terraform state file
  tf_environment: "dev" # enviroment name, used for the statefile name
  cost_increase_alert_percentage: 50 # if the difference in costs is higher than x% then you will need to manually validate the deployment

The second variable which needs to be added contains the API key which you made a note of when the infracost register command was ran locally. If you didn’t make a note then the configuration file created by the command also contains the API key, in my case was stored at /Users/russ.mckendrick/.config/infracost/credentials.yml and make a note of the API key.

Open the pipeline in Azure DevOps, click Edit, then Variables and finally add a variable called INFRACOST_API_KEY making sure that you tick the Keep this value secret box:

Adding the INFRACOST_API_KEY variable

Now the two variables have been added lets look at the changes to the pipeline itself.

Stage - Terraform Plan

Before this stage contained the following tasks:

  • “Run > terraform init”
  • “Run > terraform plan”
  • “Run > terraform show”

There are no changes to these three tasks, by the end of the them we are left with an idea of what Terraform is going to do and a Terraform Plan file is stored at $(System.DefaultWorkingDirectory)/terraform.tfplan.

Task - Install > Infracost

The first of the two new tasks we are adding simply installs Infracost:

Install > Infracost
- bash: |
    if [ -z "$(INFRACOST_API_KEY)" ]; then
      echo "ℹ️ - No Infracost API Key has been detected - skipping task"
    else
      sudo apt-get update -qq && sudo apt-get -qq install bc curl git jq bc
      curl -sL https://github.com/infracost/infracost/releases/latest/download/infracost-linux-amd64.tar.gz | tar xz -C /tmp
      sudo mv /tmp/infracost-linux-amd64 /usr/bin/infracost
    fi    
  name: "installinfracost"
  displayName: "Install > Infrascost"

As you can see, there is a little logic in there which skips this step if the $(INFRACOST_API_KEY) is not defined and we are just left with a message which looks like the following:

Nothing to do here

Once Infracost has been installed we can then run it.

Task - Run > Infracost

There is quite a bit of logic in the this task, here it is in its entirety:

Run > Infracost
- bash: |
    if [ -z "$(INFRACOST_API_KEY)" ]; then
      echo "ℹ️ - No Infracost API Key has been detected - skipping task"
    else
      mkdir $(System.DefaultWorkingDirectory)/output
      terraform show -json $(System.DefaultWorkingDirectory)/terraform.tfplan > $(System.DefaultWorkingDirectory)/output/plan.json
      infracost breakdown --format json --path $(System.DefaultWorkingDirectory)/output/plan.json > $(System.DefaultWorkingDirectory)/output/cost.json

      past_total_monthly_cost=$(jq '[.projects[].pastBreakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
      total_monthly_cost=$(jq '[.projects[].breakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
      diff_cost=$(jq '[.projects[].diff.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)
      percentage_threshold=$(cost_increase_alert_percentage)

      if [ $(echo "$past_total_monthly_cost > 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost > 0" | bc -l) = 1 ]; then
        percent=$(echo "scale=6; $total_monthly_cost / $past_total_monthly_cost * 100 - 100" | bc)
      fi

      if [ $(echo "$past_total_monthly_cost <= 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost <= 0" | bc -l) = 1 ]; then
        percent=0
      fi

      if [ -z "$percent" ]; then
        echo "##vso[task.logissue type=warning]💰 - ℹ️ No previous cost data has been detected"
      elif [ $(echo "$percent > $percentage_threshold" | bc -l) = 1 ]; then
        echo "##vso[task.logissue type=warning]💰 - 📈 A $percent% increase in cost have be detected. Your monthly costs are increasing from \$$past_total_monthly_cost to \$$total_monthly_cost"
        echo "##vso[task.setvariable variable=TERRAFORM_PLAN_HAS_DESTROY_CHANGES]true"
      else
        echo "##vso[task.logissue type=warning]💰 - 📉 An acceptable or no change in cost has been detected. Your new monthly costs are \$$total_monthly_cost from \$$past_total_monthly_cost"
      fi
    fi    
  env:
    INFRACOST_API_KEY: $(INFRACOST_API_KEY)
  name: "runinfracost"
  displayName: "Run > Infrascost"

The first few of the steps in the task roughly follow what we ran locally:

  • Check to see if $(INFRACOST_API_KEY) has been set
  • Create a folder called output
  • Run terraform show using the plan file created by the Run > terraform plan task to save a JSON version of the plan
  • Take the JSON file created above and run infracost breakdown, this time outputting the results as a second JSON file

With the only difference being that rather outputting the the screen we are saving the results to a JSON file, once the file has been generated we can interact with it using eh jq command, jq is a lightweight and flexible command-line JSON processor.

First we get the value for the previous cost, if there was one, and assign it to the $past_total_monthly_cost variable

Set $past_total_monthly_cost
past_total_monthly_cost=$(jq '[.projects[].pastBreakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)

Them we get the value for the new cost, and assign it to the $total_monthly_cost variable:

Set $total_monthly_cost
total_monthly_cost=$(jq '[.projects[].breakdown.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)

Next up, we get the difference in cost and set that as the $diff_cost variable:

Set $diff_cost
diff_cost=$(jq '[.projects[].diff.totalMonthlyCost | select (.!=null) | tonumber] | add' $(System.DefaultWorkingDirectory)/output/cost.json)

Not how that the difference in cost was aviable to us in the JSON output without the need for us to run the infracost diff command.

Finally, we take the pipeline variable $(cost_increase_alert_percentage) and set a local one called $percentage_threshold:

Set $percentage_threshold
percentage_threshold=$(cost_increase_alert_percentage)

The next part of the script …

Set $percent
if [ $(echo "$past_total_monthly_cost > 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost > 0" | bc -l) = 1 ]; then
  percent=$(echo "scale=6; $total_monthly_cost / $past_total_monthly_cost * 100 - 100" | bc)
fi

… only runs if both $past_total_monthly_cost and $total_monthly_cost are greater than 0, what it does it set the percentage increase or decrease based on the data in the variables we have just set, this is then exported to the $percent variable.

The next statement sets $percent if there is no cost data:

Set $percent to 0
if [ $(echo "$past_total_monthly_cost <= 0" | bc -l) = 1 ] && [ $(echo "$total_monthly_cost <= 0" | bc -l) = 1 ]; then
  percent=0
fi

Now we should have information to make a decision on what the Terraform should do, which should be one of three things;

  • 1. Do nothing, there is no price data to output a message saying that and move on.
  • 2. Check to see if $percent is higher than $percentage_threshold, if so output a message and also set $TERRAFORM_PLAN_HAS_DESTROY_CHANGES to true to trigger the manual review stage.
  • 3. If neither of the conditions above are met then assume that the cost increase with within $percentage_threshold, print a message.

This looks like the following:

Decide what to do
  if [ -z "$percent" ]; then
    echo "##vso[task.logissue type=warning]💰 - ℹ️ No previous cost data has been detected"
  elif [ $(echo "$percent > $percentage_threshold" | bc -l) = 1 ]; then
    echo "##vso[task.logissue type=warning]💰 - 📈 A $percent% increase in cost have be detected. Your monthly costs are increasing from \$$past_total_monthly_cost to \$$total_monthly_cost"
    echo "##vso[task.setvariable variable=TERRAFORM_PLAN_HAS_DESTROY_CHANGES]true"
  else
    echo "##vso[task.logissue type=warning]💰 - 📉 An acceptable or no change in cost has been detected. Your new monthly costs are \$$total_monthly_cost from \$$past_total_monthly_cost"
  fi

The final part of the task closes the loop and also sets the content of $(INFRACOST_API_KEY) as an environment variable called INFRACOST_API_KEY which the infracost checks when it is executed:

Decide what to do
  fi
env:
  INFRACOST_API_KEY: $(INFRACOST_API_KEY)
name: "runinfracost"
displayName: "Run > Infrascost"

Task - Vars > Set Variables for next stage

The final task in this stage is not much different than before, just some of the wording has been tweaked to take into account we are now looking for cost as well a resources being destroyed:

Vars > Set Variables for next stage
- bash: |
    if [ "$TERRAFORM_PLAN_HAS_CHANGES" = true ] && [ "$TERRAFORM_PLAN_HAS_DESTROY_CHANGES" = false ] ; then
      echo "##vso[task.setvariable variable=HAS_CHANGES_ONLY;isOutput=true]true"
      echo "##vso[task.logissue type=warning]👍 - Changes with no destroys detected, it is safe for the pipeline to proceed automatically"
      fi
    if [ "$TERRAFORM_PLAN_HAS_CHANGES" = true ] && [ "$TERRAFORM_PLAN_HAS_DESTROY_CHANGES" = true ] ; then
      echo "##vso[task.setvariable variable=HAS_DESTROY_CHANGES;isOutput=true]true"
      echo "##vso[task.logissue type=warning]⛔️ - Changes with Destroy or Cost increase, pipeline will require a manual approval to proceed"
    fi
    if [ "$TERRAFORM_PLAN_HAS_CHANGES" != true ] ; then
      echo "##vso[task.logissue type=warning]ℹ️ - No changes detected, terraform apply will not run"
    fi    
  name: "setvar"
  displayName: "Vars > Set Variables for next stage"

There are also some tweaks to the rest of the pipeline, but nothing outside of changing some of the wording.

Running the Pipeline

Now that we have all of the bits together lets run the same Terraform code which launches a Linux virtual machine with the Standard_B1ms SKU.

Initial Run

When the pipeline is first run there are no existing costs so we get the following output:

First run

As you can see, we have a message saying that “No previous cost data has been detected” and that Terraform as just run as expected as it is only adding resources.

Running again

Rerunning with the same SKU us the following:

Second run

As we already have an existing resource Infracost returns information on both the previous and new cost, which in our case was $18.91 - also not that as there are no changes Terraform does not attempt to apply any thing.

Updating the SKU and increasing costs

Now lets bump the SKU to Standard_B4ms:

Update the SKU

As you can see, an cost increase of over 50% has been detected, over 630% in-fact from $18.91 to £139.66 per month, so the $HAS_DESTROY_CHANGES has been set and the manual validation stage was executed.

Undo the change to the SKU

The final change is changing the SKU of the virtual machine back to Standard_B1ms:

Undo the change to the SKU

The message this time shows that the costs have been reduced and we are OK with that, so the pipeline triggered the auto-approve stage and we didn’t have to step in and review the changes.

Summary

Now the pipeline described above does differ from the native CI/CD integration provided by Infracost which can be found here. Infracost’s own integration hooks into your repo and is triggered on a pull request - as I already had a pipeline built I decided to adapt their script a little so that it fitted my own needs.

With over 3 million prices listed covering the bulk of Microsoft Azure, Amazon Web Services, and Google Cloud Platform cloud services it should pick up the majority of common mistakes when it comes to incorrectly configuring a service using Terraform and hopefully stop you getting any nasty surprises at the end of the month.

They have also just updated the self-hosted version of the Cloud Pricing API meaning that you can connect to your own instance rather than registered to use their public end-point which is extremely useful if you have limited network access, see this blog post for more information.

The full code for the pipeline and Terraform scripts covered in this post can be found in the GitHub repo here.