Skip to content

Latest commit

 

History

History
133 lines (89 loc) · 8.95 KB

README.md

File metadata and controls

133 lines (89 loc) · 8.95 KB

Cloud.gov Provisioning System

This repository holds the terraform configuration (and BOSH vars and ops-files) to bootstrap our infrastructure.

Be sure to read the internal developer documentation ("cg-provision") for non-public information about using this repository.

Local development

Git hooks

This project uses pre-commit to manage git hooks. To make sure your code is formatted and checked automatically, install pre-commit then run pre-commit install on this repo.

n.b., pre-commit recommends pip install pre-committo install, which will install it on your system python. You probably don't want this, and should instead pipx install pre-commit, which will automagically install it into its own virtual environment.

Layout

Terraform

Our Terraform code is organized by two concepts, with two corresponding directories.

  • Modules are reusable units of terraform code. Each module describes a useful concept in our infrastructure. A module can be reused in different contexts by declaring variables that the caller must pass in. Modules are located in terraform/modules.
  • Stacks combine and configure modules for use in an environment. They are also parameterized by variables, with values such as the environment name. Stacks are located in terraform/stacks.

As an example, if we wanted to write terraform code to deploy several CloudFront distributions in front of three load balancers in an environment, we could:

  1. Create a cloudfront module that declares a CloudFront distribution, a Shield Advanced resource to protect it, and an Access Control List (ACL) association between the distribution and an ACL. (The ACL itself is not declared in the module.) It could take an origin, a list of domains, and an ACL ARN as variables.
  2. Create a cloudfront stack uses the cloudfront module three times, once for each load balancer in the environment. It would pass an external domain and load balancer domain to each module. It could also declare a single ACL for the environment and pass its ARN to each cloudfront module. The stack could take an environment name as a variable.
  3. Add a job to Concourse for each environment. Each job would deploy the cloudfront stack and pass the environment name as a variable.

In the future, we would like to add a third concept: An entire runtime environment. An environment would combine multiple stacks to represent the entire cloud.gov runtime stack. This collection of resources could be deployed as a single unit to a new AWS region or multiple times in the same region.

Environments

The main stack is a template that is used to provision the production, staging, and development "environments."

The regionalmasterbosh stack contains our masterbosh for a given region, which deploys the tooling BOSH for that region. The tooling BOSH then deploys the BOSH directors in the main stacks across all accounts in that region.

The tooling stack is the same as the regionalmasterbosh stack, but has some extras from before we started going multi-region and multi-account:

  • concourse and staging concourse
  • buckets that we need only one of across all accounts and regions
  • some things that really should be in child environment accounts
  • nessus In the future, we should work towards disentagling these pieces out, so the old tooling is deployed as a regionalmasterbosh and the other stuff is its own stack(s)

The external and dns stacks are both outside of GovCloud (commercial AWS).

Wiring up users

As mentioned above, we have four categories of environment:

  • main - this is the thing we're actually after. It's the pieces that directly support the platform components. There should be several of these across multiple AWS accounts
  • tooling - this is used to support the things in the main platform - our CI system, managment tools such as Nessus, etc.
  • external - this manages some things that don't (or historically didn't) exist in govcloud (really just cloudfront and the users, etc, to support it). There's one of these per main environment
  • dns - this manages route53. There's exactly one of these, although we really should split it out to one per main + one for tooling

To allow the tooling environment to manage the main environment, there's a tooling-terraform role associated with each main environment, which has an assumerole policy allowing access by concourse workers in the tooling account.

To add a new main environment, see the README here

BOSH

The bosh directory contains vars and opsfiles for use by the BOSH directors.

How Concourse Creates AWS Resources

The Concourse worker VMs must have AWS access to create and apply Terraform plans. How they are given that access depends on the partition being changed.

You can determine how a failing Concourse container is configured by hijacking it. Connect to the container (see fly hijack --help) and run aws configure list to see the current configuration.

GovCloud

The Concourse worker VMs are associated with an IAM role with read-write access to GovCloud resources. The AWS SDK in the Concourse containers is automatically configured to fetch credentials from the Instance metadata service. No further configuration is necessary - note that no access keys are passed to GovCloud jobs in pipeline.yml.

AWS IAM roles documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

Commercial Stacks

Each Concourse job that manages AWS Commercial resources must override the Concourse worker's IAM role. The jobs set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION environment variables to do this. Environment variables have higher precedence in the AWS SDK, so they are used instead of the IAM role. No further configuration in Terraform is necessary.

DNS and CloudFront Stacks

The DNS stack is a special case because it must read state from GovCloud but read and write resources and state to Commercial. AWS IAM users cannot have cross-partition permissions, so the job must use two separate AWS accounts (one for each partition).

To achieve this, the Concourse jobs pass an access key to a Commercial IAM user as a TF_VAR instead of using the standard AWS_* environment variables. (Setting AWS_ variables would make the AWS SDK use them by default, and we want it to continue using the GovCloud IAM role by default.)

The IAM role and TF_VAR_ credentials are used as follows:

  • The terraform init command is run with the Commercial credentials using this script. This configures the s3 backend for the DNS stack to be set up in the Commercial account.
  • The terraform provider is configured with the Commercial credentials, ensuring that all resources will be created in the Commercial account.
  • The terraform_remote_state data blocks for each GovCloud s3 state object are configured with the GovCloud region. Because they are accessed using Terraform's initialization process, but separately from the initial terraform init, they are not passed the Commercial credentials. Without any credentials set explicitly, the AWS SDK uses the GovCloud IAM role.

Deployment Workflow

Since IaaS is a shared resource (we don't have the money or time to provision entire stacks for each developer), we never apply this configuration manually. Instead, all execution is done through the Concourse pipeline, which is configured to first run terraform plan, and then wait for manual triggering before running terraform apply.

If you want to make infrastructure changes:

  1. Create a branch and pull-request with your changes and ask for review and merge from a teammate.
  2. Once the teammate 👍 the changes, head over to the Concourse pipeline and review the resultant Terraform plan output.
  3. If the plan looks like what you intended, then manually trigger the appropriate apply jobs.

Other Points of Note

You may see access_key_id_prev and aws_key_id_prev as outputs for our iam modules. These are used for cred rotation

modules/stack/spoke composes modules/stack/base and some of the VPC modules. It's not entirely clear why, and why the VPC modules weren't simply included in base (removing spoke altogether).