Cloud Journey: Starting with Enterprise Scale — Part 1

Ram Bhagat Suthar
4 min readJun 26, 2021
Photo by Sigmund on Unsplash

Cloud adoption is increasing and the ecosystem has grown as a complex set of technologies, services, and products. The major cloud players pushing for revolutionary services and making it easier to adopt new services to solve complex business problems. As consumers, navigating and understanding this cloud ecosystem is becoming increasingly difficult. The two approaches come to mind :

  • Start small and expand
  • Start with scale

The focus of this article will be on design decisions to consider while starting with enterprise-scale infrastructure, keeping the focus on architecture. Assuming you have a team to execute Infrastructure as code.

PART 1: Isolate cloud infrastructure for IaC Automation

Isolate cloud infrastrucure on Day 1.

Follow Zero trust and shut inbound internet access to cloud infrastructure.

The decisions you make early on for infrastructure automation, continuous integrations, and continuous delivery will determine the freedom in system design later w.r.t choices like using CSP global backbone network for traffic and restricting access to resources over the internet.

There are many tools available in the market for IaC. The example below will use Terraform to keep provisioning cloud agnostics. The same approach can be applied for 3 major public cloud providers (AWS, Azure, GCP) as well as other providers.

Typical workflow for IaC provisioning ( Used draw.io)

Self-service aspiration: As the IaC landscape is maturing. The desire for self-service systems also increasing. Which again will require the isolation of cloud infrastructure.

Developer Self-service Setup ( Used draw.io)

How to Isolate Infrastructure and Run Pipelines with SaaS tools.

  1. Select Version control and pipeline tool: You would need to select a version control and pipeline tool. For example: Terraform cloud, GitHub + Workflows, Gitlab, Azure DevOps, Github + CircleCI, Travis, etc.
  2. Host the agent in your network: The pipelines need compute / agents to run the workflows for IaC provisioning. The link between the hosted agent and pipeline tools needs to be established. The agent should not have any public IP assigned.
Example Azure DevOps self-hosted agent with IAM role on Azure Cloud ( Used draw.io)

Note: Based on capacity requirement the sizing of the hosting env should be determined. You can host an agent on a single virtual machine or pool of machines. Also, some of the tools provide support to hosted agents on k8s.

Self-hosted runners from various pipeline tools :

Terraform Cloud Agent
Azure DevOps Agent
Gitlab runners
GitHub runners

3. Assign identity to self-hosted agent: The virtual machine would need to be assigned a system identity. The access to cloud infrastructure should be limited to the agent and any resource provisioning should happen through pipelines running in self-hosted agents. The agents should use managed identities.

System Identity implementation varies for each cloud service provider.

Azure : Azure Managed Identity for VM
AWS : AWS IAM authentication : A security principal is created, permissions are granted, and the identity is assumed by a resource. It works on use of temporary credentials
GCP : Compute Engine Service Agent

4. Assign a role to the agent: The self-hosted agent’s managed identity would need to be assigned an IAM role: Follow the principle of least privilege and assign a role to able to provision resources across estate or specific team landing zone.

5. Setup agent with required tools: The self-hosted agents would keep pipelines lightweight by providing required libraries at the time of provisioning the agent. Example: Kubectl CLI, Azure/AWS/GCP CLI, SSH Keys, Vault Access, and Secrets (if required), Authenticate to shared services like storage.

Problems you may face along the way

1. Chicken-and-egg problem: How to create infrastructure for self-hosted agents using automated pipelines when no tools exist on day 1.
Possible Solution:
Create the infrastructure for self-hosted agents by running the initial pipeline on shared agents/runners.
You will need to provide initial pipeline access over internet but we can restrict access by IP whitelisting( Avoid creating agents by running scripts from local machine).

2. Outbound traffic: Agents would need outbound internet access. If agents need a static IP. You will need to provision a NAT gateway.

--

--

Ram Bhagat Suthar

This is a personal blog. Opinions represented in this blog are personal.