← Google Cloud Course4 / 13

Autoscaling & Load Balancing

Make your app survive traffic spikes and failures: bake an instance template, run a managed instance group that autoscales and self-heals, and spread traffic across regions with an HTTP(S) load balancer.

Ad 728×90

The goal: handle more traffic, survive failures

One VM cannot handle a spike or survive its own failure. The fix: run several identical VMs that a load balancer spreads traffic across, and let a managed instance group keep the right number healthy. Why: your app stays up when a VM dies and grows when demand rises — automatically.

This lesson builds, in order: 1. an Instance Template (the blueprint for each VM) 2. a Managed Instance Group (keeps N healthy VMs, self-heals) 3. Autoscaling (changes N based on load) 4. an HTTP Load Balancer (spreads traffic across the group)

echo "Reuse my-vpc, central-subnet, and startup.sh from earlier lessons."

Instance templates — a reusable VM blueprint

An instance template captures everything needed to create a VM — machine type, image, network, startup script — so every VM in the group is identical. Why: a managed instance group needs one recipe to stamp out copies. Templates are immutable; you make a new one to change the config.

Create a template that installs nginx on boot

gcloud compute instance-templates create web-template \
  --machine-type e2-micro \
  --image-family debian-12 --image-project debian-cloud \
  --network my-vpc --subnet central-subnet --tags web \
  --metadata-from-file startup-script=startup.sh

Managed instance groups — keep N healthy VMs

A managed instance group (MIG) creates VMs from a template and guarantees a count, replacing any that fail a health check ("self-healing"). Why: it is the unit of scaling and resilience — the Google equivalent of an AWS Auto Scaling group.

A health check the group uses to detect unhealthy VMs

gcloud compute health-checks create http web-hc --port 80 --request-path /

Create the group from the template, with self-healing

gcloud compute instance-groups managed create web-mig \
  --template web-template --size 2 \
  --zone us-central1-a \
  --health-check web-hc --initial-delay 120

Autoscaling — grow and shrink automatically

Autoscaling changes the group size based on a signal — most commonly average CPU. You set a min, a max, and a target. Why: Google adds VMs when load rises and removes them when it falls, so you pay for capacity only when traffic justifies it.

Keep average CPU at 60%, between 2 and 6 VMs

gcloud compute instance-groups managed set-autoscaling web-mig \
  --zone us-central1-a \
  --min-num-replicas 2 --max-num-replicas 6 \
  --target-cpu-utilization 0.60 --cool-down-period 90

HTTP(S) load balancing

A load balancer gives clients one global address and spreads requests across healthy VMs in the group. Why several pieces: GCP's load balancer is assembled from a backend service (the group + health check), a URL map (routing), a proxy, and a forwarding rule (the public IP + port). Together they form one front door.

Name the group's port, then build the backend service on it

gcloud compute instance-groups managed set-named-ports web-mig \
  --named-ports http:80 --zone us-central1-a

gcloud compute backend-services create web-backend \
  --protocol HTTP --port-name http --health-checks web-hc --global

gcloud compute backend-services add-backend web-backend \
  --instance-group web-mig --instance-group-zone us-central1-a --global

Routing -> proxy -> public forwarding rule on port 80

gcloud compute url-maps create web-map --default-service web-backend

gcloud compute target-http-proxies create web-proxy --url-map web-map

gcloud compute forwarding-rules create web-rule --global \
  --target-http-proxy web-proxy --ports 80