Make your app survive traffic spikes and failures: bake an instance template, run a managed instance group that autoscales and self-heals, and spread traffic across regions with an HTTP(S) load balancer.
One VM cannot handle a spike or survive its own failure. The fix: run several identical VMs that a load balancer spreads traffic across, and let a managed instance group keep the right number healthy. Why: your app stays up when a VM dies and grows when demand rises — automatically.
This lesson builds, in order: 1. an Instance Template (the blueprint for each VM) 2. a Managed Instance Group (keeps N healthy VMs, self-heals) 3. Autoscaling (changes N based on load) 4. an HTTP Load Balancer (spreads traffic across the group)
echo "Reuse my-vpc, central-subnet, and startup.sh from earlier lessons."An instance template captures everything needed to create a VM — machine type, image, network, startup script — so every VM in the group is identical. Why: a managed instance group needs one recipe to stamp out copies. Templates are immutable; you make a new one to change the config.
Create a template that installs nginx on boot
gcloud compute instance-templates create web-template \
--machine-type e2-micro \
--image-family debian-12 --image-project debian-cloud \
--network my-vpc --subnet central-subnet --tags web \
--metadata-from-file startup-script=startup.shA managed instance group (MIG) creates VMs from a template and guarantees a count, replacing any that fail a health check ("self-healing"). Why: it is the unit of scaling and resilience — the Google equivalent of an AWS Auto Scaling group.
A health check the group uses to detect unhealthy VMs
gcloud compute health-checks create http web-hc --port 80 --request-path /Create the group from the template, with self-healing
gcloud compute instance-groups managed create web-mig \
--template web-template --size 2 \
--zone us-central1-a \
--health-check web-hc --initial-delay 120Autoscaling changes the group size based on a signal — most commonly average CPU. You set a min, a max, and a target. Why: Google adds VMs when load rises and removes them when it falls, so you pay for capacity only when traffic justifies it.
Keep average CPU at 60%, between 2 and 6 VMs
gcloud compute instance-groups managed set-autoscaling web-mig \
--zone us-central1-a \
--min-num-replicas 2 --max-num-replicas 6 \
--target-cpu-utilization 0.60 --cool-down-period 90A load balancer gives clients one global address and spreads requests across healthy VMs in the group. Why several pieces: GCP's load balancer is assembled from a backend service (the group + health check), a URL map (routing), a proxy, and a forwarding rule (the public IP + port). Together they form one front door.
Name the group's port, then build the backend service on it
gcloud compute instance-groups managed set-named-ports web-mig \
--named-ports http:80 --zone us-central1-agcloud compute backend-services create web-backend \
--protocol HTTP --port-name http --health-checks web-hc --globalgcloud compute backend-services add-backend web-backend \
--instance-group web-mig --instance-group-zone us-central1-a --globalRouting -> proxy -> public forwarding rule on port 80
gcloud compute url-maps create web-map --default-service web-backendgcloud compute target-http-proxies create web-proxy --url-map web-mapgcloud compute forwarding-rules create web-rule --global \
--target-http-proxy web-proxy --ports 80