← Kubernetes Course15 / 16

Observability & Debugging

When something breaks, find it fast — read events and pod status, stream logs, measure CPU and memory with kubectl top, and work through a repeatable triage routine.

Ad 728×90

Read status and events first

Why: most problems announce themselves in two places. get pods shows the STATUS (Pending, ImagePullBackOff, CrashLoopBackOff…) — the symptom. describe shows the Events at the bottom — usually the cause, in plain language: a missing image, a failed mount, no node with room. Always start here.

The symptom: what state is each pod in?

kubectl get pods

The cause: read the Events at the bottom of describe

kubectl describe pod <pod-name>

A live feed of cluster events, newest last

kubectl get events --sort-by=.lastTimestamp

Logs, including the crashed container

Why: logs are where an app explains itself. -f follows live. The crucial flag is --previous: when a pod is in CrashLoopBackOff, the current container is too new to have logs — --previous shows the output of the instance that just crashed, which is the one that holds the error.

Follow a running container's logs

kubectl logs -f <pod-name>

The logs of the container that JUST crashed (the key flag)

kubectl logs <pod-name> --previous

A specific container in a multi-container pod

kubectl logs <pod-name> -c <container-name>

Measure resource usage with top

Why: "the cluster is slow" needs numbers. kubectl top shows live CPU and memory per node and per pod, so you can spot the pod eating a node or the node running hot. Note: it needs the metrics-server installed — the same component the HPA depends on.

Which nodes are under pressure?

kubectl top nodes

Which pods are using the most CPU / memory?

kubectl top pods -A --sort-by=memory

Debug inside and around a pod

Why: when logs are not enough, get closer. exec opens a shell in the container to inspect it live. For a minimal image with no shell, kubectl debug attaches a temporary container with your tools into the running pod — without rebuilding the image.

Open a shell in the container

kubectl exec -it <pod-name> -- sh

Attach an ephemeral debug container with tools (distroless images)

kubectl debug -it <pod-name> --image=busybox --target=app

A triage routine

Note: when a deploy goes wrong, work the same order every time and you will find most issues in under a minute. 1) kubectl get pods — what is the status? 2) kubectl describe — what do the Events say? 3) kubectl logs --previous — what did it print before dying? 4) kubectl top — is it resource-starved? 5) check the Service/Ingress only once the pods are healthy. Symptom, then cause, then proof.

1. get pods            ─ the STATUS (the symptom)
  2. describe pod        ─ the Events (usually the cause)
  3. logs --previous     ─ the app's own error before it crashed
  4. top pods            ─ is it OOMKilled / CPU-starved?
  5. get svc / endpoints ─ only after the pods are Ready