When something breaks, find it fast — read events and pod status, stream logs, measure CPU and memory with kubectl top, and work through a repeatable triage routine.
Why: most problems announce themselves in two places. get pods shows the STATUS (Pending, ImagePullBackOff, CrashLoopBackOff…) — the symptom. describe shows the Events at the bottom — usually the cause, in plain language: a missing image, a failed mount, no node with room. Always start here.
The symptom: what state is each pod in?
kubectl get podsThe cause: read the Events at the bottom of describe
kubectl describe pod <pod-name>A live feed of cluster events, newest last
kubectl get events --sort-by=.lastTimestampWhy: logs are where an app explains itself. -f follows live. The crucial flag is --previous: when a pod is in CrashLoopBackOff, the current container is too new to have logs — --previous shows the output of the instance that just crashed, which is the one that holds the error.
Follow a running container's logs
kubectl logs -f <pod-name>The logs of the container that JUST crashed (the key flag)
kubectl logs <pod-name> --previousA specific container in a multi-container pod
kubectl logs <pod-name> -c <container-name>Why: "the cluster is slow" needs numbers. kubectl top shows live CPU and memory per node and per pod, so you can spot the pod eating a node or the node running hot. Note: it needs the metrics-server installed — the same component the HPA depends on.
Which nodes are under pressure?
kubectl top nodesWhich pods are using the most CPU / memory?
kubectl top pods -A --sort-by=memoryWhy: when logs are not enough, get closer. exec opens a shell in the container to inspect it live. For a minimal image with no shell, kubectl debug attaches a temporary container with your tools into the running pod — without rebuilding the image.
Open a shell in the container
kubectl exec -it <pod-name> -- shAttach an ephemeral debug container with tools (distroless images)
kubectl debug -it <pod-name> --image=busybox --target=appNote: when a deploy goes wrong, work the same order every time and you will find most issues in under a minute. 1) kubectl get pods — what is the status? 2) kubectl describe — what do the Events say? 3) kubectl logs --previous — what did it print before dying? 4) kubectl top — is it resource-starved? 5) check the Service/Ingress only once the pods are healthy. Symptom, then cause, then proof.
1. get pods ─ the STATUS (the symptom)
2. describe pod ─ the Events (usually the cause)
3. logs --previous ─ the app's own error before it crashed
4. top pods ─ is it OOMKilled / CPU-starved?
5. get svc / endpoints ─ only after the pods are Ready