"How is the CPU?" is one of those questions Linux can answer at five different levels of detail, and the right answer depends on what you're actually trying to find out. Total system load is one number; per-core saturation is another; the specific process eating the CPU is a third; CPU pressure (whether things are waiting for CPU even when usage looks normal) is a fourth. Use the wrong tool and you get a wrong answer that looks right.
This guide walks through how to monitor CPU usage on Linux at every level — interactive, scripted, and continuous — using the standard tools (top, htop, mpstat, pidstat, perf, /proc/stat, /proc/pressure/cpu). It covers what each field actually means, the common misreadings (load average ≠ CPU usage, %CPU > 100 is normal, iowait is not idle), and how to wire CPU into continuous monitoring so a slow drift is caught before it becomes an outage.
What "CPU usage" actually means
The kernel categorises every CPU tick into one of several states. top, htop, and friends sum these up and present them as percentages. Knowing the categories saves a lot of confusion:
| Field | What the CPU was doing |
|---|---|
| us (user) | Running unprivileged process code |
| sy (system / kernel) | Running kernel code on behalf of a process (syscalls) |
| ni (nice) | Running renice'd / low-priority user code |
| id (idle) | Doing nothing |
| wa (iowait) | Idle, but at least one process is blocked on disk I/O |
| hi (hardirq) | Servicing hardware interrupts |
| si (softirq) | Servicing software interrupts (network packet processing, etc.) |
| st (steal) | The hypervisor took the CPU away from the guest (VMs only) |
| gu (guest) | Running a guest OS via KVM (host only) |
Two important facts most guides skip:
iowaitis not CPU activity — it's idle time while waiting on disk. A host with 80% iowait isn't CPU-busy; it's disk-starved with the CPU twiddling its thumbs. Treating high iowait as "high CPU usage" leads to the wrong fix.stealonly appears on VMs. Sustainedst > 1%means the hypervisor is overcommitted and your guest is being throttled. The fix is at the hypervisor (or with your cloud provider), not inside the guest.
Total "CPU usage" in the everyday sense is 100 - id - wa — anything that isn't idle and isn't waiting on disk.
Interactive: top, htop, atop
top — installed everywhere
top
Default header:
top - 14:22:01 up 12 days, ... load average: 0.85, 0.72, 0.65
Tasks: 213 total, 2 running, 211 sleeping
%Cpu(s): 3.4 us, 1.2 sy, 0.0 ni, 95.0 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 16031.4 total, ...
Useful keystrokes inside top:
1— toggle between summary%Cpu(s)and per-core breakdown. Per-core is what you want when diagnosing single-thread saturation.P— sort processes by CPU.H— show threads instead of processes (so you can see which thread of a multi-threaded process is hot).c— show full command line (helpful when manypythonprocesses need disambiguating).othenCOMMAND=foo— filter to processes matching a command.E/e— change memory units (G/M/K).
htop — colourful, mouse-friendly
sudo apt install htop # Debian/Ubuntu
sudo dnf install htop # RHEL/CentOS/Rocky/Alma
Three things htop does better than top:
- Per-core bars at the top by default — visually obvious which core is hot.
- Tree view (
F5) — see parent/child process relationships, useful when a runaway worker has many children. - Filter and search with
F3/F4instead of rememberingtop's key bindings.
The bar colours encode the same us / sy / ni / iowait categories — hover or look at the legend at the bottom.
atop — historical CPU usage
sudo apt install atop
sudo systemctl enable --now atop
atop is the one to know about: it logs to /var/log/atop/ every 10 minutes by default, so you can go back in time. After an incident:
atop -r /var/log/atop/atop_$(date +%Y%m%d)
# Press 't' to step forward 10 min, 'T' to step back, 'b' to jump to a time
That historical view is the difference between "the CPU was high yesterday at 03:00" being a guess and being a reading.
Per-core: mpstat
When the summary %Cpu(s) looks fine but the system feels slow, the cause is often a single core pegged while others sit idle — typical for single-threaded workloads.
sudo apt install sysstat # Debian/Ubuntu
sudo dnf install sysstat # RHEL family
mpstat -P ALL 1 5 # all CPUs, every 1 second, 5 samples
Sample output:
14:22:30 CPU %usr %nice %sys %iowait %irq %soft %steal %idle
14:22:31 all 12.5 0.00 1.5 0.0 0.0 0.0 0.0 86.0
14:22:31 0 98.0 0.00 2.0 0.0 0.0 0.0 0.0 0.0 ← pegged
14:22:31 1 0.5 0.00 0.5 0.0 0.0 0.0 0.0 99.0
14:22:31 2 0.5 0.00 0.0 0.0 0.0 0.0 0.0 99.5
14:22:31 3 1.0 0.00 0.5 0.0 0.0 0.0 0.0 98.5
CPU 0 is at 100%; the other three are idle. The "system summary" is only ~25% — averaging hides the real problem. mpstat -P ALL is the right tool for this every time.
Per-process and per-thread: pidstat, top -H
To find which process is responsible:
pidstat -u 1 5 # CPU per process, every 1s, 5 samples
Or only show non-idle processes:
pidstat -u -p ALL 1 5 | awk '$8 > 0'
For threads inside a process (multi-threaded apps, JVM, Python with threads):
pidstat -t -p <PID> 1 5
# or
top -H -p <PID>
%CPU > 100 is normal here — a process on 4 cores can show up to 400%, one CPU per core it's saturating.
To find the single hottest process system-wide right now:
ps -eo pid,user,%cpu,comm --sort=-%cpu | head -10
To see the busiest threads system-wide right now:
ps -eLo pid,tid,%cpu,comm --sort=-%cpu | head -10
Load average vs CPU usage (the classic confusion)
Load average is one of the most misread metrics on Linux. It is not CPU utilisation — it's the average number of processes that are either running or waiting for CPU or waiting for disk I/O (uninterruptible sleep), over the last 1, 5, and 15 minutes.
That last clause is the one that trips people up: a host with all CPUs idle but a slow disk can show a load average of 8 because eight processes are blocked on I/O. Load average looks high; CPU usage looks low; both are correct.
Reading load average:
uptime
# 14:22:01 up 12 days, 3:15, 1 user, load average: 0.85, 0.72, 0.65
The three numbers are 1-, 5-, and 15-minute averages. Compared against CPU count:
- Load < CPU count → headroom; CPU is not the bottleneck.
- Load ≈ CPU count → fully utilised but not queueing.
- Load > CPU count → the queue is growing; either CPU or I/O is overcommitted.
To know which (CPU or I/O), look at iowait and disk metrics alongside load. High load + low iowait + high %CPU = real CPU saturation. High load + high iowait + low %CPU = disk saturation pretending to be CPU.
CPU pressure (PSI) — the modern kernel signal
Linux kernel 4.20+ added Pressure Stall Information (PSI), exposed at /proc/pressure/cpu. It answers a different question than CPU utilisation: "how often were tasks stalled because the CPU wasn't available?"
cat /proc/pressure/cpu
# some avg10=2.34 avg60=1.85 avg300=1.21 total=147823456
avg10 / avg60 / avg300 are the percentage of time at least one task was waiting for CPU, averaged over 10, 60, and 300 seconds. The benefit over raw %CPU: a 100% utilised host that's keeping up shows low pressure; a host that's "only" 80% busy but with bursts that exceed capacity shows high pressure. Pressure is the better leading indicator of "users are about to feel slowness".
Cgroup-aware variant (per container / per service):
cat /sys/fs/cgroup/<cgroup-path>/cpu.pressure
I/O is not CPU: but %iowait lives in the CPU view
A reminder: the wa field in top and mpstat looks like CPU usage but is not. To see if disk is your real bottleneck:
iostat -xz 1 5
Sample output:
Device r/s w/s rkB/s wkB/s await %util
sda 5.0 12.0 200.0 480.0 8.5 18.0 ← healthy
nvme0 2.0 1500 8.0 60000.0 98.0 99.5 ← saturated
%util near 100% and high await means disk-bound. The fix is in storage (faster device, better scheduler, less write amplification), not in CPU.
For a process-level view:
sudo apt install iotop
sudo iotop -oP # only show processes actually doing I/O
Going deeper: perf
When CPU is saturated and you need to know why — which function inside the hot process is burning cycles — perf is the tool:
sudo apt install linux-tools-common linux-tools-$(uname -r) # Ubuntu
sudo dnf install perf # RHEL family
Live system-wide profile:
sudo perf top -F 99
Snapshot a process for 30 seconds and report the hot functions:
sudo perf record -F 99 -p <PID> -g -- sleep 30
sudo perf report --stdio --no-children --sort=overhead -g none | head -30
Hardware counters (cache misses, branch mispredicts, instructions retired):
sudo perf stat -p <PID> -- sleep 10
perf is the difference between "PHP-FPM is at 100% CPU" and "PHP-FPM is at 100% CPU spending 60% of cycles in OPcache lookup because OPcache is misconfigured". When CPU saturation is the real problem, perf finds the actual cause.
Reading /proc/stat directly (for scripts)
If you need a script-friendly way to compute CPU usage, read the kernel's source of truth:
cat /proc/stat | head -5
# cpu 3892 0 4527 1232123 12 0 153 0 0 0
# cpu0 1024 0 1183 314023 ...
# cpu1 ...
The fields are cumulative ticks since boot in this order: user nice system idle iowait irq softirq steal guest guest_nice. To compute usage, sample twice and diff:
read_cpu() { awk '/^cpu / { print $2+$3+$4+$5+$6+$7+$8+$9, $5 }' /proc/stat; }
read t1 i1 < <(read_cpu); sleep 1; read t2 i2 < <(read_cpu)
echo "CPU usage: $(awk -v t1="$t1" -v i1="$i1" -v t2="$t2" -v i2="$i2" \
'BEGIN{ printf "%.1f%%\n", (1 - (i2-i1)/(t2-t1))*100 }')"
That snippet works in any shell environment with awk, no extra packages — useful for embedded/minimal systems.
Continuous monitoring (production)
Reading CPU once is diagnosis. Catching CPU saturation before requests start timing out is monitoring — and that needs continuous collection plus alerting plus history.
What to alert on
%CPU > 80%sustained for > 5 minutes — typical saturation threshold; tune up to 90% for batch / high-throughput workloads, down to 70% for latency-sensitive services.- CPU pressure
avg60 > 10%— a leading indicator of latency before utilisation maxes out. %steal > 1%sustained (VMs only) — hypervisor overcommit; talk to your cloud provider or move workloads.- Per-core saturation — alert when any core is > 95% sustained, not just when the system average is high. Single-threaded hot paths cause real outages while the average looks fine.
- Load average > N × cores sustained — usually 2× CPU count is the threshold for "queue is growing fast enough to matter".
What not to alert on
iowaitalone. iowait > 0 doesn't mean CPU is busy. Alert on iowait through your disk metrics, not your CPU metrics.- Single 1-second spikes. CPU spikes on a healthy system constantly. Aggregate over at least 1–5 minutes before paging.
With Xitoring
Install Xitogent on the host. Once running, CPU usage flows to the dashboard alongside per-core, load average, memory, disk, and network — no extra plugins. You can:
- Open the host in the dashboard and view current and historical CPU (per core, per process where collected).
- Set thresholds (e.g. "alert when CPU > 85% for 5 minutes" or "alert when CPU pressure avg60 > 15%") routed to email, SMS, Slack, PagerDuty, or any other channel.
- Correlate CPU spikes with disk, network, and process metrics on the same timeline — usually answers "is this CPU, I/O, or memory pressure?" in seconds.
For the CPU's thermal twin (which often correlates with sustained high usage), see the companion guide: How to monitor CPU temperature on Windows / Linux.
With Prometheus / node_exporter
node_exporter exposes everything you need:
node_cpu_seconds_total— sum and by-mode CPU time. Userate()over it for percentage utilisation.node_load1/node_load5/node_load15— load averages.node_pressure_cpu_waiting_seconds_total— PSI pressure (kernel 4.20+, fairly recentnode_exporter).
Example PromQL alert:
- alert: HighCpuSustained
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels: { severity: warning }
annotations:
summary: "CPU > 85% on {{ $labels.instance }}"
Operational tips
- Watch per-core, not just system summary. A 25% summary average on a 4-core box can be one core at 100%.
mpstat -P ALL 1is the right view;htop's top bars are the equivalent. %CPU > 100is correct for processes, capped at100 × cores. A process showing 380% on a 4-core host is using 95% of all cores — that's the signal you want, not "the number is wrong".niceandcpulimitwork, but rarely fix root cause. Lowering a runaway process's priority makes the host responsive again; it doesn't make the runaway process finish faster. Use them as triage, not as the fix.- CPU governor matters on bare metal.
cpupower frequency-infoshows the active governor.powersave(default on many laptops and some servers) downclocks aggressively and can show "high CPU usage" because the CPU runs at half speed.performanceis the right governor for servers under load. - NUMA effects look like CPU but are memory. Multi-socket servers running a workload that crosses NUMA nodes show high CPU with low instructions-per-cycle.
numactl --hardwareto confirm topology;numactl --cpubind=0 --membind=0 …to pin a workload to a single node. - Containers complicate the view. Inside a container,
topmay show host CPU counts and host idle even when the container is throttled. Read/sys/fs/cgroup/cpu.stat(nr_throttled,throttled_time) to see whether your container is hitting its CPU limit. - Capture before you change. If you're going to change six settings to "fix" CPU, save
top -b -n 1,mpstat -P ALL 5 1, and aperfsnapshot first. The diff after changes tells you what actually helped.
Troubleshooting
topshows 100% CPU butpidstatshows nothing busy. Likely a kernel thread (kworker, ksoftirqd) handling interrupts. Checkmpstat -I ALL 1for interrupt rates andcat /proc/interrupts | sort -k2 -n | tailfor the busiest sources — usually network or disk.- Sustained high
%sy(system / kernel) without a clear user process. Often a misbehaving driver, heavy syscall use, or contention on a kernel lock.perf topwill show the kernel function eating the time. - High
%si(softirq) on a network-heavy host. Common on machines doing high packet rates without RPS/RFS tuning. Check/proc/softirqsfor per-CPU softirq counts; if all softirqs land on CPU 0, enable RPS so they spread. stealis non-zero on a VM you "own". Hypervisor overcommit. On a public cloud, it usually means a noisy neighbour; the fix is to migrate the instance, resize, or use a dedicated tier.%CPUis low but the box feels slow. Check/proc/pressure/cpufor pressure, theniostat -x 1for disk, thenvmstat 1for memory pressure. CPU rarely lies; usage just isn't always the right metric.- Process at 100% CPU with
Sstate inps. The process is sleeping but a thread inside it is busy. Usetop -H -p <PID>to find the hot thread.
Summary
To monitor CPU usage on Linux:
- Start with
toporhtopfor an interactive view. Press1intopto see per-core.Hto see threads. - Use
mpstat -P ALL 1when the summary looks fine but a single core is pegged. The most-missed view in CPU debugging. - Use
pidstat -u 1 5(ortop -H) to find which process / thread is responsible. - Read load average alongside
iowait— high load + high iowait is disk, not CPU. Don't conflate them. - Check
/proc/pressure/cpuon modern kernels — pressure is a better latency signal than raw utilisation. - Reach for
perfwhen the host is saturated and you need to know why — function-level profile, not just process-level. - Wire CPU into continuous monitoring. Per-core and per-process trends, alerting on > 85% sustained for 5 minutes (or pressure > 10% sustained), correlated with disk and network on the same dashboard. Xitogent does this with one install; Prometheus +
node_exporterdoes it with a stack. - Skip the misreadings.
iowaitis not CPU activity. Load average isn't utilisation.%CPU > 100for a process is normal. Steal only matters on VMs. Once those four are internalised, most CPU mysteries resolve quickly.
CPU is one of the cheapest signals to collect and one of the most diagnostic when something goes sideways. With per-core, per-process, and pressure all in your dashboard, the next "why is this host slow?" question usually answers itself in seconds — not because CPU caused it, but because one fast glance at the four CPU views can rule it in or out.