On an Ubuntu server, almost every "is the box healthy?" investigation ends at the process table. CPU spike, memory pressure, a port that won't bind, a service that says it's running but isn't actually doing anything — the truth is in /proc, and the tools below are just different lenses on the same data. This guide walks through the commands that are worth keeping in muscle memory, what each one is actually good for, and how to map a misbehaving process back to the port it's bound to, the files it has open, or the systemd unit that started it.
Everything below works on a default Ubuntu Server install (tested against 22.04 LTS and 24.04 LTS). procps and systemd are always present; a few of the nicer tools (htop, lsof, atop) are one apt install away.
1. ps — the workhorse
ps reads /proc once and prints a snapshot. It is the most portable and most scriptable way to look at processes, and it is what you should reach for first when you need to grep, sort, or feed the output into another command.
The two invocations to memorise:
ps aux # BSD-style: USER, %CPU, %MEM, VSZ, RSS, TTY, STAT, START, TIME, COMMAND
ps -ef # System V-style: UID, PID, PPID, C, STIME, TTY, TIME, CMD
Both list every process on the system. ps aux is easier when you care about resource usage (%CPU, %MEM, RSS); ps -ef is easier when you care about parent/child relationships (PPID).
Filter to one process by name
ps aux | grep -v grep | grep nginx
The grep -v grep removes the grep command itself from the output — a small but constant annoyance otherwise.
Sort by CPU or memory
ps aux --sort=-%cpu | head -n 10 # top 10 by CPU
ps aux --sort=-%mem | head -n 10 # top 10 by memory (RSS-based)
The leading - means descending. Without it you get the quiet processes first, which is rarely what you want.
Custom columns
ps -eo pid,ppid,user,stat,pcpu,pmem,etime,cmd --sort=-pcpu | head -n 15
-eo lets you pick exactly which columns to print. Useful ones:
pid,ppid— process and parent IDsuser— owning userstat— process state (see below)pcpu,pmem— CPU and memory percentagesetime— elapsed time since the process started (great for "how long has this been running?")rss— resident set size in KB (actual RAM used)cmdorargs— the full command line, including arguments
Process tree
ps -ef --forest
pstree -p # cleaner, shows PID; install with `apt install psmisc` if missing
The tree view is the fastest way to spot a process that has been re-parented to PID 1 (its real parent died) or a service that's spawning more children than expected.
Reading the STAT column
ps shows a one- or two-character state code:
- R — running or runnable
- S — interruptible sleep (the normal idle state for most processes)
- D — uninterruptible sleep, usually waiting on I/O. Long-lived
Dstate is a red flag — the process can't be killed and the kernel is waiting on disk or network - Z — zombie (exited but the parent hasn't reaped it)
- T — stopped (via SIGSTOP or a debugger)
- + — foreground process group
- s — session leader
- l — multi-threaded
A handful of D processes during a database checkpoint is normal. A growing pile of them usually means a storage problem.
2. top — the live view
top is the interactive equivalent of ps. It refreshes every few seconds and is the right tool when you want to watch something happen rather than take a snapshot.
top
The keys that matter once it's running:
P— sort by CPU (default)M— sort by memoryT— sort by total CPU time1— toggle per-CPU breakdown at the topc— toggle full command line vs. just the binary nameH— show individual threads instead of processesuthen a username — filter to one userkthen a PID — send a signal (default SIGTERM)W— write current view options to~/.toprcso they stickq— quit
For non-interactive use (logging, scripting) use batch mode:
top -b -n 1 | head -n 20 # one snapshot, top 20 lines
top -b -n 5 -d 2 > /tmp/top.log # 5 snapshots, 2 seconds apart
3. htop — the friendlier top
htop is an interactive process viewer with colour, mouse support, a tree view, and scrollable columns. On a fresh Ubuntu Server it's not installed by default:
sudo apt install htop
htop
Why reach for it over top:
F5toggles the tree view in place — much easier thanpstreeF6lets you click the column you want to sort byF3searches by name;F4filters by substringF9opens a signal menu (kill is a couple of keystrokes, not a typed PID)- The top bars show per-CPU load and memory at a glance
For day-to-day "what's running" work, htop is the most ergonomic option. For scripts and ssh-into-a-box-with-no-extras work, fall back to ps and top.
4. pgrep, pidof, pkill — finding (and killing) by name
ps | grep works but is fragile. The dedicated tools are cleaner.
pgrep nginx # list PIDs whose command name matches "nginx"
pgrep -a nginx # also show the command line
pgrep -u www-data # filter to a user
pgrep -f "node server.js" # match against the full command line, not just the name
pidof sshd # PIDs of an exact binary name (space-separated)
pkill mirrors pgrep but sends a signal instead of printing:
pkill -TERM nginx # SIGTERM (graceful) — same as `kill <pid>`
pkill -HUP nginx # SIGHUP — many daemons reload config on this
pkill -KILL stuck-process # SIGKILL — last resort; the kernel kills it, no cleanup
pkill -u www-data # everything owned by www-data
A SIGTERM gives the process a chance to flush buffers, close files, and exit cleanly. SIGKILL skips all of that — useful when a process is stuck in D state on a thread you don't care about, dangerous when it's holding state you do.
5. systemctl — when the process is a service
On Ubuntu, anything started at boot or managed long-term is almost certainly a systemd unit. Don't reach for ps first when investigating a service — start with systemd:
systemctl status nginx # state, PID, recent log lines, cgroup tree
systemctl is-active nginx # "active", "inactive", "failed", etc.
systemctl list-units --type=service --state=running
systemctl list-units --type=service --state=failed
systemctl status is particularly useful because it prints the cgroup tree for the service at the bottom — i.e. every child process the unit has spawned, which ps alone won't group for you.
Pair it with journalctl for the logs:
journalctl -u nginx --since "10 min ago"
journalctl -u nginx -f # follow, like `tail -f`
The common trap: a service shows active (running) and you assume it's healthy, but active only means "the main PID is alive." Always cross-check with an actual request or a metric. Process existence ≠ working service.
6. The /proc filesystem
Every running process has a directory under /proc/<pid>/. This is where all the tools above ultimately read from — and sometimes it is faster to look directly.
Useful files inside /proc/<pid>/:
cmdline— the full command line, null-separated.cat /proc/<pid>/cmdline | tr '\0' ' 'cwd— symlink to the process's current working directoryexe— symlink to the actual binary on disk (useful when the binary has been replaced or deleted —ls -l /proc/<pid>/exeshows... (deleted)in that case)environ— the environment variables.tr '\0' '\n' < /proc/<pid>/environstatus— human-readable summary (name, state, UIDs, memory, threads)fd/— every open file descriptor as a symlink.ls -l /proc/<pid>/fdis invaluable when a process is holding a deleted file open and not releasing disk spacelimits— the actual rlimits in force for that processio— bytes read/written by this process so farnet/tcpandnet/udp— sockets in the process's network namespace
When a tool gives you bad output (or you suspect the output is bad), /proc is the ground truth.
7. Which process is using that port?
Probably the single most common "find the process" question. Two tools, both built in or one apt install away:
sudo ss -tulpn # all listening TCP and UDP sockets, with PID/program
sudo ss -tlpn 'sport = :443' # only listeners on port 443
sudo lsof -i :443 # alternative; install with `apt install lsof`
sudo lsof -i -P -n | grep LISTEN # all LISTEN sockets, no DNS, no service-name lookup
ss ships with Ubuntu and is the modern replacement for netstat. The -p flag requires root to see the PID and command for sockets that aren't yours.
Reverse direction — given a PID, which sockets does it own:
sudo ss -tulpn | grep pid=12345
sudo lsof -p 12345 -i
8. Which process is hammering the disk?
top and htop don't show I/O by default. The right tool is iotop:
sudo apt install iotop
sudo iotop -oPa
The flags: -o only show processes actually doing I/O right now, -P show processes (not threads), -a accumulate values since iotop started rather than showing instantaneous rate (much easier to read on a busy box).
For per-process CPU/memory/I/O over a longer window, pidstat (from sysstat) is the scripting-friendly option:
sudo apt install sysstat
pidstat 5 6 # CPU, every 5s, 6 samples
pidstat -d 5 6 # disk I/O
pidstat -r 5 6 # memory faults / RSS
pidstat -u -p 12345 1 # one PID, every 1s, forever
9. Zombies, orphans, and runaway children
A zombie (state Z) is a process that has exited but whose parent hasn't called wait() to read its exit status yet. Zombies use almost no resources (just a PID and an entry in the task table), but they accumulate when the parent is buggy.
Find them:
ps -eo pid,ppid,stat,cmd | awk '$3 ~ /^Z/'
You can't kill a zombie — it is already dead. You have to fix or restart the parent. If the PPID is 1, the zombie will be reaped by systemd on its own; if it's a long-lived parent, restart that parent.
An orphan is a process whose parent died but who is still running. The kernel re-parents orphans to PID 1 (systemd on Ubuntu), so they're easy to spot — PPID == 1 for anything that wasn't started by systemd itself is a candidate. Most are harmless, but a long-running orphan with no logs and no controlling terminal is usually a sign that something restarted badly.
10. Beyond a single box
Everything above gives you a real-time view of one server. The moment you have more than a couple of hosts, or you want to know what was happening on the box at 3:14 AM when the alert fired, you need persistent metrics — process count over time, top consumers per minute, alerts when a critical service disappears.
That is the job of a monitoring agent. Xitoring's server monitoring ships with a lightweight agent that records process and resource metrics, alerts on dead services, and gives you a historical view per host without the operational overhead of running Prometheus, Grafana, and alertmanager yourself. You can keep using ps, top, and htop for live debugging — and let the agent answer the "what was happening last Tuesday at 3 AM?" questions.
Cheat sheet
| Question | Command |
|---|---|
| Snapshot of everything | ps aux |
| Top 10 by CPU | ps aux --sort=-%cpu | head -n 10 |
| Top 10 by memory | ps aux --sort=-%mem | head -n 10 |
| Process tree | ps -ef --forest or pstree -p |
| Live view | top or htop |
| Find PID by name | pgrep -a <name> |
| Send a signal by name | pkill -HUP <name> |
| Service status + child PIDs | systemctl status <unit> |
| Service logs | journalctl -u <unit> -f |
| What's on port N | sudo ss -tulpn | grep :<N> |
| Open files for a PID | sudo lsof -p <PID> |
| Disk I/O per process | sudo iotop -oPa |
| Long-term per-process stats | pidstat 5 |
| Full command line for a PID | tr '\0' ' ' < /proc/<PID>/cmdline |
| Zombies | ps -eo pid,ppid,stat,cmd | awk '$3 ~ /^Z/' |
Most servers only need three of these in muscle memory: ps aux, htop, and systemctl status. The rest are there for the day one of them isn't enough.