Web & Application Servers

Updated May 2026

Varnish Monitoring

Monitor Varnish `MAIN.cache_hit` ratio, backend health, `MAIN.thread_queue_len`, `MAIN.n_lru_nuked` evictions, session drops, and storage headroom in real time — agent-based via `varnishstat`.

Start Free Trial View Docs

Why monitor Varnish?

Varnish accelerates HTTP by orders of magnitude — but when cache hit ratios drop, backends fail, or threads exhaust, a Varnish issue becomes a site-wide outage. Because Varnish sits between users and your origin tier, monitoring it well means catching most cache-layer incidents in their first minute.

Auto-discovery via Xitogent — zero manual configuration

`MAIN.cache_hit` vs `MAIN.cache_miss` ratio tracking with per-VCL breakdown

Per-backend health monitoring via probe director state

Object storage usage (`SMA.s0.g_bytes` / `g_space`) and eviction rate (`MAIN.n_lru_nuked`)

Request throughput (`MAIN.client_req`) and backend traffic split (`MAIN.backend_req`, `s_pipe`, `s_pass`)

Thread pool telemetry: count, queue length, failed creations

Session metrics: `MAIN.sess_conn`, `MAIN.sess_dropped`

Ban list length and active VCL count

Customizable alert thresholds for every metric

1-minute metric collection intervals out of the box

What is Varnish monitoring?

Varnish monitoring, explained

Varnish monitoring catches cache hit-ratio drops, backend health failures, and thread-pool exhaustion before they turn into user-visible latency or outages. Since Varnish typically sits in front of WordPress, Magento, or your origin tier, a Varnish issue is usually a site-wide issue — monitoring it well means catching most cache-layer incidents in their first minute. Xitoring auto-discovers your Varnish, reads from varnishstat, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

Metrics

What we monitor

Cache Hit Ratio (MAIN.cache_hit / cache_miss)

Percentage of requests served from cache. The headline ranking signal — well-tuned WordPress + Varnish setups sit at 90%+.

MAIN.client_req

Total client-facing requests handled by Varnish (rate-derived per second). Compare with `MAIN.backend_req` to see how much traffic the cache absorbs.

MAIN.backend_req / backend_conn

Requests forwarded to origin and connections opened. Spikes here mean cache invalidation storms, low hit ratio, or backend warming after restart.

MAIN.backend_unhealthy

Backends marked sick by probe directors. Any non-zero value during normal traffic is a hard signal — origin is failing health checks.

MAIN.thread_queue_len

Requests waiting for a free worker thread. Sustained non-zero values mean the thread pool is exhausted — bump `thread_pool_max` or split into more pools.

MAIN.threads / threads_failed

Total worker threads vs failed thread creations. Failed threads indicate hitting OS or `thread_pools` limits — Varnish drops connections when threads can't be created.

MAIN.sess_conn / sess_dropped

Total sessions accepted vs dropped (queue full). Any non-zero `sess_dropped` rate is a thread-pool capacity alert.

MAIN.n_lru_nuked

Objects evicted from cache to make room for new ones. High values mean your `-s` storage is undersized for the working set — leading indicator of falling hit ratio.

MAIN.n_object

Objects currently in cache. Tracks against `n_objectcore`/`n_objecthead` for cache-tuning insight on object overhead.

SMA.s0.g_bytes / g_space

Storage in use vs available for the default storage backend. When `g_bytes / (g_bytes + g_space)` approaches 100%, Varnish starts evicting.

MAIN.s_pipe / s_pass

Requests piped (TCP tunnel) vs passed (origin-direct, no caching). High `s_pass` rates often surface VCL `return(pass)` rules that should be `return(hash)`.

Ban List Length

Active VCL bans not yet evicted. A growing ban list slows cache lookups — should converge to near zero as the ban-lurker thread evicts banned objects.

Triggers & Alerts

Configurable alert triggers

Set up custom triggers in your dashboard to get notified the moment Varnish metrics cross your defined thresholds.

Varnish monitoring trigger configuration dashboard

Cache Hit Ratio

warning

Fires when hit ratio drops below threshold.

Backend Down

critical

Alerts when a backend server fails health checks.

Object Evictions

warning

Triggers on high eviction rate indicating cache pressure.

Thread Pool

critical

Fires when thread pool is exhausted.

Request Rate

warning

Alerts on unusual request throughput.

Importance of Varnish Monitoring

Varnish Cache can serve content 300x faster than origin servers. Without monitoring, cache misses and backend failures negate these benefits.

Maintain high cache hit ratios for optimal speed
Detect backend failures immediately
Track evictions to right-size cache storage
Monitor thread pools to prevent request drops

Why Choose Xitoring

Enterprise-grade Varnish monitoring with zero-config setup.

One-command install
15+ global monitoring nodes
Unified dashboard
Multi-channel alerting
Historical data retention

Use cases

Common Varnish monitoring scenarios

Where Varnish typically runs today — and what could go wrong if no one's watching.

Speeding up WordPress and content sites

Varnish keeps content sites loading nearly instantly by remembering finished pages. When that effect stops working, the site quietly gets slow and search rankings begin to slip. We catch the dip the moment it begins so traffic and SEO aren't quietly hurt.

Online stores at checkout

Online stores need to stay fast during the exact moments customers are buying — even when traffic spikes. We watch the signals that show whether the store can absorb a rush, so promotions and sales don't turn into lost revenue.

Caching for APIs and microservices

When Varnish caches results for an internal API, it keeps the underlying apps from being overwhelmed by repeated requests. We watch for the moment it starts struggling under burst load so capacity can be raised before the apps behind it start failing.

Before you start

Prerequisites for Varnish

Make sure you've got these in place — most installs are a 60-second job once they are.

Varnish Cache 6.x or 7.x (Varnish Enterprise also supported)
varnishstat binary available on the system PATH
Read access to the Varnish shared memory log (typically /var/lib/varnish — granted by default for root)

Setup Guide

Get started in minutes

Install Xitogent on your Varnish host

Install the lightweight Xitogent monitoring agent on the host running Varnish Cache. Xitogent runs as root, so it can read Varnish's shared memory directly with no extra group membership.

curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY

Verify varnishstat is available

Confirm the `varnishstat` binary is on PATH and returns counters. Run `varnishstat -1` on the host — you should see a snapshot of cache, backend, and session metrics.

varnishstat -1

Enable the Varnish integration

Run `sudo xitogent integrate` and select Varnish. Xitogent will test the connection and auto-detect your Varnish instance and configured backends — the rest is set up automatically.

sudo xitogent integrate

Configure alert thresholds (optional)

Set custom thresholds for Cache Hit Ratio, Backend Down events, or Object Evictions to catch cache regressions and capacity issues before users see uncached responses.

Verify it's working

Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

sudo xitogent status

Compare

Considering alternatives?

See how Xitoring stacks up against the alternatives for Varnish monitoring — flat pricing, deeper integrations, and one agent that covers your whole stack.

Xitoring vs

Datadog

Pay-per-host pricing gets expensive fast at scale. See where Xitoring delivers the same coverage on a flat plan.

Xitoring vs

New Relic

Full-stack observability without the enterprise tiers, ingestion fees, or seat-based licensing.

Xitoring vs

Grafana Cloud

One tool with one price instead of stitching Prometheus, Loki, and Grafana into a stack you also have to monitor.

See all comparisons

Frequently asked questions

What is Varnish monitoring?

Varnish monitoring is the continuous collection of Varnish Cache counters from `varnishstat` — `MAIN.cache_hit/cache_miss` ratio, backend health and request counts, thread pool usage, session metrics, object eviction (`n_lru_nuked`), and storage headroom (`SMA.s0.g_bytes`/`g_space`) — combined with alerting when those counters breach thresholds. The cache hit ratio is the headline; `n_lru_nuked` and `thread_queue_len` are the leading indicators of degradation.

How do I check Varnish cache hit ratio?

Run `varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss` for a snapshot, or watch live with `varnishstat -f MAIN.cache_hit -f MAIN.cache_miss`. Hit ratio = `cache_hit / (cache_hit + cache_miss)`. For per-URL detail, use `varnishtop -i ReqURL` and `varnishlog -q 'ReqHeader:Cache-Control'`. Xitogent computes and trends the ratio automatically — alert when it drops below your baseline (90% is typical for well-tuned WordPress).

How do I monitor Varnish backend health?

Backend health comes from probe directors configured in your VCL. `MAIN.backend_unhealthy` counts probe failures across all backends; per-backend state is in `varnishadm backend.list`. Xitogent reads both — alert on any non-zero `backend_unhealthy` rate during normal traffic, and on `backend.list` state changes from `Healthy` to `Sick`.

What does n_lru_nuked mean?

`MAIN.n_lru_nuked` counts objects Varnish evicted from cache to make room for new ones (LRU = Least Recently Used). Sustained growth means your storage backend (`-s malloc,XGB` or MSE) is too small for the working set — Varnish is constantly recycling cache space, which drives the hit ratio down. Either grow storage, tune TTLs, or move infrequently-accessed objects out of cache via VCL.

How do I monitor Varnish thread pool exhaustion?

Three counters: `MAIN.thread_queue_len` (requests waiting for a worker — should be near zero), `MAIN.threads_failed` (failed thread creation — should be zero), and `MAIN.sess_dropped` (sessions rejected because queue is full — should be zero). Any non-zero rate on any of them means the thread pool is saturated. Bump `thread_pool_max` and `thread_pools` in your Varnish startup args, or scale horizontally.

Can I integrate Varnish with Prometheus and Grafana?

Yes — community Prometheus exporters for Varnish read the same `varnishstat` data. Xitogent reads it directly (no exporter required) but the integration is fully compatible with environments already running a Prometheus exporter to feed Grafana dashboards.

Varnish Cache vs Varnish Enterprise — what's monitored differently?

The core `varnishstat` counter set is identical — `MAIN.*`, `SMA.*`, backend metrics, thread pool, sessions. Varnish Enterprise adds the MSE (Massive Storage Engine) backend with its own counter namespace (`MSE.*`), per-tenant statistics, and additional VMODs. Xitogent reads everything `varnishstat` exposes, so Enterprise's extra counters surface automatically when present.

Will the integration affect Varnish performance?

No measurable impact. Xitogent reads `varnishstat` output (which itself reads from Varnish's shared memory log without locking) on a 1-minute interval. There is no instrumentation in the request path and no extra disk I/O — the shared memory log already exists for Varnish's own diagnostics.

Can I monitor multiple Varnish instances on one server?

Yes. Pass each instance's name with `varnishstat -n ` (matching the `-n` argument used to start `varnishd`). Xitogent auto-discovers each instance and tracks them separately in the dashboard with their own metrics, alerts, and history — useful for multi-tenant or split-traffic setups.

Start monitoring Varnish today

Set up in under 60 seconds. No credit card required. Full metrics from day one.

Start Free Trial

Keep exploring

Related Integrations