DNS Server

Updated May 2026

CoreDNS Monitoring

Monitor CoreDNS query throughput, response-code distribution (NOERROR/NXDOMAIN/SERVFAIL/REFUSED), cache hit ratio, forward latency, and health-plugin failures in real time — via the native `prometheus` plugin on `:9153`.

Start Free Trial View Docs

Why monitor CoreDNS?

CoreDNS is the default cluster DNS in Kubernetes — every microservice depends on it for service discovery. When CoreDNS degrades, SERVFAIL spikes, or the forward plugin slows, it shows up as 'random connection refused' everywhere in the cluster. Monitoring catches DNS issues at the source instead of debugging downstream symptoms.

Auto-discovery via Xitogent — zero manual configuration

Native `:9153/metrics` Prometheus scraping

Per-rcode query distribution (NOERROR / NXDOMAIN / SERVFAIL / REFUSED / FORMERR)

Per-query-type tracking (A / AAAA / PTR / SRV / MX)

Cache hit ratio, cache size, cache entries per server block

Forward plugin upstream latency + connection cache hit rate

Panic detection (`coredns_panics_total`) and health-plugin failures

Plugin-level visibility (`coredns_plugin_enabled`, `coredns_build_info`)

Customizable alert thresholds for every metric

1-minute metric collection intervals out of the box

What is CoreDNS monitoring?

CoreDNS monitoring, explained

CoreDNS monitoring catches SERVFAIL spikes, cache hit-rate drops, forward-plugin latency, and panic-related restarts before they cascade into cluster-wide DNS resolution failures. Because every microservice depends on DNS for service discovery, an unmonitored CoreDNS is an unmonitored failure mode for your entire Kubernetes cluster — DNS issues show up as "random connection refused" everywhere. Xitoring auto-discovers your CoreDNS, scrapes :9153/metrics, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

Metrics

What we monitor

Queries / sec

Live DNS query throughput from `coredns_dns_requests_total`. Spikes flag DNS amplification attacks or traffic surges; sustained drops flag broken clients or misrouted ServiceMonitor scrapes.

Response Code Distribution

`coredns_dns_response_rcode_count_total` labeled by rcode — NOERROR (success), NXDOMAIN (non-existent), SERVFAIL (resolution failure), REFUSED (policy block), FORMERR (malformed).

SERVFAIL Rate

SERVFAIL responses per second. Spikes signal upstream resolver failures or forward plugin issues. PrometheusRule alert: > 5% of total query rate is the standard threshold.

Cache Hit Ratio

Computed as `coredns_cache_hits_total / (coredns_cache_hits_total + coredns_cache_misses_total)`. Target 80%+ on cluster DNS; below 50% means the cache TTLs are too short or the working set exceeds cache size.

Cache Size / Entries

`coredns_cache_size` (bytes) and `coredns_cache_entries` per cache type (success vs denial). Approaching the configured cache size triggers eviction.

Resolution Latency (histogram)

`coredns_dns_request_duration_seconds` histogram — track p50, p95, p99. Cluster DNS that drifts above 100ms p99 starts causing visible app slowness; alert on p99 > 500ms.

Forward Plugin Latency

`coredns_forward_request_duration_seconds` per upstream resolver. Separates CoreDNS-internal latency from upstream resolver latency — critical for diagnosing slow 8.8.8.8 vs slow CoreDNS itself.

Forward Request Rate

`coredns_forward_request_count_total` per upstream. Combined with cache hit ratio shows how much traffic actually leaves CoreDNS for upstream resolution.

Proxy Connection Cache

`coredns_proxy_conn_cache_hits_total` / `_misses_total`. Tracks TCP connection reuse to upstream resolvers — low hit rate means connection churn, raising upstream latency.

Health Plugin Failures

`coredns_health_request_failures_total` — the `health:8080` plugin's own failure count. Non-zero means the liveness probe is failing intermittently.

Panics

`coredns_panics_total` — any non-zero value is a CoreDNS bug or plugin crash that triggered a goroutine panic. Pair with restart count for full post-mortem context.

Go Runtime

`process_resident_memory_bytes` (RSS), `go_goroutines` (goroutine count — detects leaks), `go_gc_duration_seconds` (GC pause time). Memory growth without restarts = leak; goroutine count growth = blocked plugin or upstream.

Triggers & Alerts

Configurable alert triggers

Set up custom triggers in your dashboard to get notified the moment CoreDNS metrics cross your defined thresholds.

CoreDNS monitoring trigger configuration dashboard

SERVFAIL Rate

critical

Fires on high resolution failure rate.

Cache Hit Ratio

warning

Alerts when cache effectiveness drops.

Resolution Latency

warning

Triggers on slow DNS resolution.

Query Rate

warning

Fires on unusual query volume.

Importance of CoreDNS Monitoring

DNS is the foundation of network connectivity. Slow or failing DNS resolution impacts every service in your infrastructure.

Ensure fast DNS resolution
Detect SERVFAIL spikes immediately
Monitor cache for optimal performance
Track upstream resolver health

Why Choose Xitoring

Zero-config CoreDNS monitoring.

One-command install
Global nodes
Unified dashboard
Multi-channel alerts

Use cases

Common CoreDNS monitoring scenarios

Where CoreDNS typically runs today — and what could go wrong if no one's watching.

DNS inside a Kubernetes app

Every part of a Kubernetes app uses CoreDNS to find every other part. When it slows down or starts failing, users see strange, intermittent errors across the entire app. We catch the slowdown the moment it begins, so a small DNS hiccup doesn't surface to customers as a mysterious outage.

Large clusters with local DNS caches

Bigger Kubernetes setups put a small DNS cache on every server to keep things fast. When one of those caches misbehaves, only a slice of traffic breaks — making it hard to spot. We make sure each one is doing its job so a single bad node can't quietly degrade a fraction of your users.

Public-facing DNS for your domain

When CoreDNS is what answers DNS queries for your domain on the open internet, an outage means people can't reach your site at all. We watch the signals that prove the service is healthy and responding, so brand and revenue aren't quietly bleeding while DNS silently fails.

Before you start

Prerequisites for CoreDNS

Make sure you've got these in place — most installs are a 60-second job once they are.

CoreDNS 1.11.x / 1.12.x / 1.13.x running on the server (or in Kubernetes via the kube-system/coredns Deployment)
prometheus plugin enabled in your Corefile (default :9153)
Network reachability from Xitogent to the metrics endpoint (/metrics on :9153)

Setup Guide

Get started in minutes

Install Xitogent on your server

If you haven't already, install the lightweight Xitogent monitoring agent on the host running CoreDNS.

curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY

Enable the prometheus plugin in CoreDNS

CoreDNS exposes Prometheus-format metrics through its prometheus plugin (default endpoint:9153/metrics). Add `prometheus:9153` to your Corefile and reload CoreDNS, then confirm the metrics endpoint is reachable from the agent host.

sudo xitogent integrate

Enable the CoreDNS integration

Use the Xitoring dashboard or CLI to enable the CoreDNS integration. Xitogent auto-detects the metrics endpoint and starts collecting query, cache, and latency metrics.

Configure alert thresholds (optional)

Set custom thresholds for SERVFAIL Rate, Cache Hit Ratio, or Resolution Latency to get notified the moment DNS reliability or performance degrades.

Verify it's working

Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

sudo xitogent status

Compare

Considering alternatives?

See how Xitoring stacks up against the alternatives for CoreDNS monitoring — flat pricing, deeper integrations, and one agent that covers your whole stack.

Xitoring vs

Datadog

Pay-per-host pricing gets expensive fast at scale. See where Xitoring delivers the same coverage on a flat plan.

Xitoring vs

New Relic

Full-stack observability without the enterprise tiers, ingestion fees, or seat-based licensing.

Xitoring vs

Grafana Cloud

One tool with one price instead of stitching Prometheus, Loki, and Grafana into a stack you also have to monitor.

See all comparisons

Frequently asked questions

What is CoreDNS monitoring?

CoreDNS monitoring is the continuous collection of CoreDNS performance data from the native `prometheus` plugin — query throughput, response-code distribution (NOERROR/NXDOMAIN/SERVFAIL/REFUSED), cache hit ratio, forward-plugin latency, plugin chain enablement, and Go runtime stats — combined with alerting when those metrics breach thresholds. It's the canonical way to catch DNS degradation before it cascades into cluster-wide "random connection refused" errors.

How do I enable CoreDNS Prometheus metrics?

Add `prometheus:9153` to your Corefile (inside the root server block: `. { prometheus:9153;... }`), then reload CoreDNS. Verify with `curl http://localhost:9153/metrics`. In Kubernetes, the kube-system/coredns ConfigMap typically has it enabled by default — confirm with `kubectl -n kube-system get configmap coredns -o yaml`. Xitogent scrapes the same endpoint every 60 seconds.

What does the kubernetes plugin do?

The `kubernetes` plugin watches the Kubernetes API for Service, Endpoint, and Pod changes and synthesizes DNS records for them — `..svc.cluster.local` resolutions, headless service ESS records, pod IPs. It also enables service-discovery features like SRV records for named ports. Monitor it alongside `forward` (which handles external DNS) since they share the request pipeline.

How do I monitor CoreDNS cache hit ratio?

Compute it from Prometheus: `coredns_cache_hits_total / (coredns_cache_hits_total + coredns_cache_misses_total)` — target 80%+ for cluster DNS, 95%+ for NodeLocal DNSCache. Low hit ratios usually mean TTLs are too short (cluster DNS TTL is 30s by default — tunable) or the working set of unique queries exceeds cache size.

What does NXDOMAIN mean in CoreDNS metrics?

`NXDOMAIN` (non-existent domain) in `coredns_dns_response_rcode_count_total` means a queried name doesn't exist. Some NXDOMAIN is normal (typos, scanners); spikes flag misconfigured search domains, applications looking up nonexistent services, or DNS amplification attempts. SERVFAIL is more concerning — it means CoreDNS couldn't get an answer at all (upstream failure, plugin error).

How do I debug CoreDNS in Kubernetes?

Three layers: (1) check pod logs (`kubectl logs -n kube-system -l k8s-app=kube-dns`), (2) test resolution from inside a pod (`kubectl exec... -- nslookup kubernetes.default`), (3) read Prometheus metrics for SERVFAIL rate per-plugin. The `log` plugin can be temporarily added to the Corefile for per-query log output. Use `dnstap` for production-safe high-volume tracing without affecting query latency.

How do I monitor CoreDNS forward plugin latency?

Read the `coredns_forward_request_duration_seconds` histogram, labeled by upstream resolver address. Track p95 and p99 per upstream — slow upstreams show up here, separate from CoreDNS-internal latency. The `forward` plugin also exposes `coredns_forward_responses_total` per rcode for upstream-specific SERVFAIL rates. Alert on p99 > 500ms per upstream.

When should I use NodeLocal DNSCache?

Cluster size > ~100 nodes, or any cluster experiencing UDP conntrack races (intermittent DNS timeouts under load). NodeLocal DNSCache runs a CoreDNS-cache sidecar on every node binding `169.254.20.10:53`, eliminating the conntrack table entry per query. Cluster CoreDNS load typically drops 70–90%, and DNS p99 latency drops to local-disk speed. Monitor per-node hit rate (target 95%+).

What CoreDNS versions are supported?

CoreDNS 1.11.x, 1.12.x, and 1.13.x are fully supported. 1.12 added MCS-API multicluster service discovery, startup timeout config, and IPv6 hostname handling in the `kubernetes` plugin. 1.13.2 (Dec 2025) is the current stable. K8s 1.30+ ships CoreDNS 1.11.x by default; newer distros ship 1.12.x. Xitogent auto-detects the version and adapts.

Start monitoring CoreDNS today

Set up in under 60 seconds. No credit card required. Full metrics from day one.

Start Free Trial

Keep exploring

Related Integrations

Nginx

HAProxy