DNS Server
    Updated May 2026
    CoreDNS logo

    CoreDNS Monitoring

    Monitor CoreDNS query throughput, response-code distribution (NOERROR/NXDOMAIN/SERVFAIL/REFUSED), cache hit ratio, forward latency, and health-plugin failures in real time — via the native `prometheus` plugin on `:9153`.

    Why monitor CoreDNS?

    CoreDNS is the default cluster DNS in Kubernetes — every microservice depends on it for service discovery. When CoreDNS degrades, SERVFAIL spikes, or the forward plugin slows, it shows up as 'random connection refused' everywhere in the cluster. Monitoring catches DNS issues at the source instead of debugging downstream symptoms.

    Auto-discovery via Xitogent — zero manual configuration
    Native `:9153/metrics` Prometheus scraping
    Per-rcode query distribution (NOERROR / NXDOMAIN / SERVFAIL / REFUSED / FORMERR)
    Per-query-type tracking (A / AAAA / PTR / SRV / MX)
    Cache hit ratio, cache size, cache entries per server block
    Forward plugin upstream latency + connection cache hit rate
    Panic detection (`coredns_panics_total`) and health-plugin failures
    Plugin-level visibility (`coredns_plugin_enabled`, `coredns_build_info`)
    Customizable alert thresholds for every metric
    1-minute metric collection intervals out of the box
    What is CoreDNS monitoring?

    CoreDNS monitoring, explained

    CoreDNS monitoring catches SERVFAIL spikes, cache hit-rate drops, forward-plugin latency, and panic-related restarts before they cascade into cluster-wide DNS resolution failures. Because every microservice depends on DNS for service discovery, an unmonitored CoreDNS is an unmonitored failure mode for your entire Kubernetes cluster — DNS issues show up as "random connection refused" everywhere. Xitoring auto-discovers your CoreDNS, scrapes :9153/metrics, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

    Metrics

    What we monitor

    Queries / sec

    Live DNS query throughput from `coredns_dns_requests_total`. Spikes flag DNS amplification attacks or traffic surges; sustained drops flag broken clients or misrouted ServiceMonitor scrapes.

    Response Code Distribution

    `coredns_dns_response_rcode_count_total` labeled by rcode — NOERROR (success), NXDOMAIN (non-existent), SERVFAIL (resolution failure), REFUSED (policy block), FORMERR (malformed).

    SERVFAIL Rate

    SERVFAIL responses per second. Spikes signal upstream resolver failures or forward plugin issues. PrometheusRule alert: > 5% of total query rate is the standard threshold.

    Cache Hit Ratio

    Computed as `coredns_cache_hits_total / (coredns_cache_hits_total + coredns_cache_misses_total)`. Target 80%+ on cluster DNS; below 50% means the cache TTLs are too short or the working set exceeds cache size.

    Cache Size / Entries

    `coredns_cache_size` (bytes) and `coredns_cache_entries` per cache type (success vs denial). Approaching the configured cache size triggers eviction.

    Resolution Latency (histogram)

    `coredns_dns_request_duration_seconds` histogram — track p50, p95, p99. Cluster DNS that drifts above 100ms p99 starts causing visible app slowness; alert on p99 > 500ms.

    Forward Plugin Latency

    `coredns_forward_request_duration_seconds` per upstream resolver. Separates CoreDNS-internal latency from upstream resolver latency — critical for diagnosing slow 8.8.8.8 vs slow CoreDNS itself.

    Forward Request Rate

    `coredns_forward_request_count_total` per upstream. Combined with cache hit ratio shows how much traffic actually leaves CoreDNS for upstream resolution.

    Proxy Connection Cache

    `coredns_proxy_conn_cache_hits_total` / `_misses_total`. Tracks TCP connection reuse to upstream resolvers — low hit rate means connection churn, raising upstream latency.

    Health Plugin Failures

    `coredns_health_request_failures_total` — the `health:8080` plugin's own failure count. Non-zero means the liveness probe is failing intermittently.

    Panics

    `coredns_panics_total` — any non-zero value is a CoreDNS bug or plugin crash that triggered a goroutine panic. Pair with restart count for full post-mortem context.

    Go Runtime

    `process_resident_memory_bytes` (RSS), `go_goroutines` (goroutine count — detects leaks), `go_gc_duration_seconds` (GC pause time). Memory growth without restarts = leak; goroutine count growth = blocked plugin or upstream.

    Triggers & Alerts

    Configurable alert triggers

    Set up custom triggers in your dashboard to get notified the moment CoreDNS metrics cross your defined thresholds.

    CoreDNS monitoring trigger configuration dashboard

    SERVFAIL Rate

    critical

    Fires on high resolution failure rate.

    Cache Hit Ratio

    warning

    Alerts when cache effectiveness drops.

    Resolution Latency

    warning

    Triggers on slow DNS resolution.

    Query Rate

    warning

    Fires on unusual query volume.

    01

    Importance of CoreDNS Monitoring

    DNS is the foundation of network connectivity. Slow or failing DNS resolution impacts every service in your infrastructure.

    • Ensure fast DNS resolution
    • Detect SERVFAIL spikes immediately
    • Monitor cache for optimal performance
    • Track upstream resolver health
    CoreDNS monitoring
    DNS analytics
    02

    Why Choose Xitoring

    Zero-config CoreDNS monitoring.

    • One-command install
    • Global nodes
    • Unified dashboard
    • Multi-channel alerts
    Overview
    Alerts
    Use cases

    Common CoreDNS monitoring scenarios

    Where CoreDNS typically runs today — and what could go wrong if no one's watching.

    DNS inside a Kubernetes app

    Every part of a Kubernetes app uses CoreDNS to find every other part. When it slows down or starts failing, users see strange, intermittent errors across the entire app. We catch the slowdown the moment it begins, so a small DNS hiccup doesn't surface to customers as a mysterious outage.

    Large clusters with local DNS caches

    Bigger Kubernetes setups put a small DNS cache on every server to keep things fast. When one of those caches misbehaves, only a slice of traffic breaks — making it hard to spot. We make sure each one is doing its job so a single bad node can't quietly degrade a fraction of your users.

    Public-facing DNS for your domain

    When CoreDNS is what answers DNS queries for your domain on the open internet, an outage means people can't reach your site at all. We watch the signals that prove the service is healthy and responding, so brand and revenue aren't quietly bleeding while DNS silently fails.

    Before you start

    Prerequisites for CoreDNS

    Make sure you've got these in place — most installs are a 60-second job once they are.

    • CoreDNS 1.11.x / 1.12.x / 1.13.x running on the server (or in Kubernetes via the kube-system/coredns Deployment)
    • prometheus plugin enabled in your Corefile (default :9153)
    • Network reachability from Xitogent to the metrics endpoint (/metrics on :9153)
    Setup Guide

    Get started in minutes

    1

    Install Xitogent on your server

    If you haven't already, install the lightweight Xitogent monitoring agent on the host running CoreDNS.

    curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY
    2

    Enable the prometheus plugin in CoreDNS

    CoreDNS exposes Prometheus-format metrics through its prometheus plugin (default endpoint:9153/metrics). Add `prometheus:9153` to your Corefile and reload CoreDNS, then confirm the metrics endpoint is reachable from the agent host.

    sudo xitogent integrate
    3

    Enable the CoreDNS integration

    Use the Xitoring dashboard or CLI to enable the CoreDNS integration. Xitogent auto-detects the metrics endpoint and starts collecting query, cache, and latency metrics.

    4

    Configure alert thresholds (optional)

    Set custom thresholds for SERVFAIL Rate, Cache Hit Ratio, or Resolution Latency to get notified the moment DNS reliability or performance degrades.

    5

    Verify it's working

    Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

    sudo xitogent status

    Frequently asked questions

    What is CoreDNS monitoring?
    CoreDNS monitoring is the continuous collection of CoreDNS performance data from the native `prometheus` plugin — query throughput, response-code distribution (NOERROR/NXDOMAIN/SERVFAIL/REFUSED), cache hit ratio, forward-plugin latency, plugin chain enablement, and Go runtime stats — combined with alerting when those metrics breach thresholds. It's the canonical way to catch DNS degradation before it cascades into cluster-wide "random connection refused" errors.
    How do I enable CoreDNS Prometheus metrics?
    Add `prometheus:9153` to your Corefile (inside the root server block: `. { prometheus:9153;... }`), then reload CoreDNS. Verify with `curl http://localhost:9153/metrics`. In Kubernetes, the kube-system/coredns ConfigMap typically has it enabled by default — confirm with `kubectl -n kube-system get configmap coredns -o yaml`. Xitogent scrapes the same endpoint every 60 seconds.
    What does the kubernetes plugin do?
    The `kubernetes` plugin watches the Kubernetes API for Service, Endpoint, and Pod changes and synthesizes DNS records for them — `..svc.cluster.local` resolutions, headless service ESS records, pod IPs. It also enables service-discovery features like SRV records for named ports. Monitor it alongside `forward` (which handles external DNS) since they share the request pipeline.
    How do I monitor CoreDNS cache hit ratio?
    Compute it from Prometheus: `coredns_cache_hits_total / (coredns_cache_hits_total + coredns_cache_misses_total)` — target 80%+ for cluster DNS, 95%+ for NodeLocal DNSCache. Low hit ratios usually mean TTLs are too short (cluster DNS TTL is 30s by default — tunable) or the working set of unique queries exceeds cache size.
    What does NXDOMAIN mean in CoreDNS metrics?
    `NXDOMAIN` (non-existent domain) in `coredns_dns_response_rcode_count_total` means a queried name doesn't exist. Some NXDOMAIN is normal (typos, scanners); spikes flag misconfigured search domains, applications looking up nonexistent services, or DNS amplification attempts. SERVFAIL is more concerning — it means CoreDNS couldn't get an answer at all (upstream failure, plugin error).
    How do I debug CoreDNS in Kubernetes?
    Three layers: (1) check pod logs (`kubectl logs -n kube-system -l k8s-app=kube-dns`), (2) test resolution from inside a pod (`kubectl exec... -- nslookup kubernetes.default`), (3) read Prometheus metrics for SERVFAIL rate per-plugin. The `log` plugin can be temporarily added to the Corefile for per-query log output. Use `dnstap` for production-safe high-volume tracing without affecting query latency.
    How do I monitor CoreDNS forward plugin latency?
    Read the `coredns_forward_request_duration_seconds` histogram, labeled by upstream resolver address. Track p95 and p99 per upstream — slow upstreams show up here, separate from CoreDNS-internal latency. The `forward` plugin also exposes `coredns_forward_responses_total` per rcode for upstream-specific SERVFAIL rates. Alert on p99 > 500ms per upstream.
    When should I use NodeLocal DNSCache?
    Cluster size > ~100 nodes, or any cluster experiencing UDP conntrack races (intermittent DNS timeouts under load). NodeLocal DNSCache runs a CoreDNS-cache sidecar on every node binding `169.254.20.10:53`, eliminating the conntrack table entry per query. Cluster CoreDNS load typically drops 70–90%, and DNS p99 latency drops to local-disk speed. Monitor per-node hit rate (target 95%+).
    What CoreDNS versions are supported?
    CoreDNS 1.11.x, 1.12.x, and 1.13.x are fully supported. 1.12 added MCS-API multicluster service discovery, startup timeout config, and IPv6 hostname handling in the `kubernetes` plugin. 1.13.2 (Dec 2025) is the current stable. K8s 1.30+ ships CoreDNS 1.11.x by default; newer distros ship 1.12.x. Xitogent auto-detects the version and adapts.

    Start monitoring CoreDNS today

    Set up in under 60 seconds. No credit card required. Full metrics from day one.

    Start Free Trial

    Keep exploring

    Related Integrations