Overview
CoreDNS is the default DNS server for Kubernetes and is widely used in cloud-native environments. Monitoring CoreDNS ensures fast DNS resolution, healthy cache performance, and reliable service discovery across your infrastructure.
Prerequisites
- A server or Kubernetes cluster running CoreDNS
- Xitogent agent installed on the host
- CoreDNS Prometheus metrics endpoint enabled (default on port 9153)
- An active Xitoring account
Step 1 — Install Xitogent
Install the Xitoring agent on the host running CoreDNS:
curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY
Step 2 — Enable the CoreDNS Integration
Run the integration command:
sudo xitogent integrate
Xitogent will connect to the CoreDNS Prometheus endpoint and begin collecting DNS metrics.
Key Metrics to Monitor
| Metric | Description |
|---|---|
| Queries/sec | Total DNS query rate across all zones |
| Cache Hit Ratio | Percentage of queries served from cache |
| Resolution Latency | Average time to resolve DNS queries |
| SERVFAIL Rate | Percentage of queries resulting in server failures |
| NXDOMAIN Rate | Queries for non-existent domains |
| Upstream Latency | Response time for forwarded queries |
Step 3 — Configure Triggers
Set up alerts for DNS health:
- SERVFAIL Rate (Critical) — Fires when DNS resolution failure rate exceeds threshold, indicating upstream or configuration issues
- Cache Hit Ratio (Warning) — Alerts when cache effectiveness drops below expected levels
- Resolution Latency (Warning) — Triggers on slow DNS resolution that could impact application performance
- Query Rate (Warning) — Fires on unusual query volume that could indicate a DNS amplification attack or misconfigured service
Monitoring in Kubernetes
When monitoring CoreDNS in Kubernetes:
- Deploy Xitogent as a DaemonSet — Ensure the agent runs on nodes hosting CoreDNS pods
- Expose metrics endpoint — CoreDNS exposes Prometheus metrics on port 9153 by default via the
prometheusplugin - Monitor pod restarts — Frequent CoreDNS pod restarts indicate configuration or resource issues
- Track per-zone metrics — Identify which zones generate the most queries or errors
Best Practices
- Ensure the Prometheus plugin is enabled — CoreDNS must have the
prometheusplugin in its Corefile for metrics collection - Monitor cache sizing — An undersized cache leads to low hit ratios and increased upstream load
- Set up DNS uptime checks — Create Xitoring DNS checks to verify resolution from external locations
- Correlate with application metrics — Slow DNS often cascades into application latency
Troubleshooting
- No metrics collected: Verify the Prometheus plugin is enabled in your Corefile with
prometheus :9153 - High SERVFAIL rate: Check upstream resolver connectivity and CoreDNS forward plugin configuration
- Cache hit ratio too low: Consider increasing cache TTL or cache size in the Corefile