How to Monitor CoreDNS with Xitoring

Overview

CoreDNS is the default DNS server for Kubernetes and is widely used in cloud-native environments. Monitoring CoreDNS ensures fast DNS resolution, healthy cache performance, and reliable service discovery across your infrastructure.

Prerequisites

A server or Kubernetes cluster running CoreDNS
Xitogent agent installed on the host
CoreDNS Prometheus metrics endpoint enabled (default on port 9153)
An active Xitoring account

Step 1 — Install Xitogent

Install the Xitoring agent on the host running CoreDNS:

curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY

Step 2 — Enable the CoreDNS Integration

Run the integration command:

sudo xitogent integrate

Xitogent will connect to the CoreDNS Prometheus endpoint and begin collecting DNS metrics.

Key Metrics to Monitor

Metric	Description
Queries/sec	Total DNS query rate across all zones
Cache Hit Ratio	Percentage of queries served from cache
Resolution Latency	Average time to resolve DNS queries
SERVFAIL Rate	Percentage of queries resulting in server failures
NXDOMAIN Rate	Queries for non-existent domains
Upstream Latency	Response time for forwarded queries

Step 3 — Configure Triggers

Set up alerts for DNS health:

SERVFAIL Rate (Critical) — Fires when DNS resolution failure rate exceeds threshold, indicating upstream or configuration issues
Cache Hit Ratio (Warning) — Alerts when cache effectiveness drops below expected levels
Resolution Latency (Warning) — Triggers on slow DNS resolution that could impact application performance
Query Rate (Warning) — Fires on unusual query volume that could indicate a DNS amplification attack or misconfigured service

Monitoring in Kubernetes

When monitoring CoreDNS in Kubernetes:

Deploy Xitogent as a DaemonSet — Ensure the agent runs on nodes hosting CoreDNS pods
Expose metrics endpoint — CoreDNS exposes Prometheus metrics on port 9153 by default via the prometheus plugin
Monitor pod restarts — Frequent CoreDNS pod restarts indicate configuration or resource issues
Track per-zone metrics — Identify which zones generate the most queries or errors

Best Practices

Ensure the Prometheus plugin is enabled — CoreDNS must have the prometheus plugin in its Corefile for metrics collection
Monitor cache sizing — An undersized cache leads to low hit ratios and increased upstream load
Set up DNS uptime checks — Create Xitoring DNS checks to verify resolution from external locations
Correlate with application metrics — Slow DNS often cascades into application latency

Troubleshooting

No metrics collected: Verify the Prometheus plugin is enabled in your Corefile with prometheus :9153
High SERVFAIL rate: Check upstream resolver connectivity and CoreDNS forward plugin configuration
Cache hit ratio too low: Consider increasing cache TTL or cache size in the Corefile