Server Monitoring2 min read

    How to Monitor CoreDNS with Xitoring

    Share

    Overview

    CoreDNS is the default DNS server for Kubernetes and is widely used in cloud-native environments. Monitoring CoreDNS ensures fast DNS resolution, healthy cache performance, and reliable service discovery across your infrastructure.

    Prerequisites

    • A server or Kubernetes cluster running CoreDNS
    • Xitogent agent installed on the host
    • CoreDNS Prometheus metrics endpoint enabled (default on port 9153)
    • An active Xitoring account

    Step 1 — Install Xitogent

    Install the Xitoring agent on the host running CoreDNS:

    curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY
    

    Step 2 — Enable the CoreDNS Integration

    Run the integration command:

    sudo xitogent integrate
    

    Xitogent will connect to the CoreDNS Prometheus endpoint and begin collecting DNS metrics.

    Key Metrics to Monitor

    Metric Description
    Queries/sec Total DNS query rate across all zones
    Cache Hit Ratio Percentage of queries served from cache
    Resolution Latency Average time to resolve DNS queries
    SERVFAIL Rate Percentage of queries resulting in server failures
    NXDOMAIN Rate Queries for non-existent domains
    Upstream Latency Response time for forwarded queries

    Step 3 — Configure Triggers

    Set up alerts for DNS health:

    • SERVFAIL Rate (Critical) — Fires when DNS resolution failure rate exceeds threshold, indicating upstream or configuration issues
    • Cache Hit Ratio (Warning) — Alerts when cache effectiveness drops below expected levels
    • Resolution Latency (Warning) — Triggers on slow DNS resolution that could impact application performance
    • Query Rate (Warning) — Fires on unusual query volume that could indicate a DNS amplification attack or misconfigured service

    Monitoring in Kubernetes

    When monitoring CoreDNS in Kubernetes:

    1. Deploy Xitogent as a DaemonSet — Ensure the agent runs on nodes hosting CoreDNS pods
    2. Expose metrics endpoint — CoreDNS exposes Prometheus metrics on port 9153 by default via the prometheus plugin
    3. Monitor pod restarts — Frequent CoreDNS pod restarts indicate configuration or resource issues
    4. Track per-zone metrics — Identify which zones generate the most queries or errors

    Best Practices

    1. Ensure the Prometheus plugin is enabled — CoreDNS must have the prometheus plugin in its Corefile for metrics collection
    2. Monitor cache sizing — An undersized cache leads to low hit ratios and increased upstream load
    3. Set up DNS uptime checks — Create Xitoring DNS checks to verify resolution from external locations
    4. Correlate with application metrics — Slow DNS often cascades into application latency

    Troubleshooting

    • No metrics collected: Verify the Prometheus plugin is enabled in your Corefile with prometheus :9153
    • High SERVFAIL rate: Check upstream resolver connectivity and CoreDNS forward plugin configuration
    • Cache hit ratio too low: Consider increasing cache TTL or cache size in the Corefile