Data Systems
    Updated May 2026
    Elasticsearch logo

    Elasticsearch Monitoring

    Monitor Elasticsearch cluster status (green/yellow/red), unassigned shards, JVM heap usage, GC pauses, indexing/search rate, thread pool rejections, and ILM tier health in real time — via the native `_cluster/health` and `_nodes/stats` APIs.

    Why monitor Elasticsearch?

    Elasticsearch powers application logs (ELK), full-text and vector search, and modern observability pipelines. When the cluster degrades (yellow or red), shards go unassigned, or JVM heap pressure spikes, search outages and ingest losses follow within minutes. Monitoring catches cluster health drift before pod restarts cascade across the fleet.

    Auto-discovery via Xitogent — no manual configuration required
    Cluster health status and shard allocation tracking
    JVM heap usage and garbage collection monitoring
    Indexing and search rate performance metrics
    Per-node resource utilization breakdown
    Pending tasks and circuit breaker monitoring
    Works on both Linux and Windows servers
    1-minute metric collection intervals
    What is Elasticsearch monitoring?

    Elasticsearch monitoring, explained

    Elasticsearch monitoring catches cluster degradation (yellow / red status), unassigned shards, JVM heap pressure, GC pause spikes, and thread pool rejections before they cause search outages, ingest failures, or data loss. For ELK log pipelines, vector-search workloads, and any production cluster, per-node visibility is what separates a 30-second auto-recovery from a cluster-wide outage. Xitoring auto-discovers your Elasticsearch, queries the native APIs with a cluster:monitor user, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

    Metrics

    What we monitor

    Cluster Health

    Overall cluster status (green/yellow/red) and active shard count.

    JVM Heap Usage

    Heap used, committed, and max across all nodes with GC stats.

    Indexing Rate

    Documents indexed per second across the cluster.

    Search Rate

    Search queries per second and average search latency.

    Shard Count

    Active, relocating, initializing, and unassigned shards.

    Pending Tasks

    Cluster-level pending tasks that can indicate bottlenecks.

    CPU Usage

    Per-node CPU utilization and OS-level load averages.

    Disk Usage

    Storage used per node and available disk space.

    Thread Pool

    Active, queued, and rejected tasks in each thread pool.

    Circuit Breakers

    Memory estimates and trip counts for request/fielddata/in-flight breakers.

    Segment Count

    Number of Lucene segments and merge activity.

    Fielddata Cache

    Fielddata cache size and eviction count.

    Triggers & Alerts

    Configurable alert triggers

    Set up custom triggers in your dashboard to get notified the moment Elasticsearch metrics cross your defined thresholds.

    Elasticsearch monitoring trigger configuration dashboard

    Cluster Health

    critical

    Fires when cluster status degrades to yellow or red, indicating shard allocation issues.

    JVM Heap Usage

    critical

    Triggers when JVM heap usage exceeds threshold, risking out-of-memory errors and node instability.

    Indexing Rate

    warning

    Alerts when indexing throughput drops below baseline, indicating ingestion pipeline issues.

    Search Latency

    warning

    Fires when average search latency exceeds threshold, degrading user-facing search quality.

    Unassigned Shards

    critical

    Triggers when shards remain unassigned, leaving data under-replicated and at risk.

    Thread Pool Rejections

    warning

    Alerts when thread pool queues overflow and start rejecting requests.

    01

    Importance of Elasticsearch Monitoring

    Elasticsearch underpins search functionality, log aggregation, and real-time analytics. Without monitoring, cluster degradation, JVM pressure, and unassigned shards can cascade into search outages and data loss.

    • Detect cluster health degradation before it impacts search availability
    • Monitor JVM heap to prevent out-of-memory crashes
    • Track indexing throughput to ensure data ingestion pipelines stay healthy
    • Identify unassigned shards that leave data under-replicated
    • Optimize search latency for user-facing applications
    Elasticsearch cluster monitoring dashboard with health and search metrics
    Elasticsearch performance dashboard with indexing and node metrics
    02

    Why Choose Xitoring

    Xitoring delivers enterprise-grade Elasticsearch monitoring with zero-config setup. Our lightweight agent auto-discovers your Elasticsearch nodes, starts collecting metrics in under 60 seconds, and integrates with your existing notification channels.

    • One-command install — no complex YAML or config files
    • 15+ global monitoring nodes for low-latency checks
    • Unified dashboard for servers, search clusters, and uptime
    • Flexible alerting via Slack, PagerDuty, Telegram & more
    • Historical data retention for capacity planning & audits
    Xitoring Elasticsearch cluster monitoring overview
    Alert notification and team channels configuration
    Use cases

    Common Elasticsearch monitoring scenarios

    Where Elasticsearch typically runs today — and what could go wrong if no one's watching.

    Logs and observability data

    When apps and servers stream their logs into Elasticsearch, any backlog or rejection means parts of the picture are missing — exactly when an incident makes them most needed. We catch the bottleneck while it's small, so the team's diagnostic history stays complete.

    Search for websites, apps, and AI features

    When search powers a website, an app, or an AI feature, a slow query directly hurts user experience and conversions. We watch response times and capacity so a search slowdown can be fixed before it becomes a revenue or product problem.

    Managed search on AWS or another cloud

    Cloud providers run the service for you, but they don't tell you when your own workload is overwhelming it or your queries are starting to slow down. We close that visibility gap so cost and performance both stay in your control.

    Before you start

    Prerequisites for Elasticsearch

    Make sure you've got these in place — most installs are a 60-second job once they are.

    • Elasticsearch 7.x, 8.x, or 9.x — OR OpenSearch 1.x / 2.x — reachable on the REST port (default 9200)
    • A monitoring user with cluster:monitor privileges if security is enabled (or monitor_cluster role on Elastic Stack 8+)
    • TLS/CA configuration available to Xitogent if HTTPS is required
    Setup Guide

    Get started in minutes

    1

    Install Xitogent on your server

    If you haven't already, install the lightweight Xitogent monitoring agent on your server.

    curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY
    2

    Verify Elasticsearch API is accessible

    Xitogent uses the Elasticsearch REST API to collect metrics. Verify the cluster is reachable:

    curl -s http://localhost:9200/_cluster/health | python3 -m json.tool
    3

    Enable the Elasticsearch integration

    Use the Xitoring dashboard or CLI to enable the Elasticsearch integration.

    sudo xitogent integrate
    4

    Configure alert thresholds (optional)

    Set custom thresholds for cluster health, JVM heap, or indexing rate to get notified when something needs attention.

    5

    Verify it's working

    Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

    sudo xitogent status

    Frequently asked questions

    What is Elasticsearch monitoring?
    Elasticsearch monitoring is the continuous collection of cluster-, node-, and index-level performance data from the native REST APIs (`_cluster/health`, `_cluster/stats`, `_nodes/stats`, `_cat/indices`, `_cat/thread_pool`) — cluster status, shard allocation, JVM heap, GC pauses, indexing/search rate, query latency, thread pool queue depth + rejections, ILM tier health — combined with alerting when those metrics breach thresholds.
    How do I monitor Elasticsearch cluster health?
    Hit `GET /_cluster/health` for status (green/yellow/red), `active_shards`, `relocating_shards`, `initializing_shards`, `unassigned_shards`, `number_of_pending_tasks`. Green = all shards allocated; yellow = primary shards allocated but some replicas missing (data is at risk but searches work); red = some primary shards unassigned (data is unavailable). Alert on yellow lasting >5 minutes and on red immediately. Xitogent surfaces all four counters per polling interval.
    What does yellow cluster status mean and how do I fix unassigned shards?
    Yellow means primary shards are allocated but at least one replica isn't (the cluster is up but degraded — a single node loss could cause data loss). Common causes: a node left the cluster, disk-based shard-allocation deciders kicked in (node above `cluster.routing.allocation.disk.watermark.high`, 90% by default), or `number_of_replicas` set higher than available data nodes. Investigate with `GET /_cluster/allocation/explain` and fix the underlying issue (add disk, add nodes, or reduce replicas).
    How do I monitor JVM heap usage and GC pauses?
    Read `_nodes/stats?metric=jvm` for per-node `jvm.mem.heap_used_percent`, plus `jvm.gc.collectors.young/old` (count + `time_in_millis`). Alert above 75% stable heap and above 85% critical. GC pause-time per minute is the more actionable signal — old-gen pauses growing past a few hundred ms means the heap is too small or there's a memory leak (often field data caches). Lower fielddata cache size or split into more nodes.
    How do I detect slow Elasticsearch queries?
    Enable the slow log per index: `PUT /{index}/_settings { "index.search.slowlog.threshold.query.warn": "10s", "index.search.slowlog.threshold.query.info": "5s" }`. Slow searches land in `logs/_index_search_slowlog.json` with the actual query body. For aggregated query-time data, read `_nodes/stats` for `indices.search.query_time_in_millis`/`query_total` per node. Xitogent computes mean query latency and surfaces top-N slow indices.
    How do I monitor Elasticsearch indexing rate and refresh latency?
    `_nodes/stats?metric=indices` exposes `indexing.index_total`/`index_time_in_millis` (computes documents-per-second and per-doc index time), `refresh.total`/`total_time_in_millis` (refresh cadence — too frequent kills throughput), `merges.total` (background segment merges), and `flush.total`. For ingest pipelines, plot indexing rate alongside the write thread pool queue to spot the moment the cluster can't keep up with incoming documents.
    What is a search thread pool and how do I monitor rejections?
    Each Elasticsearch node has dedicated thread pools — `search`, `write`, `refresh`, `flush`, `get`, `snapshot`, `management`. Each has a queue; when the queue fills, new tasks are rejected. `GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected,completed&s=rejected:desc` shows queue depth and total rejections per pool. Any non-zero `rejected` rate is a hard signal — either scale horizontally or tune queue sizes (carefully — bigger queues hide the problem rather than fixing it).
    Elasticsearch vs OpenSearch monitoring — what's different?
    API-wise, mostly identical — `_cluster/health`, `_cat/*`, `_nodes/stats` all work the same way. The differences are: (1) X-Pack Monitoring / Stack Monitoring vs OpenSearch's built-in observability, (2) Elastic 8+ requires basic-auth or API-key by default; OpenSearch on AWS uses IAM-signed requests, (3) some Elastic-only features (BBQ, ELSER, ES|QL JOINs) aren't present in OpenSearch. Xitogent works with both via the shared REST surface.
    What Elasticsearch versions are supported?
    Elasticsearch 7.x, 8.x, and 9.x (built on Lucene 10, with BBQ vector quantization and ES|QL JOINs GA), plus OpenSearch 1.x and 2.x via the shared API. The integration adapts to whichever version is present — newer APIs (vector-search metrics, ILM frozen-tier stats, serverless cluster signals) surface where available.

    Start monitoring Elasticsearch today

    Set up in under 60 seconds. No credit card required. Full metrics from day one.

    Start Free Trial

    Keep exploring

    Related Integrations