Data Systems

Updated May 2026

Elasticsearch Monitoring

Monitor Elasticsearch cluster status (green/yellow/red), unassigned shards, JVM heap usage, GC pauses, indexing/search rate, thread pool rejections, and ILM tier health in real time — via the native `_cluster/health` and `_nodes/stats` APIs.

Start Free Trial View Docs

Why monitor Elasticsearch?

Elasticsearch powers application logs (ELK), full-text and vector search, and modern observability pipelines. When the cluster degrades (yellow or red), shards go unassigned, or JVM heap pressure spikes, search outages and ingest losses follow within minutes. Monitoring catches cluster health drift before pod restarts cascade across the fleet.

Auto-discovery via Xitogent — no manual configuration required

Cluster health status and shard allocation tracking

JVM heap usage and garbage collection monitoring

Indexing and search rate performance metrics

Per-node resource utilization breakdown

Pending tasks and circuit breaker monitoring

Works on both Linux and Windows servers

1-minute metric collection intervals

What is Elasticsearch monitoring?

Elasticsearch monitoring, explained

Elasticsearch monitoring catches cluster degradation (yellow / red status), unassigned shards, JVM heap pressure, GC pause spikes, and thread pool rejections before they cause search outages, ingest failures, or data loss. For ELK log pipelines, vector-search workloads, and any production cluster, per-node visibility is what separates a 30-second auto-recovery from a cluster-wide outage. Xitoring auto-discovers your Elasticsearch, queries the native APIs with a cluster:monitor user, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

Metrics

What we monitor

Cluster Health

Overall cluster status (green/yellow/red) and active shard count.

JVM Heap Usage

Heap used, committed, and max across all nodes with GC stats.

Indexing Rate

Documents indexed per second across the cluster.

Search Rate

Search queries per second and average search latency.

Shard Count

Active, relocating, initializing, and unassigned shards.

Pending Tasks

Cluster-level pending tasks that can indicate bottlenecks.

CPU Usage

Per-node CPU utilization and OS-level load averages.

Disk Usage

Storage used per node and available disk space.

Thread Pool

Active, queued, and rejected tasks in each thread pool.

Circuit Breakers

Memory estimates and trip counts for request/fielddata/in-flight breakers.

Segment Count

Number of Lucene segments and merge activity.

Fielddata Cache

Fielddata cache size and eviction count.

Triggers & Alerts

Configurable alert triggers

Set up custom triggers in your dashboard to get notified the moment Elasticsearch metrics cross your defined thresholds.

Elasticsearch monitoring trigger configuration dashboard

Cluster Health

critical

Fires when cluster status degrades to yellow or red, indicating shard allocation issues.

JVM Heap Usage

critical

Triggers when JVM heap usage exceeds threshold, risking out-of-memory errors and node instability.

Indexing Rate

warning

Alerts when indexing throughput drops below baseline, indicating ingestion pipeline issues.

Search Latency

warning

Fires when average search latency exceeds threshold, degrading user-facing search quality.

Unassigned Shards

critical

Triggers when shards remain unassigned, leaving data under-replicated and at risk.

Thread Pool Rejections

warning

Alerts when thread pool queues overflow and start rejecting requests.

Importance of Elasticsearch Monitoring

Elasticsearch underpins search functionality, log aggregation, and real-time analytics. Without monitoring, cluster degradation, JVM pressure, and unassigned shards can cascade into search outages and data loss.

Detect cluster health degradation before it impacts search availability
Monitor JVM heap to prevent out-of-memory crashes
Track indexing throughput to ensure data ingestion pipelines stay healthy
Identify unassigned shards that leave data under-replicated
Optimize search latency for user-facing applications

Elasticsearch cluster monitoring dashboard with health and search metrics

Elasticsearch performance dashboard with indexing and node metrics

Why Choose Xitoring

Xitoring delivers enterprise-grade Elasticsearch monitoring with zero-config setup. Our lightweight agent auto-discovers your Elasticsearch nodes, starts collecting metrics in under 60 seconds, and integrates with your existing notification channels.

One-command install — no complex YAML or config files
15+ global monitoring nodes for low-latency checks
Unified dashboard for servers, search clusters, and uptime
Flexible alerting via Slack, PagerDuty, Telegram & more
Historical data retention for capacity planning & audits

Xitoring Elasticsearch cluster monitoring overview

Alert notification and team channels configuration

Use cases

Common Elasticsearch monitoring scenarios

Where Elasticsearch typically runs today — and what could go wrong if no one's watching.

Logs and observability data

When apps and servers stream their logs into Elasticsearch, any backlog or rejection means parts of the picture are missing — exactly when an incident makes them most needed. We catch the bottleneck while it's small, so the team's diagnostic history stays complete.

Search for websites, apps, and AI features

When search powers a website, an app, or an AI feature, a slow query directly hurts user experience and conversions. We watch response times and capacity so a search slowdown can be fixed before it becomes a revenue or product problem.

Managed search on AWS or another cloud

Cloud providers run the service for you, but they don't tell you when your own workload is overwhelming it or your queries are starting to slow down. We close that visibility gap so cost and performance both stay in your control.

Before you start

Prerequisites for Elasticsearch

Make sure you've got these in place — most installs are a 60-second job once they are.

Elasticsearch 7.x, 8.x, or 9.x — OR OpenSearch 1.x / 2.x — reachable on the REST port (default 9200)
A monitoring user with cluster:monitor privileges if security is enabled (or monitor_cluster role on Elastic Stack 8+)
TLS/CA configuration available to Xitogent if HTTPS is required

Setup Guide

Get started in minutes

Install Xitogent on your server

If you haven't already, install the lightweight Xitogent monitoring agent on your server.

curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY

Verify Elasticsearch API is accessible

Xitogent uses the Elasticsearch REST API to collect metrics. Verify the cluster is reachable:

curl -s http://localhost:9200/_cluster/health | python3 -m json.tool

Enable the Elasticsearch integration

Use the Xitoring dashboard or CLI to enable the Elasticsearch integration.

sudo xitogent integrate

Configure alert thresholds (optional)

Set custom thresholds for cluster health, JVM heap, or indexing rate to get notified when something needs attention.

Verify it's working

Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

sudo xitogent status

Compare

Considering alternatives?

See how Xitoring stacks up against the alternatives for Elasticsearch monitoring — flat pricing, deeper integrations, and one agent that covers your whole stack.

Xitoring vs

Datadog

Pay-per-host pricing gets expensive fast at scale. See where Xitoring delivers the same coverage on a flat plan.

Xitoring vs

New Relic

Full-stack observability without the enterprise tiers, ingestion fees, or seat-based licensing.

Xitoring vs

Grafana Cloud

One tool with one price instead of stitching Prometheus, Loki, and Grafana into a stack you also have to monitor.

See all comparisons

Frequently asked questions

What is Elasticsearch monitoring?

Elasticsearch monitoring is the continuous collection of cluster-, node-, and index-level performance data from the native REST APIs (`_cluster/health`, `_cluster/stats`, `_nodes/stats`, `_cat/indices`, `_cat/thread_pool`) — cluster status, shard allocation, JVM heap, GC pauses, indexing/search rate, query latency, thread pool queue depth + rejections, ILM tier health — combined with alerting when those metrics breach thresholds.

How do I monitor Elasticsearch cluster health?

Hit `GET /_cluster/health` for status (green/yellow/red), `active_shards`, `relocating_shards`, `initializing_shards`, `unassigned_shards`, `number_of_pending_tasks`. Green = all shards allocated; yellow = primary shards allocated but some replicas missing (data is at risk but searches work); red = some primary shards unassigned (data is unavailable). Alert on yellow lasting >5 minutes and on red immediately. Xitogent surfaces all four counters per polling interval.

What does yellow cluster status mean and how do I fix unassigned shards?

Yellow means primary shards are allocated but at least one replica isn't (the cluster is up but degraded — a single node loss could cause data loss). Common causes: a node left the cluster, disk-based shard-allocation deciders kicked in (node above `cluster.routing.allocation.disk.watermark.high`, 90% by default), or `number_of_replicas` set higher than available data nodes. Investigate with `GET /_cluster/allocation/explain` and fix the underlying issue (add disk, add nodes, or reduce replicas).

How do I monitor JVM heap usage and GC pauses?

Read `_nodes/stats?metric=jvm` for per-node `jvm.mem.heap_used_percent`, plus `jvm.gc.collectors.young/old` (count + `time_in_millis`). Alert above 75% stable heap and above 85% critical. GC pause-time per minute is the more actionable signal — old-gen pauses growing past a few hundred ms means the heap is too small or there's a memory leak (often field data caches). Lower fielddata cache size or split into more nodes.

How do I detect slow Elasticsearch queries?

Enable the slow log per index: `PUT /{index}/_settings { "index.search.slowlog.threshold.query.warn": "10s", "index.search.slowlog.threshold.query.info": "5s" }`. Slow searches land in `logs/_index_search_slowlog.json` with the actual query body. For aggregated query-time data, read `_nodes/stats` for `indices.search.query_time_in_millis`/`query_total` per node. Xitogent computes mean query latency and surfaces top-N slow indices.

How do I monitor Elasticsearch indexing rate and refresh latency?

`_nodes/stats?metric=indices` exposes `indexing.index_total`/`index_time_in_millis` (computes documents-per-second and per-doc index time), `refresh.total`/`total_time_in_millis` (refresh cadence — too frequent kills throughput), `merges.total` (background segment merges), and `flush.total`. For ingest pipelines, plot indexing rate alongside the write thread pool queue to spot the moment the cluster can't keep up with incoming documents.

What is a search thread pool and how do I monitor rejections?

Each Elasticsearch node has dedicated thread pools — `search`, `write`, `refresh`, `flush`, `get`, `snapshot`, `management`. Each has a queue; when the queue fills, new tasks are rejected. `GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected,completed&s=rejected:desc` shows queue depth and total rejections per pool. Any non-zero `rejected` rate is a hard signal — either scale horizontally or tune queue sizes (carefully — bigger queues hide the problem rather than fixing it).

Elasticsearch vs OpenSearch monitoring — what's different?

API-wise, mostly identical — `_cluster/health`, `_cat/*`, `_nodes/stats` all work the same way. The differences are: (1) X-Pack Monitoring / Stack Monitoring vs OpenSearch's built-in observability, (2) Elastic 8+ requires basic-auth or API-key by default; OpenSearch on AWS uses IAM-signed requests, (3) some Elastic-only features (BBQ, ELSER, ES|QL JOINs) aren't present in OpenSearch. Xitogent works with both via the shared REST surface.

What Elasticsearch versions are supported?

Elasticsearch 7.x, 8.x, and 9.x (built on Lucene 10, with BBQ vector quantization and ES|QL JOINs GA), plus OpenSearch 1.x and 2.x via the shared API. The integration adapts to whichever version is present — newer APIs (vector-search metrics, ILM frozen-tier stats, serverless cluster signals) surface where available.

Start monitoring Elasticsearch today

Set up in under 60 seconds. No credit card required. Full metrics from day one.

Start Free Trial

Keep exploring

Related Integrations