Data Systems
    Updated May 2026
    InfluxDB logo

    InfluxDB Monitoring

    Monitor InfluxDB write throughput, query duration, series cardinality, TSM compaction queue, WAL size, and HTTP error rate in real time — via the native `/metrics` Prometheus endpoint, `_internal` (1.x), or `_monitoring` bucket (2.x).

    Why monitor InfluxDB?

    InfluxDB is the time-series database behind Grafana dashboards, IoT pipelines, and any TICK-stack deployment. When series cardinality explodes, compactions back up, or queries slow down, dashboards stop loading and alerts go silent. Monitoring catches the failure mode before the gap in graphs makes it obvious.

    Auto-discovery via Xitogent — zero manual configuration
    Native `/metrics` Prometheus endpoint scraping (InfluxDB OSS 2.x default)
    `_internal` database support (InfluxDB 1.x) and `_monitoring` bucket support (InfluxDB 2.x)
    Write throughput (points/sec, request rate, errors, drops) per database/bucket
    Query duration trending plus `queriesActive` for live load
    TSM compaction queue + cache snapshot + WAL size tracking
    Series cardinality monitoring (`database.numSeries`, `shard.seriesCreate`)
    HTTP status code distribution (1xx/2xx/4xx/5xx) per endpoint
    Customizable alert thresholds for every metric
    1-minute metric collection intervals out of the box
    What is InfluxDB monitoring?

    InfluxDB monitoring, explained

    InfluxDB monitoring catches write-throughput stalls, runaway series cardinality (the classic InfluxDB 1.x/2.x failure mode), TSM compaction backlogs, query slowdowns, and WAL growth before they cause ingest loss or query timeouts on your Grafana dashboards. For IoT sensor pipelines, application metrics backends, and any TICK-stack deployment, per-database visibility is what separates a 60-second alert from a multi-hour incident chasing missing data points. Xitoring auto-discovers your InfluxDB, reads the native /metrics Prometheus endpoint, and routes alerts to Slack, PagerDuty, Telegram, or your existing on-call.

    Metrics

    What we monitor

    Write Points / sec

    Rate of data points written, derived from `write.pointReq`. Spikes flag IoT/Telegraf storms; sudden drops flag broken collection pipelines.

    Write Errors & Drops

    `write.writeReqErr` (write requests failing) and `write.writeDropped` (points dropped server-side, often from cardinality or schema errors). Any non-zero rate = ingest is losing data.

    Query Duration / Active Queries

    `queryExecutor.queryDurationNs` (mean query time) and `queriesActive` (concurrent queries in flight). Both spike together during dashboard refresh storms.

    Series Cardinality

    `database.numSeries` per DB (1.x/2.x). The single most important InfluxDB health metric — runaway cardinality causes OOM, slow queries, and full storage exhaustion. Alert above your baseline + 50%.

    TSM Compaction Queue

    `storage.tsm1.compactions.*` Level 1/2/3/Full queue depths. Sustained non-zero queues mean writes are outpacing compactions — query performance degrades as small files accumulate.

    TSM Cache

    `storage.tsm1.cache.cachedBytes` (in-memory write buffer), `snapshotCount` (pending flushes), `WALCompactionTimeMs`. Cache growth without snapshot drain = compactor falling behind.

    WAL Disk Bytes

    `storage.tsm1.wal.currentSegmentDiskBytes` + `oldSegmentsDiskBytes`. WAL growth without TSM consolidation means recovery time will balloon on restart.

    Storage Size on Disk

    `storage.tsm1.filestore.diskBytes` + numFiles per shard. Track against your retention policy — high file counts at the same data size flag fragmentation.

    HTTP 4xx / 5xx Rate

    `httpd.clientError` + `httpd.serverError` (or Prometheus `http_api_request_errors_total`). 4xx spikes flag client schema/auth bugs; 5xx flags server-side failures.

    Connections / Auth Failures

    `httpd.req` (total HTTP requests), `httpd.authFail` (failed auth attempts), `httpd.pingReq`. Auth failure spikes signal misconfigured Telegraf or credential rotation gone wrong.

    Runtime — Goroutines & GC

    Go runtime stats: `runtime.NumGoroutine` (goroutine leak detection), `runtime.HeapAlloc` (live heap), `runtime.NumGC`/`PauseTotalNs` (GC pressure). Catch leaks and pause-time regressions before OOM.

    Subscription Writes

    `subscriber.pointsWritten` and `subscriber.writeFailures` — when Kapacitor or downstream pipelines consume via subscriptions, this is how you catch their backpressure.

    Triggers & Alerts

    Configurable alert triggers

    Set up custom triggers in your dashboard to get notified the moment InfluxDB metrics cross your defined thresholds.

    InfluxDB monitoring trigger configuration dashboard

    Write Throughput

    warning

    Fires on write rate anomalies.

    Query Duration

    warning

    Alerts on slow queries.

    Series Cardinality

    critical

    Triggers when cardinality is too high.

    Storage Size

    critical

    Fires when storage exceeds threshold.

    01

    Importance of InfluxDB Monitoring

    InfluxDB handles high-velocity time-series data. High cardinality, write pressure, and compaction delays can degrade performance.

    • Track write throughput for ingestion health
    • Monitor series cardinality to prevent OOM
    • Detect slow queries early
    • Ensure compaction keeps up
    InfluxDB monitoring
    Time-series analytics
    02

    Why Choose Xitoring

    Zero-config InfluxDB monitoring.

    • One-command install
    • Global nodes
    • Unified dashboard
    • Multi-channel alerts
    • Historical retention
    Overview
    Alerts
    Use cases

    Common InfluxDB monitoring scenarios

    Where InfluxDB typically runs today — and what could go wrong if no one's watching.

    The database behind your team's dashboards

    When dashboards in Grafana or another tool feel slow, the cause is often the database underneath — not the dashboard itself. We surface where the slowness actually lives so the team fixes the right thing instead of chasing the symptom.

    Data flowing in from sensors and devices

    Connected devices, factory equipment, and IoT sensors send measurements every second of every day. A silent backup in the pipeline means lost data — and lost data is gone forever. We watch the flow end to end so a single dropped reading raises the alarm.

    App and infrastructure metrics in one place

    When the same database holds both app metrics and server metrics, a problem with the database hides every signal at once. We watch the database itself so the team's own monitoring never goes dark during an incident.

    Before you start

    Prerequisites for InfluxDB

    Make sure you've got these in place — most installs are a 60-second job once they are.

    • InfluxDB 1.x, 2.x, or 3.0 (FDAP) running on the server
    • InfluxDB HTTP port reachable from Xitogent (default 8086, or 8181 on InfluxDB 3.0 Core)
    • Optional: a read-only token if InfluxDB 2.x/3.0 authentication is enabled (operator or all-access scopes both work for /metrics)
    Setup Guide

    Get started in minutes

    1

    Install Xitogent on your InfluxDB host

    Install the lightweight Xitogent monitoring agent on the host running InfluxDB.

    curl -s https://xitoring.com/install.sh | sudo bash -s -- --key=YOUR_API_KEY
    2

    Confirm InfluxDB is reachable

    Verify InfluxDB is listening on its HTTP port (default 8086) and reachable from the host running Xitogent. Xitogent will prompt for host and port during integrate — no extra config edits or endpoint exposure are required.

    sudo xitogent integrate
    3

    Enable the InfluxDB integration

    Use the Xitoring dashboard or CLI to enable the InfluxDB integration. Xitogent auto-detects your InfluxDB version and starts collecting write, query, and storage metrics.

    4

    Configure alert thresholds (optional)

    Set custom thresholds for Write Throughput, Query Duration, or Series Cardinality to catch ingest pressure and runaway tag growth before queries slow down.

    5

    Verify it's working

    Run this command on the server to confirm Xitogent picked up the integration. Fresh metrics will start streaming to your dashboard within ~30 seconds.

    sudo xitogent status

    Frequently asked questions

    What is InfluxDB monitoring?
    InfluxDB monitoring is the continuous collection of InfluxDB performance data — write throughput (points/sec, errors, drops), query duration and active query count, TSM compaction state, WAL size, series cardinality, HTTP status code distribution, and Go runtime stats — combined with alerting when those metrics breach thresholds. The data comes from the native `/metrics` Prometheus endpoint, the `_internal` database (1.x), or the `_monitoring` bucket (2.x).
    How do I monitor InfluxDB write throughput?
    Track `write.pointReq` (or Prometheus `storage_writer_points_total`) for the rate of points written, plus `write.writeReqErr` (failed write requests) and `write.writeDropped` (points dropped server-side, usually from schema or cardinality errors). Healthy ingest = pointReq rate stable with near-zero errors. Sudden drops in pointReq with high writeReqErr means clients are submitting bad schema; high writeDropped with normal request rate means the storage engine is rejecting points.
    How do I detect InfluxDB cardinality issues?
    Series cardinality is THE InfluxDB 1.x/2.x failure mode — too many unique tag combinations cause OOM and slow queries. Run `SHOW SERIES CARDINALITY` (InfluxQL) or `import "influxdata/influxdb/v1" v1.cardinality(...)` (Flux), or read `database.numSeries` from `_internal`. Alert on any 50%+ jump from baseline — that's almost always a tag-value explosion from an unintended high-cardinality field (request IDs, timestamps as tags, user IDs).
    What is the _internal database in InfluxDB?
    `_internal` is the special database InfluxDB 1.x writes its own metrics to — same TSM storage as user data, queryable via `USE _internal` + `SHOW MEASUREMENTS`. Contains measurements like `write`, `queryExecutor`, `tsm1_engine`, `tsm1_cache`, `tsm1_wal`, `httpd`, `runtime`, `database`, `shard`. In InfluxDB 2.x, this moved to the `_monitoring` bucket (set up by the Monitoring template). In 3.0, the `/metrics` Prometheus endpoint is the canonical surface.
    How do I monitor InfluxDB compactions?
    TSM compactions merge small WAL/cache files into larger optimized files at three levels (L1/L2/L3) plus full compactions. Watch `storage.tsm1.compactions.cacheCompactionDuration`, `tsmLevel{1,2,3}CompactionQueue` (queue depth — non-zero means backlog), and `tsmLevel{1,2,3}CompactionDuration`. A growing queue with normal write rate = compactor falling behind = query degradation imminent. Either scale up or reduce write rate.
    What's the difference between InfluxDB 1.x, 2.x, and 3.0 monitoring?
    1.x uses InfluxQL + TICK stack, exposes `/debug/vars` and the `_internal` database, runs the TSM/TSI storage engine. 2.x uses Flux + tasks, exposes `/metrics` (Prometheus) and the `_monitoring` bucket, same TSM/TSI underneath. 3.0 is the new FDAP architecture — DataFusion query engine, Parquet storage on object stores, removed the cardinality limit entirely, supports SQL alongside InfluxQL (Flux is in maintenance mode). Xitogent auto-detects the version and adapts.
    How do I detect InfluxDB query slowness?
    Track `queryExecutor.queryDurationNs` (mean query time) and `queriesActive` (concurrent in-flight queries). Spikes during dashboard refreshes are expected; sustained growth means queries are getting slower (often cardinality-driven or compaction-backlog-driven). Enable the slow query log (`log-queries-after = '5s'` in `influxdb.conf` for 1.x) to capture specific offenders for investigation.
    How do I monitor InfluxDB TSM storage?
    TSM (Time-Structured Merge tree) is the on-disk storage engine for 1.x/2.x. Monitor `storage.tsm1.filestore.diskBytes` (total on-disk size) and `numFiles` (file count — high numbers at the same bytes = fragmentation). Pair with `storage.tsm1.cache.cachedBytes` (in-memory write buffer) and WAL size. Sustained WAL growth without TSM consolidation = compactor problem; ballooning numFiles = retention/compaction not keeping up with writes.
    Will this integration affect InfluxDB performance?
    No measurable impact. Xitogent reads from the native `/metrics` Prometheus endpoint (or `_internal` / `_monitoring` query views) on a 1-minute interval — the same lightweight mechanism InfluxData's own tools use. No instrumentation injected into the write path or query engine.

    Start monitoring InfluxDB today

    Set up in under 60 seconds. No credit card required. Full metrics from day one.

    Start Free Trial

    Keep exploring

    Related Integrations