FAQ & Troubleshooting

This page covers frequently asked questions and common issues you may encounter when using Xitoring. If you don't find your answer here, check our documentation or create a support ticket.

Understanding Xitoring

What is Xitoring and why choose it?

Xitoring is an all-in-one infrastructure monitoring platform that combines:

Server Monitoring - Real-time metrics from your servers (CPU, memory, disk)
Uptime Monitoring - External availability checks from global nodes
SSL Monitoring - Certificate expiration and validity tracking
Status Pages - Public/private pages showing your service status

Tips

Why all-in-one matters: You manage one dashboard, one billing account, and one monitoring strategy—not juggling 3-4 separate tools.

What's the difference between Server Monitoring and Uptime Monitoring?

Aspect	Server Monitoring	Uptime Monitoring
How it works	Xitogent installed on your server	External checks from global nodes
What it measures	Internal metrics: CPU, memory, disk, I/O, processes	Service availability: response time, status codes, uptime %
Detects	Resource exhaustion before it becomes a problem	Issues users experience when accessing your service
Examples	Database hitting 95% memory	HTTP endpoint returning 500 errors
Update frequency	Every minute (real-time)	Every 1-10 minutes based on interval

Best practice: Use both together. Xitogent tells you "your server is under stress"; uptime checks tell you "your users can't reach you."

What makes Xitoring different from other monitoring tools?

Automated Everything
- Auto-Discovery finds running services automatically
- Auto-Triggers creates monitoring rules for you
- Auto Fault Tolerance reduces false alerts during brief blips
Root Cause Reporting (unique feature)
- Most tools say "your site is down"
- Xitoring says "down because: SSL handshake failed" or "response missing expected data"
- Cuts incident resolution time in half
Global Probing Nodes
- Worldwide monitoring locations ensures reliable detection
- Even if your ISP has issues, other nodes see the problem
- Detects issues before your customers notice
For Everyone
- Non-technical teams can use it (simple UI)
- Technical teams love the API, CLI, and automation
- One platform for solopreneurs to enterprises

How do the automation features save me time?

Without automation (traditional tools):

Find each running service manually (1-2 hours)
Create monitoring checks for each (30-60 min per check)
Configure triggers manually (10-15 min per check)
Set up notifications (5-10 min per role)
Create monitoring for integrations (database, web server, cache) Total: 4-6 hours per server

With Xitoring automation:

Run one command → Xitogent installs
Auto-Discovery scans → creates uptime checks and shows recommendations (5-10 min)
Enable integrations you need (optional, minutes)
Auto-Triggers recommends thresholds after baseline learning (~24 hours) Total: Initial monitoring in ~15 minutes; optimized triggers after ~24 hours

Tips

Setup for 10 servers + 40 checks + status page takes less than 1 hour per the website.

Getting Started

How do I add my first server?

For Linux:

Go to New Monitoring → Linux Server
Copy the provided command
SSH to your server and run as root
Xitogent installs and registers automatically (< 1 min)
You'll receive a confirmation email

For Windows:

Go to New Monitoring → Windows Server
Copy the command (PowerShell format)
Run as Administrator on your server
Same automated registration as Linux

For multiple servers: Use our Ansible playbook for bulk deployment in minutes.

See Linux Installation or Windows Installation for detailed steps.

What is Xitogent and why do I need it?

Xitogent is a Go-based agent installed directly on your servers. It's very lightweight and collects:

CPU usage, memory usage, disk space, disk I/O
Network traffic and connections
Running processes and services
Integration metrics (database, web server, cache)

Why it matters:

Detects resource exhaustion (before crashes happen)
Identifies which service is consuming CPU/memory
Provides context for incidents
Minimal resource footprint - typically uses < 1% CPU, < 50MB memory
Automatic updates without you lifting a finger

How long until I see data?

Milestone	Timeline	Why
Server registered	Instant	Dashboard shows "Waiting for data"
First data arrives	2-5 minutes	Agent needs time to collect and send metrics
Auto-Discovery completes	5-10 minutes	Xitogent scans all running services
Auto-Triggers recommended	~24 hours	System needs a full-day baseline for recommendations
Full graphs available	10+ minutes	Minimum 5-10 data points needed for graphs

Pro tip: Don't refresh obsessively—give it 10 minutes and you'll have everything.

Can I deploy to multiple servers at once?

Yes! Three options:

Bash script in loop: For same SSH key

for server in 192.168.1.10 192.168.1.11 192.168.1.12; do
  ssh root@$server "$(xitogent-install-command)"
done

Ansible playbook (recommended): Download from dashboard

ansible-playbook xitogent-playbook.yml -i inventory.txt

Manual (Cloud-init): For cloud VMs
- Add Xitogent command to user-data script
- Servers self-register on first boot

See Xitogent Installation for details.

What's the fastest way to get monitoring running?

Target: < 1 hour for 10 servers + 40 checks + status page

Register account (2 min)
Install on first server (2 min)
Wait for Auto-Discovery (10 min) - let it scan
Create 4 uptime checks (5 min) - set basic thresholds now; accept Auto-Trigger recommendations after ~24 hours
Set notification role (5 min) - Email + SMS
Deploy to 9 more servers (15 min using Ansible)
Create status page (5 min) - just pick theme and domain
Test notifications (5 min) - verify alerts work

Total: 50 minutes - less than an hour!

Xitogent Issues & Troubleshooting

Agent installed but "Waiting for data" persists

Normal timeline:

Installed 0-2 minutes ago → Expected, be patient
Installed 2-5 minutes ago → Still loading, should arrive any moment
Installed 5-10 minutes ago → Refresh page, check connection
Installed 10+ minutes ago → Investigate (see below)

If data hasn't arrived after 10 minutes:

Check if service is running:

# Linux
systemctl status xitogent

# Windows (PowerShell as Administrator)
Get-Service -Name Xitogent

Should show "active (running)" or "Running"

Verify network connectivity:

# Can it reach monitoring nodes?
curl -I https://xitoring.com
ping xitoring.com

Check firewall:
- Agent needs outbound HTTPS (port 443) to *.xitoring.com
- Check security groups (AWS), firewall rules (Windows), iptables (Linux)
Run diagnostic:
```
xitogent diagnosis
```
Send output to support if unclear

Tips

See Xitogent Debug for detailed debugging steps.

Agent crashes or keeps restarting

Possible causes:

Corrupted configuration file
Permission issues on log file
Insufficient disk space
Agent version conflict

Solutions:

# Check logs for errors
tail -50 /var/log/xitogent.log

# Verify disk space
df -h /var

# Check if xitogent binary has correct permissions
ls -la /usr/bin/xitogent
# Should show: -rwxr-xr-x

# Restart the service
systemctl restart xitogent

If still crashing, try uninstall and reinstall:

xitogent unregister
# Wait for removal to complete...
# Then reinstall
curl https://xitoring.com/install.sh | sh

Can't register server because "API key missing"

You need a Xitogent Register Key to add servers.

Solution:

Go to Account → API Access
Click Generate Key
Copy it and use with your installation command

Note: Register keys are less powerful than full API keys—they can only add servers, not access all account data.

"Service has been suspended" message

This appears when your account has outstanding invoices.

Fix:

Go to Billing & Subscription → Invoices
Pay all outstanding invoices
Monitoring resumes automatically (usually within minutes)
Agents will resume sending data

Warning

Suspended accounts stop receiving metrics but Xitogent keeps running. Unpaid time doesn't count against uptime %,

Xitogent using too much CPU or memory

Very rare—Xitogent typically uses < 1% CPU and < 50MB memory. If high:

Check for integration loop (rare):

xitogent debug
# Look for errors in integration output

Disable problematic integration:
- Go to Server Settings → Integrations
- Disable any recently enabled integration
- Restart: systemctl restart xitogent

Check if runnable processes are stuck:

ps aux | grep xitogent
# Kill any zombie processes
kill -9 <pid>

Contact support with debug output

Monitoring & Checks

Understanding Root Cause Reports (unique feature!)

When an incident occurs, Xitoring doesn't just say "down"—it tells you why.

Examples:

HTTP Check:

❌ Bad: "Your website is down"
✅ Xitoring: "HTTP check failed: SSL handshake error on certificate mismatch" → Immediately know: "Oh, our SSL cert renewed yesterday!"

Database Integration:

❌ Bad: "Database check failed"
✅ Xitoring: "MongoDB connection timeout—database not responding on port 27017" → Immediately know: "Check if MongoDB service crashed or is busy"

HTTP with response checks:

❌ Bad: "API down"
✅ Xitoring: "HTTP returning 500 (Server Error) with empty response body" → Immediately know: "Application is crashing, check logs"

This cuts incident resolution time by 50% on average.

HTTP/HTTPS check returns false positives or fails randomly

Possible causes:

Fault Tolerance too low
- Default is 1 minute
- Brief network blips trigger incidents
- Solution: Increase to 2-5 minutes
Response time threshold too tight
- Default 5000ms
- Legitimate variance (CDN, load) exceeds threshold
- Solution: Increase to 8000-10000ms for variable services
Service actually is unstable
- Check server metrics in Xitoring
- Review application logs
- Increase monitoring interval (1 min instead of 30 sec)
Application behind load balancer
- Different backends return different responses
- Use "contains" condition instead of exact match
- Or test each backend separately

Fix process:

Click Edit on the check
Click Run Test to see current status
Adjust Response Time or Fault Tolerance
Run Test again
Save changes

Tips

Use the Run Test button before saving—it shows you exactly what the check sees.

Check shows "Down" but service is actually running

Troubleshooting:

Verify check configuration:
- Click Run Test button
- See actual vs expected response
- Check Final Hostname is correct
Common HTTP issues:
- HTTPS check but certificate invalid → fix cert or use HTTP
- Custom port not reachable → verify port number and firewall
- Service requires authentication → add Authorization header
- Response body check failing → verify exact text/HTML
Integration issues:
- Database credentials wrong → test locally: mysql -h host -u user -p
- Firewall blocking port → check security group/iptables
- Service isn't running → systemctl status servicename
All checks look correct?
- Check Trigger condition
- Condition might say "response should NOT contain" when you want "should contain"

Example fix:

# MySQL check failing? Test locally:
mysql -h 192.168.1.10 -u monitoring -p -e "SELECT 1"
# If fails: wrong host, port, user, or password

What to do when "Check has been paused by system"

Xitoring automatically pauses checks stuck down for 3+ days to save resources.

Reason: Infinite incidents are created → massive alert spam → wasted credits

Fix:

Resolve the underlying issue on your service
Go to the check → click Unpause
Run Test to verify it's working
Monitor for 5 minutes to ensure stable

Prevent future pauses:

Review your Trigger configuration—is it too sensitive?
Increase Fault Tolerance—maybe brief outages shouldn't trigger?
For maintenance windows, disable the check or use Maintenance Schedule

Understanding check types (HTTP, DNS, Ping, TCP, UDP, etc.)

HTTP(S): Web services, APIs, REST endpoints

Example: Monitor https://api.example.com/status
Detects: Response time, status code, content matching

PING: Server reachability

Example: Monitor 192.168.1.1
Detects: ICMP packet loss, latency

DNS: Domain resolution

Example: Monitor resolving example.com → 1.2.3.4
Detects: DNS failures, wrong IP, NXDOMAIN

TCP: Port connectivity (any service)

Example: Monitor 192.168.1.1:3306 (MySQL)
Detects: Port open/accepting connections

UDP: Lightweight connectivity (DNS, DHCP, etc.)

Example: Monitor DNS server on 8.8.8.8:53
Detects: UDP port responding

FTP: File Transfer Protocol

Example: Monitor ftp.example.com
Detects: FTP server responsiveness

SMTP/IMAP/POP3: Email servers

Example: Monitor mail.example.com on SMTP:25
Detects: Email server connectivity

Heartbeat: Cron jobs, scheduled tasks

Example: Monitoring cron job must ping this URL every hour
Detects: Cron didn't run (or died mid-execution)

Cronjob: Specialized heartbeat for cron jobs with timeout detection

See Uptime Monitoring for detailed setup for each type.

Triggers, Automation & Notifications

What are Triggers and why do they matter?

A Trigger is a rule that says: "If X condition happens, create an Incident."

Examples:

"If HTTP response time > 5000ms, create incident"
"If HTTP status code is NOT 200, create incident"
"If CPU usage > 85%, create incident"
"If database query response > 10ms, create incident"

Why matter:

Without triggers = no incidents = no alerts
Bad triggers = too many false incidents = alert fatigue
Good triggers = right alerts, right time = fast incident response

Try at least 3 triggers per check for comprehensive monitoring.

Understanding Fault Tolerance (FT) - the "buffer"

Fault Tolerance is a time buffer (in minutes) before an incident is reported.

Example: FT = 5 minutes

Service goes down at 1:00 PM
Xitoring detects it immediately
But doesn't alert until 1:05 PM (5 minute buffer)
Why? Brief network blips shouldn't trigger alerts

Why this matters:

FT = 1 min: Alert for every brief hiccup (false positives, alert fatigue)
FT = 5 min: Only alert for real problems (sweet spot for most)
FT = 15 min: Might miss actual problems (too forgiving)

When to adjust:

Many false alerts → increase FT (5 or 10 min)
Service very critical → decrease FT (1-2 min)
Flaky service → increase FT (10-15 min) or disable check

Tips

Start with FT = 5 minutes. Adjust after seeing real incidents.

What are Auto-Triggers?

When Xitogent first scans your server, it can automatically create monitoring triggers.

Example:

Detects MySQL running on port 3306
Automatically creates: "Alert if MySQL query response > 200ms"
System proposes threshold based on baseline metrics

Benefits:

Don't have to manually create every trigger
Baselines are data-driven, not guesses
Reduces setup time from hours to minutes

What to do when recommended:

Review the suggested triggers
Click to accept the ones that make sense
Edit/delete ones you don't like
Can add more triggers later anytime

See Auto-Triggers for details.

What are Notification Roles?

A Notification Role defines who gets alerted, how, and when.

Example 1 - Engineering Team:

Recipients: dev1@company.com, dev2@company.com
Channels: Email + Slack
Schedule: All day (any time)

Example 2 - On-Call Support:

Recipients: oncall.pagerduty.com
Channels: PagerDuty + SMS (if critical)
Schedule: 8am-6pm weekdays only

Example 3 - Executive:

Recipients: cto@company.com
Channels: Email only
Schedule: Critical incidents only

When assigning to triggers:

Production check → use "On-Call" role
Development check → use "Engineering Team" role
Infrastructure → use "Admin" role

Each trigger can have multiple roles assigned.

Why are my notifications NOT arriving?

Checklist:

Is a Notification Role assigned?
- Go to check → Trigger Options
- Must have a Notification Role selected
- Without one = no alerts sent, ever
Is the channel configured?
- Click role name → verify channels (Email, SMS, Slack, etc.)
- For custom Email → must click confirmation link first
- For Slack → must have authorized Xitoring app
Is the channel actually working?
- Go to Notification Role
- Click Send Test button for each channel
- Verify test message arrives
- If test fails, channel isn't configured correctly
Are you hitting an incident?
- Check if trigger condition is actually true
- Go to check, click Run Test
- Does it meet the trigger condition?
Is the incident filtered out?
- Check Incident Policy
- Some configs throttle repeated incidents
- Check incident history to see if it was created but not alerted

Fix Example:

HTTP check (example.com) down for 10 minutes
→ Trigger: "response time > 5000ms" ✓ (met)
→ Fault Tolerance: 1 min ✓ (exceeded)
→ Notification Role assigned? ✓
→ Role has channels enabled? ✗ (PROBLEM!)
→ Fix: Enable Email or SMS in role
→ Test notification arrives

How can I reduce unnecessary notifications?

Use different channels for different severities:
- Email for "degraded performance"
- SMS for "service down"
- Create separate Notification Roles for different alert levels
Increase Fault Tolerance on non-critical checks:
- More FT = fewer false alerts
- Example: non-critical service FT=10 min instead of 1 min
- See Fault Tolerance definition
Reserve expensive channels for critical incidents:
- Use Email or Slack for low-priority alerts
- Reserve SMS/calls for critical incidents only
Monitor your notification usage:
- Go to Account → Account Usage
- Review which checks are generating alerts
- Adjust thresholds for overly-sensitive checks

Example Role Configuration:

Production Critical Service:
  - Email: Yes (always)
  - Slack: Yes (always)
  - PagerDuty: Yes (when down > 5 min)
  
Non-Critical Service:
  - Email: Yes (always)
  - Slack: Yes (team awareness)

Integrations, Performance & Advanced

What are the +30 integrations and why use them?

Integrations are pre-configured monitoring for specific software. Instead of generic "CPU is high", you get "MySQL slow query log shows 50 queries > 1 second."

By Category:

Databases:

MySQL, PostgreSQL, MongoDB, CouchDB, Redis, KeyDB, InfluxDB, SQLServer

Web/Application Servers:

Nginx, Apache, IIS, PHP-FPM, HAProxy, Varnish, LiteSpeed, OpenLiteSpeed

Message Queues:

RabbitMQ, Kafka

DNS/Network:

CoreDNS, Netstat, WireGuard, OpenVPN

System Tools:

Supervisor, Dovecot, Postfix, Exim, Docker

Why enable integrations:

Default monitoring shows "CPU 45%"
With MySQL integration: "MySQL 234 connections, 10 slow queries, 125MB buffer pool"
Critical for troubleshooting—now you know which app is eating resources

One-command setup:

xitogent integrate mysql --user monitoring --password secure123

See Integrations for setup steps for each.

Integration metrics not showing—how to fix?

First steps:

Is integration enabled in server settings?
- Go to Servers → your server → Integrations
- Verify integration is toggled ON
- Confirm credentials are saved correctly
Has enough time passed?
- First data takes 5-10 minutes to appear
- Graphs need 10+ data points (10-20 min total)
- Refresh page after waiting
Check Xitogent logs for errors:
```
tail -50 /var/log/xitogent.log
```

For integration-specific setup & troubleshooting:

Each integration has unique requirements (credentials, permissions, ports). Refer to the integration-specific documentation:

Each guide covers: setup steps, required credentials/permissions, network requirements, and troubleshooting common issues.

General checklist:

Verify the service is running on expected port
Confirm Xitogent has network access (firewall, security groups)
Check monitoring user has proper permissions
Run xitogent debug and send output to support if still stuck

How do Custom Dashboards help team management?

Instead of everyone seeing all servers, Custom Dashboards let each team see only what they need.

Examples:

Database Team:

Widgets: MySQL graphs, PostgreSQL metrics, Replication status
Hides: Web server metrics, API response times

DevOps Team:

Widgets: Server uptime, incident list, deployment status
Hides: Database internals, mail server details

Management Dashboard:

Widgets: Uptime %, incident count, SLA status
Hides: Technical details, raw metrics

To create:

Go to Dashboards → Create New Dashboard
Add widgets: metrics graphs, status cards, incident logs
Share link with team or make default for sub-account

See Custom Dashboards for details.

Using Sub-Accounts for team members

Sub-Accounts give teammates access without sharing main password.

Example:

Main account: billing@company.com (you)
Sub-account 1: alice@company (DevOps team, full access)
Sub-account 2: bob@company (Frontend only, can view/edit 3 servers)
Sub-account 3: charlie@company (Read-only, view only)

Access levels:

Full Access: Can change everything, including billing
Restricted: Can only manage assigned servers/checks
View-Only: Can see data but not edit

Setup:

Go to Account → Team Management
Click Add Sub-Account
Enter email and set access type
They receive invitation email

See Team Management for detailed team management.

Creating public Status Pages for customers

Status Pages show customers your service status and incident history. Builds trust and reduces support burden.

Benefits:

Reduces support emails ("Is your site down?")
Sets expectations (shows when maintenance happening)
Improves customer perception (transparency)
Includes uptime % and SLA status

What you can customize:

Logo and company branding
Color scheme (light/dark mode)
Custom domain (status.yourcompany.com)
Announcement banner
Which checks display

Setup:

Go to Status Page → Create New
Choose public or private (private = password protected)
Select which checks to display
Customize branding
Share link with customers

See Status Pages for full setup.

Using the mobile app for on-the-go monitoring

The Xitoring mobile app (iOS/Android) gives you full monitoring access from anywhere.

Features:

Real-time dashboard with service status
Incident list and details
Live metrics graphs
Push notifications (instant alerts)
Manual incident actions (resolve, add notes)

Install:

Notifications:

Login with your Xitoring account
Enable push notifications in settings
Alerts arrive instantly even if app is closed

Perfect for on-call engineers—phone buzzes when incidents happen.

Advanced Topics

Using the API for automation

The Xitoring API lets you programmatically:

Create, update, delete servers and checks
Manage Triggers
Create Incidents manually
Fetch metrics and history
Manage Sub-Accounts

Common use cases:

# Auto-create HTTP check for new deployment
curl -X POST https://api.xitoring.com/v1/checks \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "type": "http",
    "url": "https://newapp.example.com"
  }'

# Fetch all open incidents
curl https://api.xitoring.com/v1/incidents \
  -H "Authorization: Bearer $API_KEY"

# Manually create incident
curl -X POST https://api.xitoring.com/v1/incidents \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"check_id": 123, "reason": "Manual test"}'

See API Documentation for full reference.

Setting up Maintenance Schedules

Maintenance Schedules pause monitoring during planned downtime to prevent false incidents.

Example:

Database migration: 2024-03-15, 2:00 AM - 4:00 AM EST
Create schedule for that window
Xitoring won't create incidents during that time
Customers see "Maintenance" status instead of "Down"

Setup:

Go to Maintenance Schedules
Click Create New
Select servers/checks to pause
Set date, time, duration
Add optional description for status page
Save

See Maintenance Schedules for details.

Understanding SLA and uptime calculations

Uptime % is calculated as: (total time - downtime) / total time × 100%

Examples:

1 hour downtime in 1 month (720 hours) = 99.86% uptime
3 hours downtime in 1 year (8760 hours) = 99.97% uptime (common target "four nines")

During Maintenance Schedules:

Downtime doesn't count against uptime %
You can have "99.99% uptime" even with maintenance

Uptime % targets:

99% = ~7 hours downtime/year (acceptable for internal tools)
99.5% = ~3.5 hours downtime/year (good for business services)
99.9% = ~43min downtime/year (target for most SaaS)
99.99% = ~4min downtime/year (required for mission-critical)

Troubleshooting Guide

Still stuck? Here's how to get help

Before contacting support, gather:

Debug output:
```
xitogent debug > xitogent-debug.txt
```
Error description:
- What are you trying to do?
- What happened instead?
- When did it start?
Screenshots:
- Of the issue in dashboard
- Of error messages
Reproduction steps:
- Step 1: Click X
- Step 2: Fill in Y
- Step 3: Expected Z but got W

Create a support ticket:

Go to Support Tickets in account
Click Create New Ticket
Include the 4 items above
Send

Response time: Usually 12-24 hours (business hours)

Email support: support@xitoring.com

Warning

Never share debug output publicly—it contains API keys and credentials. Only share with Xitoring support.

Key Resources

Topic	Link
Products	Server Monitoring • Uptime Monitoring • SSL Monitoring
Setup	Getting Started • Linux Installation • Windows Installation
Automation	Auto-Discovery • Auto-Triggers • Auto Fault Tolerance
Alerts	Notifications • Notification Roles • Incidents
Integrations	+30 Integrations • Nginx • MySQL
Advanced	Status Pages • Custom Dashboards • API
Glossary	Complete Terminology

Still Have Questions?

We're here to help! Try these:

Search the Glossary - 60+ terms with definitions
Check Documentation - Detailed guides for every feature
Review Release Notes - What's new and changed
Create Support Ticket - Our team responds within 24 hours

Tips

Pro tip: Bookmark the Glossary and FAQ for quick reference during setup!