FAQ & Troubleshooting
This page covers frequently asked questions and common issues you may encounter when using Xitoring. If you don't find your answer here, check our documentation or create a support ticket.
Understanding Xitoring
What is Xitoring and why choose it?
Xitoring is an all-in-one infrastructure monitoring platform that combines:
- Server Monitoring - Real-time metrics from your servers (CPU, memory, disk)
- Uptime Monitoring - External availability checks from global nodes
- SSL Monitoring - Certificate expiration and validity tracking
- Status Pages - Public/private pages showing your service status
Tips
Why all-in-one matters: You manage one dashboard, one billing account, and one monitoring strategy—not juggling 3-4 separate tools.
What's the difference between Server Monitoring and Uptime Monitoring?
| Aspect | Server Monitoring | Uptime Monitoring |
|---|---|---|
| How it works | Xitogent installed on your server | External checks from global nodes |
| What it measures | Internal metrics: CPU, memory, disk, I/O, processes | Service availability: response time, status codes, uptime % |
| Detects | Resource exhaustion before it becomes a problem | Issues users experience when accessing your service |
| Examples | Database hitting 95% memory | HTTP endpoint returning 500 errors |
| Update frequency | Every minute (real-time) | Every 1-10 minutes based on interval |
Best practice: Use both together. Xitogent tells you "your server is under stress"; uptime checks tell you "your users can't reach you."
What makes Xitoring different from other monitoring tools?
Automated Everything
- Auto-Discovery finds running services automatically
- Auto-Triggers creates monitoring rules for you
- Auto Fault Tolerance reduces false alerts during brief blips
Root Cause Reporting (unique feature)
- Most tools say "your site is down"
- Xitoring says "down because: SSL handshake failed" or "response missing expected data"
- Cuts incident resolution time in half
Global Probing Nodes
- Worldwide monitoring locations ensures reliable detection
- Even if your ISP has issues, other nodes see the problem
- Detects issues before your customers notice
For Everyone
- Non-technical teams can use it (simple UI)
- Technical teams love the API, CLI, and automation
- One platform for solopreneurs to enterprises
How do the automation features save me time?
Without automation (traditional tools):
- Find each running service manually (1-2 hours)
- Create monitoring checks for each (30-60 min per check)
- Configure triggers manually (10-15 min per check)
- Set up notifications (5-10 min per role)
- Create monitoring for integrations (database, web server, cache) Total: 4-6 hours per server
With Xitoring automation:
- Run one command → Xitogent installs
- Auto-Discovery scans → creates uptime checks and shows recommendations (5-10 min)
- Enable integrations you need (optional, minutes)
- Auto-Triggers recommends thresholds after baseline learning (~24 hours) Total: Initial monitoring in ~15 minutes; optimized triggers after ~24 hours
Tips
Setup for 10 servers + 40 checks + status page takes less than 1 hour per the website.
Getting Started
How do I add my first server?
For Linux:
- Go to New Monitoring → Linux Server
- Copy the provided command
- SSH to your server and run as root
- Xitogent installs and registers automatically (< 1 min)
- You'll receive a confirmation email
For Windows:
- Go to New Monitoring → Windows Server
- Copy the command (PowerShell format)
- Run as Administrator on your server
- Same automated registration as Linux
For multiple servers: Use our Ansible playbook for bulk deployment in minutes.
See Linux Installation or Windows Installation for detailed steps.
What is Xitogent and why do I need it?
Xitogent is a Go-based agent installed directly on your servers. It's very lightweight and collects:
- CPU usage, memory usage, disk space, disk I/O
- Network traffic and connections
- Running processes and services
- Integration metrics (database, web server, cache)
Why it matters:
- Detects resource exhaustion (before crashes happen)
- Identifies which service is consuming CPU/memory
- Provides context for incidents
- Minimal resource footprint - typically uses < 1% CPU, < 50MB memory
- Automatic updates without you lifting a finger
How long until I see data?
| Milestone | Timeline | Why |
|---|---|---|
| Server registered | Instant | Dashboard shows "Waiting for data" |
| First data arrives | 2-5 minutes | Agent needs time to collect and send metrics |
| Auto-Discovery completes | 5-10 minutes | Xitogent scans all running services |
| Auto-Triggers recommended | ~24 hours | System needs a full-day baseline for recommendations |
| Full graphs available | 10+ minutes | Minimum 5-10 data points needed for graphs |
Pro tip: Don't refresh obsessively—give it 10 minutes and you'll have everything.
Can I deploy to multiple servers at once?
Yes! Three options:
Bash script in loop: For same SSH key
for server in 192.168.1.10 192.168.1.11 192.168.1.12; do ssh root@$server "$(xitogent-install-command)" doneAnsible playbook (recommended): Download from dashboard
ansible-playbook xitogent-playbook.yml -i inventory.txtManual (Cloud-init): For cloud VMs
- Add Xitogent command to user-data script
- Servers self-register on first boot
See Xitogent Installation for details.
What's the fastest way to get monitoring running?
Target: < 1 hour for 10 servers + 40 checks + status page
- Register account (2 min)
- Install on first server (2 min)
- Wait for Auto-Discovery (10 min) - let it scan
- Create 4 uptime checks (5 min) - set basic thresholds now; accept Auto-Trigger recommendations after ~24 hours
- Set notification role (5 min) - Email + SMS
- Deploy to 9 more servers (15 min using Ansible)
- Create status page (5 min) - just pick theme and domain
- Test notifications (5 min) - verify alerts work
Total: 50 minutes - less than an hour!
Xitogent Issues & Troubleshooting
Agent installed but "Waiting for data" persists
Normal timeline:
- Installed 0-2 minutes ago → Expected, be patient
- Installed 2-5 minutes ago → Still loading, should arrive any moment
- Installed 5-10 minutes ago → Refresh page, check connection
- Installed 10+ minutes ago → Investigate (see below)
If data hasn't arrived after 10 minutes:
Check if service is running:
# Linux systemctl status xitogent # Windows (PowerShell as Administrator) Get-Service -Name XitogentShould show "active (running)" or "Running"
Verify network connectivity:
# Can it reach monitoring nodes? curl -I https://xitoring.com ping xitoring.comCheck firewall:
- Agent needs outbound HTTPS (port 443) to
*.xitoring.com - Check security groups (AWS), firewall rules (Windows), iptables (Linux)
- Agent needs outbound HTTPS (port 443) to
Run diagnostic:
xitogent diagnosisSend output to support if unclear
Tips
See Xitogent Debug for detailed debugging steps.
Agent crashes or keeps restarting
Possible causes:
- Corrupted configuration file
- Permission issues on log file
- Insufficient disk space
- Agent version conflict
Solutions:
# Check logs for errors
tail -50 /var/log/xitogent.log
# Verify disk space
df -h /var
# Check if xitogent binary has correct permissions
ls -la /usr/bin/xitogent
# Should show: -rwxr-xr-x
# Restart the service
systemctl restart xitogent
If still crashing, try uninstall and reinstall:
xitogent unregister
# Wait for removal to complete...
# Then reinstall
curl https://xitoring.com/install.sh | sh
Can't register server because "API key missing"
You need a Xitogent Register Key to add servers.
Solution:
- Go to Account → API Access
- Click Generate Key
- Copy it and use with your installation command
Note: Register keys are less powerful than full API keys—they can only add servers, not access all account data.
"Service has been suspended" message
This appears when your account has outstanding invoices.
Fix:
- Go to Billing & Subscription → Invoices
- Pay all outstanding invoices
- Monitoring resumes automatically (usually within minutes)
- Agents will resume sending data
Warning
Suspended accounts stop receiving metrics but Xitogent keeps running. Unpaid time doesn't count against uptime %,
Xitogent using too much CPU or memory
Very rare—Xitogent typically uses < 1% CPU and < 50MB memory. If high:
Check for integration loop (rare):
xitogent debug # Look for errors in integration outputDisable problematic integration:
- Go to Server Settings → Integrations
- Disable any recently enabled integration
- Restart:
systemctl restart xitogent
Check if runnable processes are stuck:
ps aux | grep xitogent # Kill any zombie processes kill -9 <pid>Contact support with debug output
Monitoring & Checks
Understanding Root Cause Reports (unique feature!)
When an incident occurs, Xitoring doesn't just say "down"—it tells you why.
Examples:
HTTP Check:
- ❌ Bad: "Your website is down"
- ✅ Xitoring: "HTTP check failed: SSL handshake error on certificate mismatch" → Immediately know: "Oh, our SSL cert renewed yesterday!"
Database Integration:
- ❌ Bad: "Database check failed"
- ✅ Xitoring: "MongoDB connection timeout—database not responding on port 27017" → Immediately know: "Check if MongoDB service crashed or is busy"
HTTP with response checks:
- ❌ Bad: "API down"
- ✅ Xitoring: "HTTP returning 500 (Server Error) with empty response body" → Immediately know: "Application is crashing, check logs"
This cuts incident resolution time by 50% on average.
HTTP/HTTPS check returns false positives or fails randomly
Possible causes:
Fault Tolerance too low
- Default is 1 minute
- Brief network blips trigger incidents
- Solution: Increase to 2-5 minutes
Response time threshold too tight
- Default 5000ms
- Legitimate variance (CDN, load) exceeds threshold
- Solution: Increase to 8000-10000ms for variable services
Service actually is unstable
- Check server metrics in Xitoring
- Review application logs
- Increase monitoring interval (1 min instead of 30 sec)
Application behind load balancer
- Different backends return different responses
- Use "contains" condition instead of exact match
- Or test each backend separately
Fix process:
- Click Edit on the check
- Click Run Test to see current status
- Adjust Response Time or Fault Tolerance
- Run Test again
- Save changes
Tips
Use the Run Test button before saving—it shows you exactly what the check sees.
Check shows "Down" but service is actually running
Troubleshooting:
Verify check configuration:
- Click Run Test button
- See actual vs expected response
- Check Final Hostname is correct
Common HTTP issues:
- HTTPS check but certificate invalid → fix cert or use HTTP
- Custom port not reachable → verify port number and firewall
- Service requires authentication → add Authorization header
- Response body check failing → verify exact text/HTML
Integration issues:
- Database credentials wrong → test locally:
mysql -h host -u user -p - Firewall blocking port → check security group/iptables
- Service isn't running →
systemctl status servicename
- Database credentials wrong → test locally:
All checks look correct?
- Check Trigger condition
- Condition might say "response should NOT contain" when you want "should contain"
Example fix:
# MySQL check failing? Test locally:
mysql -h 192.168.1.10 -u monitoring -p -e "SELECT 1"
# If fails: wrong host, port, user, or password
What to do when "Check has been paused by system"
Xitoring automatically pauses checks stuck down for 3+ days to save resources.
Reason: Infinite incidents are created → massive alert spam → wasted credits
Fix:
- Resolve the underlying issue on your service
- Go to the check → click Unpause
- Run Test to verify it's working
- Monitor for 5 minutes to ensure stable
Prevent future pauses:
- Review your Trigger configuration—is it too sensitive?
- Increase Fault Tolerance—maybe brief outages shouldn't trigger?
- For maintenance windows, disable the check or use Maintenance Schedule
Understanding check types (HTTP, DNS, Ping, TCP, UDP, etc.)
HTTP(S): Web services, APIs, REST endpoints
Example: Monitor https://api.example.com/status
Detects: Response time, status code, content matching
PING: Server reachability
Example: Monitor 192.168.1.1
Detects: ICMP packet loss, latency
DNS: Domain resolution
Example: Monitor resolving example.com → 1.2.3.4
Detects: DNS failures, wrong IP, NXDOMAIN
TCP: Port connectivity (any service)
Example: Monitor 192.168.1.1:3306 (MySQL)
Detects: Port open/accepting connections
UDP: Lightweight connectivity (DNS, DHCP, etc.)
Example: Monitor DNS server on 8.8.8.8:53
Detects: UDP port responding
FTP: File Transfer Protocol
Example: Monitor ftp.example.com
Detects: FTP server responsiveness
SMTP/IMAP/POP3: Email servers
Example: Monitor mail.example.com on SMTP:25
Detects: Email server connectivity
Heartbeat: Cron jobs, scheduled tasks
Example: Monitoring cron job must ping this URL every hour
Detects: Cron didn't run (or died mid-execution)
Cronjob: Specialized heartbeat for cron jobs with timeout detection
See Uptime Monitoring for detailed setup for each type.
Triggers, Automation & Notifications
What are Triggers and why do they matter?
A Trigger is a rule that says: "If X condition happens, create an Incident."
Examples:
- "If HTTP response time > 5000ms, create incident"
- "If HTTP status code is NOT 200, create incident"
- "If CPU usage > 85%, create incident"
- "If database query response > 10ms, create incident"
Why matter:
- Without triggers = no incidents = no alerts
- Bad triggers = too many false incidents = alert fatigue
- Good triggers = right alerts, right time = fast incident response
Try at least 3 triggers per check for comprehensive monitoring.
Understanding Fault Tolerance (FT) - the "buffer"
Fault Tolerance is a time buffer (in minutes) before an incident is reported.
Example: FT = 5 minutes
- Service goes down at 1:00 PM
- Xitoring detects it immediately
- But doesn't alert until 1:05 PM (5 minute buffer)
- Why? Brief network blips shouldn't trigger alerts
Why this matters:
- FT = 1 min: Alert for every brief hiccup (false positives, alert fatigue)
- FT = 5 min: Only alert for real problems (sweet spot for most)
- FT = 15 min: Might miss actual problems (too forgiving)
When to adjust:
- Many false alerts → increase FT (5 or 10 min)
- Service very critical → decrease FT (1-2 min)
- Flaky service → increase FT (10-15 min) or disable check
Tips
Start with FT = 5 minutes. Adjust after seeing real incidents.
What are Auto-Triggers?
When Xitogent first scans your server, it can automatically create monitoring triggers.
Example:
- Detects MySQL running on port 3306
- Automatically creates: "Alert if MySQL query response > 200ms"
- System proposes threshold based on baseline metrics
Benefits:
- Don't have to manually create every trigger
- Baselines are data-driven, not guesses
- Reduces setup time from hours to minutes
What to do when recommended:
- Review the suggested triggers
- Click to accept the ones that make sense
- Edit/delete ones you don't like
- Can add more triggers later anytime
See Auto-Triggers for details.
What are Notification Roles?
A Notification Role defines who gets alerted, how, and when.
Example 1 - Engineering Team:
- Recipients: dev1@company.com, dev2@company.com
- Channels: Email + Slack
- Schedule: All day (any time)
Example 2 - On-Call Support:
- Recipients: oncall.pagerduty.com
- Channels: PagerDuty + SMS (if critical)
- Schedule: 8am-6pm weekdays only
Example 3 - Executive:
- Recipients: cto@company.com
- Channels: Email only
- Schedule: Critical incidents only
When assigning to triggers:
- Production check → use "On-Call" role
- Development check → use "Engineering Team" role
- Infrastructure → use "Admin" role
Each trigger can have multiple roles assigned.
Why are my notifications NOT arriving?
Checklist:
Is a Notification Role assigned?
- Go to check → Trigger Options
- Must have a Notification Role selected
- Without one = no alerts sent, ever
Is the channel configured?
- Click role name → verify channels (Email, SMS, Slack, etc.)
- For custom Email → must click confirmation link first
- For Slack → must have authorized Xitoring app
Is the channel actually working?
- Go to Notification Role
- Click Send Test button for each channel
- Verify test message arrives
- If test fails, channel isn't configured correctly
Are you hitting an incident?
- Check if trigger condition is actually true
- Go to check, click Run Test
- Does it meet the trigger condition?
Is the incident filtered out?
- Check Incident Policy
- Some configs throttle repeated incidents
- Check incident history to see if it was created but not alerted
Fix Example:
HTTP check (example.com) down for 10 minutes
→ Trigger: "response time > 5000ms" ✓ (met)
→ Fault Tolerance: 1 min ✓ (exceeded)
→ Notification Role assigned? ✓
→ Role has channels enabled? ✗ (PROBLEM!)
→ Fix: Enable Email or SMS in role
→ Test notification arrives
How can I reduce unnecessary notifications?
Use different channels for different severities:
- Email for "degraded performance"
- SMS for "service down"
- Create separate Notification Roles for different alert levels
Increase Fault Tolerance on non-critical checks:
- More FT = fewer false alerts
- Example: non-critical service FT=10 min instead of 1 min
- See Fault Tolerance definition
Reserve expensive channels for critical incidents:
- Use Email or Slack for low-priority alerts
- Reserve SMS/calls for critical incidents only
Monitor your notification usage:
- Go to Account → Account Usage
- Review which checks are generating alerts
- Adjust thresholds for overly-sensitive checks
Example Role Configuration:
Production Critical Service:
- Email: Yes (always)
- Slack: Yes (always)
- PagerDuty: Yes (when down > 5 min)
Non-Critical Service:
- Email: Yes (always)
- Slack: Yes (team awareness)
Integrations, Performance & Advanced
What are the +30 integrations and why use them?
Integrations are pre-configured monitoring for specific software. Instead of generic "CPU is high", you get "MySQL slow query log shows 50 queries > 1 second."
By Category:
Databases:
- MySQL, PostgreSQL, MongoDB, CouchDB, Redis, KeyDB, InfluxDB, SQLServer
Web/Application Servers:
- Nginx, Apache, IIS, PHP-FPM, HAProxy, Varnish, LiteSpeed, OpenLiteSpeed
Message Queues:
- RabbitMQ, Kafka
DNS/Network:
- CoreDNS, Netstat, WireGuard, OpenVPN
System Tools:
- Supervisor, Dovecot, Postfix, Exim, Docker
Why enable integrations:
- Default monitoring shows "CPU 45%"
- With MySQL integration: "MySQL 234 connections, 10 slow queries, 125MB buffer pool"
- Critical for troubleshooting—now you know which app is eating resources
One-command setup:
xitogent integrate mysql --user monitoring --password secure123
See Integrations for setup steps for each.
Integration metrics not showing—how to fix?
First steps:
Is integration enabled in server settings?
- Go to Servers → your server → Integrations
- Verify integration is toggled ON
- Confirm credentials are saved correctly
Has enough time passed?
- First data takes 5-10 minutes to appear
- Graphs need 10+ data points (10-20 min total)
- Refresh page after waiting
Check Xitogent logs for errors:
tail -50 /var/log/xitogent.log
For integration-specific setup & troubleshooting:
Each integration has unique requirements (credentials, permissions, ports). Refer to the integration-specific documentation:
- MySQL Integration Guide
- PostgreSQL Integration Guide
- MongoDB Integration Guide
- Redis Integration Guide
- View all 30+ integrations
Each guide covers: setup steps, required credentials/permissions, network requirements, and troubleshooting common issues.
General checklist:
- Verify the service is running on expected port
- Confirm Xitogent has network access (firewall, security groups)
- Check monitoring user has proper permissions
- Run
xitogent debugand send output to support if still stuck
How do Custom Dashboards help team management?
Instead of everyone seeing all servers, Custom Dashboards let each team see only what they need.
Examples:
Database Team:
- Widgets: MySQL graphs, PostgreSQL metrics, Replication status
- Hides: Web server metrics, API response times
DevOps Team:
- Widgets: Server uptime, incident list, deployment status
- Hides: Database internals, mail server details
Management Dashboard:
- Widgets: Uptime %, incident count, SLA status
- Hides: Technical details, raw metrics
To create:
- Go to Dashboards → Create New Dashboard
- Add widgets: metrics graphs, status cards, incident logs
- Share link with team or make default for sub-account
See Custom Dashboards for details.
Using Sub-Accounts for team members
Sub-Accounts give teammates access without sharing main password.
Example:
Main account: billing@company.com (you)
Sub-account 1: alice@company (DevOps team, full access)
Sub-account 2: bob@company (Frontend only, can view/edit 3 servers)
Sub-account 3: charlie@company (Read-only, view only)
Access levels:
- Full Access: Can change everything, including billing
- Restricted: Can only manage assigned servers/checks
- View-Only: Can see data but not edit
Setup:
- Go to Account → Team Management
- Click Add Sub-Account
- Enter email and set access type
- They receive invitation email
See Team Management for detailed team management.
Creating public Status Pages for customers
Status Pages show customers your service status and incident history. Builds trust and reduces support burden.
Benefits:
- Reduces support emails ("Is your site down?")
- Sets expectations (shows when maintenance happening)
- Improves customer perception (transparency)
- Includes uptime % and SLA status
What you can customize:
- Logo and company branding
- Color scheme (light/dark mode)
- Custom domain (status.yourcompany.com)
- Announcement banner
- Which checks display
Setup:
- Go to Status Page → Create New
- Choose public or private (private = password protected)
- Select which checks to display
- Customize branding
- Share link with customers
See Status Pages for full setup.
Using the mobile app for on-the-go monitoring
The Xitoring mobile app (iOS/Android) gives you full monitoring access from anywhere.
Features:
- Real-time dashboard with service status
- Incident list and details
- Live metrics graphs
- Push notifications (instant alerts)
- Manual incident actions (resolve, add notes)
Install:
Notifications:
- Login with your Xitoring account
- Enable push notifications in settings
- Alerts arrive instantly even if app is closed
Perfect for on-call engineers—phone buzzes when incidents happen.
Advanced Topics
Using the API for automation
The Xitoring API lets you programmatically:
- Create, update, delete servers and checks
- Manage Triggers
- Create Incidents manually
- Fetch metrics and history
- Manage Sub-Accounts
Common use cases:
# Auto-create HTTP check for new deployment
curl -X POST https://api.xitoring.com/v1/checks \
-H "Authorization: Bearer $API_KEY" \
-d '{
"type": "http",
"url": "https://newapp.example.com"
}'
# Fetch all open incidents
curl https://api.xitoring.com/v1/incidents \
-H "Authorization: Bearer $API_KEY"
# Manually create incident
curl -X POST https://api.xitoring.com/v1/incidents \
-H "Authorization: Bearer $API_KEY" \
-d '{"check_id": 123, "reason": "Manual test"}'
See API Documentation for full reference.
Setting up Maintenance Schedules
Maintenance Schedules pause monitoring during planned downtime to prevent false incidents.
Example:
- Database migration: 2024-03-15, 2:00 AM - 4:00 AM EST
- Create schedule for that window
- Xitoring won't create incidents during that time
- Customers see "Maintenance" status instead of "Down"
Setup:
- Go to Maintenance Schedules
- Click Create New
- Select servers/checks to pause
- Set date, time, duration
- Add optional description for status page
- Save
See Maintenance Schedules for details.
Understanding SLA and uptime calculations
Uptime % is calculated as: (total time - downtime) / total time × 100%
Examples:
- 1 hour downtime in 1 month (720 hours) = 99.86% uptime
- 3 hours downtime in 1 year (8760 hours) = 99.97% uptime (common target "four nines")
During Maintenance Schedules:
- Downtime doesn't count against uptime %
- You can have "99.99% uptime" even with maintenance
Uptime % targets:
- 99% = ~7 hours downtime/year (acceptable for internal tools)
- 99.5% = ~3.5 hours downtime/year (good for business services)
- 99.9% = ~43min downtime/year (target for most SaaS)
- 99.99% = ~4min downtime/year (required for mission-critical)
Troubleshooting Guide
Still stuck? Here's how to get help
Before contacting support, gather:
Debug output:
xitogent debug > xitogent-debug.txtError description:
- What are you trying to do?
- What happened instead?
- When did it start?
Screenshots:
- Of the issue in dashboard
- Of error messages
Reproduction steps:
- Step 1: Click X
- Step 2: Fill in Y
- Step 3: Expected Z but got W
Create a support ticket:
- Go to Support Tickets in account
- Click Create New Ticket
- Include the 4 items above
- Send
Response time: Usually 12-24 hours (business hours)
Email support: support@xitoring.com
Warning
Never share debug output publicly—it contains API keys and credentials. Only share with Xitoring support.
Key Resources
| Topic | Link |
|---|---|
| Products | Server Monitoring • Uptime Monitoring • SSL Monitoring |
| Setup | Getting Started • Linux Installation • Windows Installation |
| Automation | Auto-Discovery • Auto-Triggers • Auto Fault Tolerance |
| Alerts | Notifications • Notification Roles • Incidents |
| Integrations | +30 Integrations • Nginx • MySQL |
| Advanced | Status Pages • Custom Dashboards • API |
| Glossary | Complete Terminology |
Still Have Questions?
We're here to help! Try these:
- Search the Glossary - 60+ terms with definitions
- Check Documentation - Detailed guides for every feature
- Review Release Notes - What's new and changed
- Create Support Ticket - Our team responds within 24 hours