| Parameter |
Description |
OK |
WARNING |
CRITICAL |
Threshold Warning |
Threshold Critical |
| Check run_tools script |
Checks the successful execution or status of a specific system maintenance or utility script. |
Last execution completed successfully within expected time |
Execution completed with warnings or exceeded expected runtime |
Script failed, did not run, timed out, or returned an error |
|
|
| CPU Idle |
Checks the percentage of CPU time that is not being used by any process. |
CPU idle is within normal range |
CPU idle is low for a sustained period |
CPU idle is critically low; CPU is saturated |
<20% idle |
<10% idle |
| CPU Temperature |
Monitors the temperature of the CPU cores and the server case for overheating. |
All CPU cores < 60°C |
Any CPU core >= 60°C |
Any CPU core >= 70°C |
>=60°C |
>=70°C |
| Current Load |
Monitors the average system load (number of processes waiting for CPU time) over 1, 5, and 15 minutes. |
Load average below warning thresholds (1/5/15) |
Load average >= 125,120,115 (1/5/15) |
Load average >= 150,145,140 (1/5/15) |
125,120,115 |
150,145,140 |
| Current Users |
Checks the number of users currently logged into the system. |
Logged-in users <= 5 |
Logged-in users > 5 |
Logged-in users > 10 |
5 |
10 |
| Memory Usage |
Monitors the total system RAM usage (used vs. free memory). |
Free/available memory is >= 10% |
Free/available memory is < 10% |
Free/available memory is < 5% |
<10% free |
<5% free |
| Swap Usage |
Monitors the utilization of the swap space (virtual memory on disk). |
Swap free is >= 80% (or no active swapping) |
Swap free is < 80% (or moderate active swapping) |
Swap free is < 50% (or heavy active swapping) |
<80% free |
<50% free |
| Total Processes |
Checks the total number of processes currently running on the system. |
Total processes < 2000 |
Total processes >= 2000 |
Total processes >= 2500 |
2000 |
2500 |
| Zombie Processes |
Monitors the number of processes that have terminated but have not yet been reaped by their parent process. |
Zombie processes < 5 |
Zombie processes >= 5 |
Zombie processes >= 10 |
5 |
10 |
| APACHE Status |
Monitors the status and performance metrics of the Apache web server (e.g., number of workers, idle threads). |
Apache is running and responding normally |
Apache is running but under high load / degraded performance |
Apache is down or not responding |
|
|
| Check FTP |
Checks the availability and response time of the File Transfer Protocol (FTP) service. |
FTP service reachable and responding |
FTP responding slowly or intermittently |
FTP service unreachable or down |
|
|
| Check Mailq |
Monitors the size or status of the mail queue for pending emails. |
Mail queue size within normal limits |
Mail queue is growing / above warning threshold |
Mail queue extremely large or stuck (not decreasing) |
|
|
| Check MySQL |
Monitors the size or status of the mail queue for pending emails. |
MySQL reachable and basic health checks pass |
MySQL reachable but performance/replication indicators degraded |
MySQL down, refusing connections, or health check failed |
|
|
| Check RBL |
Checks if the server's IP address is listed on any common Real-time Blackhole Lists (RBLs) for spam. |
IP address is not listed on common RBLs |
- |
IP address is listed on one or more RBLs |
|
|
| Check SSL cert |
Monitors the expiration date and validity of the SSL certificate for a specified domain. |
Certificate valid and not close to expiry |
Certificate expiring soon |
Certificate expired, invalid, or hostname mismatch |
|
| Memcached Status |
Checks the availability and performance metrics of the Memcached key-value store. |
Memcached reachable and responding |
Memcached responding slowly or intermittently |
Memcached unreachable or down |
> 200 ms |
No response / connection failed |
| IPTABLES Status |
Checks the running status of the IPTABLES firewall service. |
Firewall service is running and rules are loaded |
- |
Firewall service stopped and/or rules missing |
|
|
| PING |
Checks if the server is reachable on the network, measures packet loss, and Round Trip Time (RTA). |
Host reachable with acceptable latency and no/low loss |
Packet loss and/or latency above warning threshold |
Host unreachable or loss/latency at critical level |
|
|
| Portscan |
Checks whether specific (expected) ports are open or closed on the server. |
All required ports are in the expected state |
Non-critical port mismatch detected |
Critical port mismatch (required closed/open) detected |
|
|
| Check Raid |
Checks the health and status of the server's hardware or software RAID array. |
RAID array is healthy/optimal |
RAID is degraded but still operational |
RAID failed or data is at risk |
|
|
| Check Smart sda |
Checks the Self-Monitoring, Analysis and Reporting Technology (SMART) status for the disk drive sda. |
SMART status OK |
SMART indicates degrading attributes |
SMART reports failure or imminent failure |
|
|
| Free Space Storage |
Checks the available free disk space on a primary storage mount point. |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space home |
Checks the available free disk space on the /home directory partition. |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space mysql |
Checks the available free disk space on the partition hosting MySQL data. |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space root |
Checks the available free disk space on the / (root) partition. |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space usr |
Checks the available free disk space on the /usr partition. |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space var |
Checks the available free disk space on the /var partition (often used for logs). |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| Free Space www |
Checks the available free disk space on the partition hosting web content (e.g., /var/www). |
Free space is >= 20% |
Free space is < 20% |
Free space is < 10% |
<20% free |
<10% free |
| I/O sda |
Monitors the input/output operations and performance for disk drive sda. |
Disk I/O latency and utilization within normal limits |
High disk I/O latency/utilization detected |
Disk I/O saturated/timeouts; severe performance impact |
Latency >400ms <=800ms |
Latency>800ms |