| Parameter |
Description |
OK |
WARNING |
CRITICAL |
| Inodes Free / |
Monitors the percentage of free inodes on the root filesystem. |
Free Inodes >= 20% |
Free Inodes < 10% |
Free Inodes < 5% |
| Context Switches |
Monitors the number of CPU context switches per second. |
< 5000 / sec |
>= 10000 / sec |
>= 20000 / sec |
| OOM Killer Events |
Checks system logs for processes killed by the Out-Of-Memory killer. |
No OOM events |
- |
OOM event detected in dmesg |
| Entropy Level |
Monitors available entropy for cryptographic operations (SSL/SSH). |
Entropy > 2500 |
Entropy < 1000 |
Entropy < 200 |
| File Descriptors |
Percentage of open file descriptors against the system limit. |
Usage < 60% |
Usage >= 80% |
Usage >= 95% |
| System Uptime |
Monitors the time since the last system reboot. |
Uptime < 180 days |
Uptime > 365 days |
- |
| NTP Offset |
Measures the time difference between the server and NTP reference. |
Offset < 50ms |
Offset >= 100ms |
Offset >= 500ms |
| Kernel Taints |
Checks if the kernel has loaded non-standard or proprietary modules. |
No taints |
- |
Kernel is TAINTED |
| Swap In Rate |
Frequency of reading data from Swap back to RAM. |
0 - 5 pages/sec |
> 20 pages/sec |
> 100 pages/sec |
| Swap Out Rate |
Frequency of writing data from RAM to Swap. |
0 - 10 pages/sec |
> 50 pages/sec |
> 200 pages/sec |
| Shared Memory |
Monitors the utilization of shared memory (shm). |
Usage < 50% |
Usage >= 75% |
Usage >= 90% |
| Process Limit |
Total number of processes vs. the OS maximum limit. |
Processes < 70% limit |
Processes >= 85% limit |
Processes >= 95% limit |
| Orphaned Processes |
Monitors processes whose parent has died but they remain active. |
Count < 5 |
Count >= 10 |
Count >= 20 |
| System Load 1 min |
Average system load over the last 1 minute. |
Load < CPU Cores |
Load >= CPU Cores * 1.5 |
Load >= CPU Cores * 2 |
| System Load 15 min |
Average system load over the last 15 minutes (long-term trend). |
Load < CPU Cores * 0.8 |
Load >= CPU Cores |
Load >= CPU Cores * 1.5 |
| Disk Read Latency |
Response time for disk read operations. |
< 10ms |
>= 20ms |
>= 50ms |
| Disk Write Latency |
Response time for disk write operations. |
< 15ms |
>= 30ms |
>= 70ms |
| I/O Wait % |
Percentage of CPU time spent waiting for I/O operations. |
< 5% |
>= 12% |
>= 25% |
| SMART Reallocated |
Number of sectors moved due to hardware defects. |
Count = 0 |
Count > 0 |
Count >= 10 |
| SSD Wear Leveling |
Remaining life of the SSD/NVMe drive. |
Wear < 80% |
Wear >= 90% |
Wear >= 98% |
| Read-Only FS |
Checks if any mounted partition has switched to read-only mode. |
All partitions RW |
- |
Read-only partition found |
| Large Log Files |
Searches for individual log files exceeding a size threshold. |
Files < 2GB |
Any file >= 5GB |
Any file >= 10GB |
| Directory Size /tmp |
Monitors the size of the temporary directory /tmp. |
Size < 1GB |
Size >= 2GB |
Size >= 5GB |
| Disk Queue Depth |
Number of pending requests to the disk controller. |
Queue < 2 |
Queue >= 5 |
Queue >= 10 |
| IOPS Read |
Number of read operations per second. |
Within baseline |
2x Baseline |
5x Baseline |
| IOPS Write |
Number of write operations per second. |
Within baseline |
2x Baseline |
5x Baseline |
| Deleted Handles |
Space taken by deleted files that are still held open by processes. |
Space < 100MB |
Space >= 1GB |
Space >= 5GB |
| NFS Mount Status |
Connectivity status of remote network mounts (NFS). |
Connected & Responding |
Responding slowly |
Disconnected / Timeout |
| RAID Rebuild |
Status of RAID array recovery after disk replacement. |
No rebuild needed |
Rebuild in progress |
Rebuild Stalled / Failed |
| SMART Temperature |
Internal temperature of the storage device. |
Temp < 45°C |
Temp >= 55°C |
Temp >= 65°C |
| Interface RX Errors |
Number of errors receiving packets on the network interface. |
0 errors |
> 5 / min |
> 50 / min |
| Interface TX Errors |
Number of errors transmitting packets. |
0 errors |
> 5 / min |
> 50 / min |
| TCP Time Wait |
Number of connections in the TIME_WAIT state. |
< 2000 |
>= 5000 |
>= 10000 |
| DNS Latency |
Time taken to resolve an external domain name. |
< 50ms |
>= 200ms |
>= 1000ms |
| Retransmission |
Percentage of TCP packets retransmitted (packet loss indicator). |
< 0.5% |
>= 2% |
>= 5% |
| Bandwidth Inbound |
Incoming traffic utilization of the network interface. |
Usage < 70% |
Usage >= 85% |
Usage >= 95% |
| Bandwidth Outbound |
Outgoing traffic utilization of the network interface. |
Usage < 70% |
Usage >= 85% |
Usage >= 95% |
| UDP Buffer Errors |
Errors in the UDP buffer (critical for VoIP/Streaming). |
0 errors |
> 10 / min |
> 100 / min |
| SSH Failed Logins |
Number of failed SSH login attempts. |
< 5 per 10 min |
>= 20 per 10 min |
>= 50 (Brute force) |
| Root Login Event |
Detects if a session was successfully opened as 'root'. |
No root logins |
- |
Root user logged in |
| New Listening Ports |
Checks for unauthorized new open ports on the server. |
No new ports |
- |
Unauthorized port open |
| Sudo Usage |
Monitors execution of commands with sudo privileges. |
Regular usage |
- |
Unusual sudo activity |
| World Writable |
Number of files in system folders that are world-writable. |
Count = 0 |
Count > 0 |
Count >= 5 |
| Security Updates |
Number of pending security patches in the package manager. |
0 updates |
> 0 updates |
Critical patches pending |
| Modified Binaries |
Integrity check of system binary files (e.g., /bin/ps). |
Matches |
- |
Mismatch detected |
| SSL Expiry (Days) |
Days remaining before the SSL certificate expires. |
> 30 days |
<= 15 days |
<= 7 days |
| HTTP 5xx Rate |
Frequency of server errors (Internal Server Error) in web logs. |
0 per min |
>= 5 per min |
>= 20 per min |
| HTTP 4xx Rate |
Frequency of client errors (Not Found / Forbidden). |
< 10 per min |
>= 50 per min |
>= 200 per min |
| PHP-FPM Workers |
Percentage of active PHP-FPM worker processes. |
Usage < 70% |
Usage >= 85% |
Usage >= 95% |
| PHP-FPM Slow Logs |
Number of entries for slow-executing PHP scripts. |
0 logs |
> 5 per min |
> 20 per min |
| Nginx Active Conn |
Number of current active connections to Nginx. |
< 1000 |
>= 5000 |
>= 10000 |
| Varnish Hit Rate |
Percentage of requests served from Varnish cache. |
Hit Rate > 80% |
Hit Rate < 60% |
Hit Rate < 30% |
| TTFB |
Time to First Byte measured from the web application. |
< 200ms |
>= 500ms |
>= 2000ms |
| Apache Idle |
Number of free Apache workers available for new requests. |
Workers > 20 |
Workers < 10 |
Workers < 3 |
| SSL Protocol |
Checks for weak or deprecated SSL/TLS protocols. |
Only TLS 1.2/1.3 |
- |
Weak protocols enabled |
| Gzip Compression |
Checks if Gzip/Brotli compression is active for web content. |
Enabled |
- |
DISABLED |
| Mail Queue Size |
Number of emails waiting in the system queue (Postfix). |
< 50 emails |
>= 100 emails |
>= 500 emails |
| Bounced Emails |
Number of returned (undelivered) emails in the last hour. |
< 10 |
>= 50 |
>= 200 |
| Cron Job Status |
Result of the last execution of critical system tasks. |
Success |
Warning in logs |
Failed / Not started |
| Backup Age |
Time elapsed since the last successful backup. |
< 26 hours |
>= 30 hours |
>= 48 hours |
| MySQL Slow Queries |
Number of queries taking longer than 2 seconds to execute. |
0 per min |
>= 5 per min |
>= 20 per min |
| MySQL Threads |
Number of active database connections/threads. |
< 100 |
>= 300 |
>= 500 |
| MySQL Buffer Hit |
Efficiency of the InnoDB Buffer Pool cache. |
Hit Rate > 95% |
Hit Rate < 90% |
Hit Rate < 80% |
| Slave Lag |
Seconds the database slave is behind the master. |
0 sec |
>= 60 sec |
>= 300 sec |
| DB Size Growth |
Database size increase rate over a 24-hour period. |
< 1GB |
>= 5GB |
>= 10GB |
| Redis Frag. Ratio |
Memory fragmentation ratio in Redis. |
Ratio 1.0 - 1.2 |
Ratio > 1.5 |
Ratio > 2.0 |
| Redis Memory |
Percentage of allocated RAM used by the Redis process. |
Usage < 70% |
Usage >= 85% |
Usage >= 95% |
| Redis Evictions |
Number of keys evicted due to memory limits. |
0 per min |
> 100 per min |
> 1000 per min |
| Postgres Deadlocks |
Number of detected transaction deadlocks in PostgreSQL. |
0 deadlocks |
> 2 |
> 5 |
| Postgres Rollback |
Percentage of failed transactions that were rolled back. |
< 1% |
>= 5% |
>= 10% |
| MongoDB Oplog |
Remaining time coverage in the MongoDB operation log. |
> 24 hours |
< 6 hours |
< 2 hours |
| Table Scans |
Rate of full table scans (not using an index). |
Low rate |
High rate |
Critical impact |
| Connection Time |
Time taken to establish a connection to the DB socket. |
< 10ms |
>= 50ms |
>= 200ms |
| Temp Tables |
Number of temporary tables created on disk instead of RAM. |
Low rate |
Increase detected |
High disk impact |
| Uptime DB |
Time since the database service was last started. |
> 24 hours |
< 1 hour |
Service crashed |
| Running Container |
Number of active containers vs. expected count. |
Match expected |
Mismatch |
None running |
| Restarts |
Number of unexpected container restarts per hour. |
0 restarts |
>= 3 restarts |
>= 10 restarts |
| CPU Limit |
CPU utilization relative to the container limit. |
Usage < 80% |
Usage >= 90% |
Usage >= 100% |
| Image Usage |
Total disk space used by Docker images and layers. |
< 20GB |
>= 50GB |
>= 100GB |
| CPU Fan Speed |
Rotational speed of the CPU cooling fan. |
> 1000 RPM |
< 500 RPM |
0 RPM (Failure) |
| PSU Status |
Status of power supply units (for redundant setups). |
Both PSUs OK |
One PSU Failed |
Both Offline / UPS |
| Voltage Vcore |
Stability of the voltage supplied to the processor. |
Within 5% range |
Outside 5% range |
Outside 10% range |
| Chassis Intrusion |
Sensor detecting if the server case has been opened. |
Closed |
- |
CASE OPENED |
| GPU Temperature |
Temperature of the Graphics Processing Unit. |
< 60°C |
>= 75°C |
>= 85°C |
| GPU VRAM |
Memory utilization of the video RAM. |
Usage < 70% |
Usage >= 85% |
Usage >= 95% |
| UPS Battery |
Charge level of the Uninterruptible Power Supply. |
100% |
< 50% |
< 20% |
| UPS Load % |
Current power load percentage on the UPS. |
Load < 60% |
Load >= 80% |
Load >= 95% |
| Ambient Temp |
Room temperature surrounding the server rack. |
< 22°C |
>= 26°C |
>= 30°C |
| CPU Throttling |
Detects if CPU is lowering frequency due to heat. |
No throttling |
- |
Throttling ACTIVE |
| CMOS Battery |
Status of the motherboard backup battery. |
Voltage OK |
Voltage Low |
Replace Battery |
| Open Handles |
Total number of open files across the whole OS. |
< 50000 |
>= 100000 |
>= 200000 |
| Shell Access |
Number of users with an active /bin/bash shell. |
Expected count |
- |
Unauthorized user |
| Log Growth |
Rate of log file growth per hour. |
< 100MB |
>= 500MB |
>= 1GB (Flood) |
| Ghost Process |
Processes running without an executable on disk. |
0 found |
- |
Ghost detected |
| Page In |
Rate of pages read from disk into RAM. |
Low rate |
High rate |
Degradation |
| Page Out |
Rate of pages written to disk from RAM. |
Low rate |
High rate |
Memory Exhaustion |
| SYN Floods |
Monitors potential SYN Flood DoS attacks. |
< 10 SYN_RECV |
>= 50 SYN_RECV |
>= 200 SYN_RECV |
| RBL Blacklist |
Checks if the server IP is listed in RBLs. |
Not listed |
Listed in 1 RBL |
Multiple RBLs |
| Root Mailbox |
Size of the local mailbox for the root user. |
< 10MB |
>= 50MB |
>= 100MB |
| Zombie Threads |
Threads that have not been properly cleaned up. |
Count < 10 |
>= 50 |
>= 100 |