Списък с всички допълнителни параметри, които можете да добавите към услугата мониторинг за Linux среда

Parameter Description OK WARNING CRITICAL
Inodes Free / Monitors the percentage of free inodes on the root filesystem. Free Inodes >= 20% Free Inodes < 10% Free Inodes < 5%
Context Switches Monitors the number of CPU context switches per second. < 5000 / sec >= 10000 / sec >= 20000 / sec
OOM Killer Events Checks system logs for processes killed by the Out-Of-Memory killer. No OOM events - OOM event detected in dmesg
Entropy Level Monitors available entropy for cryptographic operations (SSL/SSH). Entropy > 2500 Entropy < 1000 Entropy < 200
File Descriptors Percentage of open file descriptors against the system limit. Usage < 60% Usage >= 80% Usage >= 95%
System Uptime Monitors the time since the last system reboot. Uptime < 180 days Uptime > 365 days -
NTP Offset Measures the time difference between the server and NTP reference. Offset < 50ms Offset >= 100ms Offset >= 500ms
Kernel Taints Checks if the kernel has loaded non-standard or proprietary modules. No taints - Kernel is TAINTED
Swap In Rate Frequency of reading data from Swap back to RAM. 0 - 5 pages/sec > 20 pages/sec > 100 pages/sec
Swap Out Rate Frequency of writing data from RAM to Swap. 0 - 10 pages/sec > 50 pages/sec > 200 pages/sec
Shared Memory Monitors the utilization of shared memory (shm). Usage < 50% Usage >= 75% Usage >= 90%
Process Limit Total number of processes vs. the OS maximum limit. Processes < 70% limit Processes >= 85% limit Processes >= 95% limit
Orphaned Processes Monitors processes whose parent has died but they remain active. Count < 5 Count >= 10 Count >= 20
System Load 1 min Average system load over the last 1 minute. Load < CPU Cores Load >= CPU Cores * 1.5 Load >= CPU Cores * 2
System Load 15 min Average system load over the last 15 minutes (long-term trend). Load < CPU Cores * 0.8 Load >= CPU Cores Load >= CPU Cores * 1.5
Disk Read Latency Response time for disk read operations. < 10ms >= 20ms >= 50ms
Disk Write Latency Response time for disk write operations. < 15ms >= 30ms >= 70ms
I/O Wait % Percentage of CPU time spent waiting for I/O operations. < 5% >= 12% >= 25%
SMART Reallocated Number of sectors moved due to hardware defects. Count = 0 Count > 0 Count >= 10
SSD Wear Leveling Remaining life of the SSD/NVMe drive. Wear < 80% Wear >= 90% Wear >= 98%
Read-Only FS Checks if any mounted partition has switched to read-only mode. All partitions RW - Read-only partition found
Large Log Files Searches for individual log files exceeding a size threshold. Files < 2GB Any file >= 5GB Any file >= 10GB
Directory Size /tmp Monitors the size of the temporary directory /tmp. Size < 1GB Size >= 2GB Size >= 5GB
Disk Queue Depth Number of pending requests to the disk controller. Queue < 2 Queue >= 5 Queue >= 10
IOPS Read Number of read operations per second. Within baseline 2x Baseline 5x Baseline
IOPS Write Number of write operations per second. Within baseline 2x Baseline 5x Baseline
Deleted Handles Space taken by deleted files that are still held open by processes. Space < 100MB Space >= 1GB Space >= 5GB
NFS Mount Status Connectivity status of remote network mounts (NFS). Connected & Responding Responding slowly Disconnected / Timeout
RAID Rebuild Status of RAID array recovery after disk replacement. No rebuild needed Rebuild in progress Rebuild Stalled / Failed
SMART Temperature Internal temperature of the storage device. Temp < 45°C Temp >= 55°C Temp >= 65°C
Interface RX Errors Number of errors receiving packets on the network interface. 0 errors > 5 / min > 50 / min
Interface TX Errors Number of errors transmitting packets. 0 errors > 5 / min > 50 / min
TCP Time Wait Number of connections in the TIME_WAIT state. < 2000 >= 5000 >= 10000
DNS Latency Time taken to resolve an external domain name. < 50ms >= 200ms >= 1000ms
Retransmission Percentage of TCP packets retransmitted (packet loss indicator). < 0.5% >= 2% >= 5%
Bandwidth Inbound Incoming traffic utilization of the network interface. Usage < 70% Usage >= 85% Usage >= 95%
Bandwidth Outbound Outgoing traffic utilization of the network interface. Usage < 70% Usage >= 85% Usage >= 95%
UDP Buffer Errors Errors in the UDP buffer (critical for VoIP/Streaming). 0 errors > 10 / min > 100 / min
SSH Failed Logins Number of failed SSH login attempts. < 5 per 10 min >= 20 per 10 min >= 50 (Brute force)
Root Login Event Detects if a session was successfully opened as 'root'. No root logins - Root user logged in
New Listening Ports Checks for unauthorized new open ports on the server. No new ports - Unauthorized port open
Sudo Usage Monitors execution of commands with sudo privileges. Regular usage - Unusual sudo activity
World Writable Number of files in system folders that are world-writable. Count = 0 Count > 0 Count >= 5
Security Updates Number of pending security patches in the package manager. 0 updates > 0 updates Critical patches pending
Modified Binaries Integrity check of system binary files (e.g., /bin/ps). Matches - Mismatch detected
SSL Expiry (Days) Days remaining before the SSL certificate expires. > 30 days <= 15 days <= 7 days
HTTP 5xx Rate Frequency of server errors (Internal Server Error) in web logs. 0 per min >= 5 per min >= 20 per min
HTTP 4xx Rate Frequency of client errors (Not Found / Forbidden). < 10 per min >= 50 per min >= 200 per min
PHP-FPM Workers Percentage of active PHP-FPM worker processes. Usage < 70% Usage >= 85% Usage >= 95%
PHP-FPM Slow Logs Number of entries for slow-executing PHP scripts. 0 logs > 5 per min > 20 per min
Nginx Active Conn Number of current active connections to Nginx. < 1000 >= 5000 >= 10000
Varnish Hit Rate Percentage of requests served from Varnish cache. Hit Rate > 80% Hit Rate < 60% Hit Rate < 30%
TTFB Time to First Byte measured from the web application. < 200ms >= 500ms >= 2000ms
Apache Idle Number of free Apache workers available for new requests. Workers > 20 Workers < 10 Workers < 3
SSL Protocol Checks for weak or deprecated SSL/TLS protocols. Only TLS 1.2/1.3 - Weak protocols enabled
Gzip Compression Checks if Gzip/Brotli compression is active for web content. Enabled - DISABLED
Mail Queue Size Number of emails waiting in the system queue (Postfix). < 50 emails >= 100 emails >= 500 emails
Bounced Emails Number of returned (undelivered) emails in the last hour. < 10 >= 50 >= 200
Cron Job Status Result of the last execution of critical system tasks. Success Warning in logs Failed / Not started
Backup Age Time elapsed since the last successful backup. < 26 hours >= 30 hours >= 48 hours
MySQL Slow Queries Number of queries taking longer than 2 seconds to execute. 0 per min >= 5 per min >= 20 per min
MySQL Threads Number of active database connections/threads. < 100 >= 300 >= 500
MySQL Buffer Hit Efficiency of the InnoDB Buffer Pool cache. Hit Rate > 95% Hit Rate < 90% Hit Rate < 80%
Slave Lag Seconds the database slave is behind the master. 0 sec >= 60 sec >= 300 sec
DB Size Growth Database size increase rate over a 24-hour period. < 1GB >= 5GB >= 10GB
Redis Frag. Ratio Memory fragmentation ratio in Redis. Ratio 1.0 - 1.2 Ratio > 1.5 Ratio > 2.0
Redis Memory Percentage of allocated RAM used by the Redis process. Usage < 70% Usage >= 85% Usage >= 95%
Redis Evictions Number of keys evicted due to memory limits. 0 per min > 100 per min > 1000 per min
Postgres Deadlocks Number of detected transaction deadlocks in PostgreSQL. 0 deadlocks > 2 > 5
Postgres Rollback Percentage of failed transactions that were rolled back. < 1% >= 5% >= 10%
MongoDB Oplog Remaining time coverage in the MongoDB operation log. > 24 hours < 6 hours < 2 hours
Table Scans Rate of full table scans (not using an index). Low rate High rate Critical impact
Connection Time Time taken to establish a connection to the DB socket. < 10ms >= 50ms >= 200ms
Temp Tables Number of temporary tables created on disk instead of RAM. Low rate Increase detected High disk impact
Uptime DB Time since the database service was last started. > 24 hours < 1 hour Service crashed
Running Container Number of active containers vs. expected count. Match expected Mismatch None running
Restarts Number of unexpected container restarts per hour. 0 restarts >= 3 restarts >= 10 restarts
CPU Limit CPU utilization relative to the container limit. Usage < 80% Usage >= 90% Usage >= 100%
Image Usage Total disk space used by Docker images and layers. < 20GB >= 50GB >= 100GB
CPU Fan Speed Rotational speed of the CPU cooling fan. > 1000 RPM < 500 RPM 0 RPM (Failure)
PSU Status Status of power supply units (for redundant setups). Both PSUs OK One PSU Failed Both Offline / UPS
Voltage Vcore Stability of the voltage supplied to the processor. Within 5% range Outside 5% range Outside 10% range
Chassis Intrusion Sensor detecting if the server case has been opened. Closed - CASE OPENED
GPU Temperature Temperature of the Graphics Processing Unit. < 60°C >= 75°C >= 85°C
GPU VRAM Memory utilization of the video RAM. Usage < 70% Usage >= 85% Usage >= 95%
UPS Battery Charge level of the Uninterruptible Power Supply. 100% < 50% < 20%
UPS Load % Current power load percentage on the UPS. Load < 60% Load >= 80% Load >= 95%
Ambient Temp Room temperature surrounding the server rack. < 22°C >= 26°C >= 30°C
CPU Throttling Detects if CPU is lowering frequency due to heat. No throttling - Throttling ACTIVE
CMOS Battery Status of the motherboard backup battery. Voltage OK Voltage Low Replace Battery
Open Handles Total number of open files across the whole OS. < 50000 >= 100000 >= 200000
Shell Access Number of users with an active /bin/bash shell. Expected count - Unauthorized user
Log Growth Rate of log file growth per hour. < 100MB >= 500MB >= 1GB (Flood)
Ghost Process Processes running without an executable on disk. 0 found - Ghost detected
Page In Rate of pages read from disk into RAM. Low rate High rate Degradation
Page Out Rate of pages written to disk from RAM. Low rate High rate Memory Exhaustion
SYN Floods Monitors potential SYN Flood DoS attacks. < 10 SYN_RECV >= 50 SYN_RECV >= 200 SYN_RECV
RBL Blacklist Checks if the server IP is listed in RBLs. Not listed Listed in 1 RBL Multiple RBLs
Root Mailbox Size of the local mailbox for the root user. < 10MB >= 50MB >= 100MB
Zombie Threads Threads that have not been properly cleaned up. Count < 10 >= 50 >= 100