Home

System Monitoring Indicator Observation Standards

Li

Li Wei

August 10, 20253 min read

Title: System Monitoring Metric Observation Standards

Metric Categories

  • Core Infrastructure Monitoring (CIM): average CPU utilization, duration of CPU peak usage, average memory usage, bandwidth input/output, etc.
  • Application-Level Monitoring (ALM): JVM process memory, number of internal threads, disk I/O, index read/write operations, user logs, request logs, request error counts, etc.
  • Service Quality Monitoring (SQM): maximum request latency, average request latency, average request rate per minute, peak daily request rate, order count, query count, etc.

Alarm Metric References

Core Metric Metric Description Standard
Application cpu CPU utilization – proportion of time spent executing non‑idle processes (non‑idle CPU time ÷ total CPU time) 60%
memory Memory usage – used vs. available space; pay attention to total, used, free, etc. free + buffers + cached represents available memory. Too low can trigger full GC (FGC) and affect system response 60%
disk Disk I/O – how busy the disk is; I/O load reflects system load and can become an application bottleneck 60%
load load.1minPerCPU load.5minPerCPU – CPU load per core
oldGC full_gc_count – number of full GCs 2 times per day
swap mem.swapused.percent – swap usage percentage 10%
Service Quality failure_rate Interface failure count ÷ total interface calls 0.01%
error_count Number of interface errors
average_response_time Total time from when a user sends a request to when the response is fully received
TP999 Minimum latency guaranteeing that 99.9% of requests are responded to within this time
qps Requests per second (queries per second)
business_data Drop‑zero alarm – trigger an alarm when the call volume is zero over a period 0
range_fluctuation Set threshold ranges for metrics to ensure normal fluctuations within a defined interval Context‑dependent
data_accuracy Compare data across strongly related systems (e.g., compare B‑side and C‑side data of a group‑buying platform) Context‑dependent

Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.