System Monitoring Indicator Observation Standards

Title: System Monitoring Metric Observation Standards

Core Infrastructure Monitoring (CIM): average CPU utilization, duration of CPU peak usage, average memory usage, bandwidth input/output, etc.
Application-Level Monitoring (ALM): JVM process memory, number of internal threads, disk I/O, index read/write operations, user logs, request logs, request error counts, etc.
Service Quality Monitoring (SQM): maximum request latency, average request latency, average request rate per minute, peak daily request rate, order count, query count, etc.

Core Metric	Metric	Description	Standard
Application	cpu	CPU utilization – proportion of time spent executing non‑idle processes (non‑idle CPU time ÷ total CPU time)	60%
	memory	Memory usage – used vs. available space; pay attention to total, used, free, etc. `free + buffers + cached` represents available memory. Too low can trigger full GC (FGC) and affect system response	60%
	disk	Disk I/O – how busy the disk is; I/O load reflects system load and can become an application bottleneck	60%
	load	`load.1minPerCPU`	`load.5minPerCPU` – CPU load per core
	oldGC	`full_gc_count` – number of full GCs	2 times per day
	swap	`mem.swapused.percent` – swap usage percentage	10%
Service Quality	failure_rate	Interface failure count ÷ total interface calls	0.01%
	error_count	Number of interface errors	—
	average_response_time	Total time from when a user sends a request to when the response is fully received	—
	TP999	Minimum latency guaranteeing that 99.9% of requests are responded to within this time	—
	qps	Requests per second (queries per second)	—
	business_data	Drop‑zero alarm – trigger an alarm when the call volume is zero over a period	0
	range_fluctuation	Set threshold ranges for metrics to ensure normal fluctuations within a defined interval	Context‑dependent
	data_accuracy	Compare data across strongly related systems (e.g., compare B‑side and C‑side data of a group‑buying platform)	Context‑dependent

Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.