YES! WORKS/DELIVERS
No-DOESN’T WORK/DELIVER
Sure, WORKS BUT NO VALUE
Alerting (Tells us when something is wrong/broken/down)
Troubleshooting (Helps us find and fix the problem)
Tuning and Capacity Planning (Helps us make things (the apps/infra) better)
Rate (Request rate, in requests/sec)
Errors (Error rate, in errors/sec)
Latency (Response time, including queue/wait time, in milliseconds.)
Saturation (How overloaded something is, which is related to utilization but more directly measured by things like queue depth (or sometimes concurrency). As a queue measurement, this becomes non-zero when you are saturated, often not much before. Usually a counter.)
Utilization (How busy the resource or system is. Usually expressed 0–100% and most useful for predictions (as Saturation is probably more useful). Note we are not using the Utilization Law to get this (~Rate x Service Time / Workers), but instead looking for more familiar direct measurements.
Database ease of integration