Prometheus Monitoring

Metric Types

TypeDescriptionExample
CounterMonotonically increasing; never decreaseshttp_requests_total, errors_total
GaugeCan go up and downmemory_usage_bytes, active_connections
HistogramObserves and buckets values; calculates quantileshttp_request_duration_seconds
SummaryClient-side quantile calculation over sliding windowrpc_duration_seconds

PromQL Examples

# Instant vector - current value http_requests_total # With label filter http_requests_total{job="api", status="200"} # Range vector - last 5 minutes http_requests_total[5m] # Rate (per-second rate over 5m) rate(http_requests_total[5m]) # Error rate rate(http_requests_errors_total[5m]) / rate(http_requests_total[5m]) # 95th percentile latency histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) # Sum by label sum(rate(http_requests_total[5m])) by (service) # Average memory usage avg(container_memory_usage_bytes) by (pod) # CPU usage percentage 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Alerting Rules

# alerts.yml groups: - name: api-alerts rules: - alert: HighErrorRate expr: | rate(http_requests_errors_total[5m]) / rate(http_requests_total[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate on {{ $labels.service }}" description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)" - alert: HighLatency expr: | histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]) ) > 2 for: 10m labels: severity: warning annotations: summary: "High p95 latency: {{ $value }}s" - alert: InstanceDown expr: up == 0 for: 1m labels: severity: critical