Monitoring and alerts
Metrics endpoint
- Prometheus metrics:
GET /actuator/prometheus - Health probes:
GET /actuator/health/livenessGET /actuator/health/readiness
Configuration
DWARVENPICK_METRICS_PROMETHEUS_ENABLED(default:true) enables the Prometheus endpoint.- Helm chart value:
metrics.prometheus.enabled
Kubernetes scraping
The Helm chart exposes the backend on .Values.service.port (default 8080). Configure your Prometheus instance to scrape:
http://<service-name>:8080/actuator/prometheus
If you use Prometheus Operator, create a ServiceMonitor that targets the backend Service and port.
Key metrics
- Query lifecycle:
dwarvenpick_query_active{status="queued|running"}dwarvenpick_query_execution_total{outcome=...}dwarvenpick_query_duration_seconds{outcome=...}dwarvenpick_query_cancel_totaldwarvenpick_query_timeout_total
- Exports:
dwarvenpick_query_export_attempts_total{outcome=...}
- Auth:
dwarvenpick_auth_login_attempts_total{provider=...,outcome=...}
- Pools:
dwarvenpick_pool_activedwarvenpick_pool_idledwarvenpick_pool_total
Recommended alerts
- High query failure rate:
- Trigger:
increase(dwarvenpick_query_execution_total{outcome="failed"}[5m]) / increase(dwarvenpick_query_execution_total[5m]) > 0.2
- Trigger:
- Query timeout burst:
- Trigger:
increase(dwarvenpick_query_timeout_total[5m]) > 10
- Trigger:
- High queue pressure:
- Trigger:
dwarvenpick_query_active{status="queued"} > 20for 10m
- Trigger:
- Pool saturation:
- Trigger:
dwarvenpick_pool_active / dwarvenpick_pool_total > 0.9for 5m
- Trigger:
- Login failure surge:
- Trigger:
increase(dwarvenpick_auth_login_attempts_total{outcome="failed"}[5m]) > 25
- Trigger: