diff --git a/doc/admin/observability/alerts.md b/doc/admin/observability/alerts.md index 6c4173a605d..4f9d832ad02 100644 --- a/doc/admin/observability/alerts.md +++ b/doc/admin/observability/alerts.md @@ -2221,7 +2221,7 @@ Generated query for critical alert: `min((sum by (app) (up{app=~".*(pgsql|codein **Descriptions** -- critical precise-code-intel-worker: 18000s+ unprocessed upload record queue longest time in queue +- warning precise-code-intel-worker: 18000s+ unprocessed upload record queue longest time in queue **Next steps** @@ -2233,7 +2233,7 @@ count being required for the volume of uploads. ```json "observability.silenceAlerts": [ - "critical_precise-code-intel-worker_codeintel_upload_queued_max_age" + "warning_precise-code-intel-worker_codeintel_upload_queued_max_age" ] ``` @@ -2242,7 +2242,7 @@ count being required for the volume of uploads.
Technical details -Generated query for critical alert: `max((max(src_codeintel_upload_queued_duration_seconds_total{job=~"^precise-code-intel-worker.*"})) >= 18000)` +Generated query for warning alert: `max((max(src_codeintel_upload_queued_duration_seconds_total{job=~"^precise-code-intel-worker.*"})) >= 18000)`
diff --git a/monitoring/definitions/shared/codeintel.go b/monitoring/definitions/shared/codeintel.go index 9649c09abee..b1e3ab3c7e2 100644 --- a/monitoring/definitions/shared/codeintel.go +++ b/monitoring/definitions/shared/codeintel.go @@ -66,7 +66,7 @@ func (codeIntelligence) NewUploadQueueGroup(containerName string) monitoring.Gro }, QueueSize: NoAlertsOption("none"), - QueueMaxAge: CriticalOption(monitoring.Alert().GreaterOrEqual((time.Hour * 5).Seconds()), ` + QueueMaxAge: WarningOption(monitoring.Alert().GreaterOrEqual((time.Hour * 5).Seconds()), ` An alert here could be indicative of a few things: an upload surfacing a pathological performance characteristic, precise-code-intel-worker being underprovisioned for the required upload processing throughput, or a higher replica count being required for the volume of uploads.