monitoring: Increase alert time range (#17269)

Increase the time range for the
src_repoupdater_sched_update_queue_length metric from 30 minutes to 2
hours.

This should stop false positives due to expected spikes in the update
queue length when updates are queued due to age.
This commit is contained in:
Ryan Slade 2021-01-14 15:26:08 +02:00 committed by GitHub
parent f449be875f
commit 41a72d82b3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 2 additions and 2 deletions

View File

@ -3150,7 +3150,7 @@ with your code hosts connections or networking issues affecting communication wi
**Descriptions:**
- <span class="badge badge-critical">critical</span> repo-updater: 0+ rate of growth of update queue length over 5 minutes for 30m0s
- <span class="badge badge-critical">critical</span> repo-updater: 0+ rate of growth of update queue length over 5 minutes for 2h0m0s
**Possible solutions:**

View File

@ -184,7 +184,7 @@ func RepoUpdater() *monitoring.Container {
Description: "rate of growth of update queue length over 5 minutes",
Query: `max(deriv(src_repoupdater_sched_update_queue_length[5m]))`,
// Alert if the derivative is positive for longer than 30 minutes
Critical: monitoring.Alert().Greater(0).For(30 * time.Minute),
Critical: monitoring.Alert().Greater(0).For(120 * time.Minute),
Panel: monitoring.Panel().Unit(monitoring.Number),
Owner: monitoring.ObservableOwnerCloud,
PossibleSolutions: "Check repo-updater logs for indications that the queue is not being processed. The queue length should trend downwards over time as items are sent to GitServer",