Currently, build-tracker keeps track of consecutive build failures
through an in-memory store of failed builds. As this gets deployed more
frequently on MSP, we lose state more frequently which would result in
incorrect results. Instead, we can use redis as our external store as
well as for locking using redsync
## Test plan
Unit tests have been updated, but proper testing will require live
traffic
## Changelog
<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
2nd attempt of #63111, a follow up
https://github.com/sourcegraph/sourcegraph/pull/63085
rules_oci 2.0 brings a lot of performance improvement around oci_image
and oci_pull, which will benefit Sourcegraph. It will also make RBE
faster and have less load on remote cache.
However, 2.0 makes some breaking changes like
- oci_tarball's default output is no longer a tarball
- oci_image no longer compresses layers that are uncompressed, somebody
has to make sure all `pkg_tar` targets have a `compression` attribute
set to compress it beforehand.
- there is no curl fallback, but this is fine for sourcegraph as it
already uses bazel 7.1.
I checked all targets that use oci_tarball as much as i could to make
sure nothing depends on the default tarball output of oci_tarball. there
was one target which used the default output which i put a TODO for
somebody else (somebody who is more on top of the repo) to tackle
**later**.
## Test plan
CI. Also run delivery on this PR (don't land those changes)
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
This PR is a result/followup of the improvements we've made in the [SAMS repo](https://github.com/sourcegraph/sourcegraph-accounts/pull/199) that allows call sites to pass down a context (primarily to indicate deadline, and of course, cancellation if desired) and collects the error returned from `background.Routine`s `Stop` method.
Note that I did not adopt returning error from `Stop` method because I realize in monorepo, the more common (and arguably the desired) pattern is to hang on the call of `Start` method until `Stop` is called, so it is meaningless to collect errors from `Start` methods as return values anyway, and doing that would also complicate the design and semantics more than necessary.
All usages of the the `background.Routine` and `background.CombinedRoutines` are updated, I DID NOT try to interpret the code logic and make anything better other than fixing compile and test errors.
The only file that contains the core change is the [`lib/background/background.go`](https://github.com/sourcegraph/sourcegraph/pull/62136/files#diff-65c3228388620e91f8c22d91c18faac3f985fc67d64b08612df18fa7c04fafcd).
TIL BigQuery has a native "repeated" type, so we dont have to comma separate this out : )
Not impacting any existing data as the current version isnt live yet (due to an msp misconfiguration) and the bigquery tables not being created yet)
## Test plan
CI and live
This isnt exactly convenient lol
```
DEVX_TRIGGERED_FROM_BRANCH_URL="/tree/main"
DEVX_TRIGGERED_FROM_BUILD_ID="018f343b-5cfc-420c-9353-02821de45533"
DEVX_TRIGGERED_FROM_BUILD_NUMBER="271872"
DEVX_TRIGGERED_FROM_COMMIT="6d7082d26ee772be9cd3fd2b463c0f33a95ee7dc"
DEVX_TRIGGERED_FROM_COMMIT_URL="/commit/6d7082d26ee772be9cd3fd2b463c0f33a95ee7dc"
DEVX_TRIGGERED_FROM_PIPELINE_SLUG="sourcegraph"
DEVX_TRIGGERED_FROM_PR_NUMBER="0"
DEVX_TRIGGERED_FROM_PR_URL="/pull/0"
```
## Test plan
Live :letsgo:
As we're changing the devx metrics pipeline to checkout devx-service instead of sg/sg (as we've no need for sg/sg anymore, and do need to checkout devx-service to `bazel run`), we need to change up the values we set for buildtracker to trigger the build for due to it now checking out a different repo
## Test plan
Non-critical, will need to test live to see how a real system reacts to the values
This was causing panics in MSP. Unclear why this would ever be nil, but apparently it can be?
## Test plan
curl'd locally with payload from notification service log
Triggers a buildkite pipeline when a build.finished event is received from buildkite in order to collect metrics & information about the build (coming in a later PR).
The details we attach as part of the build can be iterated upon as needed. For the most part, we can get all the details of the build that ultimately is triggering this by using the buildkite cli to query the build by its ID from the `DEVX_TRIGGERED_FROM_BUILD_ID` env var
A drawback appears to be that it will always show the owner of the token as the author/creator of the build (although the committer is preserved in `BUILDKITE_BUILD_AUTHOR` env var of the jobs)
Depends on https://github.com/sourcegraph/managed-services/pull/1096
## Test plan
Tested with payloads fetched from the webhook log, and with a personal token (see builds here: https://buildkite.com/sourcegraph/devx-build-metrics/)
* build-tracker: deploy to MSP
* image test
* remove unused GITHUB_TOKEN
* fix image repo name
Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>
* determine if release then use Build Creator
* add test cases
* restore changes from main
* review comments and build fixes
* fix tests
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>
We have a number of docs links in the product that point to the old doc site.
Method:
- Search the repo for `docs.sourcegraph.com`
- Exclude the `doc/` dir, all test fixtures, and `CHANGELOG.md`
- For each, replace `docs.sourcegraph.com` with `sourcegraph.com/docs`
- Navigate to the resulting URL ensuring it's not a dead link, updating the URL if necessary
Many of the URLs updated are just comments, but since I'm doing a manual audit of each URL anyways, I felt it was worth it to update these while I was at it.
* more tests and additional logging
* Fix !finished bug causing all Jobs to be considered inprogress
* Remove inprogress since it is not a terminal state and we only consider
terminal states.
* Rework how status is determined.
* Add comments
* fix testcase
* Update dev/build-tracker/build/build.go
The build can be finished and have failing jobs but a late job can come
in and make the build "not be finished", which is a problem since we
don't get a notification for the late job finishing, which leads to the
job perpetually being in in-progress.
We do not count in-progress jobs anymore because this can lead to
invalid build states even though the build is finished
* log: remove use of description paramter in Scoped
* temporarily point to sglog branch
* bazel configure + gazelle
* remove additional use of description param
* use latest versions of zoekt,log,mountinfo
* go.mod
* add consts for build and job events + state
* safer var access
* define JobInProgress status
* rename func to be more descriptive
* only notify if a build is finished + test case
* fix test name for fixed notification
The previous approach to enable race detection was too radical and
accidently led to build our binaries with the race flage enabled, which
caused issues when building images down the line.
This happened because putting a `test --something` in bazelrc also sets
it on `build` which is absolutely not what we wanted. Usually folks get
this one working by having a `--stamp` config setting that fixes this
when releasing binaries, which we don't at this stage, as we're still
learning Bazel.
Luckily, this was caught swiftly. The current approach insteads takes a
more granular approach, which makes the `go_test` rule uses our own
variant, which injects the `race = "on"` attribute, but only on
`go_test`.
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
CI, being a main-dry-run, this will cover the container building jobs,
which were the ones failing.
---------
Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>
Reintroduces the same changes as
https://github.com/sourcegraph/sourcegraph/pull/51104 minus
syntax-highlighter which we're unable to compile with the right
toolchain at the moment.
Tested as a full main-dry-run, as well as running the stack with compose
and checking indexing and syntax-highlighting.
Executors are also built correctly.
## Test plan
CI + manual test via compose.
---------
Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
Adds https://github.com/mvdan/unparam as a nogo linter
Without `//nolint:unparam`
```
bb //dev/linters/...
INFO: Analyzed 136 targets (0 packages loaded, 0 targets configured).
INFO: Found 136 targets...
ERROR: /Users/william/code/sourcegraph/dev/linters/unparam/BUILD.bazel:3:11: GoCompilePkg dev/linters/unparam/unparam.a failed: (Exit 1): builder failed: error executing command (from target //dev/linters/unparam:unparam) bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/go_sdk/builder_reset/builder compilepkg -sdk external/go_sdk -installsuffix darwin_arm64 -src dev/linters/unparam/unparam.go -embedroot '' ... (remaining 30 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
compilepkg: nogo: errors found by nogo during build-time code analysis:
dev/linters/unparam/unparam.go:27:21: Test - b is unused (unparam)
INFO: Elapsed time: 0.374s, Critical Path: 0.17s
INFO: 3 processes: 2 internal, 1 darwin-sandbox.
FAILED: Build did NOT complete successfully
```
With `//nolint:unparam`
```
bb //dev/linters/...
INFO: Analyzed 136 targets (0 packages loaded, 0 targets configured).
INFO: Found 136 targets...
INFO: Elapsed time: 0.261s, Critical Path: 0.11s
INFO: 3 processes: 1 internal, 2 darwin-sandbox.
INFO: Build completed successfully, 3 total actions
```
## Test plan
* Checked that a small function in the linter is picked up when the
`//nolint:unparam` directive is removed
* Greem CI
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->