Commit Graph

91 Commits

Author SHA1 Message Date
Noah S-C
d4fa539b31
Revert "chore(ci): rework build-tracker to use redis instead of in-memory store of build results" (#64436)
Reverts sourcegraph/sourcegraph#64304

Number of redis related issues cropped up live

## Test plan

CI
2024-08-13 13:22:41 +02:00
Noah S-C
67f30a9d7a
chore(ci): rework build-tracker to use redis instead of in-memory store of build results (#64304)
Currently, build-tracker keeps track of consecutive build failures
through an in-memory store of failed builds. As this gets deployed more
frequently on MSP, we lose state more frequently which would result in
incorrect results. Instead, we can use redis as our external store as
well as for locking using redsync

## Test plan

Unit tests have been updated, but proper testing will require live
traffic

## Changelog

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
2024-08-13 10:32:09 +00:00
Noah S-C
b9c4e2aae9
Revert "Revert "refactor: upgrade to rules_oci 2.0 (2nd attempt)"" (#64354)
Reverts sourcegraph/sourcegraph#64351

## Test plan

Need to test on main due to main-only CI steps (even with main dry-run)
2024-08-08 09:00:08 +00:00
Noah S-C
addba96f47
Revert "refactor: upgrade to rules_oci 2.0 (2nd attempt)" (#64351)
Reverts sourcegraph/sourcegraph#63829

Not working with Aspect Delivery

## Test plan

CI
2024-08-07 22:15:21 +00:00
Greg Magolan
be015c58c2
refactor: upgrade to rules_oci 2.0 (2nd attempt) (#63829)
2nd attempt of #63111, a follow up
https://github.com/sourcegraph/sourcegraph/pull/63085

rules_oci 2.0 brings a lot of performance improvement around oci_image
and oci_pull, which will benefit Sourcegraph. It will also make RBE
faster and have less load on remote cache.

However, 2.0 makes some breaking changes like

- oci_tarball's default output is no longer a tarball
- oci_image no longer compresses layers that are uncompressed, somebody
has to make sure all `pkg_tar` targets have a `compression` attribute
set to compress it beforehand.
- there is no curl fallback, but this is fine for sourcegraph as it
already uses bazel 7.1.

I checked all targets that use oci_tarball as much as i could to make
sure nothing depends on the default tarball output of oci_tarball. there
was one target which used the default output which i put a TODO for
somebody else (somebody who is more on top of the repo) to tackle
**later**.

## Test plan

CI. Also run delivery on this PR (don't land those changes)

---------

Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
2024-08-07 22:21:49 +01:00
Bolaji Olajide
20b858f6c3
fix(build-tracker): Failed back-compat doesn't count towards branch-locking quota (#63911)
Closes
[DINF-51](https://linear.app/sourcegraph/issue/DINF-51/failed-back-compat-doesnt-count-towards-branch-locking-quota)

## Context

If a back-compat step on main fails, the build is marked as having
failed. However, we don't treat that as a failure in build-tracker,
resulting in no #buildkite-main post and not counting towards failed
build quota for locking main.

The reason why this was happening is that the Backcompat build wasn't
linked to the main Sourcegraph build in anyway. However, when a
backcompat fails the main build reflects the status of this failure, but
we do not use this field when determining the status of a build, so it
doesn't work for our use case.

![CleanShot 2024-07-18 at 15 04
15@2x](https://github.com/user-attachments/assets/9553330a-ad98-45cc-b4ce-03a22ca1b99d)

We [instead do a walkthrough of all the jobs associated with a build to
figure
out](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/dev/build-tracker/main.go?L349-372)
if the build has failed, fixed or is passing.

With this logic, it means we have to link the steps from child builds
that a particular build triggers to it's parent.

## Test plan

* Create a build that'll have backcompat failing
* The build tracker event associated with the main build will be
reported with a state of failed to buildkite.

![CleanShot 2024-07-18 at 15 10
45@2x](https://github.com/user-attachments/assets/1bf503ab-0020-47bf-9512-b3a9ee5d4e36)


## Changelog

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
2024-07-25 06:45:09 -05:00
James Cotter
ea9c45df8f
msp/runtime: split contract into JobContract and ServiceContract (#63494)
Splits the runtime contract into a JobContract and ServiceContract.
This lets better handle initialisation such as env vars which is
conditional depending on the contract type.
## Test plan

<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->
ci
2024-06-26 19:46:10 +00:00
William Bezuidenhout
1a7e1b9686
build-tracker: remove old links (#63065) 2024-06-04 12:03:58 +01:00
Will Dollman
d1b71a0a8a
bazel: Cleanup oci_deps.bzl (#62769)
* security: Update dind base image to patch multiple CVEs

Patches CVE-2023-45288 CVE-2024-2511 CVE-2024-32002 CVE-2024-32004 CVE-2024-32020 CVE-2024-32021 CVE-2024-32465

* ci: Tweak automated security update PR title

* Remove unused image hashes from oci_deps

* Tweak oci_deps comment

* Fixup old @wolfi_base references

* Add wolfi_base load

* use the correct base image

* Remove unneeded wolfi_base call
2024-05-28 10:00:31 +01:00
Joe Chen
2589fef13e
lib/background: upgrade Routine interface with context and errors (#62136)
This PR is a result/followup of the improvements we've made in the [SAMS repo](https://github.com/sourcegraph/sourcegraph-accounts/pull/199) that allows call sites to pass down a context (primarily to indicate deadline, and of course, cancellation if desired) and collects the error returned from `background.Routine`s `Stop` method.

Note that I did not adopt returning error from `Stop` method because I realize in monorepo, the more common (and arguably the desired) pattern is to hang on the call of `Start` method until `Stop` is called, so it is meaningless to collect errors from `Start` methods as return values anyway, and doing that would also complicate the design and semantics more than necessary.

All usages of the the `background.Routine` and `background.CombinedRoutines` are updated, I DID NOT try to interpret the code logic and make anything better other than fixing compile and test errors.

The only file that contains the core change is the [`lib/background/background.go`](https://github.com/sourcegraph/sourcegraph/pull/62136/files#diff-65c3228388620e91f8c22d91c18faac3f985fc67d64b08612df18fa7c04fafcd).
2024-05-24 10:04:55 -04:00
Michael Bahr
e85028b8bd
fix: update links for dev docs (#62758)
* fix: license checker info is in docs-legacy

* fix: update remaining dev links
2024-05-17 13:47:34 +02:00
Noah S-C
9b6ba7741e
bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
Noah S-C
d96745d78a
build-tracker: include error if failing to write to bigquery (#62699)
Without this, this error won't be logged to Sentry, resulting in us missing it unless we check GCP

## Test plan

Discussed with @jac
2024-05-15 17:32:53 +01:00
Noah S-C
e2814f5fdc
build-tracker: include timestamp in agent state change events (#62670)
🙃  would be useful to have..

## Test plan

Confirmed bigquery code handles time.Time natively by inspecting the code
2024-05-14 18:03:17 +00:00
Noah S-C
2c1fc163e6
build-tracker: fix handling of agent webhooks (#62632)
So the graphql API differs quite a bit 🙃 not using that as a reference anymore

## Test plan

updated unit test based on live data
2024-05-13 14:55:20 +00:00
Noah S-C
9a5eae8035
build-tracker: use repeated type for agent queues + deref strings (#62627)
TIL BigQuery has a native "repeated" type, so we dont have to comma separate this out : )

Not impacting any existing data as the current version isnt live yet (due to an msp misconfiguration) and the bigquery tables not being created yet)

## Test plan

CI and live
2024-05-13 12:00:55 +00:00
Noah S-C
0be15f8983
build-tracker: emit agent state-change webhook events to BigQuery (#62598)
Track when agents come on & offline in build-tracker

Closes https://github.com/sourcegraph/sourcegraph/issues/61275

## Test plan

Added unit test, the rest will be decided by the Prod Gods
2024-05-12 16:20:04 +02:00
Noah S-C
e2f279c7d9
build-tracker: fix convenience urls in env (#62340)
This isnt exactly convenient lol
```
DEVX_TRIGGERED_FROM_BRANCH_URL="/tree/main"
DEVX_TRIGGERED_FROM_BUILD_ID="018f343b-5cfc-420c-9353-02821de45533"
DEVX_TRIGGERED_FROM_BUILD_NUMBER="271872"
DEVX_TRIGGERED_FROM_COMMIT="6d7082d26ee772be9cd3fd2b463c0f33a95ee7dc"
DEVX_TRIGGERED_FROM_COMMIT_URL="/commit/6d7082d26ee772be9cd3fd2b463c0f33a95ee7dc"
DEVX_TRIGGERED_FROM_PIPELINE_SLUG="sourcegraph"
DEVX_TRIGGERED_FROM_PR_NUMBER="0"
DEVX_TRIGGERED_FROM_PR_URL="/pull/0"
```

## Test plan

Live :letsgo:
2024-05-01 14:26:34 +00:00
Noah S-C
cd79dc8c90
build-tracker: change trigger-build values for not checking out sg/sg (#62291)
As we're changing the devx metrics pipeline to checkout devx-service instead of sg/sg (as we've no need for sg/sg anymore, and do need to checkout devx-service to `bazel run`), we need to change up the values we set for buildtracker to trigger the build for due to it now checking out a different repo

## Test plan

Non-critical, will need to test live to see how a real system reacts to the values
2024-05-01 12:42:13 +01:00
Noah S-C
1656cdd17d
build-tracker: fix nil pointer in getting commit author (#61685)
This was causing panics in MSP. Unclear why this would ever be nil, but apparently it can be?

## Test plan

curl'd locally with payload from notification service log
2024-04-08 13:42:35 +00:00
Noah S-C
9b4107fc65
build-tracker: trigger build-metrics pipeline on build.finished (#61512)
Triggers a buildkite pipeline when a build.finished event is received from buildkite in order to collect metrics & information about the build (coming in a later PR).

The details we attach as part of the build can be iterated upon as needed. For the most part, we can get all the details of the build that ultimately is triggering this by using the buildkite cli to query the build by its ID from the `DEVX_TRIGGERED_FROM_BUILD_ID` env var

A drawback appears to be that it will always show the owner of the token as the author/creator of the build (although the committer is preserved in `BUILDKITE_BUILD_AUTHOR` env var of the jobs)

Depends on https://github.com/sourcegraph/managed-services/pull/1096

## Test plan

Tested with payloads fetched from the webhook log, and with a personal token (see builds here: https://buildkite.com/sourcegraph/devx-build-metrics/)
2024-04-05 18:50:56 +01:00
William Bezuidenhout
168561938b
build-tracker: uncomment that which was commented (#61553)
* build-tracker: uncomment that which was commented

* fixup
2024-04-03 16:09:47 +00:00
Noah S-C
01a7e66a8d
build-tracker: update healthz endpoint for MSP deployment expectations (#61554)
We can use `contract.RegisterDiagnosticsHandlers` later, requires more work due to gorilla/mux incompatibilities 

## Test plan

curl'd locally
2024-04-03 15:53:39 +00:00
William Bezuidenhout
a350a723e1
ci: use build creator when the build is a release build (#61545)
* build-tracker: deploy to MSP

* image test

* remove unused GITHUB_TOKEN

* fix image repo name

Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>

* determine if release then use Build Creator

* add test cases

* restore changes from main

* review comments and build fixes

* fix tests

---------

Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>
2024-04-03 16:52:41 +02:00
Noah S-C
d7e4bc57db
build-tracker: fix nil pointer in old build purger & enabling auto-rollout with MSP (#61549)
James said I can enable this now :clueless:

## Test plan

Ran locally, no more panic
2024-04-03 15:19:48 +01:00
Noah S-C
2a26976193
build-tracker: fix BUILDTRACKER_DEBUG_PASSWORD env var key (#61542)
:sadge: 

## Test plan

to be tested again in CloudRun testing environment
2024-04-03 13:07:36 +01:00
Noah S-C
c3ec21e436
build-tracker: deploy to MSP (#61510)
Bring it into [year 2024] 🎉 will be building on this as part of #60455 OKR, so having a more convenient deploy method would be cool

MSP PR: https://github.com/sourcegraph/managed-services/pull/1011

## Test plan

testing this live 👁️
2024-04-03 11:49:57 +01:00
Petri-Johan Last
0b5e7fd490
Replace all traditional for-loops (#60988) 2024-03-11 16:05:47 +02:00
Camden Cheek
1ead945267
Docs: update links to point to new site (#60381)
We have a number of docs links in the product that point to the old doc site. 

Method:
- Search the repo for `docs.sourcegraph.com`
- Exclude the `doc/` dir, all test fixtures, and `CHANGELOG.md`
- For each, replace `docs.sourcegraph.com` with `sourcegraph.com/docs`
- Navigate to the resulting URL ensuring it's not a dead link, updating the URL if necessary

Many of the URLs updated are just comments, but since I'm doing a manual audit of each URL anyways, I felt it was worth it to update these while I was at it.
2024-02-13 00:23:47 +00:00
William Bezuidenhout
ad3530166e
build-tracker: only account for terminal job states (#58834)
* more tests and additional logging

* Fix !finished bug causing all Jobs to be considered inprogress

* Remove inprogress since it is not a terminal state and we only consider
terminal states.
* Rework how status is determined.
* Add comments

* fix testcase

* Update dev/build-tracker/build/build.go
2023-12-08 15:40:47 +02:00
William Bezuidenhout
b4af039716
buildtracker: ignore inprogress jobs when determining final status (#58778)
The build can be finished and have failing jobs but a late job can come
in and make the build "not be finished", which is a problem since we
don't get a notification for the late job finishing, which leads to the
job perpetually being in in-progress.

We do not count in-progress jobs anymore because this can lead to
invalid build states even though the build is finished
2023-12-05 19:09:51 +02:00
William Bezuidenhout
56ee45ed4c
buildtracker: fix readme (#58706)
fix readme
2023-12-01 14:06:59 +02:00
William Bezuidenhout
3d7d0ff7b3
buildtracker: replace "Is this a flake" with "View test analytics" (#58297)
replace "Is this a flake" with "View test analytics"
2023-11-16 10:01:38 +01:00
William Bezuidenhout
1ae6cc6bfd
logger: update log lib and remove use of description (#57690)
* log: remove use of description paramter in Scoped

* temporarily point to sglog branch

* bazel configure + gazelle

* remove additional use of description param

* use latest versions of zoekt,log,mountinfo

* go.mod
2023-10-18 17:29:08 +02:00
Petri-Johan Last
10dca65499
[chore] Use consistent go-github versioning (#57391) 2023-10-06 10:48:18 +02:00
William Bezuidenhout
a4c75fe589
build-tracker: only send notifications when a build is finished (#56688)
* add consts for build and job events + state

* safer var access

* define JobInProgress status

* rename func to be more descriptive

* only notify if a build is finished + test case

* fix test name for fixed notification
2023-09-18 10:54:00 +02:00
Kota
2d676db1a8
fix: use of ioutil package (#53041) 2023-06-07 09:37:26 +00:00
Jean-Hadrien Chabran
3d36d34b3d
ci: re-enable race detection (#52776)
The previous approach to enable race detection was too radical and
accidently led to build our binaries with the race flage enabled, which
caused issues when building images down the line.

This happened because putting a `test --something` in bazelrc also sets
it on `build` which is absolutely not what we wanted. Usually folks get
this one working by having a `--stamp` config setting that fixes this
when releasing binaries, which we don't at this stage, as we're still
learning Bazel.

Luckily, this was caught swiftly. The current approach insteads takes a
more granular approach, which makes the `go_test` rule uses our own
variant, which injects the `race = "on"` attribute, but only on
`go_test`.


## Test plan

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

CI, being a main-dry-run, this will cover the container building jobs,
which were the ones failing.

---------

Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>
2023-06-05 20:41:47 +02:00
Dave Try
321e0e9d01
ci: enable bazel builds for docker images (#51241)
Reintroduces the same changes as
https://github.com/sourcegraph/sourcegraph/pull/51104 minus
syntax-highlighter which we're unable to compile with the right
toolchain at the moment.

Tested as a full main-dry-run, as well as running the stack with compose
and checking indexing and syntax-highlighting.

Executors are also built correctly. 


## Test plan

CI + manual test via compose.

---------

Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
2023-04-28 10:41:13 +02:00
Dave Try
c5d638bfda
ci: revert bazel builds (#51190)
revert bazel changes due to errors with syntax-highlighter

## Test plan

CI
2023-04-26 23:19:36 +00:00
Dave Try
5b198be1b4
bazel: build all binaries with bazel for inclusion in docker images (#51104)
Build docker images with bazel compiled binaries

---------

Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
2023-04-26 14:18:05 -05:00
William Bezuidenhout
7938684f5d
bazel: add unparam nogo linter (#50730)
Adds https://github.com/mvdan/unparam as a nogo linter

Without `//nolint:unparam`
```
 bb //dev/linters/...
INFO: Analyzed 136 targets (0 packages loaded, 0 targets configured).
INFO: Found 136 targets...
ERROR: /Users/william/code/sourcegraph/dev/linters/unparam/BUILD.bazel:3:11: GoCompilePkg dev/linters/unparam/unparam.a failed: (Exit 1): builder failed: error executing command (from target //dev/linters/unparam:unparam) bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/go_sdk/builder_reset/builder compilepkg -sdk external/go_sdk -installsuffix darwin_arm64 -src dev/linters/unparam/unparam.go -embedroot '' ... (remaining 30 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
compilepkg: nogo: errors found by nogo during build-time code analysis:
dev/linters/unparam/unparam.go:27:21: Test - b is unused (unparam)
INFO: Elapsed time: 0.374s, Critical Path: 0.17s
INFO: 3 processes: 2 internal, 1 darwin-sandbox.
FAILED: Build did NOT complete successfully
```
With `//nolint:unparam`
```
bb //dev/linters/...
INFO: Analyzed 136 targets (0 packages loaded, 0 targets configured).
INFO: Found 136 targets...
INFO: Elapsed time: 0.261s, Critical Path: 0.11s
INFO: 3 processes: 1 internal, 2 darwin-sandbox.
INFO: Build completed successfully, 3 total actions
```
## Test plan
* Checked that a small function in the linter is picked up when the
`//nolint:unparam` directive is removed
* Greem CI
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-04-18 10:03:35 +00:00
William Bezuidenhout
31e9d31220
bazel: add depguard as a nogo linter (#50585)
Add depguard as a nogo linter

## Test plan
* tested locally and made some fixes on the code it found
* green ci
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-04-13 14:19:45 +02:00
Vincent
741c453ef1
Update to go 1.19.8 (#50341)
This will update our Go dependency to a newer version. Resolving
CVE-2023-24532.

## Test plan

Ran `sg ci build main-dry-run`, everything passed:

- https://buildkite.com/sourcegraph/sourcegraph/builds/212016

- [x] ci tests
- [x] review

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-04-05 17:39:02 +02:00
Vincent
9a2904203c
dep: resolve CVE-2023-0464 in base image 2/2 (#50261)
This PR updates the base images for our docker files to a version of
Alpine without vulnerabilities.

## Test plan
Pipelines from https://github.com/sourcegraph/sourcegraph/pull/50248
indicate that there are no vulnerabilities in the base image.

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-04-02 18:24:34 +02:00
Vincent
ee981a6c2c
dep: use new docker base (#49706)
Use the new docker image as the base image for our images. This is uses
the newly released `curl` version.

## Test plan
- [x] ci tests

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-03-20 18:15:21 +01:00
Dave Try
2b8fa079f0
bazel: fix buf files (#49444)
fix protoc-gen-go version
2023-03-15 20:21:38 +00:00
Dave Try
293385d5dd
bazel: update timeouts to suppress warnings (#49399)
Updates all of the BUILD fields with timeouts to suppress warnings and
reduce log spam.


## Test plan

Green CI
2023-03-15 15:04:16 +02:00
William Bezuidenhout
ad664f733e
build-tracker: fix go to build link (#49312)
Go to build link was pointing to the API url ...
## Test plan
Did a curl request to the buildkite api and inspected the json - also
looked at the docs.
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
2023-03-14 14:11:13 +00:00
Valery Bugakov
d71d8e974d
bazel: bazel configure (#49278)
Run `bazel configure` and ignore `cody` client workspace.
2023-03-14 03:27:11 -07:00