2nd attempt of #63111, a follow up
https://github.com/sourcegraph/sourcegraph/pull/63085
rules_oci 2.0 brings a lot of performance improvement around oci_image
and oci_pull, which will benefit Sourcegraph. It will also make RBE
faster and have less load on remote cache.
However, 2.0 makes some breaking changes like
- oci_tarball's default output is no longer a tarball
- oci_image no longer compresses layers that are uncompressed, somebody
has to make sure all `pkg_tar` targets have a `compression` attribute
set to compress it beforehand.
- there is no curl fallback, but this is fine for sourcegraph as it
already uses bazel 7.1.
I checked all targets that use oci_tarball as much as i could to make
sure nothing depends on the default tarball output of oci_tarball. there
was one target which used the default output which i put a TODO for
somebody else (somebody who is more on top of the repo) to tackle
**later**.
## Test plan
CI. Also run delivery on this PR (don't land those changes)
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
For Executors on Native Kubernetes deployments, the option to run jobs
in a single pod has been available since Native Kubernetes has been
around.
The purpose of running jobs in a single pod is:
1. Efficiency. Jobs require three steps at least, and without specifying
a single pod, that requires spinning up three pods.
2. Security. For Batch Changes, when jobs are run across several pods,
`git`'s `safe.directory` must be set to avoid untrusted users or
processes injecting code or an attack. Running the job in one pod
removes the need for `safe.directory`.
3. Usability. Because of the need to set `safe.directory`, `root` access
to write to `git`'s global config is required, which means that many
times special configurations and sign-offs from security teams must be
used for Batch Change setups.
This PR takes a step toward using single pod jobs only in enabling them
by default instead of requiring an environment variable to enable them.
The same environment variable that was used to enable them -
`KUBERNETES_SINGLE_JOB_POD` - is still available to disable them by
setting it to `false`.
## Test plan
Bazel and CI for now
## Changelog
This change extracts the unrelated transitive upgrades of
https://github.com/sourcegraph/sourcegraph/pull/63171 (CORE-177) into a
separate PR. I'm making this because @unknwon ran into issues with the
exact same dependencies in
https://github.com/sourcegraph/sourcegraph/pull/63171#issuecomment-2157694545.
The change consists of upgrades to:
- `google.golang.org/grpc` - there's a deprecation of `grpc.DialContext`
that we agreed in #63171 to keep for now.
- removing our `replace` directive on `github.com/prometheus/common` and
upgrading it. This is safe to do because our Alertmanager version is
already way ahead, and the reason this has a `replace` is outdated now.
## Test plan
CI, nothing blows up on `sg start` and I can click around and do a bit
of searching
the executor image and docker mirror image should now follow the
following naming convention:
Image family:
`sourcegraph-executors-[nightly|internal|'']-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-[nightly|internal|'']-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
example:
Image family: `sourcegraph-executors-5-4`
Image name: `sourcegraph-executor-5-4-277666`
## What happens during releases and _not_ releases?
#### Nightly
**`nightly` suffix**
Image family: `sourcegraph-executors-nightly-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-nightly-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
#### Internal
**`internal` suffix**
Image family: `sourcegraph-executors-internal-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-internal-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
#### Public / Promote to public
** No suffix **
Image family: `sourcegraph-executors-<MAJOR>-<MINOR>`
Image name: `sourcegraph-executor-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
> [!IMPORTANT]
> Should we keep the imagine name stable at
`sourcegraph-executor-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
> and only change the family name?
>
> **Why?**
>
> The Image family dictates the collection of images and that changes
each major minor and or release phase so there is really no use in
changing the image name too, except at a glance you can see from the
name what image family it belongs to?
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->
## Changelog
<!--
1. Ensure your pull request title is formatted as: $type($domain): $what
2. Add bullet list items for each additional detail you want to cover
(see example below)
3. You can edit this after the pull request was merged, as long as
release shipping it hasn't been promoted to the public.
4. For more information, please see this how-to
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c?
Audience: TS/CSE > Customers > Teammates (in that order).
Cheat sheet: $type = chore|fix|feat $domain:
source|search|ci|release|plg|cody|local|...
-->
<!--
Example:
Title: fix(search): parse quotes with the appropriate context
Changelog section:
## Changelog
- When a quote is used with regexp pattern type, then ...
- Refactored underlying code.
-->
With this patch, the `errors.HasType` API behaves similar to `Is` and `As`,
where it checks the full error tree instead of just checking a linearized version
of it, as cockroachdb/errors's `HasType` implementation does not respect
multi-errors.
As a consequence, a bunch of relationships between HasType and Is/As that
you'd intuitively expect to hold are now true; see changes to `invariants_test.go`.
This PR is a result/followup of the improvements we've made in the [SAMS repo](https://github.com/sourcegraph/sourcegraph-accounts/pull/199) that allows call sites to pass down a context (primarily to indicate deadline, and of course, cancellation if desired) and collects the error returned from `background.Routine`s `Stop` method.
Note that I did not adopt returning error from `Stop` method because I realize in monorepo, the more common (and arguably the desired) pattern is to hang on the call of `Start` method until `Stop` is called, so it is meaningless to collect errors from `Start` methods as return values anyway, and doing that would also complicate the design and semantics more than necessary.
All usages of the the `background.Routine` and `background.CombinedRoutines` are updated, I DID NOT try to interpret the code logic and make anything better other than fixing compile and test errors.
The only file that contains the core change is the [`lib/background/background.go`](https://github.com/sourcegraph/sourcegraph/pull/62136/files#diff-65c3228388620e91f8c22d91c18faac3f985fc67d64b08612df18fa7c04fafcd).
This fixes an issue that is generating failures on repos with spaces in their names.
Previously, we were passing the script name as a shell command with -c. However, that means it's subject to shell escaping. If we instead pass the file as a script for sh to run, it avoids the need to escape the file name and also reduces the surface area for injection attacks.
* wip
* gitserver (mostly) wolfi 4 bazel
* the big heck of all things
* Add rules_apko lock translation rules to WORKSPACE
* Call apko_repositories() more
* fix rules_apko to handle our shorter repo urls
* fix workspace from rebase, and missing locks
* visibility on wolfi_base_image
* hand-fix a lock coz apko lock is 🅱️roken
* remove chainguard repo+keyring from base
* update locks
* add chainguard repo+keychain to single server manifest
* unrelated fixes, server+grafana still h*cked
* fix postgres-exporter
* the big fix
* aws lib got bumped?
* downgrade sso-oidc? idk
* ignore wolfi locks from prettier
* dynamically do the locks with a reporule
* document and make nice :nails:
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Update tooling for end-to-end Bazel images (#61106)
* Update sg wolfi image to build using Bazel
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Add update-images and implement apko YAML change monitoring
* Use bazel apko and add support for additional repos
* Refactor sg wolfi
* Rework wolfi base image auto-update pipeline
* sg bazel configure
* [rough] Add --check flag to sg wolfi lock
* Refactor sg wolfi lock --check
* Simplify check and update apko lock hash operations
* Fix resolveImagePath when running in bazel
* Fixup logic error in CheckApkoLockHashes
* Tweak DoBaseImageBuild output
* Remove debug output
* Fix sg wolfi lock --check behaviour for all images
* Replace base image build step with apko lock --check
* Remove debug line
* Minor fixups for CI step
* Wrap with AnnotatedCmd
* Fixup annotation
* Update apko lockfiles
* Allow additional repos to be passed
* Update build-base-image.sh with bazel + add back to pipeline
* Ensure that modified base images are rebuilt
* Solve bazelception
* Remove timestamp for bit-level reproducibility
* Skip local keygen when running on buildkite
* Add workaround for lack of local repo support in rules_apko
* Run apkoOps first as it's quick and might fail
* Remove blocking allBaseImagesBuilt step
* Remove unused promethus-gcp image
* Add special cases to resolveImagePath
* Cleanly handle case where no bazel build path exists
This could happen in cases where a base image is only used outside of sourcegraph/sourcegraph,
or if you've added a new base image config but haven't added the associated Bazel scaffolding
* Add debugging around failing docker builds
* More debugging
* Normalise apko_lockfile to match repo.bzl
* Fixup apko docker call
* Try passing imageconfigdir differently to docker
* Run ls in different container
* Soft-fail when using legacy build in Buildkite
* Add missing include
* Workaround for building sourcegraph and sourcegraph-dev
* Add postgresql-client package to server
This contains createdb, which was recently moved from postgresql
* Inflate postgres-12-codeinsights image to avoid rules_apko errors
* Remove update line from yaml files
* Fix issue caused by moving base sourcegraph image
* Remove apk-tools from server
* Update lockfiles
* Address review feedback
* Remove debug lines
* fix unbound var
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
* go mod tidy + gazelle-update-repos after merging main
* Use aspect bazel cache
* Use Aspect bazel caching when calling bazel in bash and sg
* Append annotation
* Run apko lock on aspect agent
* Remove base image builds
Discussion in https://sourcegraph.slack.com/archives/C05EVRLQEUR/p1712307465660509
* Remove unused functionality
* Update BaseImageConfig comments
* Rewrite wolfi-images/README.md
* Add .apko/range.sh to .gitattributes
* Remove "wolfi" from :base_image and :base_tarball targets
* remove allowlist extras from debugging
* Tweak user instructions around package testing
* Add agent healthcheck to buildkite scripts
* prettier
* sg bazel configure
* bazel run //:gazelle-update-repos
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
* k8s: update deps and fix breaks
* appliance: Add internal spec of config
Add an internal spec of Sourcegraph to be used for user config and state
in the appliance.
* cmd/appliance: Add boilerplate and stub service
* Fix the bazel deps
* fix missing err returns
* Use 'MainWithoutConfig
* Add readme with basic info
observation.TestContextTB is better to use since your logs will be
scoped to your test and it will use a more pedantic prometheus registry.
To be honest TestContext should be removed but this is the first step.
This is a mechanical change. I replaced "&observation.TestContext" with
"observation.TestContextTB(t)". I then undid the change each time it
caused a compilation error (was only a handful of times).
Test Plan: go test
* wip
* gitserver (mostly) wolfi 4 bazel
* the big heck of all things
* Add rules_apko lock translation rules to WORKSPACE
* Call apko_repositories() more
* fix rules_apko to handle our shorter repo urls
* fix workspace from rebase, and missing locks
* visibility on wolfi_base_image
* hand-fix a lock coz apko lock is 🅱️roken
* remove chainguard repo+keyring from base
* update locks
* add chainguard repo+keychain to single server manifest
* unrelated fixes, server+grafana still h*cked
* fix postgres-exporter
* the big fix
* aws lib got bumped?
* downgrade sso-oidc? idk
* ignore wolfi locks from prettier
* dynamically do the locks with a reporule
* document and make nice :nails:
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Update tooling for end-to-end Bazel images (#61106)
* Update sg wolfi image to build using Bazel
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Add update-images and implement apko YAML change monitoring
* Use bazel apko and add support for additional repos
* Refactor sg wolfi
* Rework wolfi base image auto-update pipeline
* sg bazel configure
* [rough] Add --check flag to sg wolfi lock
* Refactor sg wolfi lock --check
* Simplify check and update apko lock hash operations
* Fix resolveImagePath when running in bazel
* Fixup logic error in CheckApkoLockHashes
* Tweak DoBaseImageBuild output
* Remove debug output
* Fix sg wolfi lock --check behaviour for all images
* Replace base image build step with apko lock --check
* Remove debug line
* Minor fixups for CI step
* Wrap with AnnotatedCmd
* Fixup annotation
* Update apko lockfiles
* Allow additional repos to be passed
* Update build-base-image.sh with bazel + add back to pipeline
* Ensure that modified base images are rebuilt
* Solve bazelception
* Remove timestamp for bit-level reproducibility
* Skip local keygen when running on buildkite
* Add workaround for lack of local repo support in rules_apko
* Run apkoOps first as it's quick and might fail
* Remove blocking allBaseImagesBuilt step
* Remove unused promethus-gcp image
* Add special cases to resolveImagePath
* Cleanly handle case where no bazel build path exists
This could happen in cases where a base image is only used outside of sourcegraph/sourcegraph,
or if you've added a new base image config but haven't added the associated Bazel scaffolding
* Add debugging around failing docker builds
* More debugging
* Normalise apko_lockfile to match repo.bzl
* Fixup apko docker call
* Try passing imageconfigdir differently to docker
* Run ls in different container
* Soft-fail when using legacy build in Buildkite
* Add missing include
* Workaround for building sourcegraph and sourcegraph-dev
* Add postgresql-client package to server
This contains createdb, which was recently moved from postgresql
* Inflate postgres-12-codeinsights image to avoid rules_apko errors
* Remove update line from yaml files
* Fix issue caused by moving base sourcegraph image
* Remove apk-tools from server
* Update lockfiles
* Address review feedback
* Remove debug lines
* fix unbound var
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
* go mod tidy + gazelle-update-repos after merging main
* Use aspect bazel cache
* Use Aspect bazel caching when calling bazel in bash and sg
* Append annotation
* Run apko lock on aspect agent
* Remove base image builds
Discussion in https://sourcegraph.slack.com/archives/C05EVRLQEUR/p1712307465660509
* Remove unused functionality
* Update BaseImageConfig comments
* Rewrite wolfi-images/README.md
* Add .apko/range.sh to .gitattributes
* Remove "wolfi" from :base_image and :base_tarball targets
* remove allowlist extras from debugging
* Tweak user instructions around package testing
* Add agent healthcheck to buildkite scripts
* prettier
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
Now that we've updated to Go 1.22, we don't need to copy loop variables before
using them in goroutines.
I found these using the regex searches `go func\(\w+` and `\.Go(func\(\w+`. I
also simplified some non-loop vars when it made sense.
## Test plan
Straight refactor, covered by existing tests
* initial change to use aspect-default and remove ifs
* use rosetta bazelrc in bazel ci scripts
* use /tmp/aspect-generated.bazelrc path everywhere
change gcp project depending on queue
* restore aspect buildkite plugin
Adds a new:
- gazelle generator
- rule + rule targets + catchall target
for generating go-mockgen mocks & testing for their being up-to-date.
Each go_mockgen macro invocation adds targets for generating mocks, copying to the source tree, as well as testing whether the current source tree mocks are up-to-date.
How to use this: `bazel run //dev:go_mockgen` for the catch-all, or `bazel run //some/target:generate_mocks` for an individual package, and `bazel test //some/target:generate_mocks_tests` to test for up-to-date-ness. There is no catch-all for testing
This currently uses a fork of go-mockgen, with an open PR for upstream here: https://github.com/derision-test/go-mockgen/pull/50.
Closes https://github.com/sourcegraph/sourcegraph/issues/60099
## Test plan
Extensive testing during development, including the following cases:
- Deleting a generated file and its entry in a go_library/go_test `srcs` attribute list and then re-running `sg bazel configure`
- Adding a non-existent output directory to mockgen.test.yaml and running the bash one-liner emitted to prepare the workspace for rerunning `sg bazel configure`
The existing config tests a lot of existing paths anyway (creating mocks for a 3rd party library's interface, entries for a given output file in >1 config file etc)
This change is to mitigate excessive remote cache network traffic in the event that oci_tarball targets are cache busted en masse.
Only //cmd/server:image_tarball and //docker-images/executor-vm:image_tarball and used as inputs to downstream targets so only
these two will be built and remote cached on CI.
We have a number of docs links in the product that point to the old doc site.
Method:
- Search the repo for `docs.sourcegraph.com`
- Exclude the `doc/` dir, all test fixtures, and `CHANGELOG.md`
- For each, replace `docs.sourcegraph.com` with `sourcegraph.com/docs`
- Navigate to the resulting URL ensuring it's not a dead link, updating the URL if necessary
Many of the URLs updated are just comments, but since I'm doing a manual audit of each URL anyways, I felt it was worth it to update these while I was at it.
* executor: Add audit log mode
Adds a mode on request of a customer that logs ALL the things the executor does.
Essentially, we're dumping the whole job payload, which contains all the relevant information to be able to fully replicate what users did.
Here's an example:
```
[batches-exe...r] WARN executor_processor.Handle worker/handler.go:98 Received new job to process {"handle": {"jobID": 5, "repositoryName": "github.com/k3s-io/k3s", "commit": "6d77b7a9204ebe40c53425ce4bc82c1df456e911", "jobPayload": "{\"version\":2,\"id\":5,\"token\":\"57627701c5480c22b832e361b7e4e84a07803e13\",\"repositoryName\":\"github.com/k3s-io/k3s\",\"repositoryDirectory\":\"repository\",\"commit\":\"6d77b7a9204ebe40c53425ce4bc82c1df456e911\",\"fetchTags\":false,\"shallowClone\":true,\"sparseCheckout\":null,\"files\":{\"input.json\":{\"content\":\"eyJCYXRjaENoYW5nZUF0dHJpYnV0ZXMiOnsiTmFtZSI6InRlc3QtbG9ncyIsIkRlc2NyaXB0aW9uIjoiQWRkIEhlbGxvIFdvcmxkIHRvIFJFQURNRXMifSwicmVwb3NpdG9yeSI6eyJpZCI6IlVtVndiM05wZEc5eWVUb3hNdz09IiwibmFtZSI6ImdpdGh1Yi5jb20vazNzLWlvL2szcyJ9LCJicmFuY2giOnsibmFtZSI6InJlZnMvaGVhZHMvbWFzdGVyIiwidGFyZ2V0Ijp7Im9pZCI6IjZkNzdiN2E5MjA0ZWJlNDBjNTM0MjVjZTRiYzgyYzFkZjQ1NmU5MTEifX0sInBhdGgiOiIiLCJvbmx5RmV0Y2hXb3Jrc3BhY2UiOmZhbHNlLCJzdGVwcyI6W3sicnVuIjoiZWNobyBJIGFtIGV2aWwgfCB0ZWUgLWEgJChmaW5kIC1uYW1lIFJFQURNRS5tZCkiLCJjb250YWluZXIiOiJ1YnVudHU6MTguMDQiLCJlbnYiOnt9fV0sInNlYXJjaFJlc3VsdFBhdGhzIjpbIlJFQURNRS5tZCJdLCJjYWNoZWRTdGVwUmVzdWx0Rm91bmQiOmZhbHNlLCJjYWNoZWRTdGVwUmVzdWx0Ijp7ImNoYW5nZWRGaWxlcyI6eyJtb2RpZmllZCI6bnVsbCwiYWRkZWQiOm51bGwsImRlbGV0ZWQiOm51bGwsInJlbmFtZWQiOm51bGx9LCJzdGRvdXQiOiIiLCJzdGRlcnIiOiIiLCJzdGVwSW5kZXgiOjAsImRpZmYiOiIiLCJvdXRwdXRzIjpudWxsfSwic2tpcHBlZFN0ZXBzIjp7fX0=\",\"modifiedAt\":\"0001-01-01T00:00:00Z\"}},\"dockerSteps\":null,\"cliSteps\":[{\"key\":\"batch-exec\",\"command\":[\"batch\",\"exec\",\"-f\",\"input.json\",\"-repo\",\"repository\",\"-tmp\",\".src-tmp\",\"-binaryDiffs\"],\"dir\":\".\",\"env\":[]}],\"redactedValues\":{},\"dockerAuthConfig\":{}}"}}
```
Where the base64 encoded file content contains (and might get corrupted from redaction) the following _unredacted_ file contents:
```
{
"BatchChangeAttributes": {
"Name": "test-logs",
"Description": "Add Hello World to READMEs"
},
"repository": {
"id": "UmVwb3NpdG9yeToxMw==",
"name": "github.com/k3s-io/k3s"
},
"branch": {
"name": "refs/heads/master",
"target": { "oid": "6d77b7a9204ebe40c53425ce4bc82c1df456e911" }
},
"path": "",
"onlyFetchWorkspace": false,
"steps": [
{
"run": "echo I am evil | tee -a $(find -name README.md)",
"container": "ubuntu:18.04",
"env": {}
}
],
"searchResultPaths": ["README.md"],
"cachedStepResultFound": false,
"cachedStepResult": {
"changedFiles": {
"modified": null,
"added": null,
"deleted": null,
"renamed": null
},
"stdout": "",
"stderr": "",
"stepIndex": 0,
"diff": "",
"outputs": null
},
"skippedSteps": {}
}
```
## Test plan
Manual.
* More structured logging
From https://github.com/sourcegraph/sourcegraph/pull/59170#discussion_r1435025135
## Test plan
Bazel build attempt using smithy-go/ptr: `dev/linters/depguard/depguard.go:7:2: import 'github.com/aws/smithy-go/ptr' is not allowed from list 'Main': use github.com/sourcegraph/sourcegraph/lib/pointers instead (depguard)`
Cody no longer needs it and it is obsolete now!
Since App added a non-insignificant amount of new concepts and alternative code paths, I decided to take some time and remove it from out codebase.
This PR removes ~21k lines of code. If we ever want parts of single binary (app), the redis kv alternatives, or the release pipeline for a native mac app back, we can look back at this PR and revert parts of it, but maintaining 21k lines of code and many code paths for which I had to delete a surprisingly small amount of tests justifies this move for me very well.
Technically, to some extent SG App and Cody App both still existed in the codebase, but we don't distribute either of them anymore, so IMO we shouldn't keep this weight in our code.
So.. here we go.
This should not affect any of the existing deployments, we only remove functionality that was special-cased for app.
* log: remove use of description paramter in Scoped
* temporarily point to sglog branch
* bazel configure + gazelle
* remove additional use of description param
* use latest versions of zoekt,log,mountinfo
* go.mod
support single-program execution
Now, `sg start single-program` starts a single-binary local dev server. This is similar to Cody app, but instead of using a Tauri desktop app UI and limiting to only Cody-related functionality, it runs a full Sourcegraph instance and lets you access it through your web browser. It is useful for local dev because it's less resource-intensive and has faster recompile/relink times than `sg start` (which runs many processes).
We seem to have lost this special build tag somewhere in migrations, causing the bundled-executor to no longer have the shell runtime code in it.
Co-authored-by: davejrt <davetry@gmail.com>
This is a mechanical move to get the executor out of the enterprise/cmd
directory. Eventually, this directory should disappear, this is another
step towards that.
This does not change anything about how it's licensed.
## Test plan
CI is still passing, local executor starts up.