sourcegraph/cmd/executor
Will Dollman 44db6658b6
Hackathon: Build images end-to-end using Bazel (#60785)
* wip

* gitserver (mostly) wolfi 4 bazel

* the big heck of all things

* Add rules_apko lock translation rules to WORKSPACE

* Call apko_repositories() more

* fix rules_apko to handle our shorter repo urls

* fix workspace from rebase, and missing locks

* visibility on wolfi_base_image

* hand-fix a lock coz apko lock is 🅱️roken

* remove chainguard repo+keyring from base

* update locks

* add chainguard repo+keychain to single server manifest

* unrelated fixes, server+grafana still h*cked

* fix postgres-exporter

* the big fix

* aws lib got bumped?

* downgrade sso-oidc? idk

* ignore wolfi locks from prettier

* dynamically do the locks with a reporule

* document and make nice :nails:

* bazel run @rules_apko//apko patch

* Fix .typo.typo

* Update tooling for end-to-end Bazel images (#61106)

* Update sg wolfi image to build using Bazel

* bazel run @rules_apko//apko patch

* Fix .typo.typo

* Add update-images and implement apko YAML change monitoring

* Use bazel apko and add support for additional repos

* Refactor sg wolfi

* Rework wolfi base image auto-update pipeline

* sg bazel configure

* [rough] Add --check flag to sg wolfi lock

* Refactor sg wolfi lock --check

* Simplify check and update apko lock hash operations

* Fix resolveImagePath when running in bazel

* Fixup logic error in CheckApkoLockHashes

* Tweak DoBaseImageBuild output

* Remove debug output

* Fix sg wolfi lock --check behaviour for all images

* Replace base image build step with apko lock --check

* Remove debug line

* Minor fixups for CI step

* Wrap with AnnotatedCmd

* Fixup annotation

* Update apko lockfiles

* Allow additional repos to be passed

* Update build-base-image.sh with bazel + add back to pipeline

* Ensure that modified base images are rebuilt

* Solve bazelception

* Remove timestamp for bit-level reproducibility

* Skip local keygen when running on buildkite

* Add workaround for lack of local repo support in rules_apko

* Run apkoOps first as it's quick and might fail

* Remove blocking allBaseImagesBuilt step

* Remove unused promethus-gcp image

* Add special cases to resolveImagePath

* Cleanly handle case where no bazel build path exists

This could happen in cases where a base image is only used outside of sourcegraph/sourcegraph,
or if you've added a new base image config but haven't added the associated Bazel scaffolding

* Add debugging around failing docker builds

* More debugging

* Normalise apko_lockfile to match repo.bzl

* Fixup apko docker call

* Try passing imageconfigdir differently to docker

* Run ls in different container

* Soft-fail when using legacy build in Buildkite

* Add missing include

* Workaround for building sourcegraph and sourcegraph-dev

* Add postgresql-client package to server

This contains createdb, which was recently moved from postgresql

* Inflate postgres-12-codeinsights image to avoid rules_apko errors

* Remove update line from yaml files

* Fix issue caused by moving base sourcegraph image

* Remove apk-tools from server

* Update lockfiles

* Address review feedback

* Remove debug lines

* fix unbound var

---------

Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>

* go mod tidy + gazelle-update-repos after merging main

* Use aspect bazel cache

* Use Aspect bazel caching when calling bazel in bash and sg

* Append annotation

* Run apko lock on aspect agent

* Remove base image builds

Discussion in https://sourcegraph.slack.com/archives/C05EVRLQEUR/p1712307465660509

* Remove unused functionality

* Update BaseImageConfig comments

* Rewrite wolfi-images/README.md

* Add .apko/range.sh to .gitattributes

* Remove "wolfi" from :base_image and :base_tarball targets

* remove allowlist extras from debugging

* Tweak user instructions around package testing

* Add agent healthcheck to buildkite scripts

* prettier

---------

Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
2024-04-05 13:57:45 +01:00
..
docker-mirror bzl: do not fail if workdir exists already (#60708) 2024-03-15 12:57:43 +00:00
internal Simplify goroutine params (#61009) 2024-03-12 09:05:55 -07:00
kubernetes Move executor-kubernetes out of enterprise (#56449) 2023-09-08 16:24:05 +02:00
vm-image release: update PKR_VAR_name when building executor AMIs (#61293) 2024-03-20 16:27:25 +01:00
_binary.push.sh rfc795: new release process foundations (#60962) 2024-03-12 17:12:22 +01:00
BUILD.bazel Hackathon: Build images end-to-end using Bazel (#60785) 2024-04-05 13:57:45 +01:00
ci-should-rebuild.sh ci: fix incorrect usage of target determinator (#59171) 2023-12-21 15:50:29 +00:00
image_test.yaml Move executor to cmd/executor (#55700) 2023-08-10 02:06:12 +02:00
main.go Docs: update links to point to new site (#60381) 2024-02-13 00:23:47 +00:00
README.md Port executors building/pushing scripts to use Bazel (#58892) 2023-12-20 18:33:49 +00:00

Executor

The executor service polls the public frontend API for work to perform. The executor will pull a job from a particular queue (configured via the envvar EXECUTOR_QUEUE_NAME), then performs the job by running a sequence of docker and src-cli commands. This service is horizontally scalable.

Since executors and Sourcegraph are separate deployments, our agreement is to support 1 minor version divergence for now. See this example for more details:

Sourcegraph version Executor version Ok
3.43.0 3.43.*
3.43.3 3.43.*
3.43.0 3.44.*
3.43.0 3.42.*
3.43.0 3.41.* 🚫
3.43.0 3.45.* 🚫

See the executor queue for a complete list of queues.

Building and releasing

Building and releasing is handled automatically by the CI pipeline.

Binary

The executor binary is simply built with bazel build //cmd/executor:executor.

For publishing it, see bazel run //cmd/executor:binary.push:

  • In every scenario, the binary will be uploaded to gcs://sourcegraph-artifacts/executors/$GIT_COMMIT/.
  • If the current branch is main when this target is run, it will also be copied over to gcs://sourcegraph-artifacts/executors/latest.
  • If the env var EXECUTOR_IS_TAGGED_RELEASE is set to true, it will also be copied over to gcs://sourcegraph-artifacts/executors/$BUILDKITE_TAG.

VM image

The VM Image is built with packer, but it also uses an OCI image as a base for Firecracker, //docker-images/executor-vm:image_tarball which it depends on. That OCI image is a legacy image, see docker-images/executor-vm/README.md.

Because we're producing an AMI for both AWS and GCP, there are two steps involved:

  • bazel run //cmd/executor/vm-image:ami.build creates the AMI and names it according to the CI runtype.
  • bazel run //cmd/executor/vm-image:ami.push takes the AMIs from above and publish them (adjust perms, naming).

While gcloud is provided by Bazel, AWS cli is expected to be available on the host running Bazel.

Building AMIs on GCP is rather quick, but it's notoriously slow on AWS (about 20m) so we use target-determinator to detect when to rebuild the image. See ci-should-rebuild.sh, which is used by the pipeline generator to skip building it if we detect that nothing changed since the parent commit.

Docker Mirror

As with the VM image, we're producing an AMI for both AWS and GCP, there are two steps involved:

  • bazel run //cmd/executor/docker-mirror:ami.build creates the AMI and names it according to the CI runtype.
  • bazel run //cmd/executor/docker-mirror:ami.push takes the AMIs from above and publish them (adjust perms, naming).

While gcloud is provided by Bazel, AWS cli is expected to be available on the host running Bazel.