doc/dev: migrate continuous_integration.md to ci/index.md (#31905)

This commit is contained in:
Robert Lin 2022-02-28 08:16:20 -08:00 committed by GitHub
parent 8488f4b1eb
commit 447a61f902
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
19 changed files with 352 additions and 355 deletions

View File

@ -41,7 +41,7 @@ jobs:
team-reviewers: 'sourcegraph/dev-experience'
body: |
This is an automated pull request generated by [this run](https://github.com/sourcegraph/sourcegraph/actions/runs/${{ github.run_id }}).
Learn more about our GitHub Actions for managing licenses [here](https://docs.sourcegraph.com/dev/background-information/continuous_integration#third-party-licenses).
Learn more about our GitHub Actions for managing licenses [here](https://docs.sourcegraph.com/dev/background-information/ci#third-party-licenses).
You're safe to merge this pull request when the required checks are passing.

View File

@ -1,6 +1,6 @@
# Integration tests
This directory is home to the integration tests that run in [Sourcegraph's Buildkite pipelines](https://docs.sourcegraph.com/dev/background-information/continuous_integration#buildkite-pipelines).
This directory is home to the integration tests that run in [Sourcegraph's Buildkite pipelines](https://docs.sourcegraph.com/dev/background-information/ci#buildkite-pipelines).
## Test structure

View File

@ -2,4 +2,4 @@
The folder contains the scripts that our CI pipeline uses to run vulnerability scans with [Trivy](https://aquasecurity.github.io/trivy/).
See https://docs.sourcegraph.com/dev/background-information/continuous_integration for more information.
See https://docs.sourcegraph.com/dev/background-information/ci for more information.

View File

@ -3,7 +3,7 @@
# This script either generates a report of third-party dependencies, or runs a check that fails
# if there are any unapproved dependencies ('action items').
#
# Please refer to the handbook entry for more details: https://docs.sourcegraph.com/dev/background-information/continuous_integration#third-party-licenses
# Please refer to the handbook entry for more details: https://docs.sourcegraph.com/dev/background-information/ci#third-party-licenses
set -euf -o pipefail
@ -38,5 +38,5 @@ license_finder ignored_dependencies list
license_finder dependencies list
# run license check
echo "Running license_finder - if this fails, refer to our handbook: https://docs.sourcegraph.com/dev/background-information/continuous_integration#third-party-licenses"
echo "Running license_finder - if this fails, refer to our handbook: https://docs.sourcegraph.com/dev/background-information/ci#third-party-licenses"
license_finder ${COMMAND} --columns=package_manager name version licenses homepage approved

View File

@ -232,3 +232,5 @@
/admin/install/docker-compose/update /admin/install/docker-compose/operations#upgrade 308
/admin/install/docker-compose/configure /admin/install/docker-compose/operations#configure 308
/admin/install/kubernetes/overlays /admin/install/kubernetes/configure 308
/dev/background-information/continuous_integration /dev/background-information/ci 308

View File

@ -1,3 +1,330 @@
# Continuous integration
The contents of [continuous integration](../continuous_integration.md) will soon be migrated to this page.
<span class="badge badge-note">SOC2/GN-105</span> <span class="badge badge-note">SOC2/GN-106</span>
Sourcegraph uses a continuous integration and delivery tool, [Buildkite](#buildkite-pipelines), to help ensure a [consistent](#pipeline-health) build, test and deploy process. Software changes are systematically required to complete all steps within the continuous integration tool workflow prior to production deployment, in addition to being [peer reviewed](../pull_request_reviews.md).
Sourcegraph also maintains a variety of tooling on [GitHub Actions](#github-actions) for continuous integration and repository maintainence purposes.
> NOTE: To learn more about testing in particular, see our [testing principles](../testing_principles.md).
## Buildkite pipelines
[Tests](../../how-to/testing.md) are automatically run in our [various Buildkite pipelines](https://buildkite.com/sourcegraph) when you push your changes to GitHub.
Pipeline steps are generated using the [pipeline generator](https://sourcegraph.com/github.com/sourcegraph/sourcegraph@main/-/tree/enterprise/dev/ci).
To see what checks will get run against your current branch, use [`sg`](../../setup/quickstart.md):
```sh
sg ci preview
```
A complete reference of all available pipeline types and steps is available in the generated [Pipeline reference](./reference.md).
You can also see these docs locally with `sg ci docs`.
You can also request builds for your changes for you builds using `sg ci build`.
To learn about making changes to our Buildkite pipelines, see [Pipeline development](#pipeline-development).
### Pipeline steps
#### Soft failures
<span class="badge badge-note">SOC2/GN-106</span>
Many steps in Sourcegraph's Buildkite pipelines allow for [soft failures](https://buildkite.com/changelog/56-command-steps-can-now-be-made-to-soft-fail), which means that even if they fail they do not cause the entire build to be failed.
In the Buildkite UI, soft failures currently look like the following, with a _triangular_ warning sign (not to be mistaken for a hard failure!):
![soft fail in Buildkite UI](https://user-images.githubusercontent.com/23356519/150558751-d8e0da19-0b6f-4645-aa12-7547d375330f.png)
We use soft failures for the following reasons only:
- Steps that determine whether a subsequent step should run, where soft failures are the only technical way to communicate that a later step should be skipped in this manner using Buildkite.
- Examples: [hash comparison steps that determine if a build should run](https://sourcegraph.com/search?q=context:%40sourcegraph/all+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/ci/operations%5C.go+compare-hash.sh&patternType=literal)
- Regular analysis tasks, where soft failures serve as an monitoring indicator to warn the team responsible for fixing issues.
- Examples: [image vulnerability scanning](#image-vulnerability-scanning), linting tasks for catching deprecation warnings
- Temporary exceptions to accommodate experimental or in-progress work.
You can find all usages of soft failures [with the following queries](https://sourcegraph.com/notebooks/Tm90ZWJvb2s6NTc=):
- [Soft failures in the Sourcegraph pipeline generator](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%7B...bk.SoftFail...%7D+OR+%28...bk.SoftFail...%29+count:all&patternType=structural)
- [Soft failures in Buildkite YAML pipelines](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/.*+soft_fail+lang:yaml+count:all&patternType=literal)
All other failures are hard failures.
#### Image vulnerability scanning
Our CI pipeline scans uses [Trivy](https://aquasecurity.github.io/trivy/) to scan our Docker images for security vulnerabilities.
Refer to our [Pipeline reference](./reference.md) to see what pipelines Trivy checks run in.
If there are any `HIGH` or `CRITICAL` severities in a Docker image that have a known fix:
1. The CI pipeline will create an annotation that contains links to reports that describe the vulnerabilities
2. The Trivy scanning step will [soft fail](#soft-failures). Note that soft failures **do not fail builds or block deployments**. They simply highlight the failing step for further analysis.
> NOTE: Our vulnerability management process (including this workflow) is under active development and in its early stages. All of the above is subject to change. See [https://github.com/sourcegraph/sourcegraph/pull/25756](https://github.com/sourcegraph/sourcegraph/pull/25756) for more context.
We also run [separate vulnerability scans for our infrastructure](https://handbook.sourcegraph.com/departments/product-engineering/engineering/cloud/security/checkov).
### Pipeline health
Maintaining [Buildkite pipeline](#buildkite-pipelines) health is a critical part of ensuring we ship a stable product - changes that make it to the `main` branch may be deployed to various Sourcegraph instances, and having a reliable and predictable pipeline is crucial to ensuring bugs do not make it to production environments.
To enable this, we [address flakes as they arise](#flakes) and mitigate the impacts of pipeline instability with [branch locks](#branch-locks).
> NOTE: Sourcegraph teammates should refer to the [CI incidents playbook](https://handbook.sourcegraph.com/departments/product-engineering/engineering/process/incidents/playbooks/ci#scenarios) for help managing issues with pipeline health.
#### Branch locks
> WARNING: **A red `main` build is not okay and must be fixed.** Learn more about our `main` branch policy in [Testing principles: Failures on the `main` branch](../testing_principles.md#failures-on-the-main-branch).
[`buildchecker`](#buildchecker) is a tool responding to periods of consecutive build failures on the `main` branch Sourcegraph Buildkite pipeline. If it detects a series of failures on the `main` branch, merges to `main` will be restricted to members of the Sourcegraph team who authored the failing commits until the issue is resolved - this is referred to as a "branch lock". When a build passes on `main` again, `buildchecker` will automatically unlock the branch.
**Authors of the most recent failed builds are responsible for investigating failures.** Please refer to the [Continuous integration playbook](https://handbook.sourcegraph.com/departments/product-engineering/engineering/process/incidents/playbooks/ci#build-has-failed-on-the-main-branch) for step-by-step guides on what to do in various scenarios.
#### Flakes
A *flake* is defined as a test or script that is unreliable or non-deterministic, i.e. it exhibits both a passing and a failing result with the same code. In other words: something that sometimes fails, but if you retry it enough times, it passes, *eventually*.
Tests are not the only thing that are flaky - flakes can also encompass [sporadic infrastructure issues](#flaky-infrastructure) and [unreliable steps](#flaky-steps).
##### Flaky tests
> WARNING: **We do not tolerate flaky tests of any kind.** Learn more about our flaky test policy in [Testing principles: Flaky tests](../testing_principles.md#flaky-tests).
Typical reasons why a test may be flaky:
- Race conditions or timing issues
- Caching or inconsistent state between tests
- Unreliable test infrastructure (such as CI)
- Reliance on third-party services that are inconsistent
If a flaky test is discovered, immediately use language-specific functionality to skip a test and open a PR to disable the test:
- Go: [`testing.T.Skip`](https://pkg.go.dev/testing#hdr-Skipping)
- Typescript: [`.skip()`](https://mochajs.org/#inclusive-tests)
If the language or framework allows for a skip reason, include a link to the issue track re-enabling the test, or leave a docstring with a link.
Then open an issue to investigate the flaky test (use the [flaky test issue template](https://github.com/sourcegraph/sourcegraph/issues/new/choose)), and assign it to the most likely owner.
##### Flaky steps
If a step is flaky we need to get the build back to reliable as soon as possible. If there is not already a discussion in `#buildkite-main` create one and link what step you take. Here are the recommended approaches in order:
1. Revert the PR if a recent change introduced the instability. Ping author.
2. Use `Skip` StepOpt when creating the step. Include reason and a link to context. This will still show the step on builds so we don't forget about it.
An example use of `Skip`:
```diff
--- a/enterprise/dev/ci/internal/ci/operations.go
+++ b/enterprise/dev/ci/internal/ci/operations.go
@@ -260,7 +260,9 @@ func addGoBuild(pipeline *bk.Pipeline) {
func addDockerfileLint(pipeline *bk.Pipeline) {
pipeline.AddStep(":docker: Lint",
bk.Cmd("./dev/ci/docker-lint.sh"),
+ bk.Skip("2021-09-29 example message https://github.com/sourcegraph/sourcegraph/issues/123"),
)
}
```
##### Flaky infrastructure
If the [build or test infrastructure itself is flaky](https://handbook.sourcegraph.com/departments/product-engineering/engineering/enablement/dev-experience#build-pipeline-support), then [open an issue with the `team/devx` label](https://github.com/sourcegraph/sourcegraph/issues/new?labels=team/devx) and notify the [Developer Experience team](https://handbook.sourcegraph.com/departments/product-engineering/engineering/enablement/dev-experience#contact).
Also see [Buildkite infrastructure](#buildkite-infrastructure).
### Pipeline development
The source code of the pipeline generator is in [`/enterprise/dev/ci`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph@main/-/tree/enterprise/dev/ci).
Internally, the pipeline generator determines what gets run over contributions based on:
1. [Run types](#run-types), determined by branch naming conventions, tags, and environment variables
2. [Diff types](#diff-types), determined by what files have been changed in a given branch
The above factors are then used to determine the appropriate [operations](#operations), composed of [step options](#step-options), that translate into steps in the resulting pipeline.
> WARNING: Sourcegraph's pipeline generator and its generated output are under the [Sourcegraph Enterprise license](https://github.com/sourcegraph/sourcegraph/blob/main/LICENSE.enterprise).
#### Run types
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTU5"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
#### Diff types
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTYw"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
#### Operations
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTYx"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
##### Developing PR checks
To create a new check that can run on pull requests on relevant files, refer to how [diff types](#diff-types) work to get started.
Then, you can add a new check to [`CoreTestOperations`](https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/ci+CoreTestOperations+type:symbol+&patternType=literal).
Make sure to follow the best practices outlined in docstring.
For more advanced pipelines, see [Run types](#run-types).
#### Step options
> NOTE: Coming soon!
##### Creating annotations
Annotations get rendered in the Buildkite UI to present the viewer notices about the build.
The pipeline generator provides an API for this that, at a high level, works like this:
1. In your script, leave a file in `./annotations`:
```sh
if [ $EXIT_CODE -ne 0 ]; then
echo -e "$OUT" >./annotations/docsite
fi
```
1. In your pipeline operation, replace the usual `bk.Cmd` with `bk.AnnotatedCmd`:
```go
pipeline.AddStep(":memo: Check and build docsite",
bk.AnnotatedCmd("./dev/check/docsite.sh", bk.AnnotatedCmdOpts{}))
```
1. That's it!
For more details about best practices and additional features and capabilities, please refer to [the `bk.AnnotatedCmd` docstring](https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/buildkite+AnnotatedCmd+type:symbol&patternType=literal).
##### Caching build artefacts
For caching artefacts in steps to speed up steps, see [How to cache CI artefacts](../../how-to/cache_ci_artefacts.md).
#### Observability
##### Pipeline command tracing
Every successful build of the `sourcegraph/sourcegraph` repository comes with an annotation pointing at the full trace of the build on [Honeycomb.io](https://honeycomb.io).
See the [Buildkite board on Honeycomb](https://ui.honeycomb.io/sourcegraph/board/sqPvYj5BXNy/Buildkite) for an overview.
Individual commands are tracked from the perspective of a given [step](#step-options):
```go
pipeline.AddStep(":memo: Check and build docsite",
bk.AnnotatedCmd("./dev/check/docsite.sh", bk.AnnotatedCmdOpts{}))
```
Will result in a single trace span for the `./dev/check/docsite.sh` script. But the following will have individual trace spans for each `yarn` commands:
```go
pipeline.AddStep(fmt.Sprintf(":%s: Puppeteer tests for %s extension", browser, browser),
// ...
bk.Cmd("yarn --frozen-lockfile --network-timeout 60000"),
bk.Cmd("yarn --cwd client/browser -s run build"),
bk.Cmd("yarn run cover-browser-integration"),
bk.Cmd("yarn nyc report -r json"),
bk.Cmd("dev/ci/codecov.sh -c -F typescript -F integration"),
```
Therefore, it's beneficial for tracing purposes to split the step in multiple commands, if possible.
##### Test analytics
Our test analytics is currently powered by a tool that Buildkite released in beta to analyse individual tests across builds called [Buildkite Analytics](https://buildkite.com/test-analytics).
This tool enables to observe the evolution of each individual tests on the following metrics: duration and flakiness.
Browse the [dashboard](https://buildkite.com/organizations/sourcegraph/analytics) to explore the metrics and optionally set monitors that will alert if a given test or a test suite is deviating from its historical duration or flakiness.
In order to track a new test suite, the tests output must be converted to JUnit XML and then uploaded to Buildkite. You can find the instructions for the upload by creating a new Test Suite in the Buildkite Analytics UI.
### Buildkite infrastructure
Our continuous integration system is composed of two parts, a central server controled by Buildkite and agents that are operated by Sourcegraph within our own infrastructure.
In order to provide strong isolation across builds, to prevent a previous build to create any effect on the next one, our agents are stateless jobs.
When a build is dispatched by Buildkite, each individual job will be assigned to an agent in a pristine state. Each agent will execute its assigned job, automatically report back to Buildkite and finally shuts itself down. A fresh agent will then be created and will stand in line for the next job.
This means that our agents are totally **stateless**, exactly like the runners used in GitHub actions.
Also see [Flaky infrastructure](#flaky-infrastructure), [Continous integration infrastructure](https://handbook.sourcegraph.com/departments/product-engineering/engineering/tools/infrastructure/ci), and the [Continuous integration changelog](https://handbook.sourcegraph.com/departments/product-engineering/engineering/tools/infrastructure/ci/changelog).
#### Pipeline setup
To set up Buildkite to use the rendered pipeline, add the following step in the [pipeline settings](https://buildkite.com/sourcegraph/sourcegraph/settings):
```shell
go run ./enterprise/dev/ci/gen-pipeline.go | buildkite-agent pipeline upload
```
#### Managing secrets
The term _secret_ refers to authentication credentials like passwords, API keys, tokens, etc. which are used to access a particular service. Our CI pipeline must never leak secrets:
- to add a secret, use the Secret Manager on Google Cloud and then inject it at deployment time as an environment variable in the CI agents, which will make it available to every step.
- use an environment variable name with one of the following suffixes to ensure it gets redacted in the logs: `*_PASSWORD, *_SECRET, *_TOKEN, *_ACCESS_KEY, *_SECRET_KEY, *_CREDENTIALS`
- while environment variables can be assigned when declaring steps, they should never be used for secrets, because they won't get redacted, even if they match one of the above patterns.
#### Feature flags
Enabling a feature flag on the CI pipeline is achieved by setting environment variables `CI_FEATURE_FLAGS_*` to `true`.
- `CI_FEATURE_FLAG_STATELESS`: schedule the build on stateless agents instead of normal agents (this forces a `main-dry-run` run type).
## GitHub Actions
### `buildchecker`
[![buildchecker](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml)
[`buildchecker`](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml), our [branch lock management tool](#branch-locks), runs in GitHub actions - see the [workflow specification](https://github.com/sourcegraph/sourcegraph/blob/main/.github/workflows/buildchecker.yml).
To learn more about `buildchecker`, refer to the [`buildchecker` source code and documentation](https://github.com/sourcegraph/sourcegraph/tree/main/dev/buildchecker).
### `pr-auditor`
[![pr-auditor](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml)
[`pr-auditor`](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml), our [PR audit tool](../testing_principles.md#policy), runs in GitHub actions - see the [workflow specification](https://github.com/sourcegraph/sourcegraph/blob/main/.github/workflows/pr-auditor.yml).
To learn more about `pr-auditor`, refer to the [`pr-auditor` source code and documentation](https://github.com/sourcegraph/sourcegraph/tree/main/dev/pr-auditor).
### Third-party licenses
[![Licenses Update](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-update.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-update.yml) [![Licenses Check](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-check.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-check.yml)
We use the [`license_finder`](https://github.com/pivotal/LicenseFinder) tool to check third-party dependencies for their licenses. It runs as a [GitHub Action on pull requests](https://github.com/sourcegraph/sourcegraph/actions?query=workflow%3A%22Licenses+Check%22), which will fail if one of the following occur:
- If the license for a dependency cannot be inferred. To resolve:
- Use `license_finder licenses add <dep> <license>` to set the license manually
- If the license for a new or updated dependency is not on the list of approved licenses. To resolve, either:
- Remove the dependency
- Use `license_finder ignored_dependencies add <dep> --why="Some reason"` to ignore it
- Use `license_finder permitted_licenses add <license> --why="Some reason"` to allow the offending license
The `license_finder` tool can be installed using `gem install license_finder`. You can run the script locally using:
```sh
# updates ThirdPartyLicenses.csv
./dev/licenses.sh
# runs the same check as the one used in CI, returning status 1
# if there are any unapproved dependencies ('action items')
LICENSE_CHECK=true ./dev/licenses.sh
```
The `./dev/licenses.sh` script will also output some `license_finder` configuration for debugging purposes - this configuration is based on the `doc/dependency_decisions.yml` file, which tracks decisions made about licenses and dependencies.
For more details, refer to the [`license_finder` documentation](https://github.com/pivotal/LicenseFinder#usage).

View File

@ -6,7 +6,7 @@ This is a reference outlining what CI pipelines we generate under different cond
To preview the pipeline for your branch, use `sg ci preview`.
For a higher-level overview, please refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/continuous_integration).
For a higher-level overview, please refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/ci).
## Run types

View File

@ -1,332 +0,0 @@
# Continuous integration
<span class="badge badge-note">SOC2/GN-105</span> <span class="badge badge-note">SOC2/GN-106</span>
Sourcegraph uses a continuous integration and delivery tool, [Buildkite](#buildkite-pipelines), to help ensure a [consistent](#pipeline-health) build, test and deploy process. Software changes are systematically required to complete all steps within the continuous integration tool workflow prior to production deployment, in addition to being [peer reviewed](pull_request_reviews.md).
Sourcegraph also maintains a variety of tooling on [GitHub Actions](#github-actions) for continuous integration and repository maintainence purposes.
> NOTE: To learn more about testing in particular, see our [testing principles](testing_principles.md).
## Buildkite pipelines
[Tests](../how-to/testing.md) are automatically run in our [various Buildkite pipelines](https://buildkite.com/sourcegraph) when you push your changes to GitHub.
Pipeline steps are generated using the [pipeline generator](https://sourcegraph.com/github.com/sourcegraph/sourcegraph@main/-/tree/enterprise/dev/ci).
To see what checks will get run against your current branch, use [`sg`](../setup/quickstart.md):
```sh
sg ci preview
```
A complete reference of all available pipeline types and steps is available in the generated [Pipeline reference](ci/reference.md).
You can also see these docs locally with `sg ci docs`.
You can also request builds for your changes for you builds using `sg ci build`.
To learn about making changes to our Buildkite pipelines, see [Pipeline development](#pipeline-development).
### Pipeline steps
#### Soft failures
<span class="badge badge-note">SOC2/GN-106</span>
Many steps in Sourcegraph's Buildkite pipelines allow for [soft failures](https://buildkite.com/changelog/56-command-steps-can-now-be-made-to-soft-fail), which means that even if they fail they do not cause the entire build to be failed.
In the Buildkite UI, soft failures currently look like the following, with a _triangular_ warning sign (not to be mistaken for a hard failure!):
![soft fail in Buildkite UI](https://user-images.githubusercontent.com/23356519/150558751-d8e0da19-0b6f-4645-aa12-7547d375330f.png)
We use soft failures for the following reasons only:
- Steps that determine whether a subsequent step should run, where soft failures are the only technical way to communicate that a later step should be skipped in this manner using Buildkite.
- Examples: [hash comparison steps that determine if a build should run](https://sourcegraph.com/search?q=context:%40sourcegraph/all+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/ci/operations%5C.go+compare-hash.sh&patternType=literal)
- Regular analysis tasks, where soft failures serve as an monitoring indicator to warn the team responsible for fixing issues.
- Examples: [image vulnerability scanning](#image-vulnerability-scanning), linting tasks for catching deprecation warnings
- Temporary exceptions to accommodate experimental or in-progress work.
You can find all usages of soft failures [with the following queries](https://sourcegraph.com/notebooks/Tm90ZWJvb2s6NTc=):
- [Soft failures in the Sourcegraph pipeline generator](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%7B...bk.SoftFail...%7D+OR+%28...bk.SoftFail...%29+count:all&patternType=structural)
- [Soft failures in Buildkite YAML pipelines](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/.*+soft_fail+lang:yaml+count:all&patternType=literal)
All other failures are hard failures.
#### Image vulnerability scanning
Our CI pipeline scans uses [Trivy](https://aquasecurity.github.io/trivy/) to scan our Docker images for security vulnerabilities.
Refer to our [Pipeline reference](ci/reference.md) to see what pipelines Trivy checks run in.
If there are any `HIGH` or `CRITICAL` severities in a Docker image that have a known fix:
1. The CI pipeline will create an annotation that contains links to reports that describe the vulnerabilities
2. The Trivy scanning step will [soft fail](#soft-failures). Note that soft failures **do not fail builds or block deployments**. They simply highlight the failing step for further analysis.
> NOTE: Our vulnerability management process (including this workflow) is under active development and in its early stages. All of the above is subject to change. See [https://github.com/sourcegraph/sourcegraph/pull/25756](https://github.com/sourcegraph/sourcegraph/pull/25756) for more context.
We also run [separate vulnerability scans for our infrastructure](https://handbook.sourcegraph.com/departments/product-engineering/engineering/cloud/security/checkov).
### Pipeline health
Maintaining [Buildkite pipeline](#buildkite-pipelines) health is a critical part of ensuring we ship a stable product - changes that make it to the `main` branch may be deployed to various Sourcegraph instances, and having a reliable and predictable pipeline is crucial to ensuring bugs do not make it to production environments.
To enable this, we [address flakes as they arise](#flakes) and mitigate the impacts of pipeline instability with [branch locks](#branch-locks).
> NOTE: Sourcegraph teammates should refer to the [CI incidents playbook](https://handbook.sourcegraph.com/departments/product-engineering/engineering/process/incidents/playbooks/ci#scenarios) for help managing issues with pipeline health.
#### Branch locks
> WARNING: **A red `main` build is not okay and must be fixed.** Learn more about our `main` branch policy in [Testing principles: Failures on the `main` branch](testing_principles.md#failures-on-the-main-branch).
[`buildchecker`](#buildchecker) is a tool responding to periods of consecutive build failures on the `main` branch Sourcegraph Buildkite pipeline. If it detects a series of failures on the `main` branch, merges to `main` will be restricted to members of the Sourcegraph team who authored the failing commits until the issue is resolved - this is referred to as a "branch lock". When a build passes on `main` again, `buildchecker` will automatically unlock the branch.
**Authors of the most recent failed builds are responsible for investigating failures.** Please refer to the [Continuous integration playbook](https://handbook.sourcegraph.com/departments/product-engineering/engineering/process/incidents/playbooks/ci#build-has-failed-on-the-main-branch) for step-by-step guides on what to do in various scenarios.
#### Flakes
A *flake* is defined as a test or script that is unreliable or non-deterministic, i.e. it exhibits both a passing and a failing result with the same code. In other words: something that sometimes fails, but if you retry it enough times, it passes, *eventually*.
Tests are not the only thing that are flaky - flakes can also encompass [sporadic infrastructure issues](#flaky-infrastructure) and [unreliable steps](#flaky-steps).
##### Flaky tests
> WARNING: **We do not tolerate flaky tests of any kind.** Learn more about our flaky test policy in [Testing principles: Flaky tests](testing_principles.md#flaky-tests).
Typical reasons why a test may be flaky:
- Race conditions or timing issues
- Caching or inconsistent state between tests
- Unreliable test infrastructure (such as CI)
- Reliance on third-party services that are inconsistent
If a flaky test is discovered, immediately use language-specific functionality to skip a test and open a PR to disable the test:
- Go: [`testing.T.Skip`](https://pkg.go.dev/testing#hdr-Skipping)
- Typescript: [`.skip()`](https://mochajs.org/#inclusive-tests)
If the language or framework allows for a skip reason, include a link to the issue track re-enabling the test, or leave a docstring with a link.
Then open an issue to investigate the flaky test (use the [flaky test issue template](https://github.com/sourcegraph/sourcegraph/issues/new/choose)), and assign it to the most likely owner.
##### Flaky steps
If a step is flaky we need to get the build back to reliable as soon as possible. If there is not already a discussion in `#buildkite-main` create one and link what step you take. Here are the recommended approaches in order:
1. Revert the PR if a recent change introduced the instability. Ping author.
2. Use `Skip` StepOpt when creating the step. Include reason and a link to context. This will still show the step on builds so we don't forget about it.
An example use of `Skip`:
```diff
--- a/enterprise/dev/ci/internal/ci/operations.go
+++ b/enterprise/dev/ci/internal/ci/operations.go
@@ -260,7 +260,9 @@ func addGoBuild(pipeline *bk.Pipeline) {
func addDockerfileLint(pipeline *bk.Pipeline) {
pipeline.AddStep(":docker: Lint",
bk.Cmd("./dev/ci/docker-lint.sh"),
+ bk.Skip("2021-09-29 example message https://github.com/sourcegraph/sourcegraph/issues/123"),
)
}
```
##### Flaky infrastructure
If the [build or test infrastructure itself is flaky](https://handbook.sourcegraph.com/departments/product-engineering/engineering/enablement/dev-experience#build-pipeline-support), then [open an issue with the `team/devx` label](https://github.com/sourcegraph/sourcegraph/issues/new?labels=team/devx) and notify the [Developer Experience team](https://handbook.sourcegraph.com/departments/product-engineering/engineering/enablement/dev-experience#contact).
Also see [Buildkite infrastructure](#buildkite-infrastructure).
### Pipeline development
The source code of the pipeline generator is in [`/enterprise/dev/ci`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph@main/-/tree/enterprise/dev/ci).
Internally, the pipeline generator determines what gets run over contributions based on:
1. [Run types](#run-types), determined by branch naming conventions, tags, and environment variables
2. [Diff types](#diff-types), determined by what files have been changed in a given branch
The above factors are then used to determine the appropriate [operations](#operations), composed of [step options](#step-options), that translate into steps in the resulting pipeline.
> WARNING: Sourcegraph's pipeline generator and its generated output are under the [Sourcegraph Enterprise license](https://github.com/sourcegraph/sourcegraph/blob/main/LICENSE.enterprise).
#### Run types
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTU5"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
#### Diff types
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTYw"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
#### Operations
<div class="embed">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTYx"
style="width:100%;height:720px" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
##### Developing PR checks
To create a new check that can run on pull requests on relevant files, refer to how [diff types](#diff-types) work to get started.
Then, you can add a new check to [`CoreTestOperations`](https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/ci+CoreTestOperations+type:symbol+&patternType=literal).
Make sure to follow the best practices outlined in docstring.
For more advanced pipelines, see [Run types](#run-types).
#### Step options
> NOTE: Coming soon!
##### Creating annotations
Annotations get rendered in the Buildkite UI to present the viewer notices about the build.
The pipeline generator provides an API for this that, at a high level, works like this:
1. In your script, leave a file in `./annotations`:
```sh
if [ $EXIT_CODE -ne 0 ]; then
echo -e "$OUT" >./annotations/docsite
fi
```
1. In your pipeline operation, replace the usual `bk.Cmd` with `bk.AnnotatedCmd`:
```go
pipeline.AddStep(":memo: Check and build docsite",
bk.AnnotatedCmd("./dev/check/docsite.sh", bk.AnnotatedCmdOpts{}))
```
1. That's it!
For more details about best practices and additional features and capabilities, please refer to [the `bk.AnnotatedCmd` docstring](https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:%5Eenterprise/dev/ci/internal/buildkite+AnnotatedCmd+type:symbol&patternType=literal).
##### Caching build artefacts
For caching artefacts in steps to speed up steps, see [How to cache CI artefacts](../how-to/cache_ci_artefacts.md).
#### Observability
##### Pipeline command tracing
Every successful build of the `sourcegraph/sourcegraph` repository comes with an annotation pointing at the full trace of the build on [Honeycomb.io](https://honeycomb.io).
See the [Buildkite board on Honeycomb](https://ui.honeycomb.io/sourcegraph/board/sqPvYj5BXNy/Buildkite) for an overview.
Individual commands are tracked from the perspective of a given [step](#step-options):
```go
pipeline.AddStep(":memo: Check and build docsite",
bk.AnnotatedCmd("./dev/check/docsite.sh", bk.AnnotatedCmdOpts{}))
```
Will result in a single trace span for the `./dev/check/docsite.sh` script. But the following will have individual trace spans for each `yarn` commands:
```go
pipeline.AddStep(fmt.Sprintf(":%s: Puppeteer tests for %s extension", browser, browser),
// ...
bk.Cmd("yarn --frozen-lockfile --network-timeout 60000"),
bk.Cmd("yarn --cwd client/browser -s run build"),
bk.Cmd("yarn run cover-browser-integration"),
bk.Cmd("yarn nyc report -r json"),
bk.Cmd("dev/ci/codecov.sh -c -F typescript -F integration"),
```
Therefore, it's beneficial for tracing purposes to split the step in multiple commands, if possible.
##### Test analytics
Our test analytics is currently powered by a tool that Buildkite released in beta to analyse individual tests across builds called [Buildkite Analytics](https://buildkite.com/test-analytics).
This tool enables to observe the evolution of each individual tests on the following metrics: duration and flakiness.
Browse the [dashboard](https://buildkite.com/organizations/sourcegraph/analytics) to explore the metrics and optionally set monitors that will alert if a given test or a test suite is deviating
from its historical duration or flakiness.
In order to track a new test suite, the tests output must be converted to JUnit XML and then uploaded to Buildkite. You can find the instructions for the upload by creating a new Test Suite in the Buildkite Analytics UI.
### Buildkite infrastructure
Our continuous integration system is composed of two parts, a central server controled by Buildkite and agents that are operated by Sourcegraph within our own infrastructure.
In order to provide strong isolation across builds, to prevent a previous build to create any effect on the next one, our agents are stateless jobs.
When a build is dispatched by Buildkite, each individual job will be assigned to an agent in a pristine state. Each agent will execute its assigned job, automatically report back to Buildkite and finally shuts itself down. A fresh agent will then be created and will stand in line for the next job.
This means that our agents are totally **stateless**, exactly like the runners used in GitHub actions.
Also see [Flaky infrastructure](#flaky-infrastructure), [Continous integration infrastructure](https://handbook.sourcegraph.com/departments/product-engineering/engineering/tools/infrastructure/ci), and the [Continuous integration changelog](https://handbook.sourcegraph.com/departments/product-engineering/engineering/tools/infrastructure/ci/changelog).
#### Pipeline setup
To set up Buildkite to use the rendered pipeline, add the following step in the [pipeline settings](https://buildkite.com/sourcegraph/sourcegraph/settings):
```shell
go run ./enterprise/dev/ci/gen-pipeline.go | buildkite-agent pipeline upload
```
#### Managing secrets
The term _secret_ refers to authentication credentials like passwords, API keys, tokens, etc. which are used to access a particular service. Our CI pipeline must never leak secrets:
- to add a secret, use the Secret Manager on Google Cloud and then inject it at deployment time as an environment variable in the CI agents, which will make it available to every step.
- use an environment variable name with one of the following suffixes to ensure it gets redacted in the logs: `*_PASSWORD, *_SECRET, *_TOKEN, *_ACCESS_KEY, *_SECRET_KEY, *_CREDENTIALS`
- while environment variables can be assigned when declaring steps, they should never be used for secrets, because they won't get redacted, even if they match one of the above patterns.
#### Feature flags
Enabling a feature flag on the CI pipeline is achieved by setting environment variables `CI_FEATURE_FLAGS_*` to `true`.
- `CI_FEATURE_FLAG_STATELESS`: schedule the build on stateless agents instead of normal agents (this forces a `main-dry-run` run type).
## GitHub Actions
### `buildchecker`
[![buildchecker](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml)
[`buildchecker`](https://github.com/sourcegraph/sourcegraph/actions/workflows/buildchecker.yml), our [branch lock management tool](#branch-locks), runs in GitHub actions - see the [workflow specification](https://github.com/sourcegraph/sourcegraph/blob/main/.github/workflows/buildchecker.yml).
To learn more about `buildchecker`, refer to the [`buildchecker` source code and documentation](https://github.com/sourcegraph/sourcegraph/tree/main/dev/buildchecker).
### `pr-auditor`
[![pr-auditor](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml)
[`pr-auditor`](https://github.com/sourcegraph/sourcegraph/actions/workflows/pr-auditor.yml), our [PR audit tool](testing_principles.md#policy), runs in GitHub actions - see the [workflow specification](https://github.com/sourcegraph/sourcegraph/blob/main/.github/workflows/pr-auditor.yml).
To learn more about `pr-auditor`, refer to the [`pr-auditor` source code and documentation](https://github.com/sourcegraph/sourcegraph/tree/main/dev/pr-auditor).
### Third-party licenses
[![Licenses Update](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-update.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-update.yml) [![Licenses Check](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-check.yml/badge.svg)](https://github.com/sourcegraph/sourcegraph/actions/workflows/licenses-check.yml)
We use the [`license_finder`](https://github.com/pivotal/LicenseFinder) tool to check third-party dependencies for their licenses. It runs as a [GitHub Action on pull requests](https://github.com/sourcegraph/sourcegraph/actions?query=workflow%3A%22Licenses+Check%22), which will fail if one of the following occur:
- If the license for a dependency cannot be inferred. To resolve:
- Use `license_finder licenses add <dep> <license>` to set the license manually
- If the license for a new or updated dependency is not on the list of approved licenses. To resolve, either:
- Remove the dependency
- Use `license_finder ignored_dependencies add <dep> --why="Some reason"` to ignore it
- Use `license_finder permitted_licenses add <license> --why="Some reason"` to allow the offending license
The `license_finder` tool can be installed using `gem install license_finder`. You can run the script locally using:
```sh
# updates ThirdPartyLicenses.csv
./dev/licenses.sh
# runs the same check as the one used in CI, returning status 1
# if there are any unapproved dependencies ('action items')
LICENSE_CHECK=true ./dev/licenses.sh
```
The `./dev/licenses.sh` script will also output some `license_finder` configuration for debugging purposes - this configuration is based on the `doc/dependency_decisions.yml` file, which tracks decisions made about licenses and dependencies.
For more details, refer to the [`license_finder` documentation](https://github.com/pivotal/LicenseFinder#usage).

View File

@ -56,7 +56,7 @@
## Testing
- [Continuous Integration](continuous_integration.md)
- [Continuous Integration](ci/index.md)
- [Testing Principles](testing_principles.md)
- [Testing Go code](languages/testing_go_code.md)
- [Testing web code](testing_web_code.md)

View File

@ -9,7 +9,7 @@ GitHub repositories are configured to prevent merging without a review (includin
Our goal is to have a pull request review process and culture that everyone would opt-in to even if reviews weren't required.
In addition to being peer-reviewed, all contributions must pass our [continuous integration](./continuous_integration.md).
In addition to being peer-reviewed, all contributions must pass our [continuous integration](./ci/index.md).
## Why do we require peer reviews?

View File

@ -208,7 +208,7 @@ sg rfc open 420
### `sg ci` - Interact with Sourcegraph's continuous integration
Interact with Sourcegraph's [continuous integration](https://docs.sourcegraph.com/dev/background-information/continuous_integration) pipelines on [Buildkite](https://buildkite.com/sourcegraph).
Interact with Sourcegraph's [continuous integration](https://docs.sourcegraph.com/dev/background-information/ci) pipelines on [Buildkite](https://buildkite.com/sourcegraph).
```bash
# Preview what a CI run for your current changes will look like

View File

@ -148,11 +148,11 @@ No review required: deploys tested changes.
### Flaky tests
**We do not tolerate flaky tests of any kind.** Any engineer that sees a flaky test in [continuous integration](./continuous_integration.md) should immediately [disable the flaky test](continuous_integration.md#flaky-tests).
**We do not tolerate flaky tests of any kind.** Any engineer that sees a flaky test in [continuous integration](./ci/index.md) should immediately [disable the flaky test](ci/index.md#flaky-tests).
Why are flaky tests undesirable? Because these tests stop being an informative signal that the engineering team can rely on, and if we keep them around then we eventually train ourselves to ignore them and become blind to their results. This can hide real problems under the cover of flakiness.
Other kinds of flakes include [flaky steps](continuous_integration.md#flaky-steps) and [flaky infrastructure](continuous_integration.md#laky-infrastructure)
Other kinds of flakes include [flaky steps](ci/index.md#flaky-steps) and [flaky infrastructure](ci/index.md#laky-infrastructure)
## Ownership
@ -162,7 +162,7 @@ Other kinds of flakes include [flaky steps](continuous_integration.md#flaky-step
## Reference
- [Continuous integration](continuous_integration.md)
- [Continuous integration](ci/index.md)
- [How to write and run tests](../how-to/testing.md)
- [Testing Go code](languages/testing_go_code.md)
- [Testing web code](testing_web_code.md)

View File

@ -11,7 +11,7 @@ This page outlines how to accept a contribution to the [Sourcegraph repository](
## Buildkite
To request a [Buildkite build](../background-information/continuous_integration.md#buildkite-pipelines) for a pull request from a fork, a build must be manually requested after reviewing the contributor's changes. A successful Buildkite build is required for a pull request to be merged.
To request a [Buildkite build](../background-information/ci/index.md#buildkite-pipelines) for a pull request from a fork, a build must be manually requested after reviewing the contributor's changes. A successful Buildkite build is required for a pull request to be merged.
> WARNING: Builds do not happen automatically for forks for security reasons - Buildkite build runs have access to a variety of secrets used in testing. When reviewing, ensure that there are no unexpected usages of secrets or attempts to expose secrets in logs or external services.

View File

@ -1,12 +1,12 @@
# How to cache CI artefacts
This guide documents how to cache build artefacts in order to speed up build times in Sourcegraph's [Buildkite CI pipelines](../background-information/continuous_integration.md#buildkite-pipelines).
This guide documents how to cache build artefacts in order to speed up build times in Sourcegraph's [Buildkite CI pipelines](../background-information/ci/index.md#buildkite-pipelines).
> NOTE: Before getting started, we recommend familiarize yourself with [Pipeline development](../background-information/continuous_integration.md#pipeline-development) and [Buildkite infrastructure](../background-information/continuous_integration.md#buildkite-infrastructure).
> NOTE: Before getting started, we recommend familiarize yourself with [Pipeline development](../background-information/ci/index.md#pipeline-development) and [Buildkite infrastructure](../background-information/ci/index.md#buildkite-infrastructure).
## The need for caching
Because [Buildkite agents are stateless](../background-information/continuous_integration.md#buildkite-infrastructure) and start with a blank slate, this means that all dependencies and binaries have to rebuild on each job. This is the price to pay for complete isolation from one job to the other.
Because [Buildkite agents are stateless](../background-information/ci/index.md#buildkite-infrastructure) and start with a blank slate, this means that all dependencies and binaries have to rebuild on each job. This is the price to pay for complete isolation from one job to the other.
A common strategy to address this problem of having to rebuild everything is to store objects that are commonly reused accross jobs and to download them again rather than rebuilding everything from scratch.
@ -22,7 +22,7 @@ In order to determine what we can cache and when to do it, we need to make sure
## How to write a step that caches an artefact?
In the [CI pipeline generator](../background-information/continuous_integration.md#pipeline-development), when defining a step you can use the `buildkite.Cache()` function to define what needs to be cached and under which key to store it.
In the [CI pipeline generator](../background-information/ci/index.md#pipeline-development), when defining a step you can use the `buildkite.Cache()` function to define what needs to be cached and under which key to store it.
For example: we want to cache the `node_modules` folder to avoid dowloading again all dependencies for the front-end.

View File

@ -29,7 +29,7 @@
## Testing Sourcegraph & CI
- [How to run tests](testing.md)
- See also [Testing Principles](../background-information/testing_principles.md) and [Continuous Integration](../background-information/continuous_integration.md)
- See also [Testing Principles](../background-information/testing_principles.md) and [Continuous Integration](../background-information/ci/index.md)
- [Configure a test instance of Phabricator and Gitolite](configure_phabricator_gitolite.md)
- [Test a Phabricator and Gitolite instance](test_phabricator.md)
- [How to test changes in dogfood](testing_in_dogfood.md)

View File

@ -4,7 +4,7 @@
<span class="virtual-br"></span>
> NOTE: To learn more about our CI pipelines where these tests get run, please see "[Buildkite pipelines](../background-information/continuous_integration.md#buildkite-pipelines)".
> NOTE: To learn more about our CI pipelines where these tests get run, please see "[Buildkite pipelines](../background-information/ci/index.md#buildkite-pipelines)".
## Backend tests

View File

@ -103,7 +103,7 @@ Clarification and discussion about key concepts, architecture, and development s
- [Code host connections on local dev environment](background-information/code-host.md)
- [Testing](#testing)
- [Testing principles and guidelines](background-information/testing_principles.md)
- [Continuous integration](background-information/continuous_integration.md)
- [Continuous integration](background-information/ci/index.md)
- [How to write and run tests](how-to/testing.md)
- [Testing Go code](background-information/languages/testing_go_code.md)
- [Testing web code](background-information/testing_web_code.md)

View File

@ -1,4 +1,4 @@
# Buildkite Pipeline for sourcegraph/sourcegraph
We dynamically generate our CI pipeline for [Buildkite](https://buildkite.com/sourcegraph/sourcegraph) based on the output of [gen-pipeline.go](./gen-pipeline.go).
To learn more, refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/continuous_integration).
To learn more, refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/ci).

View File

@ -111,7 +111,7 @@ func renderPipelineDocs(w io.Writer) {
fmt.Fprintln(w, "# Pipeline types reference")
fmt.Fprintln(w, "\nThis is a reference outlining what CI pipelines we generate under different conditions.")
fmt.Fprintln(w, "\nTo preview the pipeline for your branch, use `sg ci preview`.")
fmt.Fprintln(w, "\nFor a higher-level overview, please refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/continuous_integration).")
fmt.Fprintln(w, "\nFor a higher-level overview, please refer to the [continuous integration docs](https://docs.sourcegraph.com/dev/background-information/ci).")
fmt.Fprintln(w, "\n## Run types")