sourcegraph

mirror of https://github.com/sourcegraph/sourcegraph.git synced 2026-02-06 18:31:54 +00:00

Author	SHA1	Message	Date
Noah S-C	bb178ba729	chore(tooling): bump Go version to 1.22.4 (#63124 ) Bump for @evict ## Test plan CI passes with no complaints ## Changelog - Bumped version of Go used to build to 1.22.4	2024-06-06 15:19:03 +00:00
Varun Gandhi	2955bb6cfb	chore: Change errors.HasType to respect multi-errors (#63024 ) With this patch, the `errors.HasType` API behaves similar to `Is` and `As`, where it checks the full error tree instead of just checking a linearized version of it, as cockroachdb/errors's `HasType` implementation does not respect multi-errors. As a consequence, a bunch of relationships between HasType and Is/As that you'd intuitively expect to hold are now true; see changes to `invariants_test.go`.	2024-06-06 13:02:14 +00:00
Robert Lin	6302955caf	feat/sg-msp-pg: add suggestion to check msp-ops page on perms error (#63118 ) I think finding the right permissions confuses people pretty often when first interacting with MSP. This adds a helper for annotating errors returned from points where we might be able to help out @DaedalusG, specifically for the situation in https://sourcegraph.slack.com/archives/C05GJPTSZCZ/p1717629546727829 😉 ## Test plan It's a little wordy but: ``` sg msp pg connect sams prod ❌ possible permissions error, ensure you have the prerequisite Entitle grants mentioned in https://sourcegraph.notion.site/3e59b9ac3d414a5f8fb5911eed1e418a: find IAM output: gcloud: failed to access secret "iam_operator_access_service_account" from "sams-prod-ywuz": rpc error: code = PermissionDenied desc = Permission 'secretmanager.versions.access' denied for resource 'projects/sams-prod-ywuz/secrets/iam_operator_access_service_account/versions/latest' (or it may not exist). ``` ## Changelog - `sg msp pg connect` will tell you about your service's generated Notion page if you run into a permissions-looking error during command setup, where there is guidance about the required Entitle requests.	2024-06-05 18:55:59 -07:00
James Cotter	bcc4367f86	msp/deploy: add 'author' and 'commit_message' annotations (#63108 ) Add 'author' and 'commit_message' annotations on release ## Test plan CI	2024-06-05 11:43:02 -07:00
Robert Lin	27211dea73	feat/msp: update handbook link in alerts dashboard, sort custom alerts first (#63089 ) 1. The dashboard link still points to the old `go/msp-ops/...` which no longer work (CORE-105) 2. Alerts defined on top of the MSP defaults are probably of more interest, so let's sort these in front of the others ## Test plan Unit/golden tests	2024-06-05 09:09:22 -07:00
Noah S-C	4a93f29755	chore(bazel): enable rules_esbuild sandbox with object-inspect workaround (#61969 ) Sandbox escapes be-gone ## Test plan Tested in CI and locally with `bazel build //client/...` as well as a lot of blood, sweat n tears tearing through failed sandboxes ## Changelog	2024-06-05 15:34:29 +01:00
William Bezuidenhout	605b2305eb	chore(sg): move `registry list` cmd to `release list` (#63094 ) Follow up from https://github.com/sourcegraph/sourcegraph/pull/63079 ## Test plan Tested locally ## Changelog	2024-06-05 10:25:38 +02:00
William Bezuidenhout	e4eec6668a	feat(sg): respect the context when executing interrupt hooks (#63069 ) During testing I found that sometimes some hooks would just hang and not complete. In this PR we execute all hooks within a timeout context. Ensuring we give _some_ time for hooks to execute but also making sure we eventually exit if some hook is misbehaving. Additional changes: - Global timeout for all hook execution is 2 seconds - We hard exit after 5 intterupts instead of 2 - Hooks are split into two groups: sequential and concurrent. As per their names the hooks are executed differently depending how they were registered. ## Test plan Tested locally ``` ^C⚠️ Interrupt received, executing hook groups for graceful shutdown... ⚠️ Executing 16 'cleanup' hooks for graceful shutdown... [ repo-updater] INFO repo-updater.repo-updater.grpcserver grpcserver/grpcserver.go:76 Shutting down gRPC server [ repo-updater] INFO sync_worker workerutil/worker.go:252 Shutting down dequeue loop {"name": "repo_sync_worker", "reason": ""} worker stopped due to context error: context canceled gitserver-1 stopped due to context error: context canceled searcher stopped due to context error: context canceled gitserver-0 stopped due to context error: context canceled blobstore stopped due to context error: context canceled symbols stopped due to context error: context canceled caddy stopped due to context error: context canceled repo-updater stopped due to context error: context canceled embeddings stopped due to context error: context canceled frontend stopped due to context error: context canceled zoekt-index-0 stopped due to context error: context canceled syntax-highlighter stopped due to context error: context canceled zoekt-web-1 stopped due to context error: context canceled web stopped due to context error: context canceled zoekt-web-0 stopped due to context error: context canceled ⚠️ Executing 6 'general' hooks for for graceful shutdown... ❌ failed to run zoekt-index-1. stderr: INFO server zoekt-sourcegraph-indexserver/main.go:1017 removing tmp dir {"tmpRoot": "/Users/william/.sourcegraph/zoekt/index-1/.indexserver.tmp"} 2024/06/04 09:15:03 updating index 6 github.com/sourcegraph/sourcegraph@HEAD=e55003da894490122546f876452f651aae65bb55 reason=content-mismatch INFO server zoekt-sourcegraph-indexserver/main.go:432 updated index {"repo": "github.com/sourcegraph/sourcegraph", "id": 6, "branches": ["HEAD=e55003da894490122546f876452f651aae65bb55"], "duration": "19.21403925s"} ``` ## Changelog - Hard exit sg when 5 intterupt hooks are received - Respect the context while executing interrupt hooks to ensure we still exit if some hook is misbehaving	2024-06-05 10:06:58 +02:00
Robert Lin	a3fe573b59	fix/msp: flatten custom alert promQL query for GCP (#63084 ) The GCP monitoring alert configuration expects, for some reason, a single-line PromQL query only, otherwise the threshold doesn't work. In configuration, however, we may want to write a multi-line query, for ease of readability. This change automatically flattens the PromQL query into a single line and strips extra spaces. Part of CORE-161 ## Test plan Unit tests	2024-06-04 14:37:51 -07:00
William Bezuidenhout	8f3a9d5260	sg: add command to fetch versions from release registry (#63079 ) added a commnad to list versions from the release registry	2024-06-04 17:42:47 +02:00
William Bezuidenhout	9bbfd25fc4	feat(sg: add `list-build` subcommand to ci (#63071 ) * sg: add `list-build` subcommand to ci Add command to list builds in various states on a pipeline * bazel remove trailing '...' from commit printing	2024-06-04 13:41:44 +02:00
Greg Magolan	2d3d918ffa	chore(bazel): upgrade to rules_js 2.0 RC (#63022 ) Bumps to rules_js (and friends) to 2.0 RCs. This brings in performance improvements for analysis phase since npm package depsets and now much smaller. It also adds support for pnpm v9 and allows for linking js_library targets as 1p deps instead of npm_package targets. See https://github.com/aspect-build/rules_js/issues/1671 for more details. ## Test plan CI ## Changelog	2024-06-04 11:26:42 +00:00
William Bezuidenhout	1a7e1b9686	build-tracker: remove old links (#63065 )	2024-06-04 12:03:58 +01:00
Robert Lin	908d7119ea	chore/msp: blindly retry Notion page deletion (#63052 ) Deleting Notion pages takes a very long time, and is prone to breaking in the page deletion step, where we must delete blocks one at a time because Notion does not allow for bulk block deletions. The errors seem to generally just be random Notion internal errors. This is very bad because it leaves go/msp-ops pages in an unusable state. To try and mitigate, we add several places to blindly retry: 1. At the Notion SDK level, where a config option is available for retrying 429 errors 2. At the "reset page" helper level, where a failure to reset a page will prompt a retry of the whole helper 3. At the "delete blocks" helper level, where individual block deletion failures will be retried Attempt to mitigate https://linear.app/sourcegraph/issue/CORE-119 While here, I also made some other QOL tweaks: - Fix timing of sub-tasks in CLI output - Bump default concurrency to 5 (our retries will handle if this is too aggressive, hopefully) - Fix a missing space in generated docs ## Test plan ``` sg msp ops generate-handbook-pages ```	2024-06-03 22:32:06 +00:00
Joe Chen	dd8ff6013f	worker: add SAMS notifications subscriber (#63051 ) Part of CORE-92 This PR add a new worker for subscribing to [SAMS notifications](https://www.notion.so/sourcegraph/SAMS-notifications-distribution-system-0d174480e0044b05b545d37d24263d5a). The current use case is to automatically (hard-)delete users on Sourcegraph.com when the corresponding user is deleted from SAMS. This worker is only started when running in the Sourcegraph.com mode and the credentials file (`service_account.json`) is provided, which has been configured since https://github.com/sourcegraph/deploy-sourcegraph-cloud/pull/18591. Co-authored-by: Robert Lin <robert@bobheadxi.dev>	2024-06-03 18:01:19 -04:00
Robert Lin	617d2f766c	chore/msp/spec: tidy up custom alerts spec (#63050 ) Follow-ups for #62885: - Better docstrings for `mql`, `promql` - `duration` -> `durationMinutes` to align with other config - `alertpolicy.ResponseCodeMetric` -> `spec.CustomAlertCondition`: they're effectively the same type Test plan: CI	2024-06-03 13:53:01 -07:00
Bolaji Olajide	9e2b56119f	feat(release): allow creation of multiple patch release events (#63034 ) * allow creation of multiple patch release events * skip old month releases * update config	2024-06-03 11:14:24 -04:00
Bolaji Olajide	bab01ccaac	feat(release): rename code freeze event to `branch cut` event (#63033 ) rename code freeze event to branch cut	2024-06-03 05:13:32 -05:00
William Bezuidenhout	4cf94e9e8c	sg: speed up interrupt execution (#63032 )	2024-06-03 09:54:51 +00:00
Greg Magolan	a3afa08161	chore(bazel): bump to aspect_bazel_lib 2.7.7 (#63012 )	2024-05-31 23:08:52 +01:00
Robert Lin	012db75133	fix/msp: make deadlineSeconds job-level configuration, apply in timeout (#63017 ) In a rushed POC of MSP jobs, I did some pretty bad copy-pasting (evidenced by all the service-specific docstrings I have removed in this PR) and made a bad configuration decision here, resulting in a few issues: 1. `schedule.deadline` is not actually applied to Cloud Run jobs, causing jobs to time out earlier than desired 2. `schedule.deadline` is not the right place to configure a deadline, because _all_ jobs need a configurable deadline, not just those with schedules. This change moves `schedule.deadline` to `deadlineSeconds`. Closes CORE-145 ## Test plan ``` $ sg msp generate gatekeeper prod $ git diff ``` ```diff diff --git a/services/gatekeeper/service.yaml b/services/gatekeeper/service.yaml index fd6a3812..ce4b02e3 100644 --- a/services/gatekeeper/service.yaml +++ b/services/gatekeeper/service.yaml @@ -48,4 +48,4 @@ environments: - "primary" schedule: cron: 0 * * * * - deadline: 1800 # 30 minutes + deadlineSeconds: 1800 # 30 minutes diff --git a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json index 3c2c295e..f83b32b9 100644 --- a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json +++ b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json @@ -281,7 +281,7 @@ }, { "name": "JOB_EXECUTION_DEADLINE", - "value": "600s" + "value": "1800s" } ], "image": "us.gcr.io/sourcegraph-dev/abuse-ban-bot:${var.resolved_image_tag}", @@ -302,7 +302,7 @@ } ], "service_account": "${data.terraform_remote_state.cross-stack-reference-input-iam.outputs.cross-stack-output-google_service_accountiam-workload-accountemail}", - "timeout": "300s", + "timeout": "1800s", "volumes": [ ], "vpc_access": { @@ -341,7 +341,7 @@ "uniqueId": "job_scheduler" } }, - "attempt_deadline": "600s", + "attempt_deadline": "1800s", "depends_on": [ "google_cloud_run_v2_job_iam_member.cloudrun_scheduler_job_invoker" ], ``` ## Changelog - MSP jobs: `schedule.deadline` is deprecated, use the top-level `deadlineSeconds` instead. Configured deadlines are now correctly applied as the Cloud Run job execution timeout as well.	2024-05-31 21:15:31 +00:00
Greg Magolan	bbae7a4954	build(bazel): bump to rules_esbuild 0.16.0 (#63005 ) * build(bazel): pin bazel fetched esbuild version to 0.19.2 * build(bazel): bump to rules_esbuild 0.16.0 * Update WORKSPACE Co-authored-by: Noah S-C <noah@sourcegraph.com> --------- Co-authored-by: Noah S-C <noah@sourcegraph.com>	2024-05-31 11:20:23 -07:00
Robert Lin	7170d4bd2b	feat/msp: add link to ops page in Slack channel description (#63011 ) Minor QOL improvement, when you're in the Slack channel the chances are good that you might want the ops docs at some point. ## Test plan n/a ## Changelog - MSP-provisioned alerts Slack channels now include a link to the service's generated operational docs for a service (go/msp-ops) in the channel description.	2024-05-31 17:59:22 +00:00
William Bezuidenhout	0fcffdd657	fix(sg): cloud eph - do not fail just because we cannot parse reason (#62989 ) * do not fail just because we cannot parse reason * fix tests * whitespace	2024-05-31 14:18:56 +00:00
Noah S-C	e1974fe9f5	chore(bazel): update ownership tags to increase coverage (#63001 ) Brings us up to 73%, a bit of buffer room ## Test plan `./dev/check-test-ownership.sh` prints out 73 ## Changelog	2024-05-31 14:10:29 +00:00
Noah S-C	79fce8c73e	feat(ci): add GHA to report when Bazel test ownership drops below 70% threshold (#62985 ) This PR adds a non-blocking GHA check to report when a branch's Bazel test ownership drops below 70%. See example messaging below to see how it looks like: https://github.com/sourcegraph/sourcegraph/pull/62985#issuecomment-2139439084. The message will be updated if the threshold is reached/breached whenever the branch changes. ## Test plan Extensive iteration in this PR, see below message https://github.com/sourcegraph/sourcegraph/pull/62985#issuecomment-2139439084 ## Changelog	2024-05-31 14:46:01 +01:00
William Bezuidenhout	bc73643a5d	chore(sg): cloud ephemeral fix instance check (#62988 ) * fix instance check * fix name sanitization	2024-05-31 13:39:46 +00:00
James Cotter	2f4e3b9272	sg/msp: fix nil `domain` and `EnvironmentDomainTypeNone` in diagram gen (#62982 )	2024-05-30 17:58:00 +01:00
Robert Lin	9e4a8e8033	feat/sg/msp: add 'sg msp validate' for validating service specifications (#62973 )	2024-05-30 09:11:36 -07:00
Robert Lin	27f0d725ac	feat/msp/spec: require notionPageID if a production env is provisioned (#62972 )	2024-05-30 09:01:42 -07:00
Robert Lin	cb62afa2c2	fix/msp: test for cron interval changes based on time, add more restrictions (#62969 ) Addresses problem noticed in https://github.com/sourcegraph/managed-services/pull/1486#issuecomment-2137887423 ## Test plan Unit tests ## Changelog - Fixed an issue with output of `sg msp generate` for MSP jobs with particular schedules changing throughout the week - MSP jobs schedules now must be between 15 minutes at the most frequent, and every week at the least frequent	2024-05-29 18:24:39 -07:00
Anish Lakhwara	de920065ea	feat(sg): add version=auto for `sg release cut` (#62970 )	2024-05-29 14:57:18 -07:00
Robert Lin	28324a3d95	feat/sg/enterprise-portal: use externalSecret to configure SAMS client secret (#62953 )	2024-05-28 15:27:30 -07:00
Robert Lin	de9a31aa89	feat/sg: add 'sg sams' commands 'create-client-token' and 'introspect-token' (#62883 ) Right now, developing SAMS clients involves raw cURL commands (see [operator cheat sheet](https://sourcegraph.notion.site/Sourcegraph-Accounts-infrastructure-operations-b90a571da30443a8b1e7c31ade3594fb)) (which is fine), but other steps like "testing auth" require using [accounts-clients-example](https://github.com/sourcegraph/sourcegraph-accounts/tree/main/cmd/accounts-client-example), which isn't very well documented and requires a bit of hand-wringing to get to and start using. We previously talked about making a SAMS-specific CLI, but IMO that's a pretty big pain point if we want SAMS integration adoption when everything else lives in `sg`, and all the nice tooling lives here as well. This PR migrates the next steps after using cURL to set up clients (`create-client-token` and `introspect-token`) from [accounts-clients-example](https://github.com/sourcegraph/sourcegraph-accounts/tree/main/cmd/accounts-client-example) to a new `sg sams` toolchain for better DX (docs, completions, flags) ## Test plan ```sh export SG_SAMS_CLIENT_ID="..." export SG_SAMS_CLIENT_SECRET="..." sg sams create-client-token -s 'enterprise_portal::codyaccess::read' ``` --------- Co-authored-by: Joe Chen <joe@sourcegraph.com>	2024-05-28 21:08:42 +00:00
William Bezuidenhout	462ba3de0b	fix(sg): fix error condition for cloud eph deployment that already exists (#62947 ) fix error conditon for deploy already existing	2024-05-28 15:25:13 +00:00
William Bezuidenhout	acf051ad66	feat(local): add cloud ephemeral dashboard command (#62945 ) * sg: add cloud ephemeral dashboard command `sg cloud ephemeral dashboard` will open the ops dashboard `sg cloud ephemeral ops --name <instance>` will open the ops page for the given instance	2024-05-28 16:12:14 +02:00
Jean-Hadrien Chabran	21b2918ef2	chore(local): catch bazel-do issues before push (#62943 )	2024-05-28 15:16:13 +02:00
Will Dollman	d1b71a0a8a	bazel: Cleanup oci_deps.bzl (#62769 ) * security: Update dind base image to patch multiple CVEs Patches CVE-2023-45288 CVE-2024-2511 CVE-2024-32002 CVE-2024-32004 CVE-2024-32020 CVE-2024-32021 CVE-2024-32465 * ci: Tweak automated security update PR title * Remove unused image hashes from oci_deps * Tweak oci_deps comment * Fixup old @wolfi_base references * Add wolfi_base load * use the correct base image * Remove unneeded wolfi_base call	2024-05-28 10:00:31 +01:00
Noah S-C	7009f1dfe4	bazel: add utility macro for wrapping single-file tools (#62930 ) Currently, we provide single-file tools such as `ctags`, `gsutil` etc via an `sh_binary` wrapper, to have a single target to reference that automatically does platform selection of the underlying tool. Due to some [unfortunate reason](https://github.com/bazelbuild/bazel/issues/11820), the underlying srcs (which is [a single file](https://bazel.build/reference/be/shell#sh_binary.srcs)) of an `sh_binary` are also exposed as outputs (rather than just as typical runfiles) alongside the script that wraps. This is _sometimes_ problematic when doing location expansion (e.g. `$(location ...)`) due to these only allowing a single output (dont ask why this works in some contexts but not others, I dont know). To address this, we create a wrapper macro + rule to replicate what we want from `sh_binary` (automatic platform selection + tool naming), while only exposing a singular file. See example of currently required approach to consuming a tool: [BUILD.bazel](https://github.com/sourcegraph/sourcegraph/pull/62801/files#diff-e2a562c2e13908933b2ee24f0ac596829b38a5325cc69a4aee05c383aaa2e494R8) & [main_test.go](https://github.com/sourcegraph/sourcegraph/pull/62801/files#diff-7a91cb5143064bfc8993ef97baf68b718ef49747ccc1d3c5e1150d4696b88305R66). With this change, `rlocationpath` (singular) can be used instead (or any of the other singular nouns in different contexts), as well as no `strings.Split/strings.Fields` being required ## Test plan `bazel cquery --output=files //dev/tools:dropdb` yields 1 vs 2 files. Also updated the rule behind `//internal/database:generate_schemas` due to the workaround in it for the fact that the underlying srcs was also exposed. The correctness is verified by running said target (locally + CI)	2024-05-27 16:53:51 +00:00
William Bezuidenhout	57824e6374	sg: cloud ephemeral - handle multiple job reasons (#62929 ) * sg: cloud ephemeral handle multiple job reasons * update cloud printers to show overall job status * nogo	2024-05-27 18:50:40 +02:00
Jean-Hadrien Chabran	75bd631412	fix(local): panic in sg ci preview (#62928 )	2024-05-27 15:06:25 +02:00
James McNamara	69b1bfb4d0	feat(ci): docker-images runtype (#62708 ) --------- Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>	2024-05-27 14:45:01 +02:00
William Bezuidenhout	1cc764ebb8	chore(sg): cloud ephemeral - account for conclusion field (#62925 )	2024-05-27 13:22:14 +02:00
James Cotter	cb71a2d529	sg/msp: support for super-simple alerts on custom metrics (#62885 ) --------- Co-authored-by: Joe Chen <joe@sourcegraph.com> Co-authored-by: Robert Lin <robert@bobheadxi.dev>	2024-05-24 20:47:19 +01:00
James Cotter	d4a6b27403	sg/msp: fix init prompts breaking when encountering whitespace (#62898 )	2024-05-24 15:08:35 +01:00
Joe Chen	2589fef13e	lib/background: upgrade `Routine` interface with context and errors (#62136 ) This PR is a result/followup of the improvements we've made in the [SAMS repo](https://github.com/sourcegraph/sourcegraph-accounts/pull/199) that allows call sites to pass down a context (primarily to indicate deadline, and of course, cancellation if desired) and collects the error returned from `background.Routine`s `Stop` method. Note that I did not adopt returning error from `Stop` method because I realize in monorepo, the more common (and arguably the desired) pattern is to hang on the call of `Start` method until `Stop` is called, so it is meaningless to collect errors from `Start` methods as return values anyway, and doing that would also complicate the design and semantics more than necessary. All usages of the the `background.Routine` and `background.CombinedRoutines` are updated, I DID NOT try to interpret the code logic and make anything better other than fixing compile and test errors. The only file that contains the core change is the [`lib/background/background.go`](https://github.com/sourcegraph/sourcegraph/pull/62136/files#diff-65c3228388620e91f8c22d91c18faac3f985fc67d64b08612df18fa7c04fafcd).	2024-05-24 10:04:55 -04:00
William Bezuidenhout	d485d76ee9	sg: cloud use new status reason format (#62881 )	2024-05-23 21:04:10 +02:00
Jean-Hadrien Chabran	7c15db348d	chore(rel): fix tests not waiting for push prod (#62089 )	2024-05-23 16:15:33 +02:00
William Bezuidenhout	0732f33c2d	sg: cloud eph - api now requires env during list (#62875 )	2024-05-23 10:04:53 +00:00
William Bezuidenhout	4f72c222bf	sg: remove debugging printlns (#62854 ) remove debugging printlns	2024-05-23 09:46:32 +00:00

1 2 3 4 5 ...

3595 Commits