Commit Graph

3624 Commits

Author SHA1 Message Date
Quinn Slack
91bc23d8e1
support fast, simple sg start single-program-experimental-blame-sqs for local dev (#63435)
This makes it easier to run Sourcegraph in local dev by compiling a few
key services (frontend, searcher, repo-updater, gitserver, and worker)
into a single Go binary and running that.

Compared to `sg start` (which compiles and runs ~10 services), it's
faster to start up (by ~10% or a few seconds), takes a lot less memory
and CPU when running, has less log noise, and rebuilds faster. It is
slower to recompile for changes just to `frontend` because it needs to
link in more code on each recompile, but it's faster for most other Go
changes that require recompilation of multiple services.

This is only intended for local dev as a convenience. There may be
different behavior in this mode that could result in problems when your
code runs in the normal deployment. Usually our e2e tests should catch
this, but to be safe, you should run in the usual mode if you are making
sensitive cross-service changes.

Partially reverts "svcmain: Simplify service setup (#61903)" (commit
9541032292).


## Test plan

Existing tests cover any regressions to existing behavior. This new
behavior is for local dev only.
2024-06-24 21:12:47 +00:00
Craig Furman
b47c376cbe
fix(appliance): source versions from release registry (#63387)
Rather than hardcoding a few. Present the user with versions up to 2
minor revisions back from the version of the appliance itself, which
should be in lock-step with the rest of the monorepo.


Closes
https://linear.app/sourcegraph/issue/REL-199/populate-accurate-list-of-versions-to-install
2024-06-24 09:48:50 +00:00
Robert Lin
cb3a1e4dc8
feat/sg: add 'sg enterprise' commands for Cody Analytics (#63414)
Closes CORE-194 - added a bit more than strictly needed here, but this
PR adds:

- `sg enterprise subscription list`
- `sg enterprise subscription set-instance-domain`
- `sg enterprise update-membership`
- `sg enterprise license list`

## Test plan

<img width="1055" alt="image"
src="https://github.com/sourcegraph/sourcegraph/assets/23356519/48ec40b0-fbac-4513-9ad8-fc3174774ada">


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/806fd054-806b-4ecb-a969-32900112f368)
2024-06-21 16:29:31 -07:00
Noah S-C
7a9d2b02e4
chore(ci): emit compact executon log in CI (#63420)
Second attempt at https://github.com/sourcegraph/sourcegraph/pull/61760,
we can start using these to dig into action cache misses etc

## Test plan

CI passes green


## Changelog
2024-06-21 19:50:35 +01:00
Craig Furman
4641bc5023
chore(sg): extract releaseregistry client package (#63382)
In preparation for reuse elsewhere.
2024-06-21 10:34:52 +01:00
Robert Lin
78dcd57221
fix/sg: fix mangled log output from sg start and sg run (#63405)
Right now `sg run` / `sg start` can horribly mangle multi-line output. A
nicely annotated report from @unknwon:


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/38acbaf9-89dc-4d4b-9fd7-b601f5654240)

Replacing the "buffered process logger" thing with
https://github.com/bobheadxi/streamline which powers `sourcegraph/run`
etc (fairly reliably if I do say so myself) fixes this for a few cases
where I can reliably repro wonky misordered output 😁

## Test plan

`sg start dotcom` with `sg.config.overwrite.yaml`:

```yaml
commands:
  enterprise-portal:
    env:
      SRC_LOG_LEVEL: debug
      PG_QUERY_LOGGING: true
```

Log scope `pgx.devtracer` is consistently formatted  , even with high
volume of logs


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/5c46f94f-e388-477a-94d3-151d5a3c7468)

Also don't see anything suspicious happening after running for a while
2024-06-20 16:07:27 -07:00
Will Dollman
e24226a764
Publish images from patch release branches (#63379)
We currently don't publish images from the new-style patch release
branches like `5.4.5099`, as this is all performed using the new release
tooling.

In order to improve the release process, we (Security) would like to run
a daily scan of the current set of images built from the patch release
branch. Currently we only scan images built from `main`, but these
slowly diverge from the patch release branch in the 2 week window
between a monthly release and the patch release.

To give a specific example, we currently have no easy/automated way to
scan images from the `5.4.5099` branch that a release will be cut from
this afternoon until the release team run the internal release process.

This PR updates the pipeline so that whenever a new commit is pushed to
the patch release branch, it will publish a new set of images and
include the tag `<branch>-insiders`. Currently just pushing to
us.gcr.io, but equally could push to dockerhub.

Example of the jobfile for a matching branch after this PR:

`bazel --bazelrc=/tmp/aspect-generated.bazelrc
--bazelrc=.aspect/bazelrc/ci.sourcegraph.bazelrc run
//cmd/batcheshelper:candidate_push --stamp
--workspace_status_command=./dev/bazel_stamp_vars.sh -- --tag
dc438648b0 --tag dc438648b0cc_2024-06-20 --tag dc438648b0cc_279230
--tag will/5.4.9999-insiders --repository
us.gcr.io/sourcegraph-dev/batcheshelper && echo -e
'<tr><td>batcheshelper</td><td><code>us.gcr.io/sourcegraph-dev</code></td><td><code>dc438648b0cc</code>,
<code>dc438648b0cc_2024-06-20</code>, <code>dc438648b0cc_279230</code>,
<code>will/5.4.9999-insiders</code></td></tr>'
>>./annotations/pushed_images.md`

[Example buildkite
run](https://buildkite.com/sourcegraph/sourcegraph/builds/279230#_)
where the pattern was updated to match this branch, and pushing
non-candidate images was disabled.

This resolves one part of
[SEC-1734](https://linear.app/sourcegraph/issue/SEC-1734/scan-images-from-patch-release-branches)

<!-- 💡 To write a useful PR description, make sure that your description
covers:
- WHAT this PR is changing:
    - How was it PREVIOUSLY.
    - How it will be from NOW on.
- WHY this PR is needed.
- CONTEXT, i.e. to which initiative, project or RFC it belongs.

The structure of the description doesn't matter as much as covering
these points, so use
your best judgement based on your context.
Learn how to write good pull request description:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e?pvs=4
-->


## Test plan

- Manual testing of buildkite pipeline

<!-- All pull requests REQUIRE a test plan:
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->


## Changelog

<!--
1. Ensure your pull request title is formatted as: $type($domain): $what
2. Add bullet list items for each additional detail you want to cover
(see example below)
3. You can edit this after the pull request was merged, as long as
release shipping it hasn't been promoted to the public.
4. For more information, please see this how-to
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c?

Audience: TS/CSE > Customers > Teammates (in that order).

Cheat sheet: $type = chore|fix|feat $domain:
source|search|ci|release|plg|cody|local|...
-->

<!--
Example:

Title: fix(search): parse quotes with the appropriate context
Changelog section:

## Changelog

- When a quote is used with regexp pattern type, then ...
- Refactored underlying code.
-->
2024-06-20 15:46:37 +01:00
Robert Lin
2958abc326
fix/msp/postgresqlroles: wait for databases to be provisioned (#63362)
Wait for databases to be provisioned before granting database-specific
roles to the operator access user.

## Test plan

Re-apply fixed
https://sourcegraph.slack.com/archives/C05E2LHPQLX/p1718850688397579,
indicating a race condition on database creation. Diff looks good:

```diff
@@ -1447,10 +1472,15 @@
             "path": "cloudrun/cloudrun-postgresqlroles-msp_iam-operator_access_service_account_table_grant",
             "uniqueId": "cloudrun-postgresqlroles-msp_iam-operator_access_service_account_table_grant"
           }
         },
         "database": "msp_iam",
+        "depends_on": [
+          "google_sql_database.postgresql-database-enterprise-portal",
+          "google_sql_database.postgresql-database-enterprise_portal",
+          "google_sql_database.postgresql-database-msp_iam"
+        ],
         "object_type": "table",
         "objects": [
         ],
         "privileges": [
           "SELECT"
```

## Changelog

- MSP Cloud SQL: Fix race condition between database creation and role
grants for the read-only operator access user
2024-06-20 07:43:14 -07:00
Keegan Carruthers-Smith
d42a99b5a3
nix: use go1.22.4 (#63372)
Tired of seeing the go toolchain being easier to use than nix.

Test Plan: nix develop on linux amd64 and macbook arm64 followed by
running "go test ./internal/search" working. Also confirming that "go
env GOROOT" points into the nix store.
2024-06-20 11:12:17 +02:00
Joe Chen
b717fd518a
enterprise-portal: implement basic MSP IAM and RPCs (#63173)
Closes CORE-99, closes CORE-176

This PR is based off (and was also served as PoC of) [RFC 962: MSP IAM
framework](https://docs.google.com/document/d/1ItJlQnpR5AHbrfAholZqjH8-8dPF1iQcKh99gE6SSjs/edit).
It comes with two main parts:

1. The initial version of the MSP IAM SDK:
`lib/managedservicesplatform/iam`
- Embeds the [OpenFGA server
implementation](https://github.com/openfga/openfga/tree/main/pkg/server)
and exposes the a `ClientV1` for interacting with it.
- Automagically manages the both MSP IAM's and OpenFGA's database
migrations upon initializing the `ClientV1`.
![CleanShot 2024-06-18 at 15 09
24@2x](https://github.com/sourcegraph/sourcegraph/assets/2946214/387e0e28-a6c2-4664-b946-0ea4a1dd0804)
- Ensures the specified OpenFGA's store and automatization model DSL
exists.
- Utility types and helpers to avoid easy mistakes (i.e. make the
relation tuples a bit more strongly-typed).
- Decided to put all types and pre-defined values together to simulate a
"central registry" and acting as a forcing function for services to form
some sort of convention. Then when we migrate the OpenFGA server to a
separate standalone service, it will be less headache about
consolidating similar meaning types/relations but different string
literals.
1. The first use case of the MSP IAM:
`cmd/enterprise-portal/internal/subscriptionsservice`
	- Added/updated RPCs:
		- Listing enterprise subscriptions via permissions
		- Update enterprise subscriptions to assign instance domains
- Update enterprise subscriptions membership to assign roles (and
permissions)
- A database table for enterprise subscriptions, only storing the extra
instance domains as Enterprise Portal is not the
writeable-source-of-truth.

## Other minor changes

- Moved `internal/redislock` to `lib/redislock` to be used in MSP IAM
SDK.
- Call `createdb ...` as part of `enterprise-portal` install script in
`sg.config.yaml` (`msp_iam` database is a hard requirement of MSP IAM
framework).

## Test plan

Tested with gRPC UI:

- `UpdateEnterpriseSubscription` to assign an instance domain
- `UpdateEnterpriseSubscriptionMembership` to assign roles
- `ListEnterpriseSubscriptions`:
	- List by subscription ID
	- List by instance domain
	- List by view cody analytics permissions

---------

Co-authored-by: Robert Lin <robert@bobheadxi.dev>
2024-06-19 21:46:48 -04:00
Noah S-C
d237975918
chore(ci): instrument push_all.sh commands in honeycomb (#63350)
So I can measure the impact of changes on the individual `bazel run`
invocations

## Test plan

main dry-run and seeing the output
https://ui.honeycomb.io/sourcegraph/datasets/buildkite-pushall/result/bCLzgquaSdV?hideCompare

## Changelog
2024-06-19 18:16:21 +01:00
Camden Cheek
db7a268c34
Chore: remove search console (#63322)
The search console page is broken, is not used or maintained, and is
only referenced by a series of blog posts years ago. We have product
support to remove it.
2024-06-19 11:05:03 -06:00
Jean-Hadrien Chabran
b3b7936ffa
chore(local): simplify 'sg db' inline help (#63344)
Follow-up to https://github.com/sourcegraph/sourcegraph/pull/63320 as I
noticed that the `UsageText` didn't include `sg db default-site-admin`.
Additionally, it was quite verbose without providing much info, so I
just dropped it in favour of highlighting notable commands.
2024-06-19 14:56:13 +00:00
Noah S-C
a5a6a0dd23
feat(sg): command to add default site-admin with predefined access token (#63320)
Adds a subcommand to `sg db` called `default-site-admin` that creates a
site-admin user with user:pass `sourcegraph:sourcegraph` and a
predefined hard-coded token
`sgp_local_f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0`


## Test plan

`go run ./dev/sg -- db default-site-admin` with clean database
`" "` after having run that (when everything should be set)
`" "` when user exists but token doesnt

## Changelog
2024-06-19 15:02:55 +01:00
Varun Gandhi
3437f8253d
chore: Centralize languages package as source-of-truth (#63292)
This patch does a few things:

- Adds `go-enry` packages to depguard, so that people do not
  accidentally use enry APIs instead of the corresponding APIs
  in the `languages` package.
- Adds more tests for different functions in the languages package
  to ensure mutual consistency in how language<->extension mappings
  are handled.
- Adds tests for enry upgrades
- Adds comments with IDs so that related parts in the code can be
   pieced together easily
2024-06-18 13:10:24 +00:00
Noah S-C
8412e6b45d
chore(ci): remove buildchecker sunday summary posts (#63289)
https://linear.app/sourcegraph/issue/DINF-36/kill-automated-ci-report

## Test plan

Still compiles, meaning at best everything unused is gone, at worst
we've got unused stuff left over but nothing necessary still remaining


## Changelog
2024-06-17 13:05:39 +00:00
Vincent
add4baa455
chore(security): update dependencies (#63197)
This PR upgrades a bunch of Golang dependencies that have known security
issues.

## Test plan
CI tests, ran `sg start`.
2024-06-11 16:14:24 +01:00
William Bezuidenhout
9b37349981
sg: cloud eph - set max deployment name (#63202)
Encountered this error while doing my demo
```
{"SeverityText":"ERROR","Timestamp":1718110348252114099,"InstrumentationScope":"mi2.instance.create","Caller":"mi2/instance.go:478","Function":"main.glob..func26","Body":"new instance validation failed: slug (displayName) must be between 4 to 30 characters. Allowed characters are: lowercase letters, numbers, hyphen. Current: christoph-resolve-syntactic-symbol-at-request-range","Resource":{"service.name":"mi2","service.version":"2024-06-11-09-50-
```
So now we limit it to 30 chars and print a notice to inform the user
that it has been truncated

## Test plan
Tested locally
```
go run ./dev/sg cloud eph deploy --name 'christoph-resolve-syntactic-symbol-at-request-range_277899_2024-06-11_5.4-f04d3b973a19' --version 'christoph-resolve-syntactic-symbol-at-request-range_277899_2024-06-11_5.4-f04d3b973a19'
 Version "christoph-resolve-syntactic-symbol-at-request-range_277899_2024-06-11_5.4-f04d3b973a19" found in Cloud ephemeral registry
👉 Your deployment name has been truncated to be "christoph-resolve-syntactic-sy"
```

## Changelog
- sg - set a max length for cloud ephemeral deployment names
2024-06-11 16:12:17 +02:00
William Bezuidenhout
4f910fb360
sg: cloud eph - improve missing tag/version message (#63195)
Previous message didn't give you steps to get the version added. New
message directs you to discuss-dev-infra slack which will ultimately run
https://buildkite.com/sourcegraph/cloud-ephemeral-images


## Test plan
CI and tested locally 
```
go run ./dev/sg cloud eph deploy --version 1.1.1
⚠️ Whoops! Version "1.1.1" seems to be missing from the Cloud ephemeral registry. Please ask in #discuss-dev-infra to get the it added to the registry
 tag/version not in Cloud Ephemeral registry
exit status 1
```


## Changelog
* sg - improve messaging when an image is missing from Cloud ephemeral
registry
2024-06-11 10:53:52 +02:00
James Cotter
1712928bc5
msp/deploy: encode commit_message as base64 (#63165)
Encodes the commit_message as base64 to avoid issues with special
characters breaking the deploy command

Part of CORE-172

## Test Plan
CI

[_Created by Sourcegraph batch change
`jac/msp-rollout-base64`._](https://sourcegraph.sourcegraph.com/users/jac/batch-changes/msp-rollout-base64)
2024-06-07 23:31:42 +01:00
Robert Lin
7e9d8ec8dc
feat/cody-gateway: use Enterprise Portal for actor/productsubscriptions (#62934)
Migrates Cody Gateway to use the new Enterprise Portal's "read-only"
APIs. For the most part, this is an in-place replacement - a lot of the
diff is in testing and minor changes. Some changes, such as the removal
of model allowlists, were made down the PR stack in
https://github.com/sourcegraph/sourcegraph/pull/62911.

At a high level, we replace the data requested by
`cmd/cody-gateway/internal/dotcom/operations.graphql` and replace it
with Enterprise Portal RPCs:

- `codyaccessv1.GetCodyGatewayAccess`
- `codyaccessv1.ListCodyGatewayAccesses`

Use cases that previously required retrieving the active license tags
now:

1. Use the display name provided by the Cody Access API
https://github.com/sourcegraph/sourcegraph/pull/62968
2. Depend on the connected Enterprise Portal dev instance to only return
dev subscriptions https://github.com/sourcegraph/sourcegraph/pull/62966

Closes https://linear.app/sourcegraph/issue/CORE-98
Related to https://linear.app/sourcegraph/issue/CORE-135
(https://github.com/sourcegraph/sourcegraph/pull/62909,
https://github.com/sourcegraph/sourcegraph/pull/62911)
Related to https://linear.app/sourcegraph/issue/CORE-97

## Local development

This change also adds Enterprise Portal to `sg start dotcom`. For local
development, we set up Cody Gateway to connect to Enterprise Portal such
that zero configuration is needed - all the required secrets are sourced
from the `sourcegrah-local-dev` GCP project automatically when you run
`sg start dotcom`, and local Cody Gateway will talk to local Enterprise
Portal to do the Enterprise subscriptions sync.

This is actually an upgrade from the current experience where you need
to provide Cody Gateway a Sourcegraph user access token to test
Enterprise locally, though the Sourcegraph user access token is still
required for the PLG actor source.

The credential is configured in
https://console.cloud.google.com/security/secret-manager/secret/SG_LOCAL_DEV_SAMS_CLIENT_SECRET/overview?project=sourcegraph-local-dev,
and I've included documentation in the secret annotation about what it
is for and what to do with it:


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/c61ad4e0-3b75-408d-a930-076a414336fb)

## Rollout plan

I will open PRs to set up the necessary configuration for Cody Gateway
dev and prod. Once reviews taper down I'll cut an image from this branch
and deploy it to Cody Gateway dev, and monitor it closely + do some
manual testing. Once verified, I'll land this change and monitor a
rollout to production.

Cody Gateway dev SAMS client:
https://github.com/sourcegraph/infrastructure/pull/6108
Cody Gateway prod SAMS client update (this one already exists):

```
accounts=> UPDATE idp_clients
SET scopes = scopes || '["enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]'::jsonb
WHERE id = 'sams_cid_018ea062-479e-7342-9473-66645e616cbf';
UPDATE 1
accounts=> select name, scopes from idp_clients WHERE name = 'Cody Gateway (prod)';
        name         |                                                              scopes                                                              
---------------------+----------------------------------------------------------------------------------------------------------------------------------
 Cody Gateway (prod) | ["openid", "profile", "email", "offline_access", "enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]
(1 row)
```

Configuring the target Enterprise Portal instances:
https://github.com/sourcegraph/infrastructure/pull/6127

## Test plan

Start the new `dotcom` runset, now including Enterprise Portal, and
observe logs from both `enterprise-portal` and `cody-gateway`:

```
sg start dotcom
```

I reused the test plan from
https://github.com/sourcegraph/sourcegraph/pull/62911: set up Cody
Gateway external dependency secrets, then set up an enterprise
subscription + license with a high seat count (for a high quota), and
force a Cody Gateway sync:

```
curl -v -H 'Authorization: bearer sekret' http://localhost:9992/-/actor/sync-all-sources
```

This should indicate the new sync against "local dotcom" fetches the
correct number of actors and whatnot.

Using the local enterprise subscription's access token, we run the QA
test suite:

```sh
$ bazel test --runs_per_test=2 --test_output=all //cmd/cody-gateway/qa:qa_test --test_env=E2E_GATEWAY_ENDPOINT=http://localhost:9992 --test_env=E2E_GATEWAY_TOKEN=$TOKEN
INFO: Analyzed target //cmd/cody-gateway/qa:qa_test (0 packages loaded, 0 targets configured).
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 1 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 1 of 2):
PASS
================================================================================
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 2 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 2 of 2):
PASS
================================================================================
INFO: Found 1 test target...
Target //cmd/cody-gateway/qa:qa_test up-to-date:
  bazel-bin/cmd/cody-gateway/qa/qa_test_/qa_test
Aspect @@rules_rust//rust/private:clippy.bzl%rust_clippy_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
Aspect @@rules_rust//rust/private:rustfmt.bzl%rustfmt_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
INFO: Elapsed time: 13.653s, Critical Path: 13.38s
INFO: 7 processes: 1 internal, 6 darwin-sandbox.
INFO: Build completed successfully, 7 total actions
//cmd/cody-gateway/qa:qa_test                                            PASSED in 11.7s
  Stats over 2 runs: max = 11.7s, min = 11.7s, avg = 11.7s, dev = 0.0s

Executed 1 out of 1 test: 1 test passes.
```
2024-06-07 11:46:01 -07:00
William Bezuidenhout
8bb0ab54eb
release: never use build number in image family (#63157)
the executor image and docker mirror image should now follow the
following naming convention:

Image family:
`sourcegraph-executors-[nightly|internal|'']-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-[nightly|internal|'']-<MAJOR>-<MINOR>-<BUILD_NUMBER>`

example:
Image family: `sourcegraph-executors-5-4`
Image name: `sourcegraph-executor-5-4-277666`

## What happens during releases and _not_ releases?
#### Nightly
**`nightly` suffix**
Image family: `sourcegraph-executors-nightly-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-nightly-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
#### Internal
**`internal` suffix**
Image family: `sourcegraph-executors-internal-<MAJOR>-<MINOR>`
Image name:
`sourcegraph-executor-internal-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
#### Public / Promote to public

** No suffix **

Image family: `sourcegraph-executors-<MAJOR>-<MINOR>`
Image name: `sourcegraph-executor-<MAJOR>-<MINOR>-<BUILD_NUMBER>`

>  [!IMPORTANT]
> Should we keep the imagine name stable at
`sourcegraph-executor-<MAJOR>-<MINOR>-<BUILD_NUMBER>`
> and only change the family name? 
>
> **Why?**
>
> The Image family dictates the collection of images and that changes
each major minor and or release phase so there is really no use in
changing the image name too, except at a glance you can see from the
name what image family it belongs to?
## Test plan

<!-- All pull requests REQUIRE a test plan:
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->


## Changelog

<!--
1. Ensure your pull request title is formatted as: $type($domain): $what
2. Add bullet list items for each additional detail you want to cover
(see example below)
3. You can edit this after the pull request was merged, as long as
release shipping it hasn't been promoted to the public.
4. For more information, please see this how-to
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c?

Audience: TS/CSE > Customers > Teammates (in that order).

Cheat sheet: $type = chore|fix|feat $domain:
source|search|ci|release|plg|cody|local|...
-->

<!--
Example:

Title: fix(search): parse quotes with the appropriate context
Changelog section:

## Changelog

- When a quote is used with regexp pattern type, then ...
- Refactored underlying code.
-->
2024-06-07 17:23:24 +02:00
Erik Seliger
1287243cae
gitserver: Framework to support integration testing against gitserver (#62801)
This PR tinkers a bit with building a test helper to run integration
tests that are still ~lightweight against a real gitserver.
The caller can either clone a real repo to disk / embed it in the git
repo, or can create a small repo on the fly, and then get a running
gitserver gRPC server that returns all the data required.

These tests should only exist outside of cmd/ and internal/, as there is
a big potential to do cross-cmd imports from here, which can cause bad
coupling. But for just these tests, that should be fine.

The most trivial rockskip indexing job that I put in here to POC this
runs in 6.3s, including all setup and teardown. That seems very
reasonable to me.

Test plan:

The POC test passes.
2024-06-07 17:01:12 +02:00
William Bezuidenhout
f7271701d5
fix(sg): fix cloud eph suggested commands (#63093)
- suggested commands didn't have the `ephemeral` subcommand
- update expiry time to show in duration till expiry

## Test plan
Tested locally + CI

## Changelog
- fix suggested cloud ephemeral commands
- show duration till expiry for cloud ephemeral
2024-06-07 12:50:43 +02:00
Jan Hartman
aa615bc37f
feat(sg): add command to generate a dotcom user gateway access token (#63125)
We can now generate gateway access tokens from sg instead of having to
manually wrangle a script to do it every time. This will help with
making Cody Gateway easier to run locally.

## Test plan
Tested locally.

---------

Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
2024-06-07 10:40:51 +00:00
Greg Magolan
27da7890fc
chore(bazel): fixup custom eslint test rule after bump to rules_js 2 (#63143)
This regressed in https://github.com/sourcegraph/sourcegraph/pull/63022
where the custom `gather_files_from_js_providers` function that was
copied over from rules_js 1.x was including runfiles.

Resolves eslint failures seen in
https://buildkite.com/sourcegraph/sourcegraph/builds/277072#018fe743-abac-44d8-911b-d5a7ed425413
and observed locally:

```
(07:19:15) INFO: From ESLint client/wildcard/wildcard_lib_eslint-output.txt:
  |  
  | Oops! Something went wrong! :(
  |  
  | ESLint: 8.57.0
  |  
  | ESLint couldn't find a configuration file. To set up a configuration file for this project, please run:
  |  
  | npm init @eslint/config
  |  
  | ESLint looked for configuration files in /tmp/bazel-working-directory/__main__/bazel-out/k8-fastbuild/bin/client/wildcard/src/components/Alert and its ancestors. If it found none, it then looked in your home directory.
  |  
  | If you think you already have a configuration file or if you need more help, please stop by the ESLint Discord server: https://eslint.org/chat

```

## Test plan

CI (check test logs)

## Changelog
2024-06-06 23:32:34 +01:00
Robert Lin
1aeb9c93f1
chore/msp: document gRPC notes in spec docstrings (#63140)
Lessons learned from
https://sourcegraph.slack.com/archives/C05E2LHPQLX/p1717703306405529

## Test plan

n/a
2024-06-06 14:20:50 -07:00
James McNamara
4077b3ec22
feat(ci): Adds playwright tests for sveltekit to bazel (#62560)
This runs playwright tests with bazel. This changes how the
app is served in the tests, specifically playwright will intercept all
network calls to the local server and serve the static assets directly
or serve root index.html file if nothing is matched.

---------

Co-authored-by: bahrmichael <michael.bahr@sourcegraph.com>
Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
Co-authored-by: Michael Bahr <1830132+bahrmichael@users.noreply.github.com>
Co-authored-by: Jean-Hadrien Chabran <jean-hadrien.chabran@sourcegraph.com>
Co-authored-by: Camden Cheek <camden@ccheek.com>
2024-06-06 12:45:05 -06:00
William Bezuidenhout
c458e14b9f
sg: deny cloud ephemeral builds from main (#63127)
Triggering cloud ephemeral builds on main is problematic since there are
a bunch of specific conditions that get triggered because the branch is
`main`


## Test plan
Tested locally
`sg cloud eph build`
```
./sg cloud eph build
⚠️ Triggering Cloud Ephemeral builds from "main" is not supported.

  Alternatively, if you still want to deploy "main" you can do:

  1. create a new branch off main by running  git switch <branch-name>
  2. push the branch to the remote by running  git push -u origin <branch-name>
  3. trigger the build by running  sg cloud ephemeral build


 failed to trigger epehemeral build for branch: cannot trigger a Cloud Ephemeral build for main branch
```
`./sg cloud eph deploy`
```
⚠️ Triggering Cloud Ephemeral builds from "main" is not supported.

  Alternatively, if you still want to deploy "main" you can do:

  1. create a new branch off main by running  git switch <branch-name>
  2. push the branch to the remote by running  git push -u origin <branch-name>
  3. trigger the build by running  sg cloud ephemeral build


 cloud ephemeral deployment failure: cannot trigger a Cloud Ephemeral build for main branch

```

## Changelog
- deny cloud ephemeral deployments triggered from 'main'
2024-06-06 15:27:16 +00:00
Noah S-C
bb178ba729
chore(tooling): bump Go version to 1.22.4 (#63124)
Bump for @evict 

## Test plan

CI passes with no complaints

## Changelog

- Bumped version of Go used to build to 1.22.4
2024-06-06 15:19:03 +00:00
Varun Gandhi
2955bb6cfb
chore: Change errors.HasType to respect multi-errors (#63024)
With this patch, the `errors.HasType` API behaves similar to `Is` and `As`,
where it checks the full error tree instead of just checking a linearized version
of it, as cockroachdb/errors's `HasType` implementation does not respect
multi-errors.

As a consequence, a bunch of relationships between HasType and Is/As that
you'd intuitively expect to hold are now true; see changes to `invariants_test.go`.
2024-06-06 13:02:14 +00:00
Robert Lin
6302955caf
feat/sg-msp-pg: add suggestion to check msp-ops page on perms error (#63118)
I think finding the right permissions confuses people pretty often when
first interacting with MSP. This adds a helper for annotating errors
returned from points where we might be able to help out @DaedalusG,
specifically for the situation in
https://sourcegraph.slack.com/archives/C05GJPTSZCZ/p1717629546727829 😉

## Test plan

It's a little wordy but:

```
sg msp pg connect sams prod
 possible permissions error, ensure you have the prerequisite Entitle grants mentioned in https://sourcegraph.notion.site/3e59b9ac3d414a5f8fb5911eed1e418a: find IAM output: gcloud: failed to access secret "iam_operator_access_service_account" from "sams-prod-ywuz": rpc error: code = PermissionDenied desc = Permission 'secretmanager.versions.access' denied for resource 'projects/sams-prod-ywuz/secrets/iam_operator_access_service_account/versions/latest' (or it may not exist).
```

## Changelog

- `sg msp pg connect` will tell you about your service's generated
Notion page if you run into a permissions-looking error during command
setup, where there is guidance about the required Entitle requests.
2024-06-05 18:55:59 -07:00
James Cotter
bcc4367f86
msp/deploy: add 'author' and 'commit_message' annotations (#63108)
Add 'author' and 'commit_message' annotations on release

## Test plan
CI
2024-06-05 11:43:02 -07:00
Robert Lin
27211dea73
feat/msp: update handbook link in alerts dashboard, sort custom alerts first (#63089)
1. The dashboard link still points to the old `go/msp-ops/...` which no
longer work (CORE-105)
2. Alerts defined on top of the MSP defaults are probably of more
interest, so let's sort these in front of the others

## Test plan

Unit/golden tests
2024-06-05 09:09:22 -07:00
Noah S-C
4a93f29755
chore(bazel): enable rules_esbuild sandbox with object-inspect workaround (#61969)
Sandbox escapes be-gone

## Test plan

Tested in CI and locally with `bazel build //client/...` as well as a
lot of blood, sweat n tears tearing through failed sandboxes

## Changelog
2024-06-05 15:34:29 +01:00
William Bezuidenhout
605b2305eb
chore(sg): move registry list cmd to release list (#63094)
Follow up from https://github.com/sourcegraph/sourcegraph/pull/63079

## Test plan
Tested locally

## Changelog
2024-06-05 10:25:38 +02:00
William Bezuidenhout
e4eec6668a
feat(sg): respect the context when executing interrupt hooks (#63069)
During testing I found that sometimes some hooks would just hang and not
complete. In this PR we execute all hooks within a timeout context.
Ensuring we give _some_ time for hooks to execute but also making sure
we eventually exit if some hook is misbehaving.

Additional changes:
- Global timeout for all hook execution is 2 seconds
- We hard exit after 5 intterupts instead of 2
- Hooks are split into two groups: sequential and concurrent. As per
their names the hooks are executed differently depending how they were
registered.


## Test plan
Tested locally

```
^C⚠️ Interrupt received, executing hook groups for graceful shutdown...
⚠️ Executing 16 'cleanup' hooks for graceful shutdown...
[   repo-updater] INFO repo-updater.repo-updater.grpcserver grpcserver/grpcserver.go:76 Shutting down gRPC server
[   repo-updater] INFO sync_worker workerutil/worker.go:252 Shutting down dequeue loop {"name": "repo_sync_worker", "reason": ""}
worker stopped due to context error: context canceled
gitserver-1 stopped due to context error: context canceled
searcher stopped due to context error: context canceled
gitserver-0 stopped due to context error: context canceled
blobstore stopped due to context error: context canceled
symbols stopped due to context error: context canceled
caddy stopped due to context error: context canceled
repo-updater stopped due to context error: context canceled
embeddings stopped due to context error: context canceled
frontend stopped due to context error: context canceled
zoekt-index-0 stopped due to context error: context canceled
syntax-highlighter stopped due to context error: context canceled
zoekt-web-1 stopped due to context error: context canceled
web stopped due to context error: context canceled
zoekt-web-0 stopped due to context error: context canceled
⚠️ Executing 6 'general' hooks for for graceful shutdown...
 failed to run zoekt-index-1.
stderr:
INFO server zoekt-sourcegraph-indexserver/main.go:1017 removing tmp dir {"tmpRoot": "/Users/william/.sourcegraph/zoekt/index-1/.indexserver.tmp"}
2024/06/04 09:15:03 updating index 6 github.com/sourcegraph/sourcegraph@HEAD=e55003da894490122546f876452f651aae65bb55 reason=content-mismatch
INFO server zoekt-sourcegraph-indexserver/main.go:432 updated index {"repo": "github.com/sourcegraph/sourcegraph", "id": 6, "branches": ["HEAD=e55003da894490122546f876452f651aae65bb55"], "duration": "19.21403925s"}
```


## Changelog
- Hard exit sg when 5 intterupt hooks are received
- Respect the context while executing interrupt hooks to ensure we still
exit if some hook is misbehaving
2024-06-05 10:06:58 +02:00
Robert Lin
a3fe573b59
fix/msp: flatten custom alert promQL query for GCP (#63084)
The GCP monitoring alert configuration expects, for some reason, a
single-line PromQL query only, otherwise the threshold doesn't work. In
configuration, however, we may want to write a multi-line query, for
ease of readability. This change automatically flattens the PromQL query
into a single line and strips extra spaces.

Part of CORE-161

## Test plan

Unit tests
2024-06-04 14:37:51 -07:00
William Bezuidenhout
8f3a9d5260
sg: add command to fetch versions from release registry (#63079)
added a commnad to list versions from the release registry
2024-06-04 17:42:47 +02:00
William Bezuidenhout
9bbfd25fc4
feat(sg: add list-build subcommand to ci (#63071)
* sg: add `list-build` subcommand to ci

Add command to list builds in various states on a pipeline

* bazel

remove trailing '...' from commit printing
2024-06-04 13:41:44 +02:00
Greg Magolan
2d3d918ffa
chore(bazel): upgrade to rules_js 2.0 RC (#63022)
Bumps to rules_js (and friends) to 2.0 RCs.

This brings in performance improvements for analysis phase since npm package depsets and now much smaller. It also adds support for pnpm v9 and allows for linking js_library targets as 1p deps instead of npm_package targets. See https://github.com/aspect-build/rules_js/issues/1671 for more details.

## Test plan

CI

## Changelog
2024-06-04 11:26:42 +00:00
William Bezuidenhout
1a7e1b9686
build-tracker: remove old links (#63065) 2024-06-04 12:03:58 +01:00
Robert Lin
908d7119ea
chore/msp: blindly retry Notion page deletion (#63052)
Deleting Notion pages takes a very long time, and is prone to breaking in the page deletion step, where we must delete blocks one at a time because Notion does not allow for bulk block deletions. The errors seem to generally just be random Notion internal errors. This is very bad because it leaves go/msp-ops pages in an unusable state.

To try and mitigate, we add several places to blindly retry:

1. At the Notion SDK level, where a config option is available for retrying 429 errors
2. At the "reset page" helper level, where a failure to reset a page will prompt a retry of the whole helper
3. At the "delete blocks" helper level, where individual block deletion failures will be retried

Attempt to mitigate https://linear.app/sourcegraph/issue/CORE-119

While here, I also made some other QOL tweaks:

- Fix timing of sub-tasks in CLI output
- Bump default concurrency to 5 (our retries will handle if this is too aggressive, hopefully)
- Fix a missing space in generated docs

## Test plan

```
sg msp ops generate-handbook-pages   
```
2024-06-03 22:32:06 +00:00
Joe Chen
dd8ff6013f
worker: add SAMS notifications subscriber (#63051)
Part of CORE-92

This PR add a new worker for subscribing to [SAMS notifications](https://www.notion.so/sourcegraph/SAMS-notifications-distribution-system-0d174480e0044b05b545d37d24263d5a). The current use case is to automatically (hard-)delete users on Sourcegraph.com when the corresponding user is deleted from SAMS. 

This worker is only started when running in the Sourcegraph.com mode and the credentials file (`service_account.json`) is provided, which has been configured since https://github.com/sourcegraph/deploy-sourcegraph-cloud/pull/18591.

Co-authored-by: Robert Lin <robert@bobheadxi.dev>
2024-06-03 18:01:19 -04:00
Robert Lin
617d2f766c
chore/msp/spec: tidy up custom alerts spec (#63050)
Follow-ups for #62885:

- Better docstrings for `mql`, `promql`
- `duration` -> `durationMinutes` to align with other config
- `alertpolicy.ResponseCodeMetric` -> `spec.CustomAlertCondition`: they're effectively the same type

Test plan: CI
2024-06-03 13:53:01 -07:00
Bolaji Olajide
9e2b56119f
feat(release): allow creation of multiple patch release events (#63034)
* allow creation of multiple patch release events

* skip old month releases

* update config
2024-06-03 11:14:24 -04:00
Bolaji Olajide
bab01ccaac
feat(release): rename code freeze event to branch cut event (#63033)
rename code freeze event to branch cut
2024-06-03 05:13:32 -05:00
William Bezuidenhout
4cf94e9e8c
sg: speed up interrupt execution (#63032) 2024-06-03 09:54:51 +00:00
Greg Magolan
a3afa08161
chore(bazel): bump to aspect_bazel_lib 2.7.7 (#63012) 2024-05-31 23:08:52 +01:00
Robert Lin
012db75133
fix/msp: make deadlineSeconds job-level configuration, apply in timeout (#63017)
In a rushed POC of MSP jobs, I did some pretty bad copy-pasting (evidenced by all the service-specific docstrings I have removed in this PR) and made a bad configuration decision here, resulting in a few issues:

1. `schedule.deadline` is not actually applied to Cloud Run jobs, causing jobs to time out earlier than desired
2. `schedule.deadline` is not the right place to configure a deadline, because _all_ jobs need a configurable deadline, not just those with schedules. This change moves `schedule.deadline` to `deadlineSeconds`.

Closes CORE-145

## Test plan

```
$ sg msp generate gatekeeper prod
$ git diff
```

```diff                    
diff --git a/services/gatekeeper/service.yaml b/services/gatekeeper/service.yaml
index fd6a3812..ce4b02e3 100644
--- a/services/gatekeeper/service.yaml
+++ b/services/gatekeeper/service.yaml
@@ -48,4 +48,4 @@ environments:
           - "primary"
     schedule:
       cron: 0 * * * *
-      deadline: 1800 # 30 minutes
+    deadlineSeconds: 1800 # 30 minutes
diff --git a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
index 3c2c295e..f83b32b9 100644
--- a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
+++ b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
@@ -281,7 +281,7 @@
                   },
                   {
                     "name": "JOB_EXECUTION_DEADLINE",
-                    "value": "600s"
+                    "value": "1800s"
                   }
                 ],
                 "image": "us.gcr.io/sourcegraph-dev/abuse-ban-bot:${var.resolved_image_tag}",
@@ -302,7 +302,7 @@
               }
             ],
             "service_account": "${data.terraform_remote_state.cross-stack-reference-input-iam.outputs.cross-stack-output-google_service_accountiam-workload-accountemail}",
-            "timeout": "300s",
+            "timeout": "1800s",
             "volumes": [
             ],
             "vpc_access": {
@@ -341,7 +341,7 @@
             "uniqueId": "job_scheduler"
           }
         },
-        "attempt_deadline": "600s",
+        "attempt_deadline": "1800s",
         "depends_on": [
           "google_cloud_run_v2_job_iam_member.cloudrun_scheduler_job_invoker"
         ],
```

## Changelog

- MSP jobs: `schedule.deadline` is deprecated, use the top-level `deadlineSeconds` instead. Configured deadlines are now correctly applied as the Cloud Run job execution timeout as well.
2024-05-31 21:15:31 +00:00