Currently the matrix is hardcoded in the msp repo.
Service operators can forget to add or remove their service from the
list.
GitHub supports dynamically generating the matrix from a previous jobs
output
([example](https://josh-ops.com/posts/github-actions-dynamic-matrix/))
This PR adds an `sg msp subscription-matrix` command which will generate
the matrix we need
Part of CORE-202
## Test plan
Output
```
{"service":[{"id":"cloud-ops","env":"prod","category":"internal"},{"id":"gatekeeper","env":"prod","category":"internal"},{"id":"linearhooks","env":"prod","category":"internal"}]}
```
I think finding the right permissions confuses people pretty often when
first interacting with MSP. This adds a helper for annotating errors
returned from points where we might be able to help out @DaedalusG,
specifically for the situation in
https://sourcegraph.slack.com/archives/C05GJPTSZCZ/p1717629546727829😉
## Test plan
It's a little wordy but:
```
sg msp pg connect sams prod
❌ possible permissions error, ensure you have the prerequisite Entitle grants mentioned in https://sourcegraph.notion.site/3e59b9ac3d414a5f8fb5911eed1e418a: find IAM output: gcloud: failed to access secret "iam_operator_access_service_account" from "sams-prod-ywuz": rpc error: code = PermissionDenied desc = Permission 'secretmanager.versions.access' denied for resource 'projects/sams-prod-ywuz/secrets/iam_operator_access_service_account/versions/latest' (or it may not exist).
```
## Changelog
- `sg msp pg connect` will tell you about your service's generated
Notion page if you run into a permissions-looking error during command
setup, where there is guidance about the required Entitle requests.
Deleting Notion pages takes a very long time, and is prone to breaking in the page deletion step, where we must delete blocks one at a time because Notion does not allow for bulk block deletions. The errors seem to generally just be random Notion internal errors. This is very bad because it leaves go/msp-ops pages in an unusable state.
To try and mitigate, we add several places to blindly retry:
1. At the Notion SDK level, where a config option is available for retrying 429 errors
2. At the "reset page" helper level, where a failure to reset a page will prompt a retry of the whole helper
3. At the "delete blocks" helper level, where individual block deletion failures will be retried
Attempt to mitigate https://linear.app/sourcegraph/issue/CORE-119
While here, I also made some other QOL tweaks:
- Fix timing of sub-tasks in CLI output
- Bump default concurrency to 5 (our retries will handle if this is too aggressive, hopefully)
- Fix a missing space in generated docs
## Test plan
```
sg msp ops generate-handbook-pages
```
#62704 introduced a regression due to the changing of the semantics of `rollouts` configuration in code: previously, only the final stage would get it, but with #62704 this became available on all environments, and to infer the final stage a nil-safe helper `rollout.IsFinalStage()` was introduced.
This change fixes a missed check migration that causes additional assets to be incorrectly generated for non-final environments.
## Test plan
`sg msp generate -all`
Closes CORE-23 - this change removes the manual `gcloud deploy apply` step previously required to enable MSP rollouts, thanks to a recent release of the Google Terraform provider.
## Test plan
https://github.com/sourcegraph/managed-services/pull/1403
This change adds a `locations: { gcpRegion: "...", gcpLocation: "..." }` configuration to centralize all location-related options. `gcpRegion` specifies regional preferences, while `gcpLocation` specifies multi-regional preferences (for resources that support it - only BigQuery in most cases).
Closes CORE-24 - see issue for some context.
## Test plan
```
sg msp generate -all # no diff
```
```
sg msp schema -output='../managed-services/schema/service.schema.json'
```
When using https://github.com/sourcegraph/sourcegraph/pull/62565, we override test environments that are in CLI mode, which can cause infra to be rolled out by surprise via VCS mode on switch - this change adds an option to respect the existing run mode configuration via `-workspace-run-mode=ignore`.
Thread: https://sourcegraph.slack.com/archives/C06JENN2QBF/p1715256898022469?thread_ts=1715251558.736709&cid=C06JENN2QBF
## Test plan
```
sg msp tfc sync -all
👉 Syncing all environments for all services, including setting ALL workspaces to use run mode "vcs" (use '-workspace-run-mode=ignore' to respect the existing run mode) - are you sure? (y/N) N
❌ aborting
Projects/sourcegraph/managed-services 1 » sg msp tfc sync -all -workspace-run-mode=ignore
👉 Syncing all environments for all services - are you sure? (y/N) y
// ...
```
Despite recent efforts to surface service options in `sg msp` commands better, such as https://github.com/sourcegraph/sourcegraph/pull/61620, it seems "what is a service argument" remains a point of confusion. It's not helped that some of the command help texts are not super helpful.
This change:
1. Adds explicit lists of available services in help text when we can get it
2. Improves service ID completion so that it works in any subdirectory in the managed-services repo
## Test plan
```sh
Projects/sourcegraph/managed-services » cd services
sourcegraph/managed-services/services » sg msp ops -h
NAME:
sg managed-services-platform operations - Generate operational reference for a service
USAGE:
sg msp ops [command options] <service ID>
DESCRIPTION:
Directly view operational reference documentation for a service - also available in go/msp-ops.
Available services:
- build-tracker
- cloud-ops
- cloud-relay
- cody-analytics
- entitler
- gatekeeper
- msp-testbed
- pings
- releaseregistry
- sams
- sourcegraph-accounts
- support-integration
- telemetry-gateway
COMMANDS:
help, h Shows a list of commands or help for one command
OPTIONS:
--pretty Render syntax-highlighed Markdown (default: true)
--help, -h show help
sourcegraph/managed-services/services » sg msp ops # <tab>
help h -- Shows a list of commands or help for one command
build-tracker cody-analytics msp-testbed sams telemetry-gateway
cloud-ops entitler pings sourcegraph-accounts
cloud-relay gatekeeper releaseregistry support-integration
```
As titled - we can now prompt you through service setup. This is a small QOL improvement that makes providing the required parameters a bit easier, as the setup flags experience has been a point of feedback from several MSP adopters.
As a follow-up, we can probably remove the flag-based setup entirely, as it should generally be a human-operator-only setup. Then we can expand the setup to include e.g. resource setup (postgres), and maybe an initial generate step as well.
## Test plan

This change migrates `generate` and `tfc sync` to use our service/env argument getters so that we return more consistent error messages. Errors around non-existent or missing service/env arguments now also provide relevant lists of possible values, such as all available services or all available environments for a valid service argument (see test plan examples).
Hopefully this makes errors easier to understand, as the possible values should give a better hint as to what arguments the command expects.
If opening a generated MSP service file in your editor, but as part of a different workspace/repo, then vscode doesnt pick up what schema file to use for it, resulting in no intellisense. The yaml langserver supports a magic comment to point to a relative file, that works in this case 🙂
## Test plan
Validated locally with local sg build and `sg msp init ...`
MSP rollouts (#59956) currently requires an additional manual step to provision via a `gcloud deploy apply` using a generated configuration YAML file. This is required because at the time, the following were not available via Terraform:
1. Cloud Deploy custom target _types_: define entities in Cloud Deploy describing the existence of custom targets using custom Skaffold scripts.
2. Cloud Deploy targets, _using_ custom target types: the Terraform resource only supports native target types, not custom targets.
In a recent GCP Terraform provider release, support for 1 was added, and this change migrates the definition currently in the generated Cloud Deploy YAML file. However, 2 is not yet supported, so we can't yet remove the manual `gcloud deploy apply` step - this is tracked in https://github.com/sourcegraph/managed-services/issues/940. This PR also improves the docstrings to better indicate what we expect to change in the future.
Closes https://github.com/sourcegraph/managed-services/issues/932
## Test plan
https://github.com/sourcegraph/managed-services/pull/939
More feedback from recent `sg msp pg` usage, starting with https://sourcegraph.slack.com/archives/C05GJPTSZCZ/p1710932987694719?thread_ts=1709911173.644899&cid=C05GJPTSZCZ:
1. **operationdocs**: Stronger wording on first-time `managed-services` repo and tooling setup, in particular saying you're going to need to clone the repo.
2. **operationdocs**: Note that write-access Entitle is required even for read-only database connection (both cases require IAM impersonation, which _can_ grant write access, so it's gated behind the write-access request)
3. **sg msp**: Throw special error when additional args are provided in commands that don't expect it, reminding users that flags need to be placed before args.
4. **sg msp**: Render warning with link to generated docs if permissions-related error is detected in `cloud-sql-proxy` output.
## Test plan
```
sg msp pg connect sourcegraph-accounts prod --session.timeout foobar
❌ got unexpected additional arguments "--session.timeout foobar" - note that flags must be placed BEFORE arguments, i.e. '<flags> <arguments>'
```
```
[cloud-sql-proxy] 2024/03/25 08:06:36 [sourcegraph-accounts-prod-csvc:us-central1:postgresql-e6bc] failed to connect to instance: failed to get instance: Refresh error: failed to get instance metadata (connection name = "sourcegraph-accounts-prod-csvc:us-central1:postgresql-e6bc"): Get "https://sqladmin.googleapis.com/sql/v1beta4/projects/sourcegraph-accounts-prod-csvc/instances/postgresql-e6bc/connectSettings?alt=json&prettyPrint=false": impersonate: status code 403: {
[cloud-sql-proxy] "error": {
[cloud-sql-proxy] "code": 403,
[cloud-sql-proxy] "message": "Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).",
[cloud-sql-proxy] "status": "PERMISSION_DENIED",
[cloud-sql-proxy] "details": [
[cloud-sql-proxy] {
[cloud-sql-proxy] "@type": "type.googleapis.com/google.rpc.ErrorInfo",
[cloud-sql-proxy] "reason": "IAM_PERMISSION_DENIED",
[cloud-sql-proxy] "domain": "iam.googleapis.com",
[cloud-sql-proxy] "metadata": {
[cloud-sql-proxy] "permission": "iam.serviceAccounts.getAccessToken"
[cloud-sql-proxy] }
[cloud-sql-proxy] }
[cloud-sql-proxy] ]
[cloud-sql-proxy] }
[cloud-sql-proxy] }
⚠️ Permissions error detected - do you have the prerequisite Entitle permissions grant? See go/msp-ops/sourcegraph-accounts#prod for more details.
```
https://github.com/sourcegraph/handbook/pull/8767 updates the handbook with the new output
Previously, we'd ask users to run the command again with the `-download` flag. This is kind of annoying especially because of flag positioning quirks. Instead, let's just ask the user if they'd actually like us to install it for them, in case we don't find the binary in the cache.
## Test plan
```sh
$ sg msp pg connect sams dev
⚠️ cloud-sql-proxy binary not found at "/Users/robert@sourcegraph.com/Library/Caches/sourcegraph/bin/cloud-sql-proxy/2.8.1/cloud-sql-proxy"
👉 Would you like me to install cloud-sql-proxy for you? n
❌ failed to find cloud-sql-proxy: stat /Users/robert@sourcegraph.com/Library/Caches/sourcegraph/bin/cloud-sql-proxy/2.8.1/cloud-sql-proxy: no such file or directory
$ sg msp pg connect sams dev
⚠️ cloud-sql-proxy binary not found at "/Users/robert@sourcegraph.com/Library/Caches/sourcegraph/bin/cloud-sql-proxy/2.8.1/cloud-sql-proxy"
👉 Would you like me to install cloud-sql-proxy for you? y
✅ cloud-sql-proxy binary saved to "/Users/robert@sourcegraph.com/Library/Caches/sourcegraph/bin/cloud-sql-proxy/2.8.1/cloud-sql-proxy"
💡 Preparing a connection with read-only access - for write access, use the '-write-access' flag.
👉 Use this command to connect to database "accounts":
psql -U operatoraccess-a55c85@sams-dev-bfec.iam -d accounts -h 127.0.0.1 -p 5433
👉 Use this command to connect to database "cody_management":
psql -U operatoraccess-a55c85@sams-dev-bfec.iam -d cody_management -h 127.0.0.1 -p 5433
⚠️ The current session will terminate in 300 seconds. Use '-session.timeout' to increase the session duration.
[cloud-sql-proxy] 2024/03/11 03:35:04 Impersonating service account with Application Default Credentials
[cloud-sql-proxy] 2024/03/11 03:35:05 [sams-dev-bfec:us-central1:postgresql-26ca] Listening on 127.0.0.1:5433
[cloud-sql-proxy] 2024/03/11 03:35:05 The proxy has started successfully and is ready for new connections!
^C [cloud-sql-proxy] 2024/03/11 03:35:06 SIGINT signal received. Shutting down...
```
---------
Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>
Allows services to define a `rollout` spec that ensures new image releases go through a specified sequence and flow. We do this using Cloud Deploy and custom targets that update the Cloud Run service image and configuring Terraform to ignore image changes.
> [!NOTE]
> We use a custom target (as opposed to using the native Cloud Deploy + Cloud Run integration, which wants the entire spec in YAML for releases - see https://github.com/sourcegraph/managed-services/issues/186#issuecomment-1915196511) because everything else we have is generated in Terraform, and the core Cloud Run configuration extensively references Terraform values. It would be an extensive undertaking to change how this works. For the most part, this is to deploy a new version of the service code, and it can be beneficial to tie that to the service repository's CI to make it clear where a piece of code goes - building the custom target to _only_ roll out images allows us to do that.
Custom targets are not yet supported by the GCP Terraform provider, which is unfortunate - instead we have to render some YAML that can be applied with a `gcloud` command. For the most part, this should be a one-time operation. There is generated guidance on what to do with the generated output, and also how to create releases.
Closes https://github.com/sourcegraph/managed-services/issues/186
Kinda rambly, high-level Loom overview: https://www.loom.com/share/55bfa34d173c40a9b78708de2029f34f?sid=6f1b062d-ba02-4bb9-8abe-c9f8f8f9a8fe
### Configuring rollouts
In the top-level service spec:
```yaml
rollout:
stages:
- environment: test
- environment: robert
```
And in each relevant environment:
```yaml
- id: robert
projectID: msp-testbed-robert-7be9
category: test
deploy:
type: rollout
```
`sg msp generate` will render resources for the "last" stage to house Cloud Deploy infrastructure.
### Creating releases
Creating a release triggers a rollout, which progresses through the specified stages, like so:
<img width="1347" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/9df0e510-08eb-4fd4-bbd4-1d58c6817bba">
Creating releases is intended to be run using `gcloud` commands for now - we could introduce a `sg msp` command for this later. The command creates a release targeting the Cloud Deploy pipeline that exists in the final-stage project. Example command (one is also generated in the pipeline YAML file docstrings):
```sh
gcloud deploy releases create manual-test-04-2024-01-31 \
--project=msp-testbed-robert-7be9 \
--region=us-central1 \
--delivery-pipeline=msp-testbed-us-central1-rollout \
--source='gs://msp-testbed-robert-7be9-cloudrun-skaffold/source.tar.gz' \
--labels="commit=abc123,author=foo" \
--deploy-parameters="customTarget/tag=dd34d1be076e_2024-01-31"
```
Promotions can happen at any time - not every release needs to be promoted to the subsequent stage - and currently must happen manually for each stage except the first.
A secret/output is provisioned with a "release creator" SA that can be used to create a [workload identity pool](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/infrastructure/-/blob/managed-services/continuous-deployment-pipeline/main.tf?L5-20) that can be used to run the `gcloud deploy releases create` command in CI.
After the first apply, which now assumes an `insiders` tag, Terraform no longer touches the image via a [lifecycle ignore](https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle)
### Rollout execution
Rollouts happen via a new `clouddeploy-executor` SA in the last stage, which is granted sufficient IAM roles to deploy Cloud Run revisions.
The "render" step in Skaffold prepares a release - in our case, generating a `deploy.sh` with the prerequisite arguments. A record is available in the the relevant "rollout" page:
<img width="1694" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/53b70923-c0ce-4661-8e6f-d3444cf256e1">

The "deploy" step just downloads the artifact and executes it.
### Tracing a release
You can include arbitrary labels on releases - this shows up in the release entity in GCP console, but we don't yet propagate anything very well down to the Cloud Run revision. In particular, it seems like we don't get the tag information in the revision UI, but if you click "edit" you see the correct tag populated:
<img width="500" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/856ccd92-9e0d-41d9-b84c-1846b30a3f79"> <img width="500" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/f917be4b-f714-4fef-8871-2006fbf83901">
See `skaffold.yaml` - we're mostly just executing commands with `gcloud`, and reporting expected outputs. We can extend this with more detailed outputs and additional tagging or scripting if we want - examples I've seen often build a custom binary/image to execute more advanced use cases. Also see https://cloud.google.com/deploy/docs/custom-targets
### Rollbacks
[Cloud Deploy has a concept of rollbacks](https://cloud.google.com/deploy/docs/roll-back), which you can apply via UI - it seems this just runs the previous configuration:
<img width="868" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/6bdc8459-61b7-4ce6-9397-c2f9b3a29e8b">
<img width="1426" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/778241a7-3a97-45f9-b4a6-31bf81f5a8d5">
## Test plan
See https://github.com/sourcegraph/managed-services/pull/454 and https://console.cloud.google.com/deploy/delivery-pipelines/us-central1/msp-testbed-us-central1-rollout?project=msp-testbed-robert-7be9 . I also specifically tested that deploying a particular image, and then deploying a change in Terraform, does not overwrite the image, and we do not have infinite drift on the Terraform when releases deploy images.
Also https://github.com/sourcegraph/managed-services/actions/runs/7744296405
There are a few issues with unstable output by default:
1. Rolling out TF changes can inadvertently deploy new revisions
2. Some services have private images that the operator might not have access to - we don't want image update to block `sg msp generate` by default
3. Updating images via subscription should primarily be done via automation, or manually and explicitly
This change toggles `-stable=true` by default. I will update our image update workflow to use `-stable=false` explicitly: https://github.com/sourcegraph/managed-services/pull/392
The new configuration is mostly based on Cody Gateway - if a service has an external domain, we create an uptime check and alert on failures.
The uptime check uses MSP standards, which depends on whether or not service health probes are configured. Since we use this in several places now, I've also reworked the health probes configuration to make it easier to reason with:
1. `healthProbes` now configures all healthchecks. `startupProbe` and `livenessProbe` has been removed
2. `disabled` is now `healthzProbes` - this configures if MSP healthchecks should be used, instead of default `/` ones.
3. By default, if no config is provided, MSP healthchecks are not used
4. If config is provided, MSP healthchecks must be explicitly disabled
Closes https://github.com/sourcegraph/managed-services/issues/350
This is required for our upcoming vendor evaluations as well.
This PR also includes a variety of internal improvements to alert policies.
Addresses some feedback from [this thread](https://sourcegraph.slack.com/archives/C05GJPTSZCZ/p1705074439174099):
1. `init-env` might be easily confused for `init`, this change adds an up-front check that all arguments are present and returns an error message suggesting `init` just in case you haven't created a service yet
2. If a service spec can't be opened, we now return an error message `service does not exist`
3. All callsites of `spec.Open` now wrap the error with the ID of the service they are expecting to open
## Test plan
```
$ sg msp init-env dev
❌ exactly 2 arguments required, '<service ID>' and '<env ID>' - this command is for adding an environment to an existing service, did you mean to use 'sg msp init' instead?
$ sg msp init-env asdfasdf dev
❌ load service "asdfasdf": service does not exist: open services/asdfasdf/service.yaml: no such file or directory
```
Expand the new `sg msp ops` command to create an entire directory tree in https://github.com/sourcegraph/handbook. For now, we assume someone will update this by hand from time to time - environments should generally be fairly static.
## Test plan
```
sg msp operations generate-handbook-pages
```
https://github.com/sourcegraph/handbook/pull/8429
---------
Co-authored-by: James Cotter <35706755+jac@users.noreply.github.com>
This change adds a static list of all workspaces we have. This is unlikely to change much more in the future. This static list can be used for:
1. Syncing Terraform Cloud workspaces (we no longer need to render the stack to do so)
2. CLI completions where appropriate
To make sure the static list holds, I've also removed the option to not use TFC as the backend.
We likely need architecture diagrams eventually, especially for SOC2, and I thought it might be good to explore what we can generate, since MSP infra architecture is a bit conditional on service specification, documenting by hand can prove rather difficult, outside of `sg msp operations`.
I tried walking the `cdktf.TerraformStack` graph, but couldn't figure out how to get dependencies correctly. In the end @michaellzc pointed me to `terraform graph`, which uses TF plans to prepare a graph. It includes stuff that doesn't feel very important, so I added a bunch of crude filtering to make the graph a bit more usable.
The layout is not _great_ - I tried the various [dot layout engines](https://graphviz.org/docs/layouts/) and `unflatten` but none of them worked very well for our use case - but the information is actually kind of useful, and does illustrate a realistic graph of the various pieces involved.
A default rendering of the graph is available with `sg msp tfc graph`, and you can get the dot-format configuration with `sg msp tfc graph -dot`. I think the grouping under the `tfc` commands makes sense because you do need TFC to generate this.
Part of https://github.com/sourcegraph/managed-services/issues/361 and https://github.com/sourcegraph/managed-services/issues/328
## Test plan
```
sg msp tfc graph sams dev cloudrun
```

Previously, commands like `sg msp pg connect` required TFC access to run. Now, we just use the outputs exported to GSM directly (#59341), so that these commands can run if you have access to the GCP project only.
This lifts stacks from `dev/managedservicesplatform/internal/stack` to `dev/dev/managedservicesplatform/stacks`, making it easier to share consts (namely outputs) directly with MSP tooling.
Right now, commands like `sg msp db connect` need to access TFC workspace outputs. This is clunky because it requires another Entitle roundtrip to get credentials and access to TFC.
Now that we configure project ID up-front, `sg msp` can just reach out to the service environment's project ID for secrets - by adding all local variables/TF outputs to GSM as well, we can now get access to everything with just one Entitle request on the environment project or folder.
This change only emits StackLocals as GSM secrets - I'll make the actual tooling changes in a follow-up.
Fix path generation and arguments from https://github.com/sourcegraph/sourcegraph/pull/59220 for the `sg msp init-env` command.
Also updates the initial example service spec to match the one generated from writing the YAML back to disk - there's not too much control over the indentation offered by the library.
This is a big diff, but they all tie together, so hear me out:
The only way to get the project ID right now is to query the appropriate Terraform Cloud workspace outputs. However, to do that, you need access to `sourcegraph-secrets`, to get the appropriate TFC access token.
This is awkward because as an operator, you would follow the instructions to request `mspServiceEditor` on your desired project - but now, to use various MSP tooling like `sg msp pg connect`, you must _also_ request access to `sourcegraph-secrets`, so that we can get a TFC token to find the project ID and other stuff. Because we might have a large number of services it's not feasible to manually set up Entitle bundles (they cannot be programmatically created).
The approach I want to take is to copy the MSP team TFC token from `sourcegraph-secrets` into each individual MSP environment project. Then, we can get the MSP team TFC token from the _environment_ project instead, access for which will be granted by the `mspServiceEditor` role. To do this however, we must know the project ID up front. So this PR makes the following changes:
1. Makes it so that the randomized project ID isn't managed by Terraform, but generated statically, and configured in `environments[].projectID`.
2. This requires changes to `sg msp init` to create a project ID the same way we create it in-Terraform today, but in addition to service initialization, we must now also have tooling to start configuration a new environment as well, so that we can generate a project ID for the operator. This is done via a new command, `sg msp init-env`, which inserts a new environment into a service spec.
- MSP service specs are intended to be operator-written and hopefully include lots of docstrings on configuration, so we take special care to preserve formatting and comments by manipulating `yaml.Node` directly.
4. In order to use `yaml.Node`, however, we must switch over to `gopkg.in/yaml.v3` - previously, we used the K8S YAML library, mostly as a carry-over from what is used in Cloud. In order to use `gopkg.in/yaml.v3`, we need to:
- Replace all `json` struct tags with `yaml`, as the YAML library does not support JSON tags
- Upgrade `github.com/invopop/jsonschema` so that we can point the JSON schema generator to use the `yaml` tags as well
5. Now that we have `projectID` statically available, we can remove code that queries TFC workspaces for the project ID and replace them with references to the spec instead.
## Test plan
1. Unit tests on `sg msp init`'s generated output
2. Unit tests on inserting environment
3. Unit tests on project ID generator
4. https://github.com/sourcegraph/managed-services/pull/295
We generate all stacks to get a list of stacks for which we need TFC workspaces - we need to do this in "stable mode" to avoid image access on subscription types (same thing we do for CI)