sourcegraph

mirror of https://github.com/sourcegraph/sourcegraph.git synced 2026-02-06 20:31:48 +00:00

Author	SHA1	Message	Date
Robert Lin	2958abc326	fix/msp/postgresqlroles: wait for databases to be provisioned (#63362 ) Wait for databases to be provisioned before granting database-specific roles to the operator access user. ## Test plan Re-apply fixed https://sourcegraph.slack.com/archives/C05E2LHPQLX/p1718850688397579, indicating a race condition on database creation. Diff looks good: ```diff @@ -1447,10 +1472,15 @@ "path": "cloudrun/cloudrun-postgresqlroles-msp_iam-operator_access_service_account_table_grant", "uniqueId": "cloudrun-postgresqlroles-msp_iam-operator_access_service_account_table_grant" } }, "database": "msp_iam", + "depends_on": [ + "google_sql_database.postgresql-database-enterprise-portal", + "google_sql_database.postgresql-database-enterprise_portal", + "google_sql_database.postgresql-database-msp_iam" + ], "object_type": "table", "objects": [ ], "privileges": [ "SELECT" ``` ## Changelog - MSP Cloud SQL: Fix race condition between database creation and role grants for the read-only operator access user	2024-06-20 07:43:14 -07:00
Robert Lin	1aeb9c93f1	chore/msp: document gRPC notes in spec docstrings (#63140 ) Lessons learned from https://sourcegraph.slack.com/archives/C05E2LHPQLX/p1717703306405529 ## Test plan n/a	2024-06-06 14:20:50 -07:00
Robert Lin	27211dea73	feat/msp: update handbook link in alerts dashboard, sort custom alerts first (#63089 ) 1. The dashboard link still points to the old `go/msp-ops/...` which no longer work (CORE-105) 2. Alerts defined on top of the MSP defaults are probably of more interest, so let's sort these in front of the others ## Test plan Unit/golden tests	2024-06-05 09:09:22 -07:00
Robert Lin	a3fe573b59	fix/msp: flatten custom alert promQL query for GCP (#63084 ) The GCP monitoring alert configuration expects, for some reason, a single-line PromQL query only, otherwise the threshold doesn't work. In configuration, however, we may want to write a multi-line query, for ease of readability. This change automatically flattens the PromQL query into a single line and strips extra spaces. Part of CORE-161 ## Test plan Unit tests	2024-06-04 14:37:51 -07:00
Robert Lin	908d7119ea	chore/msp: blindly retry Notion page deletion (#63052 ) Deleting Notion pages takes a very long time, and is prone to breaking in the page deletion step, where we must delete blocks one at a time because Notion does not allow for bulk block deletions. The errors seem to generally just be random Notion internal errors. This is very bad because it leaves go/msp-ops pages in an unusable state. To try and mitigate, we add several places to blindly retry: 1. At the Notion SDK level, where a config option is available for retrying 429 errors 2. At the "reset page" helper level, where a failure to reset a page will prompt a retry of the whole helper 3. At the "delete blocks" helper level, where individual block deletion failures will be retried Attempt to mitigate https://linear.app/sourcegraph/issue/CORE-119 While here, I also made some other QOL tweaks: - Fix timing of sub-tasks in CLI output - Bump default concurrency to 5 (our retries will handle if this is too aggressive, hopefully) - Fix a missing space in generated docs ## Test plan ``` sg msp ops generate-handbook-pages ```	2024-06-03 22:32:06 +00:00
Robert Lin	617d2f766c	chore/msp/spec: tidy up custom alerts spec (#63050 ) Follow-ups for #62885: - Better docstrings for `mql`, `promql` - `duration` -> `durationMinutes` to align with other config - `alertpolicy.ResponseCodeMetric` -> `spec.CustomAlertCondition`: they're effectively the same type Test plan: CI	2024-06-03 13:53:01 -07:00
Robert Lin	012db75133	fix/msp: make deadlineSeconds job-level configuration, apply in timeout (#63017 ) In a rushed POC of MSP jobs, I did some pretty bad copy-pasting (evidenced by all the service-specific docstrings I have removed in this PR) and made a bad configuration decision here, resulting in a few issues: 1. `schedule.deadline` is not actually applied to Cloud Run jobs, causing jobs to time out earlier than desired 2. `schedule.deadline` is not the right place to configure a deadline, because _all_ jobs need a configurable deadline, not just those with schedules. This change moves `schedule.deadline` to `deadlineSeconds`. Closes CORE-145 ## Test plan ``` $ sg msp generate gatekeeper prod $ git diff ``` ```diff diff --git a/services/gatekeeper/service.yaml b/services/gatekeeper/service.yaml index fd6a3812..ce4b02e3 100644 --- a/services/gatekeeper/service.yaml +++ b/services/gatekeeper/service.yaml @@ -48,4 +48,4 @@ environments: - "primary" schedule: cron: 0 * * * * - deadline: 1800 # 30 minutes + deadlineSeconds: 1800 # 30 minutes diff --git a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json index 3c2c295e..f83b32b9 100644 --- a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json +++ b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json @@ -281,7 +281,7 @@ }, { "name": "JOB_EXECUTION_DEADLINE", - "value": "600s" + "value": "1800s" } ], "image": "us.gcr.io/sourcegraph-dev/abuse-ban-bot:${var.resolved_image_tag}", @@ -302,7 +302,7 @@ } ], "service_account": "${data.terraform_remote_state.cross-stack-reference-input-iam.outputs.cross-stack-output-google_service_accountiam-workload-accountemail}", - "timeout": "300s", + "timeout": "1800s", "volumes": [ ], "vpc_access": { @@ -341,7 +341,7 @@ "uniqueId": "job_scheduler" } }, - "attempt_deadline": "600s", + "attempt_deadline": "1800s", "depends_on": [ "google_cloud_run_v2_job_iam_member.cloudrun_scheduler_job_invoker" ], ``` ## Changelog - MSP jobs: `schedule.deadline` is deprecated, use the top-level `deadlineSeconds` instead. Configured deadlines are now correctly applied as the Cloud Run job execution timeout as well.	2024-05-31 21:15:31 +00:00
Robert Lin	7170d4bd2b	feat/msp: add link to ops page in Slack channel description (#63011 ) Minor QOL improvement, when you're in the Slack channel the chances are good that you might want the ops docs at some point. ## Test plan n/a ## Changelog - MSP-provisioned alerts Slack channels now include a link to the service's generated operational docs for a service (go/msp-ops) in the channel description.	2024-05-31 17:59:22 +00:00
James Cotter	2f4e3b9272	sg/msp: fix nil `domain` and `EnvironmentDomainTypeNone` in diagram gen (#62982 )	2024-05-30 17:58:00 +01:00
Robert Lin	9e4a8e8033	feat/sg/msp: add 'sg msp validate' for validating service specifications (#62973 )	2024-05-30 09:11:36 -07:00
Robert Lin	27f0d725ac	feat/msp/spec: require notionPageID if a production env is provisioned (#62972 )	2024-05-30 09:01:42 -07:00
Robert Lin	cb62afa2c2	fix/msp: test for cron interval changes based on time, add more restrictions (#62969 ) Addresses problem noticed in https://github.com/sourcegraph/managed-services/pull/1486#issuecomment-2137887423 ## Test plan Unit tests ## Changelog - Fixed an issue with output of `sg msp generate` for MSP jobs with particular schedules changing throughout the week - MSP jobs schedules now must be between 15 minutes at the most frequent, and every week at the least frequent	2024-05-29 18:24:39 -07:00
James Cotter	cb71a2d529	sg/msp: support for super-simple alerts on custom metrics (#62885 ) --------- Co-authored-by: Joe Chen <joe@sourcegraph.com> Co-authored-by: Robert Lin <robert@bobheadxi.dev>	2024-05-24 20:47:19 +01:00
Robert Lin	e0a7c0d3a6	fix/msp/spec: validate against LivenessInterval that is too high (#62872 ) Guards against another one of those "only fails at apply time" things (https://github.com/sourcegraph/managed-services/pull/1459)	2024-05-22 22:09:37 +00:00
Robert Lin	6c59b02534	feat/msp: do not use tfvars file outside of deploy-type 'subscription' (#62704 ) Closes CORE-121 The dependency on the generated `tfvars` file is frustrating for first-time MSP setup because it currently requires `-stable=false` to update, and doesn't actually serve any purpose for deploy types other than `subscription` (which uses it to isolate image changes that happen on via GitHub actions). This makes it so that we don't generate, or depend on, the dynamic `tfvars` file unless you are using `subscription`. I've also added a rollout spec configuration, `initialImageTag`, to make the initial tag we provision environments with configurable (as some services might not publish `insiders` images) - see the docstring. ## Test plan Inspect output of `sg msp generate -all`	2024-05-16 09:43:47 -07:00
Noah S-C	9b6ba7741e	bazel: transcribe test ownership to bazel tags (#62664 )	2024-05-16 15:51:16 +01:00
James Cotter	d1404951eb	sg/msp: fix `CustomTargetType` reference in `Target` definition (#62727 )	2024-05-16 13:32:37 +01:00
James Cotter	75356f8606	sg/msp: clarify `repository` annotation meaning in delivery pipeline (#62703 ) PR feedback from: https://github.com/sourcegraph/sourcegraph/pull/62702	2024-05-15 21:46:39 +01:00
James Cotter	3b394e7954	sg/msp: add repo annotation to delivery pipeline (#62702 )	2024-05-15 12:35:00 -07:00
Robert Lin	cb15cea2b0	msp/cloudrun: use GA launch stage (#62685 ) VPC direct egress is now GA: see example in https://registry.terraform.io/providers/hashicorp/google/5.29.0/docs/resources/cloud_run_v2_service#example-usage---cloudrunv2-service-directvpc and https://cloud.google.com/run/docs/configuring/vpc-direct-vpc This also fixes the infinite `GA` -> `BETA` drift we have in TFC	2024-05-15 17:32:54 +01:00
Robert Lin	cc6cfd8499	msp/rollouts: remove Cloud Deploy target import (#62687 ) Now that #62644 (CORE-23) is rolled out, this import block is no longer needed (and may even be disruptive when provisioning new rollout pipelines). The change was rolled out in: - https://github.com/sourcegraph/managed-services/pull/1416 - https://github.com/sourcegraph/managed-services/pull/1417 - https://github.com/sourcegraph/managed-services/pull/1403 ## Test plan n/a	2024-05-15 17:32:54 +01:00
Robert Lin	456315b54d	msp/rollouts: use new in-terraform custom target provisioning (#62644 ) Closes CORE-23 - this change removes the manual `gcloud deploy apply` step previously required to enable MSP rollouts, thanks to a recent release of the Google Terraform provider. ## Test plan https://github.com/sourcegraph/managed-services/pull/1403	2024-05-14 18:51:33 -07:00
Robert Lin	7308d16db9	msp/terraform: upgrade to 1.7.5 (#62650 ) According to https://developer.hashicorp.com/terraform/language/v1.7.x/upgrade-guides this should be compatible with our current version, 1.3.10 We need to upgrade to use `import` blocks (TF 1.5), which will make https://github.com/sourcegraph/sourcegraph/pull/62644 and CORE-23 capable of a smooth rollout (otherwise we encounter conflict with the previously hand-deployed resources). This also requires our CDKTF modules to be regenerated with the new Terraform version: https://github.com/sourcegraph/managed-services-platform-cdktf/pull/10 ## Test plan n/a - will do a staged rollout per https://www.notion.so/sourcegraph/MSP-infrastructure-upgrades-1808e7e45bd54f419dd93af542d99238?pvs=4	2024-05-14 12:33:06 -07:00
James Cotter	1d2076fc87	sg/msp: fix typo in exernal_health_check description (#62659 )	2024-05-14 12:31:02 -07:00
Robert Lin	71555cc0b1	msp/operationdocs: fix bad formatting (#62641 ) Noticed some leftover awkward formatting from https://github.com/sourcegraph/sourcegraph/pull/62607 ## Test plan Golden tests	2024-05-13 14:31:01 -07:00
James Cotter	cf9bcb3d80	sg/msp: upgrade sentry (#62636 )	2024-05-13 10:38:18 -07:00
Robert Lin	fdf0bf9a02	msp/operationdocs: add incident response starter guide, Notion-specific formatting (#62607 ) Closes CORE-20: adds a small per-service "incident response" section near the alerts reference section of each service, providing some simple starter context and linking to other relevant guidance. This change also makes some Notion-oriented formatting tweaks: putting all paragraphs on a single line (because of https://github.com/sourcegraph/notionreposync/issues/9) and also rendering callouts with appropriate background colors (https://github.com/sourcegraph/notionreposync/pull/11). ## Test plan Golden tests, roll out to Notion: ```sh GITHUB_ACTIONS=true sg msp ops generate-handbook-pages ``` Incident response: ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/d07e0071-870f-4acb-b4a4-2246b40850a3) Callouts: ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/6ec7dbea-cafd-40e0-b50c-780c4e9cbd22)	2024-05-10 23:56:41 +00:00
Robert Lin	7b6dd9080e	msp: centralize and expose locations configuration (#62604 ) This change adds a `locations: { gcpRegion: "...", gcpLocation: "..." }` configuration to centralize all location-related options. `gcpRegion` specifies regional preferences, while `gcpLocation` specifies multi-regional preferences (for resources that support it - only BigQuery in most cases). Closes CORE-24 - see issue for some context. ## Test plan ``` sg msp generate -all # no diff ``` ``` sg msp schema -output='../managed-services/schema/service.schema.json' ```	2024-05-10 15:50:07 -07:00
James Cotter	2d5ed2e735	sg/msp: add cloud deploy pubsub notifications (#62596 ) --------- Co-authored-by: Joe Chen <joe@sourcegraph.com> Co-authored-by: Robert Lin <robert@bobheadxi.dev>	2024-05-10 22:51:47 +01:00
Robert Lin	022b4ad95f	msp/terraformcloud: add option to respect existing run mode (#62580 ) When using https://github.com/sourcegraph/sourcegraph/pull/62565, we override test environments that are in CLI mode, which can cause infra to be rolled out by surprise via VCS mode on switch - this change adds an option to respect the existing run mode configuration via `-workspace-run-mode=ignore`. Thread: https://sourcegraph.slack.com/archives/C06JENN2QBF/p1715256898022469?thread_ts=1715251558.736709&cid=C06JENN2QBF ## Test plan ``` sg msp tfc sync -all 👉 Syncing all environments for all services, including setting ALL workspaces to use run mode "vcs" (use '-workspace-run-mode=ignore' to respect the existing run mode) - are you sure? (y/N) N ❌ aborting Projects/sourcegraph/managed-services 1 » sg msp tfc sync -all -workspace-run-mode=ignore 👉 Syncing all environments for all services - are you sure? (y/N) y // ... ```	2024-05-09 14:57:40 -07:00
Robert Lin	4d6455996c	msp: add infra and runtime support for job checkins (#62508 ) Closes CORE-21 - allows jobs to register check-ins using Sentry when they are configured as cron jobs: https://docs.sentry.io/product/crons/, for a nice view of "is my job running or nah" without using GCP's less-than-beautiful console views 1. Adds the configured schedule and deadline as environment variables for MSP jobs 2. Adds a contract mechanism for checking in, for example: ```go func work(ctx context.Context) (err error) { done, err := c.Diagnostics.JobExecutionCheckIn(ctx) if err != nil { /* failed to register check-in */ } defer done(err) // ... do work } ``` ## Test plan ```sh TestJobExecutionCheckIn_SENTRY_DSN='...' go test -v ./runtime/contract ``` ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/8998af89-e74a-44a5-939a-92c8b63ea262) In Slack: ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/0677e2db-5a33-4751-ae86-d43e5b1e159f) It appears the message is not currently customizable: https://develop.sentry.dev/sdk/check-ins/ --------- Co-authored-by: Joe Chen <joe@sourcegraph.com>	2024-05-09 10:11:48 -07:00
Robert Lin	1463f6724f	msp/terraformcloud: grant 'sso' team read access to MSP workspaces (#62559 )	2024-05-09 09:53:20 -07:00
Robert Lin	e1fa3393b5	msp: update msp-ops links (#62435 ) With https://github.com/sourcegraph/sourcegraph/pull/62325 landed, this updates a few references: 1. Service golinks don't work anymore: CORE-105 3. Alerts docs now point to the service Notion page	2024-05-06 10:36:15 -07:00
Robert Lin	f444774570	msp/operationdocs: write to Notion instead of Markdown (#62325 ) Closes https://github.com/sourcegraph/managed-services/issues/1076 aka closes CORE-28 - https://github.com/sourcegraph/managed-services/pull/1332 also automated the updates. Notes: 1. Notion anchors are ID-based, so we strip all anchor links because we cannot generate them in one pass. `notionreposync` may implement a mechanism for this in the future (filed https://github.com/sourcegraph/notionreposync/issues/8), but for now we don't have a great way around this, especially because of the next point 2. In-place updates are hard because of the block structure, so we destroy page contents and recreate the page every time. This causes a "flicker" as a viewer may see the page disappear slowly before their eyes (we can only delete things 1 block at a time). `notionreposync` may may implement an improved mechanism for this in the future (filed https://github.com/sourcegraph/notionreposync/issues/7) 3. There's something funky going wrong with line breaks, filed https://github.com/sourcegraph/notionreposync/issues/9 - it hurts readability but it's not unmanageable. 4. Sadly Notion does not allow API file uploads (😡 https://developers.notion.com/docs/working-with-files-and-media#uploading-files-and-media-via-the-notion-api), so we generate them into the managed-services repo (https://github.com/sourcegraph/managed-services/pull/1332) and then just link to those diagrams from the generated page. We use a Markdown file that renders the SVG because the native SVG viewer sucks. 5. Made misc changes to operationdocs output where Notion's version is noticeably worse, or difficult to support (tables, lists-in-admonitions, etc) Depends on various improvements upstream in https://github.com/sourcegraph/notionreposync: - https://github.com/sourcegraph/notionreposync/pull/4 - https://github.com/sourcegraph/notionreposync/pull/5 - https://github.com/sourcegraph/notionreposync/pull/6 Follow-up improvements: - CORE-105 - CORE-106 ## Test plan ``` sg msp ops generate-handbook-pages ``` ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/9d314a97-5370-4123-9534-f9f897002110) https://sourcegraph.notion.site/Build-Tracker-infrastructure-operations-bd66bf25d65d41b4875874a6f4d350cc ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/69e1eb48-2fa9-421b-b2fd-969e25f37fee) https://github.com/sourcegraph/managed-services/pull/1332 --------- Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>	2024-05-03 21:17:16 +00:00
James Cotter	6d7082d26e	sg/msp: architecture diagrams (#62213 )	2024-05-01 13:57:34 +01:00
Robert Lin	fd2b746b02	msp/redis: disable HA by default (#62210 ) https://github.com/sourcegraph/managed-services/pull/1307 "pinned" all services that should have HA Redis - this change flips the default to use the lower-cost alternative by default. ## Test plan https://github.com/sourcegraph/managed-services/actions/runs/8884108494	2024-04-30 11:26:48 -07:00
James Cotter	4909533715	sg/msp: upgrade google/google_beta to v5.26.0 (#62251 )	2024-04-29 19:34:28 +00:00
Robert Lin	54245a7a0d	msp/spec: exclude populated README from YAML (#62215 )	2024-04-26 20:47:09 +00:00
Robert Lin	8986f9cd99	msp/redis: make tier changes graceful (#62211 ) Makes the downgrade option introduced in https://github.com/sourcegraph/sourcegraph/pull/62137 usable, since a tier change in Redis is a force-recreate situation (unlike Cloud SQL) ## Test plan - [x] Applied downgrade directly in msp-testbed-robert https://app.terraform.io/app/sourcegraph/workspaces/msp-msp-testbed-robert-cloudrun/runs - [x] Switched robert back to VCS mode, monitored upgrade https://app.terraform.io/app/sourcegraph/workspaces/msp-msp-testbed-robert-cloudrun/runs/run-PFTNBXLBdZGWTm1Z No alerts fired during either of the above	2024-04-26 12:22:50 -07:00
Robert Lin	8069fabdc3	managedservicesplatform: add generalized 'HA' toggle for Redis, Cloud SQL (#62137 ) This change adds an explicit 'HA' toggle for Redis and Cloud SQL, as part of investigation into https://github.com/sourcegraph/managed-services/issues/311: - `redis.highAvailability`: sets the standard HA mode. Right now, we do this by default to preserve existing behaviour - a follow-up to this PR would be to explicitly set the HA mode on production services, then make this default `false`. - `postgreSQL.highAvailability`: enables regional mode, and also point-in-time-recovery as required. This could be quite expensive - we have ~$200/mo of Cloud SQL expenses, mostly in CPU before discounts, so this is projected to ~double that bill, but would be the simplest and most reliable way to maintain uptime in the event of a zonal failover. The plan is not necessarily to immediately make use of `postgreSQL.highAvailability` but have the option open, as the configuration is trivial - we could still develop a manual failover process to adhere to requirements outlined in https://github.com/sourcegraph/managed-services/issues/311 Preparing a summary here: https://www.notion.so/sourcegraph/MSP-service-availability-655e89d164b24727803f5e5a603226d8?pvs=4 ## Test plan `sg msp generate -all` has no diff	2024-04-25 20:48:23 +00:00
Robert Lin	3623ecb2cf	sg/msp: filter generated environments by category (#62131 ) Part of https://github.com/sourcegraph/managed-services/issues/599 See https://github.com/sourcegraph/managed-services/pull/1288 for how this mechanism will be used. ## Test plan ```sh sg msp generate -all -category=test ``` ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/d27c5df5-6d13-43fa-96c2-5523da24693a)	2024-04-24 09:44:16 -07:00
James Cotter	cd44cf24db	sg/msp: remove flaky projectid test (#62155 )	2024-04-24 15:04:51 +00:00
James Cotter	738a37c7a9	sg/msp: add alert policy documentation to generated ops pages (#61939 )	2024-04-18 19:46:19 +00:00
Robert Lin	2189b2991f	msp/cloudrun: use VPC direct egress (#60466 ) Adopts [Cloud Run VPC direct egress](https://cloud.google.com/run/docs/configuring/vpc-direct-vpc) for private networks. Private networks are used by MSP services that connect to Cloud SQL, Memorystore (Redis), or other MSP services via VPC-SC perimeters. On paper, this should give us: - Likely smaller bill, as we no longer pay for serverless VPC connector VMs - Reduced latency on traffic through private network - Reduced latency on traffic spikes as serverless VPC connector no longer needs to scale out There are some caveats we are discussing in Slack: https://sourcegraph.slack.com/archives/C05E2LHPQLX/p1713324250520539 Closes https://github.com/sourcegraph/managed-services/issues/317 ## Test plan Rolled this out without downtime to `msp-testbed: robert`. The VPC private networking test used in https://github.com/sourcegraph/managed-services/pull/1024 still works.	2024-04-17 17:13:50 +00:00
James Cotter	c9a53faea1	msp: GCP Monitoring Dashboard (#61761 )	2024-04-17 16:51:36 +01:00
Robert Lin	c02266057c	msp/project: enable accesscontextmanager APIs (#61679 ) Prep work for https://github.com/sourcegraph/managed-services/issues/660 - required to manage some VPC SC perimeter APIs ## Test plan n/a	2024-04-08 18:12:41 -04:00
James Cotter	d26be1349b	msp: update ops with deployment info (#61610 ) * msp: update ops with deployment info * Update dev/managedservicesplatform/operationdocs/operationdocs.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * add rollout section * sentence case * Update dev/managedservicesplatform/operationdocs/operationdocs.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * update test files --------- Co-authored-by: Robert Lin <robert@bobheadxi.dev>	2024-04-08 13:18:58 +01:00
Robert Lin	3dcfd4c53d	msp/privatenetwork: allow PrivateIpGoogleAccess from subnet (#61648 ) Minor change to allow option 2 as outlined in https://github.com/sourcegraph/managed-services/issues/660#issuecomment-2027394872, but also could be beneficial in the future to investigate using private google access for e.g. Telemetry Gateway publishing to Cloud Run, which is pretty high-volume, and even for simple things like traces/metric export (tracked in https://github.com/sourcegraph/managed-services/issues/1093) ## Test plan Deployed to msp-testbed-robert	2024-04-06 00:57:31 +09:00
Robert Lin	514506de4d	sg/msp: retrieve service/env more consistently, provide possible values in errors (#61620 ) This change migrates `generate` and `tfc sync` to use our service/env argument getters so that we return more consistent error messages. Errors around non-existent or missing service/env arguments now also provide relevant lists of possible values, such as all available services or all available environments for a valid service argument (see test plan examples). Hopefully this makes errors easier to understand, as the possible values should give a better hint as to what arguments the command expects.	2024-04-05 22:05:35 +09:00
Robert Lin	df7ecc189b	msp: support README.md in managed-services repo as mixins for generated docs (#61409 ) This allows MSP operators to define a custom introduction to their service by writing a `README.md` file adjacent to their service specification in https://github.com/sourcegraph/managed-services. The README is parsed, stripped of the first header (which must be an H1 with the service name), all H2+ gets indented (to fit in nicely into the generated docs), and included under the `Service overview` section. This closes https://github.com/sourcegraph/managed-services/issues/382 for now - while @chrsmith's idea to have a `/.msp/...` convention in source code repos, it's a bit finicky to reach upstream for the required files. The custom README can be a large, detailed all-in-one page such that `go/msp-ops/<my-service>` can be used as a go-to reference for everything about that service, or just be a few links to other relevant documentation, pages, and/or places to get help. Examples, from https://github.com/sourcegraph/handbook/pull/8781 and https://github.com/sourcegraph/managed-services/pull/984: - Minimal details: ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/4f36e8fc-001d-440c-b4aa-facb8f8fee65) - Lots of details: ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/8e669963-abbe-442d-9cd1-e59b0f4719b6) ## Test plan Above examples + simple golden testing	2024-03-27 21:01:30 +08:00

1 2 3 4

181 Commits