In the future this will allow us to attribute stack traces collected by
pprof to a tenant. This only sets it for the http middleware. I am
unsure how to achieve the same thing for grpc, since that uses
propogators.
Test Plan: captured a goroutine profile and saw some goroutines with a
tenant label
---------
Co-authored-by: Erik Seliger <erikseliger@me.com>
This only contains one commit which reduces how often we call scan in
indexserver on dotcom.
- acacc5eda1 shards: only trigger rescan on .zoekt files changing
Test Plan: tested in zoekt CI
I noticed we were using a much older version, so code nav
is a bit janky on S2 as the old version had index data from
lsif-go and not scip-go, causing cross-repo nav from
sg/sg to sg/log to not work.
This PR brings back https://github.com/sourcegraph/sgtail back in `sg`,
plus a few adjustments to make it easier to use. I'll archive that repo
once this PR lands.
@camdencheek mentioned you here as you've been the most recent beta
tester, it's more an FYI than a request for a review, though it's
welcome if you want to spend a bit of time reading this.
Closes DINF-155
## Test plan
Locally tested + new unit test + CI
## Changelog
- Adds a new `sg tail` command that provides a better UI to tail and
filter log messages from `sg start --tail`.
This only contains one commit which has a performance improvement
experiment hidden behind an environment variable.
- https://github.com/sourcegraph/zoekt/commit/12ce07a298 index:
experiment to limit ngram lookups for large snippets
Test Plan: CI
As it says on the tin - various commands related to SAMS can now target
dev services integrated against SAMS-dev directly. See test plan for
examples.
I've also refactored the `sg sams introspect-token` etc commands in
preparation for introducing more `sg sams` commands - the existing
commands are now collapsed into `sg sams token introspect` and `sg sams
token introspect -p`
Part https://linear.app/sourcegraph/issue/CORE-220, a spike into
polishing some local-dev DX for SAMS.
I also upgrade the glamour library because I noticed the JSON
pretty-printing was no longer colored - the upgrade fixed that
## Test plan
All the below now work with no additional effort:
```sh
# get token details and print a temporary token
sg sams token introspect -p
# list enterprise-portal-dev data
sg enterprise subscription list -member.cody-analytics-viewer 'robert@sourcegraph.com'
```
You can use it against locally running services that connect to SAMS-dev
as well, for example the below also works with no additional
flags/envvars:
```sh
sg start dotcom # includes enterprise-portal
sg enterprise subscription list -enterprise-portal-server=http://localhost:6081
```
## Changelog
- `sg` commands requiring SAMS client credentials now load shared
SAMS-dev client credentials by default.
This PR is stacked on top of all the prior work @chrsmith has done for
shuffling configuration data around; it implements the new "Self hosted
models" functionality.
## Configuration
Configuring a Sourcegraph instance to use self-hosted models basically
involves adding some configuration like this to the site config (if you
set `modelConfiguration`, you are opting in to the new system which is
in early access):
```
// Setting this field means we are opting into the new Cody model configuration system.
"modelConfiguration": {
// Disable use of Sourcegraph's servers for model discovery
"sourcegraph": null,
// Create two model providers
"providerOverrides": [
{
// Our first model provider "mistral" will be a Huggingface TGI deployment which hosts our
// mistral model for chat functionality.
"id": "mistral",
"displayName": "Mistral",
"serverSideConfig": {
"type": "huggingface-tgi",
"endpoints": [{"url": "https://mistral.example.com/v1"}]
},
},
{
// Our second model provider "bigcode" will be a Huggingface TGI deployment which hosts our
// bigcode/starcoder model for code completion functionality.
"id": "bigcode",
"displayName": "Bigcode",
"serverSideConfig": {
"type": "huggingface-tgi",
"endpoints": [{"url": "http://starcoder.example.com/v1"}]
}
}
],
// Make these two models available to Cody users
"modelOverridesRecommendedSettings": [
"mistral::v1::mixtral-8x7b-instruct",
"bigcode::v1::starcoder2-7b"
],
// Configure which models Cody will use by default
"defaultModels": {
"chat": "mistral::v1::mixtral-8x7b-instruct",
"fastChat": "mistral::v1::mixtral-8x7b-instruct",
"codeCompletion": "bigcode::v1::starcoder2-7b"
}
}
```
More advanced configurations are possible, the above is our blessed
configuration for today.
## Hosting models
Another major component of this work is starting to build up
recommendations around how to self-host models, which ones to use, how
to configure them, etc.
For now, we've been testing with these two on a machine with dual A100s:
* Huggingface TGI (this is a Docker container for model inference, which
provides an OpenAI-compatible API - and is widely popular)
* Two models:
* Starcoder2 for code completion; specifically `bigcode/starcoder2-15b`
with `eetq` 8-bit quantization.
* Mixtral 8x7b instruct for chat; specifically
`casperhansen/mixtral-instruct-awq` which uses `awq` 4-bit quantization.
This is our 'starter' configuration. Other models - specifically other
starcoder 2, and mixtral instruct models - certainly work too, and
higher parameter versions may of course provide better results.
Documentation for how to deploy Huggingface TGI, suggested configuration
and debugging tips - coming soon.
## Advanced configuration
As part of this effort, I have added a quite extensive set of
configuration knobs to to the client side model configuration (see `type
ClientSideModelConfigOpenAICompatible` in this PR)
Some of these configuration options are needed for things to work at a
basic level, while others (e.g. prompt customization) are not needed for
basic functionality, but are very important for customers interested in
self-hosting their own models.
Today, Cody clients have a number of different _autocomplete provider
implementations_ which tie model-specific logic to enable autocomplete,
to a provider. For example, if you use a GPT model through Azure OpenAI,
the autocomplete provider for that is entirely different from what you'd
get if you used a GPT model through OpenAI officially. This can lead to
some subtle issues for us, and so it is worth exploring ways to have a
_generalized autocomplete provider_ - and since with self-hosted models
we _must_ address this problem, these configuration knobs fed to the
client from the server are a pathway to doing that - initially just for
self-hosted models, but in the future possibly generalized to other
providers.
## Debugging facilities
Working with customers in the past to use OpenAI-compatible APIs, we've
learned that debugging can be quite a pain. If you can't see what
requests the Sourcegraph backend is making, and what it is getting
back.. it can be quite painful to debug.
This PR implements quite extensive logging, and a `debugConnections`
flag which can be turned on to enable logging of the actual request
payloads and responses. This is critical when a customer is trying to
add support for a new model, their own custom OpenAI API service, etc.
## Robustness
Working with customers in the past, we also learned that various parts
of our backend `openai` provider were not super robust. For example, [if
more than one message was present it was a fatal
error](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/openai.go#L305),
or if the SSE stream yielded `{"error"}` payloads, they would go
ignored. Similarly, the SSE event stream parser we use is heavily
tailored towards [the exact response
structure](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/decoder.go#L15-L19)
which OpenAI's official API returns, and is therefor quite brittle if
connecting to a different SSE stream.
For this work, I have _started by forking_ our
`internal/completions/client/openai` - and made a number of major
improvements to it to make it more robust, handle errors better, etc.
I have also replaced the usage of a custom SSE event stream parser -
which was not spec compliant and brittle - with a proper SSE event
stream parser that recently popped up in the Go community:
https://github.com/tmaxmax/go-sse
My intention is that after more extensive testing, this new
`internal/completions/client/openaicompatible` provider will be more
robust, more correct, and all around better than
`internal/completions/client/openai` (and possibly the azure one) so
that we can just supersede those with this new `openaicompatible` one
entirely.
## Client implementation
Much of the work done in this PR is just "let the site admin configure
things, and broadcast that config to the client through the new model
config system."
Actually getting the clients to respect the new configuration, is a task
I am tackling in future `sourcegraph/cody` PRs.
## Test plan
1. This change currently lacks any unit/regression tests, that is a
major noteworthy point. I will follow-up with those in a future PR.
* However, these changes are **incredibly** isolated, clearly only
affecting customers who opt-in to this new self-hosted models
configuration.
* Most of the heavy lifting (SSE streaming, shuffling data around) is
done in other well-tested codebases.
2. Manual testing has played a big role here, specifically:
* Running a dev instance with the new configuration, actually connected
to Huggingface TGI deployed on a remote server.
* Using the new `debugConnections` mechanism (which customers would use)
to directly confirm requests are going to the right places, with the
right data and payloads.
* Confirming with a new client (changes not yet landed) that
autocomplete and chat functionality work.
Can we use more testing? Hell yeah, and I'm going to add it soon. Does
it work quite well and have small room for error? Also yes.
## Changelog
Cody Enterprise: added a new configuration for self-hosting models.
Reach out to support if you would like to use this feature as it is in
early access.
---------
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Secrets fetched from GSM should probably not be stored locally. As we
increase the usage of fetching external secrets, this stuff is
increasingly sensitive, particularly for SAMS stuff - every time it's
used, we should ensure that the user has the required permissions, and
also only store external secrets in-memory.
It looks like several other callsites make use of the persistence of
other secrets e.g. those prompted from users, so this change
specifically targets the `GetExternal` method. Additionally, I also
added a check on load to delete any legacy external secrets that are
stored to disk on load - we can remove this after a few weeks.
## Test plan
Unit tests asserts old behaviour and new desired behaviour
`sg start -cmd cody-gateway` uses external secrets and works as expected
After running `sg`, `sg secret list` has no external secrets anymore
This PR stubs out the URI needed for the React UI to interface with the
appliance, as well as removed the previously implemented UI and
components of the React UI that were only around for a demo.
A number of helper and safety methods have also been added for
interfacing with JSON reads/writes and handling common errors.
While the HTTP handlers are still only stubs, this PR was growing in
size so I decided to cut it here and break apart the rest in upcoming
PRs. React UI is able to parse status and auth correctly at this time.
<!-- PR description tips:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e
-->
## Test plan
Unit tests
## Changelog
<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
For simple, MVP integrations with Telemetry Gateway, and also as a
reference. See example - this can be consumed with:
```go
import telemetrygatewayv1 "github.com/sourcegraph/sourcegraph/lib/telemetrygateway/v1"
```
## Test plan
n/a
The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171
bumps the `prometheus/common` package too far via transitive deps,
causing us to generate configuration for alertmanager that altertmanager
doesn't accept, at least until the alertmanager project cuts a new
release with a newer version of `promethues/common`.
For now we forcibly downgrade with a replace. Everything still builds,
so we should be good to go.
## Test plan
`sg start` and `sg run prometheus`. On `main`, editing
`observability.alerts` will cause Alertmanager to refuse to accept the
generated configuration. With this patch, all is well it seems - config
changes go through as expected. This is a similar test plan for
https://github.com/sourcegraph/sourcegraph/pull/63329
## Changelog
- Fix Prometheus Alertmanager configuration failing to apply
`observability.alerts` from site config
We created a decoder that was never used, but the package is otherwise
unused. It recently had a CVE, so this just removes it so it's no longer
part of our security surface area.
Removes existing `sg analytics` command and replaces it with a
one-per-invocation sqlite backed approach. This is a local storage for
invocation events before theyre pushed to bigquery
## Test plan
```
sqlite> select * from analytics;
0190792e-af38-751a-b93e-8481290a18b6|1|{"args":[],"command":"sg help","flags":{"help":null,"sg":null},"nargs":0,"end_time":"2024-07-03T15:20:21.069837706Z","success":true}
0190792f-4e2b-7c35-98d6-ad73cab82391|1|{"args":["dotcom"],"command":"sg live","flags":{"live":null,"sg":null},"nargs":1,"end_time":"2024-07-03T15:21:04.563232429Z","success":true}
```
## Changelog
<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
---------
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
Adds a new `postgreSQL.logicalReplication` configuration to allow MSP to
generate prerequisite setup for integration with Datastream:
https://cloud.google.com/datastream/docs/sources-postgresql. Integration
with Datastream allows the Data Analytics team to self-serve data
enrichment needs for the Telemetry V2 pipeline.
Enabling this feature entails downtime (Cloud SQL instance restart), so
enabling the logical replication feature at the Cloud SQL level
(`cloudsql.logical_decoding`) is gated behind
`postgreSQL.logicalReplication: {}`.
Setting up the required stuff in Postgres is a bit complicated,
requiring 3 Postgres provider instances:
1. The default admin one, authenticated with our admin user
2. New: a workload identity provider, using
https://github.com/cyrilgdn/terraform-provider-postgresql/pull/448 /
https://github.com/sourcegraph/managed-services-platform-cdktf/pull/11.
This is required for creating a publication on selected tables, which
requires being owner of said table. Because tables are created by
application using e.g. auto-migrate, the workload identity is always the
table owner, so we need to impersonate the IAM user
3. New: a "replication user" which is created with the replication
permission. Replication seems to not be a propagated permission so we
need a role/user that has replication enabled.
A bit more context scattered here and there in the docstrings.
Beyond the Postgres configuration we also introduce some additional
resources to enable easy Datastream configuration:
1. Datastream Private Connection, which peers to the service private
network
2. Cloud SQL Proxy VM, which only allows connections to `:5432` from the
range specified in 1, allowing a connection to the Cloud SQL instance
2. Datastream Connection Profile attached to 1
From there, data team can click-ops or manage the Datastream Stream and
BigQuery destination on their own.
Closes CORE-165
Closes CORE-212
Sample config:
```yaml
resources:
postgreSQL:
databases:
- "primary"
logicalReplication:
publications:
- name: testing
database: primary
tables:
- users
```
## Test plan
https://github.com/sourcegraph/managed-services/pull/1569
## Changelog
- MSP services can now configure `postgreSQL.logicalReplication` to
enable Data Analytics team to replicate selected database tables into
BigQuery.
**feat(appliance): landing page with no-op authorization check**
Refactor appliance HTTP handler functions to return http.Handlers,
rather than be type-converted to a particular implementation of Handlers
(HandlerFuncs). This is done to make the CheckAuthorization middleware
(introduce here as a no-op) slightly more idiomatic, operation on the
more-general http.Handler rather than one implementing type of Handler.
**feat(appliance): admin password gates most pages**
- /appliance and /appliance/setup redirect to the login page if the
browser does not present a cookie containing a valid JWT for the
appliance.
- On first boot, the appliance generates a JWT signing key and saves it
in a backing secret.
- The admin must create a particularly-named secret containing the
password, that on first boot will be hashed and transposed to the same
backing secret that holds the JWT key.
- This is documented internally as we do not yet have a place for
user-facing docs pre-release.
- The password is checked for strength, and a new one must be
configuredb by a Kubernetes admin if it is insufficiently strong.
---
Closes
https://linear.app/sourcegraph/issue/REL-20/maintenance-ui-admin-must-configure-initial-password-on-first-boot
I recommend reviewing this as 2 commits, since the first is a refactor
that enables the second. The second is quite large though (sorry!)
## Test plan
<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->
Unit tests included for authorization middleware. Browser testing split
to
https://linear.app/sourcegraph/issue/REL-216/appliance-frontend-testing.
Manual tests on some UI flows:
1. Booting the appliance against an empty namespace and navigating to
localhost:8080 displays an error message.
2. Creating the password secret as instructed and navigating to the
appliance shows a login form
3. Receive error message when entering password incorrectly
4. Can access the setup page when entering password correctly and
proceed to install SG
Please kick the tires on these too!
## Changelog
<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
---------
Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>
Co-authored-by: Anish Lakhwara <anish+github@lakhwara.com>
Description:
This PR introduces support for counting tokens within the Azure code and
updating these counts in Redis. The token counting logic is embedded
directly in the Azure code rather than using a standardized point for
all token counting logic.
Reasoning:
• Azure does not currently support obtaining token usage from their
streaming endpoint, unlike OpenAI.
• To enable immediate functionality, the token counting logic is placed
within the Azure code itself.
• The implementation supports GPT-4o.
Future Considerations:
• When Azure eventually adds support for token usage from the streaming
endpoint, we will migrate to using Azure’s built-in capabilities.
• This will ensure full utilization of Azure OpenAI features as they
achieve parity with OpenAI.
Changes:
• Added token counting logic to the Azure code.
• Updated Redis with the token counts.
Testing:
• Verified the implementation works with GPT-4o.
Conclusion:
This is a temporary solution to enable token counting in Azure. We will
adapt our approach as Azure enhances its feature set to include token
usage from their streaming endpoint.
## Test plan
Tested locally with debugger
<!-- All pull requests REQUIRE a test plan:
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->
## Changelog
<!--
1. Ensure your pull request title is formatted as: $type($domain): $what
2. Add bullet list items for each additional detail you want to cover
(see example below)
3. You can edit this after the pull request was merged, as long as
release shipping it hasn't been promoted to the public.
4. For more information, please see this how-to
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c?
Audience: TS/CSE > Customers > Teammates (in that order).
Cheat sheet: $type = chore|fix|feat $domain:
source|search|ci|release|plg|cody|local|...
-->
<!--
Example:
Title: fix(search): parse quotes with the appropriate context
Changelog section:
## Changelog
- When a quote is used with regexp pattern type, then ...
- Refactored underlying code.
-->
With an open-source package with more features.
Ran:
```
sg bazel configure
bazel run //:gazelle-update-repos
```
Between writing that little slices package, and wanting more from it in
an upcoming PR, I found out this 3rd party library existed. It seems
very good.
[Linear
Issue](https://linear.app/sourcegraph/project/claude-3-on-gcp-8c014e1a3506/overview)
This PR adds support for anthropic models in the google provider through
google vertex.
NOTE: The current code only supported Google Gemini API and had boiler
plate code for Google vertex(only for the gemini model) this PR adds
Google Vertex for anthropic models properly so this way the google
provider can be run in 3 different configurations
1. Google Gemini API(this works but only for chat and not for
completions which is the intended behaviour for now)
2. Google Vertex API Anthropic Model(This works perfectly and is added
in this PR and tested for both chat and completions and it works great)
3. Google Vertex API Gemini Model (this doesn't work yet and can
eventually be added to this codebase but we gotta add a new decoder for
the streaming responses of the gemini model through this API we can take
care of this later)
Sense of Urgency: This is a P0 because of enterprise requirements so I
would appreciate a fast approval and merging.
<!-- 💡 To write a useful PR description, make sure that your description
covers:
- WHAT this PR is changing:
- How was it PREVIOUSLY.
- How it will be from NOW on.
- WHY this PR is needed.
- CONTEXT, i.e. to which initiative, project or RFC it belongs.
The structure of the description doesn't matter as much as covering
these points, so use
your best judgement based on your context.
Learn how to write good pull request description:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e?pvs=4
-->
## Test plan
- Run this branch for Cody instance ->
https://github.com/sourcegraph/cody/pull/4606
- Ask @arafatkatze to dm you the siteadmin config to make things work
- Check the logs and play with completions and chat
<!-- All pull requests REQUIRE a test plan:
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->
## Changelog
<!--
1. Ensure your pull request title is formatted as: $type($domain): $what
3. Add bullet list items for each additional detail you want to cover
(see example below)
4. You can edit this after the pull request was merged, as long as
release shipping it hasn't been promoted to the public.
5. For more information, please see this how-to
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c?
Audience: TS/CSE > Customers > Teammates (in that order).
Cheat sheet: $type = chore|fix|feat $domain:
source|search|ci|release|plg|cody|local|...
-->
<!--
Example:
Title: fix(search): parse quotes with the appropriate context
Changelog section:
## Changelog
- When a quote is used with regexp pattern type, then ...
- Refactored underlying code.
-->
---------
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
Co-authored-by: Beatrix <beatrix@sourcegraph.com>
Co-authored-by: Stephen Gutekanst <stephen@sourcegraph.com>
- Updates enry to include languages from Linguist v7.29.0 such as Mojo etc.
- Updates auto-complete filters in frontend code.
- Updates Zoekt to pick up newer version with bumped enry dep.
- Updates language extension overrides to avoid ambiguity for `.json` and `.yml`.
- Updates snapshot tests.
Closes CORE-99, closes CORE-176
This PR is based off (and was also served as PoC of) [RFC 962: MSP IAM
framework](https://docs.google.com/document/d/1ItJlQnpR5AHbrfAholZqjH8-8dPF1iQcKh99gE6SSjs/edit).
It comes with two main parts:
1. The initial version of the MSP IAM SDK:
`lib/managedservicesplatform/iam`
- Embeds the [OpenFGA server
implementation](https://github.com/openfga/openfga/tree/main/pkg/server)
and exposes the a `ClientV1` for interacting with it.
- Automagically manages the both MSP IAM's and OpenFGA's database
migrations upon initializing the `ClientV1`.

- Ensures the specified OpenFGA's store and automatization model DSL
exists.
- Utility types and helpers to avoid easy mistakes (i.e. make the
relation tuples a bit more strongly-typed).
- Decided to put all types and pre-defined values together to simulate a
"central registry" and acting as a forcing function for services to form
some sort of convention. Then when we migrate the OpenFGA server to a
separate standalone service, it will be less headache about
consolidating similar meaning types/relations but different string
literals.
1. The first use case of the MSP IAM:
`cmd/enterprise-portal/internal/subscriptionsservice`
- Added/updated RPCs:
- Listing enterprise subscriptions via permissions
- Update enterprise subscriptions to assign instance domains
- Update enterprise subscriptions membership to assign roles (and
permissions)
- A database table for enterprise subscriptions, only storing the extra
instance domains as Enterprise Portal is not the
writeable-source-of-truth.
## Other minor changes
- Moved `internal/redislock` to `lib/redislock` to be used in MSP IAM
SDK.
- Call `createdb ...` as part of `enterprise-portal` install script in
`sg.config.yaml` (`msp_iam` database is a hard requirement of MSP IAM
framework).
## Test plan
Tested with gRPC UI:
- `UpdateEnterpriseSubscription` to assign an instance domain
- `UpdateEnterpriseSubscriptionMembership` to assign roles
- `ListEnterpriseSubscriptions`:
- List by subscription ID
- List by instance domain
- List by view cody analytics permissions
---------
Co-authored-by: Robert Lin <robert@bobheadxi.dev>
This change extracts the unrelated transitive upgrades of
https://github.com/sourcegraph/sourcegraph/pull/63171 (CORE-177) into a
separate PR. I'm making this because @unknwon ran into issues with the
exact same dependencies in
https://github.com/sourcegraph/sourcegraph/pull/63171#issuecomment-2157694545.
The change consists of upgrades to:
- `google.golang.org/grpc` - there's a deprecation of `grpc.DialContext`
that we agreed in #63171 to keep for now.
- removing our `replace` directive on `github.com/prometheus/common` and
upgrading it. This is safe to do because our Alertmanager version is
already way ahead, and the reason this has a `replace` is outdated now.
## Test plan
CI, nothing blows up on `sg start` and I can click around and do a bit
of searching
Updates the Goldmark markdown renderer to v1.7.2, which includes
https://github.com/yuin/goldmark/pull/455, fixing the issue with
single-tilde strikethroughs not rendering as strikethroughs as described
by the Github Flavored Markdown spec.
There was a goroutine leak in the Linux observation logic.
add4baa455/internal/memcmd/observer_linux.go (L122-L141)
In observe, it was possible for the goroutine that produces collection
events on a channel to block forever. If we exit early from the
function, it was still possible for the channel to be full (because the
consumer exited early), which causes the producer goroutine to block
forever since there is no room in the channel.
This PR fixes this issue by adding a `defer` statement that ensures that
collection channel is drained before we exit - thus fixing the leak.
## Test plan
Added goroutine leak dector to linux tests - now see them pass.
## Changelog
A goroutine leak in the experimental linux memory observation logic has
been fixed.
This PR adds a new package memcmd, that adds a new abstraction called
"Observer" that allows you to track the memory that a command (and all
of its children) is using. (This package uses a polling approach with
procfs, since [maxRSS on Linux is otherwise
unreliable](https://jkz.wtf/random-linux-oddity-1-ru_maxrss) for our
purposes).
Example usage
```go
import (
"context"
"fmt"
"os/exec"
"time"
"github.com/sourcegraph/sourcegraph/internal/memcmd"
)
func Example() {
const template = `
#!/usr/bin/env bash
set -euo pipefail
word=$(head -c "$((10 * 1024 * 1024))" </dev/zero | tr '\0' '\141') # 10MB worth of 'a's
sleep 1
echo ${#word}
`
cmd := exec.Command("bash", "-c", template)
err := cmd.Start()
if err != nil {
panic(err)
}
observer, err := memcmd.NewLinuxObserver(context.Background(), cmd, 1*time.Millisecond)
if err != nil {
panic(err)
}
observer.Start()
defer observer.Stop()
err = cmd.Wait()
if err != nil {
panic(err)
}
memoryUsage, err := observer.MaxMemoryUsage()
if err != nil {
panic(err)
}
fmt.Println((0 < memoryUsage && memoryUsage < 50*1024*1024)) // Output should be between 0 and 50MB
// Output:
// true
}
```
## Test plan
Unit tests
Note that some tests only work on darwin, so you'll have to run those
locally.
## Changelog
This feature adds a package that allows us to track the memory usage of
commands invoked via exec.Cmd.
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Part of CORE-99
This PR scaffolds the database schema and code structure based on
[CORE-99
comment](https://linear.app/sourcegraph/issue/CORE-99/enterprise-portal-design-sams-user-to-subscription-rpcs#comment-8105ac31)
with some modifications. See inline comments for more elaborations.
- It uses GORM's ONLY for auto migration, just to kick things off, we
may migrate to file-based migration like we are planning for SAMS.
- It then uses the `*pgxpool.Pool` as the DB interface for executing
business logic queries.
Additionally, refactored `subscriptionsservice/v1.go` to use a `Store`
that provide single interface for accessing data(base), as we have been
doing in SAMS and SSC.
## Test plan
Enterprise Portal starts locally, and database is initialized:

Deleting Notion pages takes a very long time, and is prone to breaking in the page deletion step, where we must delete blocks one at a time because Notion does not allow for bulk block deletions. The errors seem to generally just be random Notion internal errors. This is very bad because it leaves go/msp-ops pages in an unusable state.
To try and mitigate, we add several places to blindly retry:
1. At the Notion SDK level, where a config option is available for retrying 429 errors
2. At the "reset page" helper level, where a failure to reset a page will prompt a retry of the whole helper
3. At the "delete blocks" helper level, where individual block deletion failures will be retried
Attempt to mitigate https://linear.app/sourcegraph/issue/CORE-119
While here, I also made some other QOL tweaks:
- Fix timing of sub-tasks in CLI output
- Bump default concurrency to 5 (our retries will handle if this is too aggressive, hopefully)
- Fix a missing space in generated docs
## Test plan
```
sg msp ops generate-handbook-pages
```