sourcegraph/lib/background/background.go
Robert Lin 7e9d8ec8dc
feat/cody-gateway: use Enterprise Portal for actor/productsubscriptions (#62934)
Migrates Cody Gateway to use the new Enterprise Portal's "read-only"
APIs. For the most part, this is an in-place replacement - a lot of the
diff is in testing and minor changes. Some changes, such as the removal
of model allowlists, were made down the PR stack in
https://github.com/sourcegraph/sourcegraph/pull/62911.

At a high level, we replace the data requested by
`cmd/cody-gateway/internal/dotcom/operations.graphql` and replace it
with Enterprise Portal RPCs:

- `codyaccessv1.GetCodyGatewayAccess`
- `codyaccessv1.ListCodyGatewayAccesses`

Use cases that previously required retrieving the active license tags
now:

1. Use the display name provided by the Cody Access API
https://github.com/sourcegraph/sourcegraph/pull/62968
2. Depend on the connected Enterprise Portal dev instance to only return
dev subscriptions https://github.com/sourcegraph/sourcegraph/pull/62966

Closes https://linear.app/sourcegraph/issue/CORE-98
Related to https://linear.app/sourcegraph/issue/CORE-135
(https://github.com/sourcegraph/sourcegraph/pull/62909,
https://github.com/sourcegraph/sourcegraph/pull/62911)
Related to https://linear.app/sourcegraph/issue/CORE-97

## Local development

This change also adds Enterprise Portal to `sg start dotcom`. For local
development, we set up Cody Gateway to connect to Enterprise Portal such
that zero configuration is needed - all the required secrets are sourced
from the `sourcegrah-local-dev` GCP project automatically when you run
`sg start dotcom`, and local Cody Gateway will talk to local Enterprise
Portal to do the Enterprise subscriptions sync.

This is actually an upgrade from the current experience where you need
to provide Cody Gateway a Sourcegraph user access token to test
Enterprise locally, though the Sourcegraph user access token is still
required for the PLG actor source.

The credential is configured in
https://console.cloud.google.com/security/secret-manager/secret/SG_LOCAL_DEV_SAMS_CLIENT_SECRET/overview?project=sourcegraph-local-dev,
and I've included documentation in the secret annotation about what it
is for and what to do with it:


![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/c61ad4e0-3b75-408d-a930-076a414336fb)

## Rollout plan

I will open PRs to set up the necessary configuration for Cody Gateway
dev and prod. Once reviews taper down I'll cut an image from this branch
and deploy it to Cody Gateway dev, and monitor it closely + do some
manual testing. Once verified, I'll land this change and monitor a
rollout to production.

Cody Gateway dev SAMS client:
https://github.com/sourcegraph/infrastructure/pull/6108
Cody Gateway prod SAMS client update (this one already exists):

```
accounts=> UPDATE idp_clients
SET scopes = scopes || '["enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]'::jsonb
WHERE id = 'sams_cid_018ea062-479e-7342-9473-66645e616cbf';
UPDATE 1
accounts=> select name, scopes from idp_clients WHERE name = 'Cody Gateway (prod)';
        name         |                                                              scopes                                                              
---------------------+----------------------------------------------------------------------------------------------------------------------------------
 Cody Gateway (prod) | ["openid", "profile", "email", "offline_access", "enterprise_portal::subscription::read", "enterprise_portal::codyaccess::read"]
(1 row)
```

Configuring the target Enterprise Portal instances:
https://github.com/sourcegraph/infrastructure/pull/6127

## Test plan

Start the new `dotcom` runset, now including Enterprise Portal, and
observe logs from both `enterprise-portal` and `cody-gateway`:

```
sg start dotcom
```

I reused the test plan from
https://github.com/sourcegraph/sourcegraph/pull/62911: set up Cody
Gateway external dependency secrets, then set up an enterprise
subscription + license with a high seat count (for a high quota), and
force a Cody Gateway sync:

```
curl -v -H 'Authorization: bearer sekret' http://localhost:9992/-/actor/sync-all-sources
```

This should indicate the new sync against "local dotcom" fetches the
correct number of actors and whatnot.

Using the local enterprise subscription's access token, we run the QA
test suite:

```sh
$ bazel test --runs_per_test=2 --test_output=all //cmd/cody-gateway/qa:qa_test --test_env=E2E_GATEWAY_ENDPOINT=http://localhost:9992 --test_env=E2E_GATEWAY_TOKEN=$TOKEN
INFO: Analyzed target //cmd/cody-gateway/qa:qa_test (0 packages loaded, 0 targets configured).
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 1 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 1 of 2):
PASS
================================================================================
INFO: From Testing //cmd/cody-gateway/qa:qa_test (run 2 of 2):
==================== Test output for //cmd/cody-gateway/qa:qa_test (run 2 of 2):
PASS
================================================================================
INFO: Found 1 test target...
Target //cmd/cody-gateway/qa:qa_test up-to-date:
  bazel-bin/cmd/cody-gateway/qa/qa_test_/qa_test
Aspect @@rules_rust//rust/private:clippy.bzl%rust_clippy_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
Aspect @@rules_rust//rust/private:rustfmt.bzl%rustfmt_aspect of //cmd/cody-gateway/qa:qa_test up-to-date (nothing to build)
INFO: Elapsed time: 13.653s, Critical Path: 13.38s
INFO: 7 processes: 1 internal, 6 darwin-sandbox.
INFO: Build completed successfully, 7 total actions
//cmd/cody-gateway/qa:qa_test                                            PASSED in 11.7s
  Stats over 2 runs: max = 11.7s, min = 11.7s, avg = 11.7s, dev = 0.0s

Executed 1 out of 1 test: 1 test passes.
```
2024-06-07 11:46:01 -07:00

235 lines
7.0 KiB
Go

package background
import (
"context"
"fmt"
"os"
"os/signal"
"slices"
"sync"
"syscall"
"github.com/sourcegraph/sourcegraph/lib/errors"
)
// Routine represents a background process that consists of a long-running
// process with a graceful shutdown mechanism.
type Routine interface {
// Name returns the human-readable name of the routine.
Name() string
// Start begins the long-running process. This routine may also implement a Stop
// method that should signal this process the application is going to shut down.
Start()
// Stop signals the Start method to stop accepting new work and complete its
// current work. This method can but is not required to block until Start has
// returned. The method should respect the context deadline passed to it for
// proper graceful shutdown.
Stop(ctx context.Context) error
}
// Monitor will start the given background routines in their own goroutine. If
// the given context is canceled or a signal is received, the Stop method of
// each routine will be called. This method blocks until the Stop methods of
// each routine have returned. Two signals will cause the app to shut down
// immediately.
//
// This function is only returned when routines are signaled to stop with
// potential errors from stopping routines.
func Monitor(ctx context.Context, routines ...Routine) error {
signals := make(chan os.Signal, 2)
signal.Notify(signals, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM)
return monitorBackgroundRoutines(ctx, signals, routines...)
}
func monitorBackgroundRoutines(ctx context.Context, signals <-chan os.Signal, routines ...Routine) error {
wg := &sync.WaitGroup{}
startAll(wg, routines...)
waitForSignal(ctx, signals)
return stopAll(ctx, wg, routines...)
}
// startAll calls each routine's Start method in its own goroutine and registers
// each running goroutine with the given waitgroup. It DOES NOT wait for the
// routines to finish starting, so the caller must wait for the waitgroup (if
// desired).
func startAll(wg *sync.WaitGroup, routines ...Routine) {
for _, r := range routines {
t := r
wg.Add(1)
Go(func() { defer wg.Done(); t.Start() })
}
}
// stopAll calls each routine's Stop method in its own goroutine and registers
// each running goroutine with the given waitgroup. It waits for all routines to
// stop or the context to be canceled.
func stopAll(ctx context.Context, wg *sync.WaitGroup, routines ...Routine) error {
var stopErrs error
var stopErrsLock sync.Mutex
for _, r := range routines {
wg.Add(1)
Go(func() {
defer wg.Done()
if err := r.Stop(ctx); err != nil {
stopErrsLock.Lock()
stopErrs = errors.Append(stopErrs,
errors.Wrapf(err, "stop routine %q", errors.Safe(r.Name())))
stopErrsLock.Unlock()
}
})
}
done := make(chan struct{})
go func() {
wg.Wait()
done <- struct{}{}
}()
select {
case <-done:
return stopErrs
case <-ctx.Done():
stopErrsLock.Lock()
defer stopErrsLock.Unlock()
if stopErrs != nil {
return errors.Wrapf(ctx.Err(), "unable to stop routines gracefully with partial errors: %v", stopErrs)
}
return errors.Wrap(ctx.Err(), "unable to stop routines gracefully")
}
}
// waitForSignal blocks until the given context is canceled or signal has been
// received on the given channel. If two signals are received, os.Exit(0) will
// be called immediately.
func waitForSignal(ctx context.Context, signals <-chan os.Signal) {
select {
case <-ctx.Done():
go exitAfterSignals(signals, 2)
case <-signals:
go exitAfterSignals(signals, 1)
}
}
// exiter exits the process with a status code of zero. This is declared here
// so it can be replaced by tests without risk of aborting the tests without
// a good indication to the calling program that the tests didn't in fact pass.
var exiter = func() { os.Exit(0) }
// exitAfterSignals waits for a number of signals on the given channel, then
// calls os.Exit(0) to exit the program.
func exitAfterSignals(signals <-chan os.Signal, numSignals int) {
for range numSignals {
<-signals
}
exiter()
}
// CombinedRoutine is a list of routines which are started and stopped in
// unison.
type CombinedRoutine []Routine
func (rs CombinedRoutine) Name() string {
names := make([]string, 0, len(rs))
for _, r := range rs {
names = append(names, r.Name())
}
return fmt.Sprintf("combined%q", names) // [a b c] -> combined["one" "two" "three"]
}
// Start starts all routines, it does not wait for the routines to finish
// starting.
func (rs CombinedRoutine) Start() {
startAll(&sync.WaitGroup{}, rs...)
}
// Stop attempts to gracefully stopping all routines. It attempts to collect all
// the errors returned from the routines, and respects the context deadline
// passed to it and gives up waiting when context deadline exceeded.
func (rs CombinedRoutine) Stop(ctx context.Context) error {
wg := &sync.WaitGroup{}
return stopAll(ctx, wg, rs...)
}
// LIFOStopRoutine is a list of routines which are started in unison, but stopped
// sequentially first-in-first-out order (the first Routine is stopped, and once
// it successfully stops, the next routine is stopped).
//
// This is useful for services where subprocessors should be stopped before the
// primary service stops for a graceful shutdown.
type FIFOSTopRoutine []Routine
func (r FIFOSTopRoutine) Name() string { return "fifo" }
func (r FIFOSTopRoutine) Start() { CombinedRoutine(r).Start() }
func (r FIFOSTopRoutine) Stop(ctx context.Context) error {
// Pass self inverted into LIFOStopRoutine
slices.Reverse(r)
return LIFOStopRoutine(r).Stop(ctx)
}
// LIFOStopRoutine is a list of routines which are started in unison, but stopped
// sequentially last-in-first-out order (the last Routine is stopped, and once it
// successfully stops, the next routine is stopped).
//
// This is useful for services where subprocessors should be stopped before the
// primary service stops for a graceful shutdown.
type LIFOStopRoutine []Routine
func (r LIFOStopRoutine) Name() string { return "lifo" }
func (r LIFOStopRoutine) Start() { CombinedRoutine(r).Start() }
func (r LIFOStopRoutine) Stop(ctx context.Context) error {
var stopErr error
for i := len(r) - 1; i >= 0; i -= 1 {
err := r[i].Stop(ctx)
if err != nil {
stopErr = errors.Append(stopErr,
errors.Wrapf(err, "stop routine %q", errors.Safe(r[i].Name())))
}
}
return stopErr
}
// NoopRoutine return a background routine that does nothing for start or stop.
// If the name is empty, it will default to "noop".
func NoopRoutine(name string) Routine {
if name == "" {
name = "noop"
}
return CallbackRoutine{
NameFunc: func() string { return name },
}
}
// CallbackRoutine calls the StartFunc and StopFunc callbacks to implement a
// Routine. Each callback may be nil.
type CallbackRoutine struct {
NameFunc func() string
StartFunc func()
StopFunc func(ctx context.Context) error
}
func (r CallbackRoutine) Name() string {
if r.NameFunc != nil {
return r.NameFunc()
}
return "callback"
}
func (r CallbackRoutine) Start() {
if r.StartFunc != nil {
r.StartFunc()
}
}
func (r CallbackRoutine) Stop(ctx context.Context) error {
if r.StopFunc != nil {
return r.StopFunc(ctx)
}
return nil
}