sourcegraph/internal
Stephen Gutekanst dca1b9694d
self hosted models (#63899)
This PR is stacked on top of all the prior work @chrsmith has done for
shuffling configuration data around; it implements the new "Self hosted
models" functionality.

## Configuration

Configuring a Sourcegraph instance to use self-hosted models basically
involves adding some configuration like this to the site config (if you
set `modelConfiguration`, you are opting in to the new system which is
in early access):

```
  // Setting this field means we are opting into the new Cody model configuration system.
  "modelConfiguration": {
    // Disable use of Sourcegraph's servers for model discovery
    "sourcegraph": null,

    // Create two model providers
    "providerOverrides": [
      {
        // Our first model provider "mistral" will be a Huggingface TGI deployment which hosts our
        // mistral model for chat functionality.
        "id": "mistral",
        "displayName": "Mistral",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "https://mistral.example.com/v1"}]
        },
      },
      {
        // Our second model provider "bigcode" will be a Huggingface TGI deployment which hosts our
        // bigcode/starcoder model for code completion functionality.
        "id": "bigcode",
        "displayName": "Bigcode",
        "serverSideConfig": {
          "type": "huggingface-tgi",
          "endpoints": [{"url": "http://starcoder.example.com/v1"}]
        }
      }
    ],

    // Make these two models available to Cody users
    "modelOverridesRecommendedSettings": [
      "mistral::v1::mixtral-8x7b-instruct",
      "bigcode::v1::starcoder2-7b"
    ],

    // Configure which models Cody will use by default
    "defaultModels": {
      "chat": "mistral::v1::mixtral-8x7b-instruct",
      "fastChat": "mistral::v1::mixtral-8x7b-instruct",
      "codeCompletion": "bigcode::v1::starcoder2-7b"
    }
  }
```

More advanced configurations are possible, the above is our blessed
configuration for today.

## Hosting models

Another major component of this work is starting to build up
recommendations around how to self-host models, which ones to use, how
to configure them, etc.

For now, we've been testing with these two on a machine with dual A100s:

* Huggingface TGI (this is a Docker container for model inference, which
provides an OpenAI-compatible API - and is widely popular)
* Two models:
* Starcoder2 for code completion; specifically `bigcode/starcoder2-15b`
with `eetq` 8-bit quantization.
* Mixtral 8x7b instruct for chat; specifically
`casperhansen/mixtral-instruct-awq` which uses `awq` 4-bit quantization.

This is our 'starter' configuration. Other models - specifically other
starcoder 2, and mixtral instruct models - certainly work too, and
higher parameter versions may of course provide better results.

Documentation for how to deploy Huggingface TGI, suggested configuration
and debugging tips - coming soon.

## Advanced configuration

As part of this effort, I have added a quite extensive set of
configuration knobs to to the client side model configuration (see `type
ClientSideModelConfigOpenAICompatible` in this PR)

Some of these configuration options are needed for things to work at a
basic level, while others (e.g. prompt customization) are not needed for
basic functionality, but are very important for customers interested in
self-hosting their own models.

Today, Cody clients have a number of different _autocomplete provider
implementations_ which tie model-specific logic to enable autocomplete,
to a provider. For example, if you use a GPT model through Azure OpenAI,
the autocomplete provider for that is entirely different from what you'd
get if you used a GPT model through OpenAI officially. This can lead to
some subtle issues for us, and so it is worth exploring ways to have a
_generalized autocomplete provider_ - and since with self-hosted models
we _must_ address this problem, these configuration knobs fed to the
client from the server are a pathway to doing that - initially just for
self-hosted models, but in the future possibly generalized to other
providers.

## Debugging facilities

Working with customers in the past to use OpenAI-compatible APIs, we've
learned that debugging can be quite a pain. If you can't see what
requests the Sourcegraph backend is making, and what it is getting
back.. it can be quite painful to debug.

This PR implements quite extensive logging, and a `debugConnections`
flag which can be turned on to enable logging of the actual request
payloads and responses. This is critical when a customer is trying to
add support for a new model, their own custom OpenAI API service, etc.

## Robustness

Working with customers in the past, we also learned that various parts
of our backend `openai` provider were not super robust. For example, [if
more than one message was present it was a fatal
error](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/openai.go#L305),
or if the SSE stream yielded `{"error"}` payloads, they would go
ignored. Similarly, the SSE event stream parser we use is heavily
tailored towards [the exact response
structure](https://github.com/sourcegraph/sourcegraph/blob/main/internal/completions/client/openai/decoder.go#L15-L19)
which OpenAI's official API returns, and is therefor quite brittle if
connecting to a different SSE stream.

For this work, I have _started by forking_ our
`internal/completions/client/openai` - and made a number of major
improvements to it to make it more robust, handle errors better, etc.

I have also replaced the usage of a custom SSE event stream parser -
which was not spec compliant and brittle - with a proper SSE event
stream parser that recently popped up in the Go community:
https://github.com/tmaxmax/go-sse

My intention is that after more extensive testing, this new
`internal/completions/client/openaicompatible` provider will be more
robust, more correct, and all around better than
`internal/completions/client/openai` (and possibly the azure one) so
that we can just supersede those with this new `openaicompatible` one
entirely.

## Client implementation

Much of the work done in this PR is just "let the site admin configure
things, and broadcast that config to the client through the new model
config system."

Actually getting the clients to respect the new configuration, is a task
I am tackling in future `sourcegraph/cody` PRs.

## Test plan

1. This change currently lacks any unit/regression tests, that is a
major noteworthy point. I will follow-up with those in a future PR.
* However, these changes are **incredibly** isolated, clearly only
affecting customers who opt-in to this new self-hosted models
configuration.
* Most of the heavy lifting (SSE streaming, shuffling data around) is
done in other well-tested codebases.
2. Manual testing has played a big role here, specifically:
* Running a dev instance with the new configuration, actually connected
to Huggingface TGI deployed on a remote server.
* Using the new `debugConnections` mechanism (which customers would use)
to directly confirm requests are going to the right places, with the
right data and payloads.
* Confirming with a new client (changes not yet landed) that
autocomplete and chat functionality work.

Can we use more testing? Hell yeah, and I'm going to add it soon. Does
it work quite well and have small room for error? Also yes.

## Changelog

Cody Enterprise: added a new configuration for self-hosting models.
Reach out to support if you would like to use this feature as it is in
early access.

---------

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
2024-07-19 01:34:02 +00:00
..
accesstoken bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
actor bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
adminanalytics bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
api various improvements to saved searches (#63539) 2024-07-15 20:12:34 +00:00
appliance fix(appliance): resource kind present on reconciler logs (#63919) 2024-07-18 13:29:18 -04:00
audit bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
auth feat: show current email during password reset and auto-populate text-box after successful completion (#59645) 2024-07-12 15:45:35 +01:00
authbearer bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
authz internal/database/sub_repo_permissions: modify store to be able to insert ip based permissions (#63811) 2024-07-18 14:05:30 -07:00
batches gating: Add individual switches for disabling tools features (#63686) 2024-07-16 15:45:38 +02:00
binary
bytesize bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
byteutils Backend: add line index (#63726) 2024-07-09 19:59:42 +00:00
clientconfig Add a better Cody client server-sent configuration mechanism (#63591) 2024-07-03 22:57:31 +00:00
cloud bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
cmd chore(search): update search API call sites to set the version explicitly (#63782) 2024-07-12 10:01:47 +02:00
codeintel refactor(codeintel): Extracts a MappedIndex abstraction over uploads (#63781) 2024-07-18 05:54:48 +02:00
codemonitors chore(worker): disable jobs based on ENVs (#63853) 2024-07-16 18:07:22 +02:00
codygateway feat/enterpriseportal: implement GetCodyGatewayUsage RPC (#63555) 2024-07-02 09:39:15 -07:00
collections chore: Add collection type - OrderedSet (#63469) 2024-06-25 13:13:14 +00:00
comby bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
completions self hosted models (#63899) 2024-07-19 01:34:02 +00:00
compute chore: Centralize languages package as source-of-truth (#63292) 2024-06-18 13:10:24 +00:00
conf self hosted models (#63899) 2024-07-19 01:34:02 +00:00
cookie bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
ctags_config feat(codeintel): Add scip-ctags support for Magik (#63504) 2024-07-08 09:24:36 -04:00
database Prompt Library (#63872) 2024-07-18 16:04:55 -07:00
debugserver chore/deps: upgrade grpc, prometheus/common (#63328) 2024-06-19 09:55:44 -04:00
deviceid
diskcache all: use observation.TestContextTB instead of TestContext (#61751) 2024-04-10 14:07:39 +02:00
diskusage bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
dotcom dotcom: MockSourcegraphDotComMode requires a T for cleanup (#61172) 2024-03-14 20:27:21 +00:00
download
embeddings bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
encryption bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
endpoint bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
env chore(config): clarify env var already registered panic (#63784) 2024-07-11 13:59:29 +00:00
errcode Chore: remove gorilla/schema (#63738) 2024-07-10 15:36:37 +00:00
eventlogger chore(analytics): remove Cody characters events from inclusion in pin… (#63557) 2024-06-29 01:23:17 +00:00
executor worker: Reduce frequency of very frequently run jobs (#62864) 2024-05-23 18:31:20 +02:00
extsvc webhooks: Deterministically match webhook events to repos (#63668) 2024-07-16 06:50:13 +02:00
featureflag Feature flags: relax some constraints (#61343) 2024-03-25 10:39:01 -06:00
fileutil gitserver: Implement RefHash in backend (#62612) 2024-05-13 16:05:16 +02:00
github_apps rcache: Explicitly pass redis pool to use (#63644) 2024-07-10 01:23:19 +02:00
gitserver chore: Add doc comment for DiffOptions.Paths (#63385) 2024-06-20 19:26:34 +08:00
goroutine rcache: Explicitly pass redis pool to use (#63644) 2024-07-10 01:23:19 +02:00
gosyntect feat(search): Add Syntax Highlighting for Magik language (#62919) 2024-06-06 16:49:07 -04:00
gqltestutil Search: expose path matches on FileMatch (#63396) 2024-06-26 08:23:28 -06:00
gqlutil
grpc chore/deps: upgrade grpc, prometheus/common (#63328) 2024-06-19 09:55:44 -04:00
guardrails bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
hashutil
highlight chore(search): Add test case covering hack lang detection behavior (#63136) 2024-06-07 09:35:39 -04:00
honey honey: add read locking to event.Fields call for NonSendingReader (#61886) 2024-04-15 15:19:38 +00:00
hostmatcher gomod: update or vendor buildkit, docker, hostmatcher and saml to resolve CVEs (#60130) 2024-02-05 13:14:15 +02:00
hostname
htmlutil Web: add mermaid diagram rendering (#62678) 2024-05-16 14:54:43 -04:00
httpcli rcache: Explicitly pass redis pool to use (#63644) 2024-07-10 01:23:19 +02:00
httpserver lib/background: upgrade Routine interface with context and errors (#62136) 2024-05-24 10:04:55 -04:00
httptestutil Remove GitHub proxy service (#56485) 2023-09-14 19:43:40 +02:00
insights gating: Add individual switches for disabling tools features (#63686) 2024-07-16 15:45:38 +02:00
instrumentation chore: upgrade otel SDK packages (#59564) 2024-01-15 20:08:54 +00:00
ipynb Render Jupyter notebooks (#62583) 2024-05-10 12:21:10 -04:00
jsonc
k8s/resource feat(appliance): local developer mode (#63417) 2024-06-24 16:19:27 +01:00
lazyregexp
license bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
licensing chore: Change errors.HasType to respect multi-errors (#63024) 2024-06-06 13:02:14 +00:00
limiter bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
logging fix: update links for dev docs (#62758) 2024-05-17 13:47:34 +02:00
luasandbox bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
mapfs bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
maps Revert "Remove unused internal/k8s package" (#61835) 2024-04-12 09:35:40 -04:00
markdown Render Jupyter notebooks (#62583) 2024-05-10 12:21:10 -04:00
memcmd fix/internal/memcmd: close the explicit stop channel before cancelling context (#63214) 2024-06-12 06:08:29 -07:00
memo
metrics chore/deps: upgrade grpc, prometheus/common (#63328) 2024-06-19 09:55:44 -04:00
modelconfig self hosted models (#63899) 2024-07-19 01:34:02 +00:00
notebooks gating: Add individual switches for disabling tools features (#63686) 2024-07-16 15:45:38 +02:00
oauthtoken bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
oauthutil bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
observation Correctly re-map source ranges in new SCIP-based APIs (#63630) 2024-07-11 06:55:46 +00:00
oobmigration backend/appliance: Introduce a basic utils package for appliance sourcegraph upgrades (#63529) 2024-07-04 01:48:54 +00:00
opencodegraph fix: update links for dev docs (#62758) 2024-05-17 13:47:34 +02:00
otlpenv
own gating: Add individual switches for disabling tools features (#63686) 2024-07-16 15:45:38 +02:00
packagefilters bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
paths Replace all traditional for-loops (#60988) 2024-03-11 16:05:47 +02:00
pbt chore(codenav): Resolve repo and commit in common code (#63072) 2024-06-07 21:58:36 +08:00
perforce bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
productsubscription bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
profiler bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
pubsub lib/background: upgrade Routine interface with context and errors (#62136) 2024-05-24 10:04:55 -04:00
randstring Replace all traditional for-loops (#60988) 2024-03-11 16:05:47 +02:00
ratelimit chore: Change errors.HasType to respect multi-errors (#63024) 2024-06-06 13:02:14 +00:00
rbac bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
rcache Remove old cache cleanup method (#63645) 2024-07-10 02:04:53 +02:00
redislock enterprise-portal: implement basic MSP IAM and RPCs (#63173) 2024-06-19 21:46:48 -04:00
redispool gateway: Don't panic because of duplicate env var registration (#63787) 2024-07-11 17:58:28 +00:00
releaseregistry feat(appliance): self-update (#63780) 2024-07-11 17:59:39 +01:00
repos fix(code hosts): Use more deterministic API endpoints for GitHub code host connections (#63445) 2024-07-02 16:25:11 +02:00
repoupdater dotcom: Remove on-demand cloning of repositories (#63321) 2024-06-26 14:53:14 -07:00
requestclient Replace all traditional for-loops (#60988) 2024-03-11 16:05:47 +02:00
requestinteraction requestinteraction: add X-Sourcegraph-Interaction-ID propagation (#58016) 2023-11-22 20:09:39 +00:00
sams bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
sanitycheck
search gating: Add individual switches for disabling tools features (#63686) 2024-07-16 15:45:38 +02:00
searcher searcher: Modernize entrypoint and gRPC server (#63700) 2024-07-09 21:10:11 +02:00
security Remove unused package (#63646) 2024-07-10 02:30:28 +02:00
service support fast, simple sg start single-program-experimental-blame-sqs for local dev (#63435) 2024-06-24 21:12:47 +00:00
session chore: Change errors.HasType to respect multi-errors (#63024) 2024-06-06 13:02:14 +00:00
settings Chore: remove search console (#63322) 2024-06-19 11:05:03 -06:00
siteid bazel: first pass at moving moving logging linting into nogo (#58910) 2024-01-02 10:07:25 -08:00
slack
sourcegraphoperator bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
src-cli Bump minimum src-cli version required (#62700) 2024-05-16 09:52:46 +00:00
src-prometheus fix: update links for dev docs (#62758) 2024-05-17 13:47:34 +02:00
suspiciousnames bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
symbols symbols: Minor code cleanup (#63708) 2024-07-10 01:22:03 +02:00
sysreq
telemetry V2-telemetry: Simplify sensitive metadata allowlist to accept feature only (#63325) 2024-06-27 15:22:58 -04:00
telemetrygateway chore/deps: upgrade grpc, prometheus/common (#63328) 2024-06-19 09:55:44 -04:00
temporarysettings
testutil build-tracker: fix convenience urls in env (#62340) 2024-05-01 14:26:34 +00:00
timeutil
trace chore: Break dependency of internal/trace on conf (#62177) 2024-04-30 21:12:39 +02:00
tracer chore: Break dependency of internal/trace on conf (#62177) 2024-04-30 21:12:39 +02:00
ttlcache bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
txemail fix/internal/txemail: add timeout for SMTP connection establishment (#63759) 2024-07-10 12:34:06 -07:00
types Prompt Library (#63872) 2024-07-18 16:04:55 -07:00
unpack
updatecheck Prompt Library (#63872) 2024-07-18 16:04:55 -07:00
uploadhandler Syntactic indexing produce scip files (#63580) 2024-07-09 13:49:55 +02:00
uploadstore Unexport some externally irrelevant symbols from uploadstore (#63647) 2024-07-10 02:45:02 +02:00
usagestats Prompt Library (#63872) 2024-07-18 16:04:55 -07:00
users scim: Fix user updates when SCIM was previously enabled (#63135) 2024-06-06 22:24:00 +02:00
vcs bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
version backend/appliance: Introduce a basic utils package for appliance sourcegraph upgrades (#63529) 2024-07-04 01:48:54 +00:00
webhooks/outbound bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
workerutil lib/background: upgrade Routine interface with context and errors (#62136) 2024-05-24 10:04:55 -04:00
wrexec rcache: Explicitly pass redis pool to use (#63644) 2024-07-10 01:23:19 +02:00
buf.yaml
BUILD.bazel