sourcegraph/schema
Gary Lee b8e31fde27
Allow embeddings job to exclude failed files from the index (#55180)
When a text input is submitted for generating embeddings the response
may be null. If we attempt retries and still cannot generate embeddings
for this input text then we return an error which calls for failing the
entire embed repo job.

[Slack
thread](https://sourcegraph.slack.com/archives/C053L1AQ0BC/p1688676751106069)

[Issue](https://github.com/sourcegraph/sourcegraph/issues/55469)

This PR introduces a configuration `ExcludeChunkOnError`. When set to
true an embed repo job will proceed with the rest of the embed repo job
when these generate embeddings errors occur. However, the file that
generated the input text which received an error is excluded from the
index as to avoid partially indexing the file.

I'll add more details on the first iteration of this solution and the
trade offs in a separate comment.

## Test plan

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

Embed test cases added
2023-08-01 13:39:50 -07:00
..
aws_codecommit.schema.json do not show errors for trailing commas in web JSON editors (#4100) 2019-05-16 23:29:12 -07:00
azuredevops.schema.json Move dash in regular expression (#49229) 2023-03-13 14:52:51 -04:00
batch_spec.schema.json batches: add fork attribute to changeset template (#51572) 2023-05-31 13:19:07 -04:00
bitbucket_cloud.schema.json fix Bitbucket Cloud exclude regex to make it work with AJV. (#54494) 2023-06-30 17:58:30 +02:00
bitbucket_server_util.go authz/github: validate provider against default github URL if not set (#24598) 2021-09-06 12:37:33 -04:00
bitbucket_server.schema.json webhooks: create new namespace for incoming + outgoing (#49570) 2023-03-17 16:52:18 -07:00
bitbucketcloud_util.go Add Bitbucket Cloud as an auth provider with Perms syncing (#46309) 2023-01-16 14:20:35 +02:00
BUILD.bazel fix Bitbucket Cloud exclude regex to make it work with AJV. (#54494) 2023-06-30 17:58:30 +02:00
changeset_spec.schema.json Support binary patches (#44779) 2022-11-29 03:22:01 +01:00
extension_schema.go remove extension registry UI and related GraphQL API (#45891) 2022-12-22 00:10:56 -08:00
gen.go schema: cleanup schema gen (#12394) 2020-07-23 09:13:47 +02:00
gen.sh update github.com/sourcegraph/go-jsonschema dep (#45983) 2022-12-28 10:44:47 -10:00
gerrit.schema.json Add Gerrit as an officially supported code host with permissions syncing (#46763) 2023-01-27 15:33:24 +00:00
github_util.go authz/github: validate provider against default github URL if not set (#24598) 2021-09-06 12:37:33 -04:00
github.schema.json [GitHub App] Connections can clone all installation repositories (#53869) 2023-06-23 14:29:07 +02:00
gitlab_util.go authz/github: validate provider against default github URL if not set (#24598) 2021-09-06 12:37:33 -04:00
gitlab.schema.json Add exclude pattern support for GitLab (#51862) 2023-05-12 15:18:18 +03:00
gitolite.schema.json Unremoving phabricator integration fields, adding lines to changelog (#32573) 2022-03-15 10:01:39 -04:00
go-modules.schema.json extsvc: Change default rate limits of npm and Go external services (#34042) 2022-04-19 11:50:46 +00:00
json-schema-draft-07.schema.json use existing spec file 2018-10-28 13:24:42 -07:00
jvm-packages.schema.json packages: improve and expand docs (#49774) 2023-03-21 17:47:57 +00:00
localgit.schema.json app: Dedicated local repo external service (#51805) 2023-06-07 15:14:36 +02:00
npm-packages.schema.json npm: Bump rate limit. (#37018) 2022-06-10 15:00:51 +00:00
other_external_service.schema.json extsvc: Other defines root path for git discovery (#47779) 2023-03-07 15:29:43 -08:00
package.json web: sync TS project refenreces (#46407) 2023-01-16 18:55:10 -08:00
pagure.schema.json repos: add Pagure code host support (#28084) 2021-11-23 18:03:35 +01:00
perforce.schema.json Clarify Perforce p4.passwd / ticket format (#44205) 2022-11-10 15:27:19 +00:00
phabricator.schema.json do not show errors for trailing commas in web JSON editors (#4100) 2019-05-16 23:29:12 -07:00
python-packages.schema.json repos: Introduce Python dependency repos integration (#34886) 2022-05-05 13:24:25 +02:00
README.md remove extension registry UI and related GraphQL API (#45891) 2022-12-22 00:10:56 -08:00
ruby-packages.schema.json Packages: add RubyGems support (#42817) 2022-10-17 09:48:18 +02:00
rust-packages.schema.json packages: improve and expand docs (#49774) 2023-03-21 17:47:57 +00:00
schema.go Allow embeddings job to exclude failed files from the index (#55180) 2023-08-01 13:39:50 -07:00
settings.schema.json batches: upade doc string for OrgsAllMembersBatchChangesAdmin (#51370) 2023-05-03 04:35:10 +02:00
site.schema.json Allow embeddings job to exclude failed files from the index (#55180) 2023-08-01 13:39:50 -07:00
stringdata.go app: Dedicated local repo external service (#51805) 2023-06-07 15:14:36 +02:00
tsconfig.json web: fix pnpm-lock issue (#47478) 2023-02-09 22:04:31 -08:00
validation_test.go fix Bitbucket Cloud exclude regex to make it work with AJV. (#54494) 2023-06-30 17:58:30 +02:00

Sourcegraph JSON Schemas

JSON Schema is a way to define the structure of a JSON document. It enables typechecking and code intelligence on JSON documents.

Sourcegraph uses the following JSON Schemas:

Modifying a schema

  1. Edit the *.schema.json file in this directory.
  2. Run go generate to update the *_stringdata.json file.
  3. Commit the changes to both files.
  4. Run sg start to automatically update TypeScript schema files.

Known issues

  • The JSON Schema IDs (URIs) are of the form https://sourcegraph.com/v1/*.schema.json#, but these are not actually valid URLs. This means you generally need to supply them to JSON Schema validation libraries manually instead of having the validator fetch the schema from the web.