doc/dev: Add code intelligence development documentation (#12949)

This commit is contained in:
Eric Fritz 2020-08-14 13:14:05 -05:00 committed by GitHub
parent 94671fb198
commit 380718a262
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
39 changed files with 566 additions and 125 deletions

1
.github/CODEOWNERS vendored
View File

@ -206,6 +206,7 @@ Dockerfile @sourcegraph/distribution
/doc/ @sourcegraph/distribution
/doc/dev/ @nicksnyder
/doc/dev/web/ @felixfbecker @sourcegraph/web
/doc/dev/codeintel/ @efritz @sourcegraph/code-intel
# Browser extensions
/browser/ @sourcegraph/web

View File

@ -347,6 +347,7 @@
<a class="content-nav-section-header" href="/dev">Developing Sourcegraph</a>
<ul class="content-nav-section-group">
<li><a href="/dev/campaigns_development">Developing campaigns</a></li>
<li><a href="/dev/codeintel">Developing code intelligence</a></li>
<li><a href="/dev/graphql_api">Developing the Sourcegraph GraphQL API</a></li>
<li><a href="/dev/local_development">Getting started with developing Sourcegraph</a></li>
<li><a href="/dev/postgresql">PostgreSQL storage tips</a></li>

View File

@ -85,7 +85,7 @@ digraph architecture {
code_intel [
label="Code intel processes\n(click to expand)"
fillcolor="9"
URL="https://github.com/sourcegraph/sourcegraph/tree/master/doc/dev/architecture/precise-code-intel.svg"
URL="https://docs.sourcegraph.com/dev/codintel/architecture"
]
subgraph cluster_zoekt {

View File

@ -169,7 +169,7 @@
<!-- code_intel -->
<g id="node9" class="node">
<title>code_intel</title>
<g id="a_node9"><a xlink:href="https://github.com/sourcegraph/sourcegraph/tree/master/doc/dev/architecture/precise-code-intel.svg" xlink:title="Code intel processes\n(click to expand)" target="_blank">
<g id="a_node9"><a xlink:href="https://docs.sourcegraph.com/dev/codintel/architecture" xlink:title="Code intel processes\n(click to expand)" target="_blank">
<polygon fill="#d9d9d9" stroke="black" points="813.5,-432.73 698.5,-432.73 698.5,-388.73 813.5,-388.73 813.5,-432.73"/>
<text text-anchor="middle" x="756" y="-413.73" font-family="Iosevka" font-size="10.00">Code intel processes</text>
<text text-anchor="middle" x="756" y="-402.73" font-family="Iosevka" font-size="10.00">(click to expand)</text>

Before

Width:  |  Height:  |  Size: 31 KiB

After

Width:  |  Height:  |  Size: 31 KiB

View File

@ -1,4 +0,0 @@
package architecture
//go:generate sh -c "dot architecture.dot -Tsvg > architecture.svg"
//go:generate sh -c "dot precise-code-intel.dot -Tsvg > precise-code-intel.svg"

View File

@ -0,0 +1,5 @@
#!/bin/bash
set -ex
dot architecture.dot -Tsvg >architecture.svg

View File

@ -6,13 +6,6 @@ You can click on each component to jump to its respective code repository or sub
<object data="/dev/architecture/architecture.svg" type="image/svg+xml" style="width:100%; height: 100%">
</object>
The Code intelligence processes (the LSIF-based code intelligence service) has been extracted into its own diagram.
<object data="/dev/architecture/codeintel.svg" type="image/svg+xml" style="width:100%; height: 100%">
</object>
To re-generate the architecture diagram from the `architecture.dot` file with Graphviz, run: `dot -Tsvg -o architecture.svg architecture.dot` (and similar for `codeintel.dot`).
## Clients
We maintain multiple Sourcegraph clients:
@ -48,11 +41,13 @@ Our backend is composed of multiple services:
Here are some guides to help you understand how multiple systems fit together:
- [Life of a search query](life-of-a-search-query.md)
- [Life of an LSIF upload](life-of-an-lsif-upload.md)
- [Life of a code intelligence query](life-of-a-code-intelligence-query.md)
- [Life of a repository](life-of-a-repository.md)
- [Life of a ping](life-of-a-ping.md)
- [Search pagination](search-pagination.md)
- Code intelligence
- [Uploads](../codeintel/queries.md)
- [Queries](../codeintel/queries.md)
- [Extensions](../codeintel/queries.md)
- Future topics we will cover here:
- Sourcegraph extension architecture
- Web app and browser extension architecture

View File

@ -1,71 +0,0 @@
# Life of a code intelligence query
This document describes how our backend systems serve code intelligence results to clients. There are multiple kinds of code intelligence queries:
- _Hover queries_ retrieve the hover text (documentation) associated with a symbol;
- _Definitions queries_ retrieve a list of definitions (generally one) of a symbol, possibly defined in a different repository; and
- _References queries_ retrieve a list of uses of a symbol, possibly defined across multiple repositories.
The results of each query can be _precise_ or _fuzzy_, depending on the quality of data available. This document will detail the conditions required in order for results to be precise.
## Clients
There are a few ways to perform a code intelligence query with Sourcegraph:
1. Using the GraphQL API served by our [frontend](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/cmd/frontend) service. The API only serves _precise_ code intelligence queries.
2. Using the [basic-code-intel](https://github.com/sourcegraph/sourcegraph-basic-code-intel) extension in the Sourcegraph UI. The extension attempts to serve _precise_ code intelligence via the GraphQL API, falling back to _fuzzy_ code intelligence based on search queries when no precise results are available.
3. Using a browser extension on a codehost such as GitHub or Bitbucket. The (browser) extension will use the basic-code-intel (Sourcegraph) extension to retrieve results.
These clients are discussed in turn.
### GraphQL API
All GraphQL queries for precise code intelligence must first resolve a `GitTree`, which is a specific path or directory in a repository at a specific commit. All code intelligence operations (definitions, references, and hovers) are nested under an `lsif` field, which resolves to a null value when there is no LSIF upload present for the git tree.
For information about how LSIF data is uploaded and processed, see [life of an LSIF upload](life-of-an-lsif-upload.md).
```graphql
query {
repository(name: "github.com/foo/bar") {
commit(rev: "0123456789012345678901234567890123456789") {
blob(path: "/baz/bonk.go") {
lsif {
definitions(line: 10, character: 42) {
...
}
}
}
}
}
}
```
The example above resolves the [git tree](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) (via `blob`) for the file `/baz/bonk.go`, then asks for the definitions under the cursor position `10:42`. The resolver for the `lsif` field can be found [here](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22func+%28r+*gitTreeEntryResolver%29+LSIF%28%22).
The resolver for definitions, references, and hover fields within the `lsif` field are defined in the enterprise codeintel package. The definition resolver is [a method of `lsifQueryResolver`](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+lsifQueryResolver%29+definitions+file:codeintel+&patternType=literal), for example. These resolvers are very basic and simply call a method on the [lsif-server client](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22%29+Definitions%28%22+file:lsifserver/.*.go) that simply makes an HTTP request to the precise-code-intel-api-server (discussed below).
It may be the case that the `lsif` field is not null, but there is no known definition for a particular source location. This happens in particular when a symbol is defined in a repository that does not also have (properly correlated) LSIF data.
Code intelligence resolvers exist only in the enterprise version of the product. The OSS version will return a canned message for all LSIF requests.
### Basic code intel
The [basic-code-intel](https://github.com/sourcegraph/sourcegraph-basic-code-intel) repository automatically generates extensions for most languages. The [Go extension](https://github.com/sourcegraph/sourcegraph-go) and the [TypeScript extension](https://github.com/sourcegraph/sourcegraph-typescript) require the basic code intel package as a dependency, but are published to the extension registry separately (due to special support for language servers).
Code intelligence extensions register _providers_ which are called from the UI or browser extensions to get hover text for a symbol or for the locations of its definitions or references. For example, the definition provider is defined [here](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph-basic-code-intel%24+registerDefinitionProvider). This provider will first query the GraphQL API for LSIF data. If the query returns a result, it is returned. Otherwise, either the position is not defined in that LSIF upload, or there is no LSIF upload that can provide intelligence that commit and path. In this case, the extension falls back to _fuzzy_ code intelligence. A [search query](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph-basic-code-intel%24+async+definition%28+file:handler.ts) is constructed with the symbol name and search results are filtered for obvious non-relevance (e.g. the target module name not matching a source import statement). The hover and references provider are not dissimilar.
Code intelligence queries are displayed in a [hover overlay](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%3CWebHoverOverlay+file:Blob.tsx) over the code blob in the UI. Overlays are shown after the user hovers over a particular token on a line, which will immediately fire a hover and definition (preload) request to the basic code intel extension. If _Go to Definition_ is clicked, the user is navigated to the preloaded location. If _Find References_ is clicked, a subsequent references request is made to the basic code intel extension, and a file match results panel is populated.
### Browser extensions
The browser extension will display the same [hover overlay](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+class+HoverOverlayContainer) as the UI does.
## LSIF API Server
The [precise-code-intel-api-server](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/precise-code-intel) accepts code intelligence queries via HTTP requests. The payload of each query is a repository ID, a commit hash, a file path, and a position in the source file. The definition endpoint is defined [here](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+definitions+file:precise-code-intel/.*/routes), for example.
Each query attempts to load a open LISF upload file (formatted as a SQLite database) from disk. Each LSIF upload is associated with a repository, a commit, and a _root_. The target LSIF upload is the upload with the same repository, commit, and a root that is a prefix of the request file path. If there is no LSIF upload for that exact commit, we try to load the _closest_ database by traversing ancestor and descendent commits until we find an upload with a matching repository and root.
This SQLite file is then queried and returns the hover text or the set of locations associated with that source position. This is generally enough for hover text, but not enough or definitions and reference queries in the presence of multiple mutually-referential repositories.
After all _local_ location results are found in the SQLite file, the server will query Postgres with the repository and commit to find the set of packages that it defines (for remote references) or for the set of dependencies that it has (for remote definitions). This will return a list of additional repository and commit pairs, which are queried in a similar way.

View File

@ -1,28 +0,0 @@
# Life of an LSIF upload
This document describes how an LSIF data file is uploaded to a Sourcegraph instance and processed. This document does **not** cover the data file generation, which is covered in the [user docs](https://docs.sourcegraph.com/user/code_intelligence/lsif), on [lsif.dev](https://lsif.dev), and in the documentation for individual indexers.
## Uploading
Data files are uploaded via the [lsif upload](https://sourcegraph.com/github.com/sourcegraph/src-cli/-/blob/cmd/src/lsif_upload.go) command in the Sourcegraph command line utility. This command gzip-encodes the file on-disk and sends it to the Sourcegraph instance via an unauthenticated HTTP POST. This request is handled by a [proxy handler](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22func+uploadProxyHandler%28%22), which will redirect the file to [precise-code-intel-api-server](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/cmd/lsif-server) via the [lsif-server client](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22%29+Upload%28%22+file:lsifserver/.*.go).
This handler exists only in the enterprise version of the product. The OSS version does not register this route and will return 404 for all requests.
Prior to proxying the payload to precise-code-intel-api-server, the frontend will ensure that the target repository is cloned and the target [commit exists](https://sourcegraph.com/search?q=repo:^github\.com/sourcegraph/sourcegraph%24+"%29+ResolveRev%28"). This latter operation may may cause a remote fetch to occur in gitserver.
Additionally, this endpoint will authorize a request via a code host token when `LsifEnforceAuth` is true in the site settings. This is enabled in particular for the dot-com deployment so that LSIF uploads to a public repository are only allowed from requests using an access token with collaborator access to that repository. It is not generally expected for a private instance to enable this setting. See an example of the auth flow [here](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22func+enforceAuthGithub%28%22).
## Processing
Once the upload payload is received via the precise-code-intel-api-server [upload endpoint](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22%27/upload%27%22+file:precise-code-intel/.*/routes/.*.ts), it is written to disk and an _unprocessed_ LSIF [upload record](https://sourcegraph.com/search?q=repo:^github\.com/sourcegraph/sourcegraph%24+"class+LsifUpload"+file:precise-code-intel/.*.ts) is added to the `lsif_uploads` table in Postgres.
Each upload record has a state which can be one of the following:
- queued
- processing
- completed
- errored
The [precise-code-intel-worker](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+%22Selected+upload+to+convert%22) process will poll for _queued_ uploads. Once it selects (and locks) an upload for processing, it sets its state temporarily to _processing_, converts the raw LSIF input on disk into a SQLite database that can be used by the precise-code-intel-api-server to answer code intelligence queries. On success, the upload's state is set to _completed_. On failure, the upload's state is set to _errored_ along with an error message and a stacktrace. An upload in the _completed_ state is visible to the precise-code-intel-api-server to answer queries.
See [life of a code intelligence query](life-of-a-code-intelligence-query.md) for additional documentation on how the SQLite data file is read.

View File

@ -0,0 +1,6 @@
# Code intelligence subsystem architecture
This is a high level overview of Sourcegraph's code intelligence subsystem architecture so you can understand how our systems fit together.
You can click on each component to jump to its respective code repository or subtree.
<object data="diagrams/architecture.svg" type="image/svg+xml" style="width:100%; height: 100%"></object>

View File

@ -0,0 +1,3 @@
# Deploying code intelligence services
Code intelligence services currently deploy with the rest of Sourcegraph. However, once [RFC 199: User code execution in the auto-indexer](https://docs.google.com/document/d/1rCduWqaLAbMu2s43RwJTBbRlhL6qS3oqq4iawiGdoVE) has been implemented, we will have Cloud-only auto-indexer service that must be deployed out-of-band (outside of Kubernetes). Documentation about this process will reside here once that effort has completed.

View File

@ -57,7 +57,7 @@ digraph architecture {
indexer [
label="Indexer"
fillcolor="3"
URL="https://github.com/sourcegraph/sourcegraph/tree/master/enterprise/cmd/precise-code-intel-worker"
URL="https://github.com/sourcegraph/sourcegraph/tree/master/enterprise/cmd/precise-code-intel-indexer"
]
worker [

View File

@ -62,7 +62,7 @@
<!-- indexer -->
<g id="node3" class="node">
<title>indexer</title>
<g id="a_node3"><a xlink:href="https://github.com/sourcegraph/sourcegraph/tree/master/enterprise/cmd/precise-code-intel-worker" xlink:title="Indexer" target="_blank">
<g id="a_node3"><a xlink:href="https://github.com/sourcegraph/sourcegraph/tree/master/enterprise/cmd/precise-code-intel-indexer" xlink:title="Indexer" target="_blank">
<polygon fill="#bebada" stroke="black" points="285.5,-198.38 230.5,-198.38 230.5,-162.38 285.5,-162.38 285.5,-198.38"/>
<text text-anchor="middle" x="258" y="-177.88" font-family="Iosevka" font-size="10.00">Indexer</text>
</a>

Before

Width:  |  Height:  |  Size: 7.7 KiB

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -0,0 +1,34 @@
sequenceDiagram
Caller ->>+ Resolvers: Definitions(repo, commit, file, position)
Resolvers ->>+ Code Intel API: FindClosestDumps(repo, commit, file)
Code Intel API ->>+ Store: FindClosestDumps(repo, commit, file)
Store -->>- Code Intel API: dumps
Code Intel API -->>- Resolvers: dumps
loop for each dumps[i] (while locations is empty)
Resolvers -->>+ Position Adjuster: AdjustPosition(file, position, from: commit, to: dumps[i].commit)
Position Adjuster -->>- Resolvers: adjusted file, adjusted position
Resolvers ->>+ Code Intel API: Definitions(dumps[i], adjusted file, adjusted position)
Code Intel API ->>+ Bundle Manager: Definitions(dumps[i], adjusted file, adjusted position)
Bundle Manager -->>- Code Intel API: locations
alt if locations is empty
Code Intel API ->>+ Bundle Manager: MonikersByPosition(dump, adjusted file, adjusted position)
Bundle Manager -->>- Code Intel API: monikers
loop for each monikers[i] (while locations is empty)
Code Intel API ->>+ Bundle Manager: PackageInformation(monikers[i])
Bundle Manager -->>- Code Intel API: package information
Code Intel API ->>+ Store: GetPackage(package information)
Store -->>- Code Intel API: package dump
Code Intel API ->>+ Bundle Manager: MonikerResults(package dump, monikers[i])
Bundle Manager -->>- Code Intel API: locations
end
end
Code Intel API -->>- Resolvers: locations
end
Resolvers -->>+ Position Adjuster: AdjustLocations(locations)
Position Adjuster -->>- Resolvers: adjusted locations
Resolvers -->>- Caller: adjusted locations

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 21 KiB

View File

@ -0,0 +1,25 @@
sequenceDiagram
Providers ->>+ LSIF Provider: DefinitionsAndHover(textDocument, position)
par
LSIF Provider ->>+ GraphQL API: LSIF.Ranges(position +/- W)
GraphQL API -->>- LSIF Provider: ranges
and
LSIF Provider ->>+ GraphQL API: LSIF.DefinitionAndHover(position)
GraphQL API -->>- LSIF Provider: {definitions, hover text}
end
LSIF Provider -->>- Providers: {definitions, hover text}
alt if no definitions
Providers ->>+ Search Provider: Definitions(textDocument, position)
Search Provider ->>+ GraphQL API: Symbol Search "repo:^repo$@commit"
GraphQL API -->>- Search Provider: definitions
alt if no definitions
Search Provider ->>+ GraphQL API: Symbol Search "-repo:^repo$"
GraphQL API -->>- Search Provider: definitions
end
Search Provider -->>- Providers: definitions
end

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -0,0 +1,21 @@
sequenceDiagram
Providers ->>+ LSIF Provider: DefinitionsAndHover(textDocument, position)
par
LSIF Provider ->>+ GraphQL API: LSIF.Ranges(position +/- W)
GraphQL API -->>- LSIF Provider: ranges
and
LSIF Provider ->>+ GraphQL API: LSIF.Definition+Hover(position)
GraphQL API -->>- LSIF Provider: {definitions, hover text}
end
LSIF Provider -->>- Providers: {definitions, hover text}
alt if no hover text
Providers ->>+ Search Provider: Hover(textDocument, position)
Search Provider ->>+ Providers: Definition(textDocument, position)
Providers -->>- Search Provider: definition
Search Provider ->>+ GraphQL API: GetFileContent(definition)
GraphQL API -->>- Search Provider: file content
Search Provider -->>- Providers: hover text
end

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -0,0 +1,19 @@
sequenceDiagram
Providers ->>+ LSIF Provider: References(textDocument, position)
loop
LSIF Provider ->>+ GraphQL API: LSIF.References(position)
GraphQL API -->>- LSIF Provider: references
LSIF Provider -->>- Providers: references
end
par
Providers ->>+ Search Provider: References(textDocument, position)
Search Provider ->>+ GraphQL API: Regexp Search "repo:^repo$@commit"
GraphQL API -->>- Search Provider: local references
and
Search Provider ->>+ GraphQL API: Regexp Search "-repo:^repo$" index:true
GraphQL API -->>- Search Provider: remote references
end
Search Provider -->>- Providers: local references + remote references

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -0,0 +1,28 @@
#!/bin/bash
set -ex
declare mermaid_diagrams=(
definitions
references
resolve-page
hover
upload
extension-definitions
extension-references
extension-hover
)
# Install mermaid util
yarn
mermaid="../../../../node_modules/.bin/mmdc"
# Generate mermaid diagrams
for diagram in "${mermaid_diagrams[@]}"; do
"${mermaid}" -i "${diagram}.mermaid" -o "${diagram}.svg"
# Make the generated id deterministic so CI won't see superflouous changes
sed -i '' "s/mermaid-[0-9]\{1,\}/mermaid-${diagram}/g" "${diagram}.svg"
done
dot architecture.dot -Tsvg >architecture.svg

View File

@ -0,0 +1,26 @@
sequenceDiagram
Caller ->>+ Resolvers: Hover(repo, commit, file, position)
Resolvers ->>+ Code Intel API: FindClosestDumps(repo, commit, file)
Code Intel API ->>+ Store: FindClosestDumps(repo, commit, file)
Store -->>- Code Intel API: dumps
Code Intel API -->>- Resolvers: dumps
loop for each dumps[i] (while hover text is empty)
Resolvers -->>+ Position Adjuster: AdjustPosition(file, position, from: commit, to: dumps[i].commit)
Position Adjuster -->>- Resolvers: adjusted file, adjusted position
Resolvers ->>+ Code Intel API: Hover(dumps[i], adjusted file, adjusted position)
Code Intel API ->>+ Bundle Manager: Hover(dumps[i], adjusted file, adjusted position)
Bundle Manager -->>- Code Intel API: hover text, range
alt if no hover text
Code Intel API ->> Code Intel API: dump', position' = Definition(repo, commit, adjusted file, adjusted position)
Code Intel API ->>+ Bundle Manager: Hover(dump', position')
Bundle Manager -->>- Code Intel API: hover text, range
end
Code Intel API -->>- Resolvers: hover text, range
end
Resolvers -->>+ Position Adjuster: AdjustRange(range)
Position Adjuster -->>- Resolvers: adjusted range
Resolvers -->>- Caller: hover text, adjusted range

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,34 @@
sequenceDiagram
Caller ->>+ Resolvers: References(repo, commit, file, position)
Resolvers ->>+ Code Intel API: FindClosestDumps(repo, commit, file)
Code Intel API ->>+ Store: FindClosestDumps(repo, commit, file)
Store -->>- Code Intel API: dumps
Code Intel API -->>- Resolvers: dumps
loop for each dumps[i]
Resolvers -->>+ Position Adjuster: AdjustPosition(file, position)
Position Adjuster -->>- Resolvers: adjusted file, adjusted position
alt if cursor is supplied
Note right of Resolvers: cursor is decoded from request
else
Resolvers ->>+ Bundle Manager: MonikersForPosition(dumps[i], adjusted file, adjusted position)
Bundle Manager -->>- Resolvers: monikers
Note right of Resolvers: cursor is created from <dump, monikers, adjusted file, adjusted position>
end
Resolvers ->>+ Code Intel API: References(cursor)
loop while under page limit
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
Reference Page Resolver -->>- Code Intel API: locations, cursor'
Note right of Code Intel API: cursor = cursor' on subsequent iteration
end
Code Intel API -->>- Resolvers: locations[0], ..., locations[n], cursor'
end
Resolvers -->>+ Position Adjuster: AdjustLocations(locations[0], ..., locations[n])
Position Adjuster -->>- Resolvers: adjusted locations[i]
Resolvers -->>- Caller: adjusted locations, cursor'

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -0,0 +1,53 @@
sequenceDiagram
alt sameDumpCursor
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
Reference Page Resolver ->>+ Bundle Manager: References(cursor.dump, cursor.adjusted file, cursor.adjusted position)
Bundle Manager -->>- Reference Page Resolver: locations
Reference Page Resolver -->>- Code Intel API: locations, cursor'
end
alt sameDumpMonikersCursor
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
loop for each cursor.monikers[i]
Reference Page Resolver ->>+ Bundle Manager: MonikerResults(cursor.dump, cursor.monikers[i])
Bundle Manager -->>- Reference Page Resolver: locations[i]
end
Reference Page Resolver -->>- Code Intel API: locations[0] + ... + locations[n], cursor'
end
alt definitionMonikersCursor
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
loop for each cursor.monikers[i], (while no dump')
Reference Page Resolver ->>+ Bundle Manager: PackageInformation(cursor.monikers[i])
Bundle Manager -->>- Reference Page Resolver: package information
Reference Page Resolver ->>+ Store: GetPackage(package information)
Store -->>- Reference Page Resolver: dump'
end
Note right of Reference Page Resolver: cursor.monikers[k] has package information
Reference Page Resolver ->>+ Bundle Manager: MonikerResults(dump', cursor.monikers[k])
Bundle Manager ->>- Reference Page Resolver: locations
Reference Page Resolver -->>- Code Intel API: locations, cursor'
end
alt sameRepoCursor
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
Reference Page Resolver ->>+ Store: SameRepoPager(cursor.monikers[k])
Store -->>- Reference Page Resolver: dumps'
loop for each dumps'[i]
Reference Page Resolver ->>+ Bundle Manager: MonikerResults(dumps'[i], cursor.monikers[k])
Bundle Manager -->>- Reference Page Resolver: locations[i]
end
Reference Page Resolver -->>- Code Intel API: locations[0] + ... + locations[n], cursor'
end
alt remoteRepoCursor
Code Intel API ->>+ Reference Page Resolver: resolvePage(cursor)
Reference Page Resolver ->>+ Store: PackageReferencePager(cursor.monikers[k])
Store -->>- Reference Page Resolver: dumps'
loop for each dumps'[i]
Reference Page Resolver ->>+ Bundle Manager: MonikerResults(dumps'[i], cursor.monikers[k])
Bundle Manager -->>- Reference Page Resolver: locations[i]
end
Reference Page Resolver -->>- Code Intel API: locations[0] + ... + locations[n], cursor'
end

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 27 KiB

View File

@ -0,0 +1,34 @@
sequenceDiagram
src-cli ->>+ Frontend: handleEnqueue?repository,commit,root
Frontend ->>+ Store: InsertUpload(repository, commit, root)
Store -->>- Frontend: uploadID
Frontend -->>- src-cli: 201 Accepted: {"id": uploadID}
loop
src-cli ->>+ Frontend: handleEnqueue?uploadID,index
Frontend ->> Store: AddUploadPart(uploadID, index)
Frontend ->> Bundle Manager: SendUploadPart(uploadID, index)
Frontend -->>- src-cli: 204 No Content
end
src-cli ->>+ Frontend: handleEnqueue?uploadID,done
Frontend ->> Bundle Manager: StitchParts(uploadID)
Frontend ->> Store: MarkQueued(uploadID)
Frontend -->>- src-cli: 204 No Content
Worker ->>+ Store: Dequeue
Store -->>- Worker: upload
Worker ->>+ Store: BeginTx
Worker ->>+ Bundle Manager: GetUpload(upload.id)
Bundle Manager ->>- Worker: raw LSIF data
Note over Store,Worker: Convert data into sqlite database, defined packages, and referenced packages
Worker ->> Store: UpdatePackages(defined packages)
Worker ->> Store: UpdatePackageReferences(referenced packages)
Worker ->> Store: DeleteOverlappingDumps(upload.repository, upload.commit, upload.root)
Worker ->> Store: MarkRepositoryAsDirty(upload.repository)
Worker ->> Bundle Manager: SendDB(upload.id, sqlite database)
Worker -->> Store: Commit
Store -->>- Worker: { }

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 19 KiB

View File

@ -0,0 +1,85 @@
# How code intelligence extensions resolve hovers
Definition, reference, and hover providers are invoked from the extension host when the user hovers over a symbol in a code view. See the documentation for [authoring extensions](http://localhost:5080/extensions/authoring) for more details about the general extension architecture.
These providers receive the current text document (denoting a repository, commit, and path) and the position the user is hovering (a line and column offset within the file). The providers return results as an asynchronous iterator, which allows additional results to be streamed into the UI as they are received from the backend.
Code intelligence queries are resolved favoring [precise](http://localhost:5080/user/code_intelligence/lsif) code intelligence, if available, then falling back to [search-based](http://localhost:5080/user/code_intelligence/basic_code_intelligence).
## Definitions
The definitions provider returns the set of locations that define the symbol at the hover location. This provider supports the `Go to definition` button on the hover tooltip. If there is only one result, the button will act as a link to that location. If there are multiple results, they are shown in a definitions panel at the bottom of the screen.
<a href="/dev/codeintel/diagrams/extension-definitions.svg" target="_blank">
<img src="/dev/codeintel/diagrams/extension-definitions.svg">
</a>
The LSIF provider is invoked first. This provider will make two types of queries:
1. A `ranges` query that requests data for all symbols within a window centered around a hover position. Range query results returns only _local_ data. This includes hover text and locations in the same index, but will exclude any cross-repository locations. Range query results are cached so that subsequent hover actions (within _W_ lines of a previous query) are likely to already have the data they need available in memory.
1. A `definition + hover` query that requests the definition and hover text for the exact hover position. This will resolve cross-repository definitions.
The LSIF provider will first check if there is a range window in memory for the target position. If so, then it will try to extract the definition from that data and will fall back to an explicit request for that position if no data is available. If there is no range data for the target position, both queries are made in parallel. This populates the cache with the window for subsequent queries without slowing down the first result within a fresh window.
If the LSIF provider did not return any results (due to the repository or the target definition code not being fully indexed), the search providers are invoked as a fallback. The search providers perform a set of symbol searches using the text of the hovered symbol as the base of the query. The first search will look only within the same repository and commit in order to favor local declarations of the symbol. If the repository does not define this symbol, a second search is made that _excludes_ the source repository.
## References
The definitions provider returns the set of locations that reference the symbol at the hover location. This provider supports the `Find references` button on the hover tooltip. These results are shown in a references panel at the bottom of the screen.
<a href="/dev/codeintel/diagrams/extension-references.svg" target="_blank">
<img src="/dev/codeintel/diagrams/extension-references.svg">
</a>
The LSIF provider is invoked first. This provider will make a paginated `References` query, returning each page of results to the extension host as they are resolved (up to a maximum number of pages).
This result set is then supplemented by results from the search provider. The search provider will perform two regexp searches using the text of the hovered symbol as the base of the query. One search will look only within the same repository and commit, and the other sesarch will _exclude_ the source repository. Both searches are made in parallel. Results from the search provider for a location in a file that also contains a precise result are filtered before being sent to the extension host to avoid littering the result set.
## Hover
The definitions provider returns the hover text associated with the symbol at the hover location. This provider populates the text shown in the hover tooltip.
<a href="/dev/codeintel/diagrams/extension-hover.svg" target="_blank">
<img src="/dev/codeintel/diagrams/extension-hover.svg">
</a>
The LSIF provider is invoked first. This provider will make two types of queries:
1. A `ranges` query that requests data for all symbols within a window centered around a hover position. Range query results returns only _local_ data. This includes hover text and locations in the same index, but will exclude any cross-repository locations. Range query results are cached so that subsequent hover actions (within _W_ lines of a previous query) are likely to already have the data they need available in memory.
1. A `definition + hover` query that requests the definition and hover text for the exact hover position. This will resolve hover text for symbols defined in an external repository. For most indexers, the local range data is enough to completely resolve the hover data; we have, however, seen indexes in which cross-repository symbols do not link their hover text correctly. An explicit hover query is required in these circumstances.
The LSIF provider will first check if there is a range window in memory for the target position. If so, then it will try to extract the hover text from that data and will fall back to an explicit request for that position if no data is available. If there is no range data for the target position, both queries are made in parallel. This populates the cache with the window for subsequent queries without slowing down the first result within a fresh window.
If the LSIF provider did not return any results (due to the repository or the target definition code not being fully indexed), the search providers are invoked as a fallback. The search providers perform a recursive definitions request (note that this may invoke an LSIF provider). The hover text is then extracted from the source code around the definition.
## Query appendix
Definition queries take the following form, where `searchToken` and `ext[i]` are replaced with the symbol user is hovering and set of file extensions for the current text document's language, respectively.
```
^{searchToken}$ type:symbol patternType:regexp case:yes file:.({ext[0]}|{ext[1]}|...)$
```
Reference queries take the following form, using the same placeholders as described above.
```
\b{searchToken}\b type:file patternType:regexp case:yes file:.({ext[0]}|{ext[1]})$
```
#### Indexed search queries
The definition and reference queries performed above are _first_ performed unindexed so that the commit hash included in the repo term is respected. This will yield results within the same git tree rather than yielding results on a distinct commit, which is favorable.
After a five-second delay we _also_ perform the same query with the commit hash suffix removed from the `repo` filter, and the term `index:yes` added to the query. The first request to return will be used (and if the unindexed search returns before this delay, only one request is made).
In some deployments with very large repositories, the performance of unindexed search may always exceed this delay. In these situations, the setting `basicCodeIntel.indexOnly` can be set to completely disable unindexed searches from the code intel extensions.
#### Repository type filtering
Queries will also include the term `fork:yes` if the setting `basicCodeIntel.includeForks` is set to true, and the term `archived:yes` if the setting `basicCodeIntel.includeArchives` is set to true.
## Code appendix
- LSIF providers: [definitionAndHover](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/lsif/providers.ts#L98:10), [references](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/lsif/providers.ts#L134:10)
- Search providers: [definition](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/search/providers.ts#L112:11), [references](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/search/providers.ts#L163:11), [hover](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/search/providers.ts#L209:11)
- Combined providers: [createDefinitionProvider](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/providers.ts#L174:17), [createReferencesProvider](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/providers.ts#L242:17), [createHoverProvider](https://sourcegraph.com/github.com/sourcegraph/code-intel-extensions@master/-/blob/shared/providers.ts#L313:17)

View File

@ -0,0 +1,9 @@
# Developing code intelligence
This guide documents our approach to developing code intelligence-related features in our codebase. This includes the code intelligence [subsystems](https://github.com/sourcegraph/sourcegraph/tree/main/enterprise/cmd) and the [extensions](https://github.com/sourcegraph/code-intel-extensions) that provide code intelligence to the web UI, browser extension, and code host integrations.
- [Architecture documentation](architecture.md)
- [Deployment documentation](deployment.md)
- [How LSIF indexes are processed](uploads.md)
- [How precise code intelligence queries are resolved](queries.md)
- [How code intelligence extensions resolve hovers](extensions.md)

View File

@ -0,0 +1,91 @@
# How precise code intelligence queries are resolved
Precise code intelligence results are obtained by making [GraphQL requests](https://sourcegraph.com/api/console#%7B%22operationName%22%3A%22DefinitionAndHover%22%2C%22query%22%3A%22query%20DefinitionAndHover\(%24repository%3A%20String!%2C%20%24commit%3A%20String!%2C%20%24path%3A%20String!%2C%20%24line%3A%20Int!%2C%20%24character%3A%20Int!\)%20%7B%5Cn%20%20repository\(name%3A%20%24repository\)%20%7B%5Cn%20%20%20%20commit\(rev%3A%20%24commit\)%20%7B%5Cn%20%20%20%20%20%20blob\(path%3A%20%24path\)%20%7B%5Cn%20%20%20%20%20%20%20%20lsif%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20definitions\(line%3A%20%24line%2C%20character%3A%20%24character\)%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20nodes%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20resource%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20path%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20repository%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20name%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20commit%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20oid%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20range%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20start%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20line%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20character%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20end%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20line%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20character%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20hover\(line%3A%20%24line%2C%20character%3A%20%24character\)%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20markdown%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20text%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20range%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20start%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20line%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20character%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20end%20%7B%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20line%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20character%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%20%20%7D%5Cn%20%20%20%20%20%20%7D%5Cn%20%20%20%20%7D%5Cn%20%20%7D%5Cn%7D%5Cn%22%2C%22variables%22%3A%22%7B%5Cn%20%20%5C%22repository%5C%22%3A%20%5C%22github.com%2Fsourcegraph%2Fsourcegraph%5C%22%2C%5Cn%20%20%5C%22commit%5C%22%3A%20%5C%2288ba1ebe3422fd93c07cbf0084dc177dea393df4%5C%22%2C%5Cn%20%20%5C%22path%5C%22%3A%20%5C%22monitoring%2Fprecise_code_intel_indexer.go%5C%22%2C%5Cn%20%20%5C%22line%5C%22%3A%2012%2C%5Cn%20%20%5C%22character%5C%22%3A%2012%5Cn%7D%22%7D) to the frontend service. The [code intelligence extensions](https://github.com/sourcegraph/code-intel-extensions) are example consumer of this API, and its [documentation](./extensions.md) details how code intelligence results are used.
<!-- TODO(efritz): range queries -->
<!-- TODO(efritz): diagnostic queries -->
## Definitions
A definitions request returns the set of locations that define the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset). The sequence of actions required to resolve a definitions query is shown below (click to enlarge).
<a href="/dev/codeintel/diagrams/definitions.svg" target="_blank">
<img src="/dev/codeintel/diagrams/definitions.svg">
</a>
First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of `git diff` between the two commits is used to adjust the input path and line number.
The adjusted path and position is used to query the definitions at that position using the selected upload data. If a definition is local to the upload, the bundle manager can resolve the query without any additional data. If the definition is remote (defined in a different root of the same repository, or defined in a different repository), the _import_ monikers of the symbol at the adjusted path and position in the selected upload are determined, as are the package information data of those monikers. Using an upload that provides one of the selected packages, definitions of the associated moniker are queried from the bundle manager.
Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, `git diff` is used to again re-adjust the results to the target commit.
### Code appendix
- Resolvers: [QueryResolver](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/resolver.go#L73:20), [Definitions](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/query.go#L138:25)
- CodeIntelAPI: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/exists.go#L18:26), [Definitions](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/definitions.go#L21:26)
- Store: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/dumps.go#L99:17), [GetPackage](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/packages.go#L11:17)
- Position Adjuster: [AdjustPosition](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L63:28), [AdjustRange](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L77:28)
- Bundle Manager: [Definitions](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L166:25), [MonikersByPosition](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L284:25), [PackageInformation](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L358:25), [MonikerResults](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L316:25)
## References
A references request returns the set of locations that reference the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset). Unlike the set of definitions, which should generally have only member, the set of references can unbounded for popular repositories. The resolution of references is therefore done in chunks, allowing the user to request reference results page-by-page. The sequence of actions required to resolve a references query is shown below (click to enlarge).
<a href="/dev/codeintel/diagrams/references.svg" target="_blank">
<img src="/dev/codeintel/diagrams/references.svg">
</a>
First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of `git diff` between the two commits is used to adjust the input path and line number.
A references request optionally supplies a cursor that encodes the state of the previous request (the first request supplies no cursor). If a cursor is supplied, it is decoded and validated. Otherwise, one is created with the input repository, commit, adjusted path, adjusted position, the selected upload identifier, and the monikers of the symbol at the adjusted path and position in the selected, upload. Note that this step may be repeated over multiple uploads: each upload returned in the previous step will have its own cursor, encoded/decoded independently at the GraphQL resolver layer.
The cursor decoded or created above is used to drive the resolution of the current page of results. While the number of results in the current page is less than the requested number of results, another batch of locations is requested using the current cursor and append it to the current page. Resolving a page also returns a new cursor. This cursor is ultimately sent back to the client so they can make a subsequent request, and is also used as the new _current_ cursor if a subsequent batch of locations is requested.
Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, `git diff` is used to again re-adjust the results to the target commit.
---
The sequence of actions required to resolve a page of references given a cursor is shown below (click to enlarge).
<a href="/dev/codeintel/diagrams/resolve-page.svg" target="_blank">
<img src="/dev/codeintel/diagrams/resolve-page.svg">
</a>
The cursor can be in one of five _phases_, ordered as follows. Each phase handles a distinct segment of the result set. A phase may return no results, or it may return multiple pages worth of results. In the later case, the cursor encodes sufficient information (e.g. number of uploads, references previously returned in the phase) to be able to skip duplicate results.
1. The `sameDumpCursor` phase retrieves reference results from the upload in which the target symbol is indexed. This phase will return local references to symbols defined in the same upload. This phase will also, for some but not all indexer output, return references to remote symbols.
1. The `sameDumpMonikersCursor` phase retrieves reference results by the moniker of the target symbol from the upload in which the target the symbol is indexed. This excludes the reference results that are returned from the previous phase. This phase is necessary as not all indexer output uniquely correlates the references of symbols defined externally.
1. The `definitionMonikersCursor` phase retrieves reference results by moniker from the upload in which the symbol definition is indexed (if it is distinct from the upload in which the target symbol is indexed).
1. The `sameRepoCursor` phase retrieves references results by moniker from all uploads for the same repository. This includes uploads only for roots which are distinct from the root of the upload in which the target symbol is indexed. This handles results from large repositories that are split into multiple, separately-indexed projects.
1. The `remoteRepoCursor` phase retrieves reference results by moniker from all uploads for distinct repositories. This enables true cross-repository reference results.
### Code appendix
- PositionAdjuster: [AdjustPosition](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L63:28), [AdjustRange](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L77:28)
- Resolvers: [QueryResolver](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/resolver.go#L73:20), [References](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/query.go#L167:25)
- CodeIntelAPI: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/exists.go#L18:26), [References](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L24:26), [DecodeOrCreateCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/cursor.go#L54:6)
- Bundle Manager: [References](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L189:25), [MonikersByPosition](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L284:25), [PackageInformation](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L358:25), [MonikerResults](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L316:25)
- Store: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/dumps.go#L99:17), [GetPackage](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/packages.go#L11:17), [SameRepoPager](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/references.go#L39:17), [PackageReferencePager](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/references.go#L77:17)
- ReferencePageResolver: [resolvePage](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L50:33), [handleSameDumpCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L91:33), [handleSameDumpMonikersCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L137:33), [handleDefinitionMonikersCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L218:33), [handleSameRepoCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L283:33), [handleRemoteRepoCursor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/references.go#L311:33)
## Hover
A hover request returns the hover text associated with the symbol at a particular location (defined uniquely by a repository, commit, path, line offset, and character offset), as well as the range of the hovered symbol. The sequence of actions required to resolve a hover query is shown below (click to enlarge).
<a href="/dev/codeintel/diagrams/hover.svg" target="_blank">
<img src="/dev/codeintel/diagrams/hover.svg">
</a>
First, the repository, commit, and path inputs are used to determine the set of LSIF uploads that can answer queries for that data. Such an upload may have been indexed on another commit. In this case, the output of `git diff` between the two commits is used to adjust the input path and line number.
The adjusted path and position is used to query the hover at that position using the selected upload data. For most indexers, this is enough to completely resolve the hover data; we have, however, seen indexes in which cross-repository symbols do not link their hover text correctly. In these cases, the definition of the symbol at the same location is determined, and another hover query is performed using the definition symbol's position (if exactly one such definition is found).
Finally, if the resulting locations were provided by an upload that was indexed on a commit distinct from the input commit, `git diff` is used to again re-adjust the results to the target commit.
### Code appendix
- Resolvers: [QueryResolver](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/resolver.go#L73:20), [Hover](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/query.go#L236:25)
- CodeIntelAPI: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/exists.go#L18:26), [Hover](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/api/hover.go#L13:26)
- Store: [FindClosestDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/dumps.go#L99:17)
- PositionAdjuster: [AdjustPosition](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L63:28), [AdjustRange](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/resolvers/position.go#L77:28)
- Bundle Manager: [Hover](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/database/database.go#L213:25)

View File

@ -0,0 +1,48 @@
# How LSIF indexes are processed
An LSIF indexer produces a file containing the definition, reference, hover, and diagnostic data for a project. Users upload this index file to a Sourcegraph instance, which converts it into an internal format that can support [code intelligence queries](./queries.md).
The sequence of actions required to to upload and convert this data is shown below (click to enlarge).
<a href="/dev/codeintel/diagrams/upload.svg" target="_blank">
<img src="/dev/codeintel/diagrams/upload.svg">
</a>
## Uploading
The API used to upload an LSIF index is modeled after the [S3 multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html). Many LSIF uploads can be fairly large and the [network is generally not reliable](https://aphyr.com/posts/288-the-network-is-reliable). To get around frequent failure of large uploads (and to get around uploads limits in Cloudflare), the upload is broken into multiple, independently gzipped chunks. Each chunk is uploaded in sequence to the instances, where it is concatenated into a single file on the remote end. This allows us to retry chunks independently in the case of an upload failure without sacrificing the entire operation.
An initial request adds an upload into the database with the `uploading` state and marks the number of upload chunks it expects to see. The subsequent requests specify the upload identifier (returned in the initial request), and the index of the chunk that is being uploaded. If this upload part successfully makes it to disk, it is marked as received in the upload record. The last request is a request marking upload completion from the client. At this point, the frontend ensures that all the expected chunks have been received and reside on disk. The frontend informs the bundle manager to concatenate the files, and the upload record is moved from the `uploading` state to the `queued` state, where it is made visible to the worker process.
## Processing
The worker process polls Postgres for upload records in the `queued` state. When such a record is available, it is marked as `processing` and is locked in a transaction to ensure that it is not double-processed by another worker instance. The worker asks the bundle manager for the raw LSIF upload data. Because this data is generally large, the data is streamed to the worker while it is being processed (and retry logic inside the bundle manager client will retry the request from the last byte it received on transient failures).
The worker then converts the raw LSIF data into a SQLite database, producing a set of packages that the indexed source code _defines_ and a set of packages that the indexed source code _depends on_. This [portion of the conversion](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/correlate.go#L20:6) is omitted from the diagram as it remains within the worker process (with one exception), but is explained below.
1. The [correlateFromReader](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/correlate.go#L73:6) step streams raw LSIF data from the bundle manager and produces a stream of JSON objects. Each object in the stream is interpreted as an LSIF vertex or edge. Objects are validated, then inserted into an in-memory representation of the graph.
1. The [canonicalize](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/canonicalize.go#L12:6) step collapses the in-memory representation of the graph produced by the previous step. Most notably, it ensures that the data attached to a range vertex _transitively_ is now attached to the range vertex _directly_.
1. The [prune](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/prune.go#L14:6) step determines the set of documents that are present in the index but do not exist in git (via an efficient batch of calls to gitserver) and removes references to them from the in-memory representation of the graph. This prevents us from attempting to navigate to locations that are not visible within the instance (generated or vendored paths that are not committed).
1. The [groupBundleData](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/group.go#L34:6) step converts the canonicalized and pruned in-memory representation of the graph into the shape that will reside within a SQLite bundle. This _rotates_ the data so that it can be efficiently read based on our [query access patterns](./queries.md).
1. The [sqlite writer](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/bundles/persistence/sqlite/writer.go) writes the grouped bundle data from the previous step into a new SQLite database on disk.
The set of packages defined by and depended on by this index can be constructed from reading the package information attached to export and import monikers, respectively, from the correlated data. This data is inserted into Postgres to enable cross-repository definition and reference queries.
Duplicate uploads (with the same repository, commit, and root) are removed to prevent the frontend from querying multiple indexes for the same data. This can happen if a user re-uploads the same index, or if an index is re-uploaded as part of a CI step that was re-run. In these cases we prefer to keep the newest upload.
The repository is marked as _dirty_, which informs a process that runs periodically to re-calculate the set of uploads visible to each commit. This process will refresh the commit graph for this repository stored in Postgres.
The SQLite database is sent to the bundle manager in chunks, as described in the previous section.
Finally, if the previous steps have all completed without error, the transaction is committed, moving the upload record from the `processing` state to the `completed` state, where it is made visible to the frontend to answer code intelligence queries. If an error does occur, the upload record is instead moved to the `errored` state and marked with a failure reason.
## Code appendix
- src-cli: [lsif upload command](https://sourcegraph.com/github.com/sourcegraph/src-cli@master/-/blob/cmd/src/lsif_upload.go#L153:2)
- Worker: [abstract process](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/workerutil/worker.go#L16:6), [upload processor](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/worker/handler.go#L43:19), [correlator](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-worker/internal/correlation/correlate.go#L20:6) (the heavy hitter)
- Store: [Dequeue](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/workerutil/dbworker/store/store.go#L202:17), [InsertUpload](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/uploads.go#L294:17), [AddUploadPart](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/uploads.go#L330:17), [UpdatePackages](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/packages.go#L37:17), [UpdatePackageReferences](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/references.go#L115:17), [DeleteOverlappingDumps](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/dumps.go#L192:17), [MarkRepositoryAsDirty](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/store/commits.go#L62:17)
- Bundle Manager:
- SendUploadPart - [client](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/bundles/client/bundle_manager_client.go#L142:35), [server](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/server/handler.go#L83:18)
- StitchParts - [client](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/bundles/client/bundle_manager_client.go#L154:35), [server](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/server/handler.go#L92:18)
- GetUpload - [client](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/bundles/client/bundle_manager_client.go#L175:350), [server](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/server/handler.go#L53:18)
- SendDB - [client](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/internal/codeintel/bundles/client/bundle_manager_client.go#L244:35), [server (send part)](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/server/handler.go#L114:18), [server (stitch)](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/cmd/precise-code-intel-bundle-manager/internal/server/handler.go#L123:18)

View File

@ -21,6 +21,7 @@ Sourcegraph development is open source at [github.com/sourcegraph/sourcegraph](h
- [Developing the GraphQL API](graphql_api.md)
- [Developing indexed search](zoekt.md)
- [Developing campaigns](campaigns_development.md)
- [Developing code intelligence](codeintel/index.md)
- [Using PostgreSQL](postgresql.md)
- [Testing](testing.md)
- [Go style guide](https://about.sourcegraph.com/handbook/engineering/languages/go)

View File

@ -1,5 +1,8 @@
#!/usr/bin/env bash
# UGH
export CLOUDSDK_PYTHON=/usr/bin/python3
set -eu
cd "$(dirname "${BASH_SOURCE[0]}")/../../../.."
DATADIR=$(realpath './internal/cmd/precise-code-intel-tester/testdata')

View File

@ -14,7 +14,6 @@ INDEXFILE="${INDEXDIR}/${NAME}.${REV}.dump"
# Early-out if there's already a dump file
if [ -f "${INDEXFILE}" ]; then
echo "YAY"
exit 0
fi
@ -24,7 +23,6 @@ mkdir -p "${INDEXDIR}"
# Copy repo to temporary directory
cp -r "${REPODIR}/${NAME}" "${REVDIR}"
cleanup() {
echo "REMOVING A GUY -- ${REVDIR}"
rm -rf "${REVDIR}"
}
trap cleanup EXIT
@ -36,5 +34,4 @@ git checkout "${REV}" 2>/dev/null
# Index revision
go mod vendor && lsif-go -o "${INDEXFILE}"
V=$?
echo "!!!! $V"
exit $V

View File

@ -78,7 +78,11 @@
"@gql2ts/from-schema": "^1.10.1",
"@gql2ts/language-typescript": "^1.9.0",
"@gql2ts/types": "^1.9.0",
"@graphql-codegen/cli": "1.17.7",
"@graphql-codegen/typescript": "1.17.7",
"@graphql-codegen/typescript-operations": "1.17.7",
"@istanbuljs/nyc-config-typescript": "^1.0.1",
"@mermaid-js/mermaid-cli": "^8.7.0",
"@octokit/rest": "^16.36.0",
"@percy/puppeteer": "^1.1.0",
"@pollyjs/adapter-puppeteer": "^5.0.0",
@ -198,9 +202,6 @@
"gql2ts": "^1.10.1",
"graphql": "^14.7.0",
"graphql-schema-linter": "^0.5.0",
"@graphql-codegen/cli": "1.17.7",
"@graphql-codegen/typescript": "1.17.7",
"@graphql-codegen/typescript-operations": "1.17.7",
"gulp": "^4.0.2",
"identity-obj-proxy": "^3.0.0",
"jest": "^25.5.4",

View File

@ -2058,6 +2058,15 @@
resolved "https://registry.npmjs.org/@mdx-js/react/-/react-1.6.6.tgz#71ece2a24261eed0e184c0ef9814fcb77b1a4aee"
integrity sha512-zOOdNreHUNSFQ0dg3wYYg9sOGg2csf7Sk8JGBigeBq+4Xk4LO0QdycGAmgKNfeme+SyBV5LBIPjt1NNsScyWEQ==
"@mermaid-js/mermaid-cli@^8.7.0":
version "8.7.0"
resolved "https://registry.npmjs.org/@mermaid-js/mermaid-cli/-/mermaid-cli-8.7.0.tgz#8c798996b9fa0bec8acf6b946cbafc6407e51870"
integrity sha512-Eh5ivob7wS4YCbL9K90/I0TYTqrknCCWAIYipeU8T1Klrpqb6/1LszkoBFzDogM9UK94Lg8vH1XBYv5J4f3/EA==
dependencies:
chalk "^4.1.0"
commander "^6.0.0"
puppeteer "^5.0.0"
"@mrmlnc/readdir-enhanced@^2.2.1":
version "2.2.1"
resolved "https://registry.npmjs.org/@mrmlnc/readdir-enhanced/-/readdir-enhanced-2.2.1.tgz#524af240d1a360527b730475ecfa1344aa540dde"
@ -2674,7 +2683,8 @@
integrity sha512-KWxkyphmlwam8kfYPSmoitKQRMGQCsr1ZRmNZgijT7ABKaVyk/+I5ezt2J213tM04Hi0vyg4L7iH1VCkNvm2Jw==
"@sourcegraph/extension-api-types@link:packages/@sourcegraph/extension-api-types":
version "2.1.0"
version "0.0.0"
uid ""
"@sourcegraph/prettierrc@^3.0.3":
version "3.0.3"
@ -7655,6 +7665,11 @@ commander@^5.1.0:
resolved "https://registry.npmjs.org/commander/-/commander-5.1.0.tgz#46abbd1652f8e059bddaef99bbdcb2ad9cf179ae"
integrity sha512-P0CysNDQ7rtVw4QIQtm+MRxV66vKFSvlsQvGYXZWR3qFU0jlMKHZZZgw8e+8DSah4UDKMqnknRDQz+xuQXQ/Zg==
commander@^6.0.0:
version "6.0.0"
resolved "https://registry.npmjs.org/commander/-/commander-6.0.0.tgz#2b270da94f8fb9014455312f829a1129dbf8887e"
integrity sha512-s7EA+hDtTYNhuXkTlhqew4txMZVdszBmKWSPEMxGr8ru8JXR7bLUFIAtPhcSuFdJQ0ILMxnJi8GkQL0yvDy/YA==
comment-parser@^0.7.5:
version "0.7.5"
resolved "https://registry.npmjs.org/comment-parser/-/comment-parser-0.7.5.tgz#06db157a3b34addf8502393743e41897e2c73059"
@ -18480,7 +18495,7 @@ puppeteer-firefox@^0.5.1:
rimraf "^2.6.1"
ws "^6.1.0"
puppeteer@5.2.1:
puppeteer@5.2.1, puppeteer@^5.0.0:
version "5.2.1"
resolved "https://registry.npmjs.org/puppeteer/-/puppeteer-5.2.1.tgz#7f0564f0a5384f352a38c8cc42af875cd87f4ea6"
integrity sha512-PZoZG7u+T6N1GFWBQmGVG162Ak5MAy8nYSVpeeQrwJK2oYUlDWpHEJPcd/zopyuEMTv7DiztS1blgny1txR2qw==
@ -20905,7 +20920,8 @@ sourcegraph@^24.0.0:
integrity sha512-PlGvkdBy5r5iHdKAVNY/jsPgWb3oY+2iAdIQ3qR83UHhvBFVgoctDAnyfJ1eMstENY3etBWtAJ8Kleoar3ecaA==
"sourcegraph@link:packages/sourcegraph-extension-api":
version "24.7.0"
version "0.0.0"
uid ""
space-separated-tokens@^1.0.0:
version "1.1.2"