In codenav, we use very few APIs from gitserver, so let's use a more
narrow interface (which we can potentially fake in the future) instead
of depending on gitserver.Client directly.
The utils are currently used outside of cmd/frontend, so to break the
internal -> cmd import I'm moving and merging the utils package with
internal/gqlutil.
Test plan: Just moved around and renamed a package, the go compiler
doesn't complain.
Consolidates the two test helper/util modules, and moves one test file
to align with the file it's actually testing.
## Test plan
Tests continue to pass
Previously, the QueuedCount method was confusing because:
1. By default, it actually returned the count for both the 'queued' and
'errored' states (despite the name just have 'Queued').
2. There was an additional boolean flag for also returning entries in
the 'processing' state, but reduced clarity at call-sites.
So I've changed the method to take a bitset instead, mirroring the
just-added Exists API, and renamed the method to a more
generic 'CountByState'.
While this does make call-sites a bit more verbose, I think the
clarity win makes the change an overall positive one.
This change should not change any behaviour, but it prepares for syntactic/search-based pagination.
Both syntactic and search based usages now return a "NextCursor", which is `Some` if there are more results of their provenance to be had. For now they always return `None` which preserves the current behaviour. The logic in `root_resolver` implements a state machine that transitions the cursor through the provenances and makes sure to skip provenances if the cursor in the request starts a certain state.
## Test plan
Tests continue to pass, no behavioral changes
Reduce the exposed surface area with a smaller interface
minimalRepoStore, and make sure we're using fakes in the
tests which better document the intent, instead of using a
mock repo store. The fake makes sure that the methods
are mutually consistent, which is easier to do when working
with a smaller interface.
Instead of re-generating and re-compiling the mocks 3 times,
consolidate them into a single storemocks package.
This should also provide better code navigation in the editor.
The `getSourceRange` function can have a range translation failure,
in which case it would be wrong to propagate the range directly
without the commit information. So skip the ranges in that case.
When determining the set of visible uploads at a given commit,
we group uploads from scip-K and lsif-K for ranking purposes
to allow them to shadow each other. Generally, scip-K will end
up shadowing lsif-K, avoiding sending the client a mix of data
from a much older lsif-K index when newer data from a scip-K
index is available.
Fixes https://linear.app/sourcegraph/issue/GRAPH-787
The logic requires some final tie-breaking to avoid non-determinism
in case when there is a scip-K and lsif-K index at the same distance,
which I've done based on the upload_id in descending order
(upload IDs are monotonically increasing). Earlier, some code was
doing tie-breaking based on upload_id in _ascending_ order,
so I've changed that for consistency. This caused some test failures,
so I've updated the corresponding struct literals.
I've left some PR review comments to describe each modification
to the existing test cases.
## Test plan
Added a new dedicated test checking for shadowing behavior.
## Changelog
Fixes a bug where old LSIF uploads would also be used for code
navigation even when newer SCIP uploads were available for the
same language, potentially leading to duplicate results in the reference
panel. With this change, scip-go uploads shadow the uploads
for lsif-go, and similarly for other indexers following the
scip-X/lsif-X naming
convention.
A large sub-query was duplicated between two queries, making
the code harder to understand. This patch factors that out.
Ancillary changes:
- Added more doc comments
- Tweak tests to make failures easier to follow.
Fixes GRAPH-782
Specifically this allows us avoid updating syntactic_indexing_enabled
when we don't have the value available.
This is done to solve the problem where not providing that field crashes
the resolver. I confirmed it by both via tests and via manual testing of
the API

## Test plan
- Added a new test for resolvers specifically to test the extra logic
around this field
This speeds up precise find usages from about 600ms to 400ms
for a test workload involving 16 precise usages across 5 files.
(Specifically for the `codenav.Service` type)
Fixes GRAPH-781
I have some more thoughts on speeding this up further, but
I'll likely look into that after adding tests.
I think the span will not be terminated if we don't call `endObservation`.
Not the end of the world since this span likely won't be useful,
but we should avoid skipping calls to `endObservation` for consistency.
This patch wires up the newly changed APIs in #64118
to the GraphQL API, enabling precise support in the
usagesForSymbol API. It also handles pagination.
Fixes https://linear.app/sourcegraph/issue/GRAPH-573
## Test plan
Manually tested. Will add automated tests in follow-up PR.
## Changelog
- Adds support for precise code navigation to the experimental
`usagesForSymbol` API, which can be used to implement a
reference panel or similar functionality.
This is a register call that is easy to forget. When forgotten, all queries against the repo store will block forever.
In addition, this adds a hard-dependency on conf to every services startup, plus a busy loop. With multi-tenant, this will not work great because authz providers would be a global, and we instead want most things to be ephemeral so they're per-provider. This is a step toward that, but doesn't yet remove the providers global variable.
Good news, it turns out that we don't actually need to register the providers in every service! The reason they were required was to check if zero providers are configured, or if authzbypass mode is enabled.
Authz bypass mode is usually ON, except when there are problems with the authz providers, meaning some authz providers might not be able to sync permissions. Bypassing of permissions is only ever happening if there are ALSO zero providers configured.
So this is basically an optimization for the case where an instance has zero authz configured so that the SQL queries are a bit simpler. This also helps in tests because with bypass mode on and no providers configured, authz enforcement is effectively off in the repo store.
This makes it so that in tests we need to do slightly more work, but also makes for a more realistic test vs at runtime setup. Also, it's highly recommended to use mocks for DB wherever possible in more high-level components to keep tests fast.
To never have a scenario where we accidentally mess up here and enable bypass mode erroneously, this PR drops that entirely. Authz is always enforced, but when a code host connection is unrestricted (i.e., will not spawn a provider) the repos are still visible, so this should be no change over before.
## Test plan
The stack starts and works, and all CI tests are still passing. Code review should help as well.
For the implementation of precise usagesForSymbol, I need to be
able to access some of these types in the codenav package directly, so move
a bunch of types there to avoid an import cycle: codenav -> resolvers -> codenav.
This patch changes the code so that the core APIs we need
for precise usagesForSymbol actually return (Upload)Usages instead
of just (Upload)Locations, so that we have the usage kind information
as well as the symbol names available. I will be handling the GraphQL
layer for converting the Usages to appropriate Resolvers in a follow-up PR.
## Test plan
Modified a bunch of existing mock tests. 🤷🏽
At the heart of the loop for extracting usages across a Sourcegraph
instance is the `extractLocationsFromPosition` function, which
extracts related symbols and source ranges from a single SCIP
Document. (Source ranges for returning to the user directly,
and related symbols to do further lookups, e.g. in the case
of inheritance.)
Since we want to perform matching based on symbol names in the upcoming
precise usagesForSymbol API, and also return symbol names for each
associated source range, this function needs to be updated to:
1. Be able to take a symbol name for doing lookups. This is done using
the new `FindUsagesKey` type which allows two cases - position-based and
symbol-based.
2. Be able to return symbol names associated with every source range.
This is done by creating a new `UsageBuilder` type which somewhat subsumes
the `Location` type. We avoid copying the same 'UploadID' and 'Path'
fields eagerly for clarity; that will be handled by callers in the future when
they mix `UsageBuilder` values across different Documents (by first calling `build`).
For the above, I've introduced a new func `extractRelatedUsagesAndSymbolNames`,
and `extractLocationsFromPosition` delegates to that. In the future,
`extractLocationsFromPosition` will be removed.
For precise usagesForSymbols, we want to propagate usages everywhere
(with associated symbol names, not just 'Location' values). This PR
introduces the new Usage type, and unifies the old GetBulkSymbolUsages and
GetMinimalBulkSymbolUsages APIs into a single GetSymbolUsages API.
We convert the Usage values to Location to avoid changing a lot of code
at once.
We also change the DB query to do grouping and aggregation for us
instead of doing it in Go code.
---------
Co-authored-by: Christoph Hegemann <christoph.hegemann@sourcegraph.com>
Closes
https://linear.app/sourcegraph/issue/GRAPH-765/use-surroundingcontent-returned-from-the-search-client
The primary reason for doing this is to avoid sending a bunch of extra
calls for file contents to gitserver, which dominate the profile/trace
after implementing the remaining opts from
https://github.com/sourcegraph/sourcegraph/pull/63971
Also limits the API to only return the line containing the match for now
as discussed with Camden.
We might need to bring back the `lineGetter` for precise usages, but it
should be easy enough to revive at that point /cc @varungandhi-src
## Test plan
Updated unit tests, some manual testing in the API console
Monikers carry a bunch of redundant fields which basically consist of
information parsed out of the symbol name. This patch cleans up
some APIs to work with symbol names directly instead of monikers.
This patch changes the location querying code so that:
1. We're populating structures corresponding to SCIP instead of LSIF
(with "scheme" and "identifier" inside "MonikerData")
2. Avoid repeatedly allocating a constant string 'scip' for the scheme
only to throw it away later.
3. Makes the two queries and their scanning code more similar for easier
comparison. When I land precise usagesForSymbol, I will de-duplicate
some of the scanning code between these two queries.
I have avoided renaming all of the local variables to avoid creating
more noise.
## Test plan
Covered by existing tests.
Also consolidates mocks to avoid wasting time generating
and compiling them repeatedly, and reducing pollution of
Find Usages results in the editor (yes, I know these can be
filtered out, but it's silly to have multiple copies).
Closes
https://linear.app/sourcegraph/issue/GRAPH-742/simplify-gittreetranslator-implementation
- use unified diff format and 0 context lines to allow translation by
just using hunk headers
- only request hunks per file once, and sync follow-up requests for the
same file
- don't bother LRU caching the full hunk contents, just store 4 int32's
per hunk
- replace linear search through sorted hunks with binary search
Current PR keeps the old PR and just proxies to the new implementation.
Once this PR is reviewed and merged I'll remove the old API and update
all callsites.
## Test plan
New/existing tests
Pass down semantic information down as much as possible,
delaying lossy conversion to table names till the very end.
I will be introducing the `exhaustive` linter in a future PR
(https://golangci-lint.run/usage/linters/#exhaustive)
so that we can get a static error if you don't switch exhaustively.