Commit Graph

787 Commits

Author SHA1 Message Date
Varun Gandhi
f403bfc1ff
chore: Bump autoindexing image SHAs (#64472) 2024-08-14 16:39:08 +00:00
Christoph Hegemann
a80ad938ce
fix: return all search-based results if no syntactic provenance is requested (#64330)
Closes
https://linear.app/sourcegraph/issue/GRAPH-797/return-all-search-based-results-if-syntactic-is-not-requested

Making sum-types like a caveman...

## Test plan

Manual testing via API. I can't make the web app do a search-based
usages request at the moment.
2024-08-13 10:52:56 +02:00
Varun Gandhi
74f4831aa3
chore: Replace gitserver.Client -> minimalGitserver in codenav (#64416)
In codenav, we use very few APIs from gitserver, so let's use a more
narrow interface (which we can potentially fake in the future) instead
of depending on gitserver.Client directly.
2024-08-12 16:40:36 -04:00
Erik Seliger
55cdb4961c
graphqlbackend: Move utils to internal/ (#64117)
The utils are currently used outside of cmd/frontend, so to break the
internal -> cmd import I'm moving and merging the utils package with
internal/gqlutil.

Test plan: Just moved around and renamed a package, the go compiler
doesn't complain.
2024-08-08 14:18:37 +02:00
Erik Seliger
8de09ddbc7
chore: Cleanup more cross-cmd imports (#64259)
This PR fixes a few more imports from /internal/ packages using /cmd/... contents.

Test plan: Mainly moved code around and CI still passes.
2024-08-08 10:10:58 +02:00
Christoph Hegemann
81fa144f4f
refactor: steps the usage cursor provenance state in a single place (#64321)
Adds a second function to determine the initial cursor based on the
requested provenances.

## Test plan
N/A
2024-08-07 10:39:12 +02:00
Christoph Hegemann
780b5dbbc4
chore: Move and clean up test code for syntactic usages (#64318)
Consolidates the two test helper/util modules, and moves one test file
to align with the file it's actually testing.

## Test plan

Tests continue to pass
2024-08-07 09:15:58 +01:00
Varun Gandhi
930033ca9b
chore: Replace QueuedCount -> CountByState with bitset parameter (#64302)
Previously, the QueuedCount method was confusing because:

1. By default, it actually returned the count for both the 'queued' and
   'errored' states (despite the name just have 'Queued').
2. There was an additional boolean flag for also returning entries in
    the 'processing' state, but reduced clarity at call-sites.

So I've changed the method to take a bitset instead, mirroring the
just-added Exists API, and renamed the method to a more
generic 'CountByState'.

While this does make call-sites a bit more verbose, I think the
clarity win makes the change an overall positive one.
2024-08-07 16:09:18 +08:00
Christoph Hegemann
f1d5d779c3
refactor: handles Cursor uniformly over all usage provenances (#64319)
This change should not change any behaviour, but it prepares for syntactic/search-based pagination.

Both syntactic and search based usages now return a "NextCursor", which is `Some` if there are more results of their provenance to be had. For now they always return `None` which preserves the current behaviour. The logic in `root_resolver` implements a state machine that transitions the cursor through the provenances and makes sure to skip provenances if the cursor in the request starts a certain state.

## Test plan

Tests continue to pass, no behavioral changes
2024-08-07 08:26:31 +02:00
Varun Gandhi
99f7d97dc6
chore: Switch over to fake RepoStore in codenav tests (#64284)
Reduce the exposed surface area with a smaller interface
minimalRepoStore, and make sure we're using fakes in the
tests which better document the intent, instead of using a
mock repo store. The fake makes sure that the methods
are mutually consistent, which is easier to do when working
with a smaller interface.
2024-08-06 22:09:43 +08:00
Varun Gandhi
8f058838ce
chore: Consolidate mocks for dbworker/store.Store type (#64294)
Let's not regenerate it 4 extra times -- it's already generated
once in a central dbworker/store/mocks package which we can reuse.
2024-08-06 18:40:25 +08:00
Christoph Hegemann
4ebb805bd3
fix: uses the same base64 for decoding we use for encoding the UsageCursor (#64290)
At the moment we decode via the "Std" encoding, which expects padding
but we encode with "Raw" which doesn't add any.

## Test plan

Manual testing
2024-08-06 17:38:00 +08:00
Varun Gandhi
38633daef0
chore: Consolidate mocks for uploads's Store type (#64286)
Instead of re-generating and re-compiling the mocks 3 times,
consolidate them into a single storemocks package.

This should also provide better code navigation in the editor.
2024-08-06 10:15:17 +01:00
Varun Gandhi
a0b2a1dd45
chore: Remove incorrectly logged warning (#64267)
The cursor state can be "" when the data is exhausted.
2024-08-05 09:43:43 +00:00
Varun Gandhi
c8c8f32aff
fix: Handle potential nil reference properly (#64265) 2024-08-05 17:41:09 +08:00
Varun Gandhi
7595615f07
fix: Don't propagate un-translated source ranges (#64263)
The `getSourceRange` function can have a range translation failure,
in which case it would be wrong to propagate the range directly
without the commit information. So skip the ranges in that case.
2024-08-05 15:03:55 +08:00
Varun Gandhi
ef77a1e0e1
fix: Handle sub-repo permissions in CodeGraphData API (#64241) 2024-08-05 12:31:07 +08:00
Christoph Hegemann
7fc92a5538
codeintel: make usage-range non-optional (#64236)
Closes
https://linear.app/sourcegraph/issue/GRAPH-727/make-range-for-usagerange-non-optional

## Test plan

Tests continue to pass
2024-08-02 14:51:57 +00:00
Varun Gandhi
26c309916e
chore: Use binary search over symbols array (#64240)
Right now, perf is dominated by slowness of gitserver, but let's
avoid the pessimization in the old code with multiple linear lookups.
2024-08-02 10:29:17 +00:00
Varun Gandhi
4f21a5a4b9
chore: De-duplicate CTE for visible uploads (#64232)
The visible uploads detection was copied in yet another place,
so de-duplicate that.
2024-08-02 13:58:33 +08:00
Varun Gandhi
f9bc3cf1df
fix: Prefer SCIP uploads over LSIF uploads (#64217)
When determining the set of visible uploads at a given commit,
we group uploads from scip-K and lsif-K for ranking purposes
to allow them to shadow each other. Generally, scip-K will end
up shadowing lsif-K, avoiding sending the client a mix of data
from a much older lsif-K index when newer data from a scip-K
index is available.

Fixes https://linear.app/sourcegraph/issue/GRAPH-787

The logic requires some final tie-breaking to avoid non-determinism
in case when there is a scip-K and lsif-K index at the same distance,
which I've done based on the upload_id in descending order
(upload IDs are monotonically increasing). Earlier, some code was
doing tie-breaking based on upload_id in _ascending_ order,
so I've changed that for consistency. This caused some test failures,
so I've updated the corresponding struct literals.

I've left some PR review comments to describe each modification
to the existing test cases.

## Test plan

Added a new dedicated test checking for shadowing behavior.

## Changelog

Fixes a bug where old LSIF uploads would also be used for code
navigation even when newer SCIP uploads were available for the
same language, potentially leading to duplicate results in the reference
panel. With this change, scip-go uploads shadow the uploads
for lsif-go, and similarly for other indexers following the
scip-X/lsif-X naming
convention.
2024-08-02 13:36:50 +08:00
Varun Gandhi
554fd33eff
chore: Factor out sub-query for locating nearest uploads (#64210)
A large sub-query was duplicated between two queries, making
the code harder to understand. This patch factors that out.

Ancillary changes:
- Added more doc comments
- Tweak tests to make failures easier to follow.
2024-08-01 21:05:03 +08:00
Anton Sviridov
7aba4732ee
Policy patch - leave fields unchanged if they're missing from request (#64174)
Fixes GRAPH-782

Specifically this allows us avoid updating syntactic_indexing_enabled
when we don't have the value available.

This is done to solve the problem where not providing that field crashes
the resolver. I confirmed it by both via tests and via manual testing of
the API
![CleanShot 2024-07-31 at 15 51
54](https://github.com/user-attachments/assets/2e82ff20-5541-4a7d-9dd6-17196274d59a)

## Test plan

- Added a new test for resolvers specifically to test the extra logic
around this field
2024-08-01 12:01:26 +01:00
Christoph Hegemann
a64832ab44
refactor: renames usage cursor and adds fields for future syntactic cursor (#64208)
Landing this early to avoid accumulating conflicts

## Test plan

Shouldn't change functionality as new fields are unused for now
2024-08-01 10:36:19 +00:00
Christoph Hegemann
5affc2dba5
chore(codeintel): replaces lsp.Range uses with scip.Range (#64178)
Collecting some red-line brownie points

## Test plan
Existing tests continue to pass
2024-07-31 12:25:33 +02:00
Varun Gandhi
ab2697a0d6
fix: De-dup and concurrent-ify file content requests & splitting (#64169)
This speeds up precise find usages from about 600ms to 400ms
for a test workload involving 16 precise usages across 5 files.

(Specifically for the `codenav.Service` type)

Fixes GRAPH-781

I have some more thoughts on speeding this up further, but
I'll likely look into that after adding tests.
2024-07-31 17:46:33 +08:00
Varun Gandhi
43b4341a25
chore: Unconditionally call endObservation (#64150)
I think the span will not be terminated if we don't call `endObservation`.
Not the end of the world since this span likely won't be useful,
but we should avoid skipping calls to `endObservation` for consistency.
2024-07-31 06:15:18 +00:00
Varun Gandhi
8f2479edd2
feat: Add support for precise usagesForSymbol (#64126)
This patch wires up the newly changed APIs in #64118 
to the GraphQL API, enabling precise support in the
usagesForSymbol API. It also handles pagination.

Fixes https://linear.app/sourcegraph/issue/GRAPH-573

## Test plan

Manually tested. Will add automated tests in follow-up PR.

## Changelog

- Adds support for precise code navigation to the experimental
`usagesForSymbol` API, which can be used to implement a
  reference panel or similar functionality.
2024-07-31 13:55:07 +08:00
Erik Seliger
c917330d6b
authz: Drop requirement for installing authz providers in every service (#63743)
This is a register call that is easy to forget. When forgotten, all queries against the repo store will block forever.

In addition, this adds a hard-dependency on conf to every services startup, plus a busy loop. With multi-tenant, this will not work great because authz providers would be a global, and we instead want most things to be ephemeral so they're per-provider. This is a step toward that, but doesn't yet remove the providers global variable.

Good news, it turns out that we don't actually need to register the providers in every service! The reason they were required was to check if zero providers are configured, or if authzbypass mode is enabled.

Authz bypass mode is usually ON, except when there are problems with the authz providers, meaning some authz providers might not be able to sync permissions. Bypassing of permissions is only ever happening if there are ALSO zero providers configured.

So this is basically an optimization for the case where an instance has zero authz configured so that the SQL queries are a bit simpler. This also helps in tests because with bypass mode on and no providers configured, authz enforcement is effectively off in the repo store.
This makes it so that in tests we need to do slightly more work, but also makes for a more realistic test vs at runtime setup. Also, it's highly recommended to use mocks for DB wherever possible in more high-level components to keep tests fast.

To never have a scenario where we accidentally mess up here and enable bypass mode erroneously, this PR drops that entirely. Authz is always enforced, but when a code host connection is unrestricted (i.e., will not spawn a provider) the repos are still visible, so this should be no change over before.

## Test plan

The stack starts and works, and all CI tests are still passing. Code review should help as well.
2024-07-31 01:23:34 +02:00
Varun Gandhi
3f0a85219c
chore: Move codenav types to lower-level package (#64141)
For the implementation of precise usagesForSymbol, I need to be
able to access some of these types in the codenav package directly, so move
a bunch of types there to avoid an import cycle: codenav -> resolvers -> codenav.
2024-07-30 12:25:11 +08:00
Varun Gandhi
963792f36e
chore: Bubble precise Usages instead of Locations (#64118)
This patch changes the code so that the core APIs we need
for precise usagesForSymbol actually return (Upload)Usages instead
of just (Upload)Locations, so that we have the usage kind information
as well as the symbol names available. I will be handling the GraphQL
layer for converting the Usages to appropriate Resolvers in a follow-up PR.

## Test plan

Modified a bunch of existing mock tests. 🤷🏽
2024-07-29 20:12:23 +08:00
Christoph Hegemann
3f8620508b
codeintel: Speed up syntactic and search-based usages using batch APIs (#64078)
Closes
https://linear.app/sourcegraph/issue/GRAPH-771/chunk-syntactic-usage-tasks-and-use-batch-apis-to-handle-chunks

This is the last step for properly implementing #63971. We split the
`candidateFiles` search produces into chunks and process these in
parallel using the new batch api on `MappedIndex`.

## Test plan
Existing tests continue passing

---------

Co-authored-by: Varun Gandhi <varun.gandhi@sourcegraph.com>
2024-07-29 04:09:05 +02:00
Vincent
2333ddeaab
fix: add extra check for code intelligence inference (#64083) 2024-07-26 15:22:52 +01:00
Varun Gandhi
6d981c60ad
chore: Update main occurrence extraction code to allow for symbol-based matching (#64082)
At the heart of the loop for extracting usages across a Sourcegraph
instance is the `extractLocationsFromPosition` function, which
extracts related symbols and source ranges from a single SCIP
Document. (Source ranges for returning to the user directly,
and related symbols to do further lookups, e.g. in the case
of inheritance.)

Since we want to perform matching based on symbol names in the upcoming
precise usagesForSymbol API, and also return symbol names for each
associated source range, this function needs to be updated to:
1. Be able to take a symbol name for doing lookups. This is done using
    the new `FindUsagesKey` type which allows two cases - position-based and
    symbol-based.
2. Be able to return symbol names associated with every source range.
    This is done by creating a new `UsageBuilder` type which somewhat subsumes
    the `Location` type. We avoid copying the same 'UploadID' and 'Path'
    fields eagerly for clarity; that will be handled by callers in the future when
    they mix `UsageBuilder` values across different Documents (by first calling `build`).

For the above, I've introduced a new func `extractRelatedUsagesAndSymbolNames`,
and `extractLocationsFromPosition` delegates to that. In the future,
`extractLocationsFromPosition` will be removed.
2024-07-26 18:28:43 +08:00
Anton Sviridov
7e5b52ea67
Syntactic indexing policy toggle: UI and supporting APIs (with backcompat) (#64075)
Fixes GRAPH-751
Fixes GRAPH-636

Reverts sourcegraph/sourcegraph#64046 which itself was a revert of
https://github.com/sourcegraph/sourcegraph/pull/63876 due to backwards
compatibility concerns.

This new PR adds two extra commits which restore the `indexingEnabled`
field, and make `syntacticIndexingEnabled` field optional.

---

Fixes GRAPH-636
Fixes GRAPH-751

This PR aims to allow admins to enable syntactic indexing on per-policy
basis using the existing UI and APIs with the following amendments:

- New UI toggle for syntactic indexing – **only visible if the
experimental feature is enabled** (see screenshots below)
- New field in GraphQL API for syntactic indexing
- New Go resolvers to handle the new field

To support this on the backend, we also update the policies handling
code and SQL queries to ensure we can retrieve and update the
`syntactic_indexing_enabled` field.

<!-- PR description tips:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e
-->

## Test plan

- Backend part covered by updated policies tests
- Manual testing of UI

With experimental feature **enabled**:

![CleanShot 2024-07-22 at 12 56
56](https://github.com/user-attachments/assets/e95aa224-be9a-40a6-9e42-6f409478e2fc)

With experimental feature **disabled**:
![CleanShot 2024-07-22 at 12 57
21](https://github.com/user-attachments/assets/0d46a65d-95bc-4695-a3df-ad9aa86dbd36)

<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->

## Changelog

- In site-admin APIs for updating code intelligence policies:
  - field `indexingEnabled` is renamed to `preciseIndexingEnabled`
- a required `syntacticIndexingEnabled` is added (only takes effect if
experimental feature is enabled)
  - field `forIndexing` is renamed to `forPreciseIndexing`

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
2024-07-25 20:03:56 +01:00
Varun Gandhi
cadb6d8e70
chore: Unify LsifStore.*SymbolUsages APIs into one (#64076)
For precise usagesForSymbols, we want to propagate usages everywhere
(with associated symbol names, not just 'Location' values). This PR
introduces the new Usage type, and unifies the old GetBulkSymbolUsages and
GetMinimalBulkSymbolUsages APIs into a single GetSymbolUsages API.

We convert the Usage values to Location to avoid changing a lot of code
at once.

We also change the DB query to do grouping and aggregation for us
instead of doing it in Go code.

---------

Co-authored-by: Christoph Hegemann <christoph.hegemann@sourcegraph.com>
2024-07-25 20:45:47 +08:00
Christoph Hegemann
77fbfbda16
codeintel: Adds a batch api for fetching multiple documents to MappedIndex (#64073)
Closes
https://linear.app/sourcegraph/issue/GRAPH-770/batch-api-for-mappedindex

Depends on #64024 

## Test plan

Added unit test
2024-07-25 10:32:59 +02:00
Christoph Hegemann
f7688ff5cb
codeintel: allows batch retrieval of SCIP documents (#64024)
Closes
https://linear.app/sourcegraph/issue/GRAPH-766/batch-api-for-fetching-scip-documents

## Test plan
Tested by being used for the single-path retrieval path, will be tested
more once its used in follow-up PRs with more batching
2024-07-25 08:04:15 +00:00
Christoph Hegemann
e0502149cc
codeintel: returns surroundingContent from search result (#64022)
Closes
https://linear.app/sourcegraph/issue/GRAPH-765/use-surroundingcontent-returned-from-the-search-client

The primary reason for doing this is to avoid sending a bunch of extra
calls for file contents to gitserver, which dominate the profile/trace
after implementing the remaining opts from
https://github.com/sourcegraph/sourcegraph/pull/63971

Also limits the API to only return the line containing the match for now
as discussed with Camden.

We might need to bring back the `lineGetter` for precise usages, but it
should be easy enough to revive at that point /cc @varungandhi-src

## Test plan

Updated unit tests, some manual testing in the API console
2024-07-25 09:50:33 +02:00
Michael Lin
56817a7a9e
Revert "Syntactic indexing policy toggle: UI and supporting APIs" (#64046)
Reverts sourcegraph/sourcegraph#63876
https://sourcegraph.slack.com/archives/CHXHX7XAS/p1721835847010889

### Test plan

CI
2024-07-24 15:59:48 +00:00
Anton Sviridov
c14f892e9f
Syntactic indexing policy toggle: UI and supporting APIs (#63876)
Fixes GRAPH-636
Fixes GRAPH-751

This PR aims to allow admins to enable syntactic indexing on per-policy
basis using the existing UI and APIs with the following amendments:

- New UI toggle for syntactic indexing – **only visible if the
experimental feature is enabled** (see screenshots below)
- New field in GraphQL API for syntactic indexing
- New Go resolvers to handle the new field

To support this on the backend, we also update the policies handling
code and SQL queries to ensure we can retrieve and update the
`syntactic_indexing_enabled` field.

<!-- PR description tips:
https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e
-->

## Test plan

- Backend part covered by updated policies tests
- Manual testing of UI

With experimental feature **enabled**:

![CleanShot 2024-07-22 at 12 56
56](https://github.com/user-attachments/assets/e95aa224-be9a-40a6-9e42-6f409478e2fc)

With experimental feature **disabled**:
![CleanShot 2024-07-22 at 12 57
21](https://github.com/user-attachments/assets/0d46a65d-95bc-4695-a3df-ad9aa86dbd36)

<!-- REQUIRED; info at
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles
-->

## Changelog

- In site-admin APIs for updating code intelligence policies:
  - field `indexingEnabled` is renamed to `preciseIndexingEnabled`
- a required `syntacticIndexingEnabled` is added (only takes effect if
experimental feature is enabled)
  - field `forIndexing` is renamed to `forPreciseIndexing`

<!-- OPTIONAL; info at
https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c
-->
2024-07-24 13:57:33 +01:00
Varun Gandhi
2663704fed
chore: Change some APIs to use symbols instead of monikers (#64029)
Monikers carry a bunch of redundant fields which basically consist of
information parsed out of the symbol name. This patch cleans up
some APIs to work with symbol names directly instead of monikers.
2024-07-24 20:11:03 +08:00
Christoph Hegemann
19db59b72c
Removes the old GitTreeTranslator API (#64027)
Following through after #63938 

## Test plan

Existing tests continue to pass
2024-07-24 09:22:42 +00:00
Varun Gandhi
dc7da57edb
chore: Make location fetching queries more uniform (#64026)
This patch changes the location querying code so that:
1. We're populating structures corresponding to SCIP instead of LSIF
   (with "scheme" and "identifier" inside "MonikerData")
2. Avoid repeatedly allocating a constant string 'scip' for the scheme
    only to throw it away later.
3. Makes the two queries and their scanning code more similar for easier
    comparison. When I land precise usagesForSymbol, I will de-duplicate
    some of the scanning code between these two queries.

I have avoided renaming all of the local variables to avoid creating
more noise.

## Test plan

Covered by existing tests.
2024-07-24 11:14:22 +02:00
Varun Gandhi
1b0e004d03
chore(codeintel): Rename uploads/../Store -> codegraph/../DataStore (#64001)
Also consolidates mocks to avoid wasting time generating
and compiling them repeatedly, and reducing pollution of
Find Usages results in the editor (yes, I know these can be
filtered out, but it's silly to have multiple copies).
2024-07-24 08:24:20 +01:00
Christoph Hegemann
eefcc7ea14
codeintel: fetches diffs from gitserver in batches (#64025)
Closes
https://linear.app/sourcegraph/issue/GRAPH-769/fetch-diffs-in-batches

## Test plan
Covered by existing tests (noticeable because tests need updates as we
now use the path out of the returned diff)
2024-07-24 06:12:47 +00:00
Christoph Hegemann
46e589d6c8
codeintel: runs occurrence and symbol search in parallel (#64023)
Closes
https://linear.app/sourcegraph/issue/GRAPH-767/search-based-run-occurrence-search-and-symbol-search-in-parallel

## Test plan

Covered by existing tests and checking the trace to see executes
overlapping:

![image](https://github.com/user-attachments/assets/75d9644f-cae7-46b9-a1bf-d3cdbb396dc8)
2024-07-24 04:36:49 +00:00
Christoph Hegemann
34e40c372c
codeintel: GitTreeTranslator rewrite (#63938)
Closes
https://linear.app/sourcegraph/issue/GRAPH-742/simplify-gittreetranslator-implementation

- use unified diff format and 0 context lines to allow translation by
just using hunk headers
- only request hunks per file once, and sync follow-up requests for the
same file
- don't bother LRU caching the full hunk contents, just store 4 int32's
per hunk
- replace linear search through sorted hunks with binary search

Current PR keeps the old PR and just proxies to the new implementation.
Once this PR is reviewed and merged I'll remove the old API and update
all callsites.

## Test plan
New/existing tests
2024-07-24 04:13:36 +02:00
Varun Gandhi
d93c0fdf90
chore: Document tech debt wrt multiple doc traversals (#64006) 2024-07-23 15:19:20 +02:00
Varun Gandhi
978abd5020
chore(codeintel): Pass UsageKind instead of raw column names (#64003)
Pass down semantic information down as much as possible,
delaying lossy conversion to table names till the very end.

I will be introducing the `exhaustive` linter in a future PR
(https://golangci-lint.run/usage/linters/#exhaustive)
so that we can get a static error if you don't switch exhaustively.
2024-07-23 12:28:55 +02:00