In this PR (https://github.com/sourcegraph/sourcegraph/pull/45042), it
is noted that:
> Removed lifecycle configuration from uploadstore, instead relying just
> on our own builtin background worker to expire objects.
As a result, the `TTL` field on the GCS Client was not used anywhere. So
this patch removes the TTL field.
- The `internal/uploadstore` package is renamed to `object` indicating
that it is meant to provide a generic object storage wrapper.
- The `search/exhaustive/uploadstore` package is used in very few places
so I've merged into the `internal/search` package similar to
`internal/embeddings`.
There are a few reasons to do the renaming.
1. The word `upload` in a more general context is ambiguous (just in
`internal/`) - in the codeintel context, it means "SCIP index" but it
can also be interpreted generically ("upload of _some_ data").
2. Better readability - `object.Storage` is much shorter than
`uploadstore.Store`. Additionally, we use the term `Store` A LOT
in the codebase, and usually, these refer to wrappers over some
tables in some DB.
Making things worse, some of our code also has:
```
uploadsstore
"github.com/sourcegraph/sourcegraph/internal/codeintel/uploads/internal/store"
```
And code which says `uploadsstore.Store` (notice the extra `s` 😢), which
is actually a wrapper over some key DB tables like `lsif_uploads`.
In our backend, we mostly follow the convention that type names
for the resolvers match up with the types in the GraphQL API,
with the extra `Resolver` suffix. However, this is broken by the
`GitTreeEntryResolver` where there were no types called
`GitBlobResolver` or `GitTreeResolver`, since all functionality
was implemented on top of `GitTreeEntryResolver`.
Since Go supports implicit method forwarding for embedded structs,
I think we can improve the code consistency & readability by adding
these two stub resolver types.
As struct embedding is not a super common pattern, I've explicitly
mentioned that in the doc comments for `GitBlobResolver` and
`GitTreeResolver`.
With this patch, the `errors.HasType` API behaves similar to `Is` and `As`,
where it checks the full error tree instead of just checking a linearized version
of it, as cockroachdb/errors's `HasType` implementation does not respect
multi-errors.
As a consequence, a bunch of relationships between HasType and Is/As that
you'd intuitively expect to hold are now true; see changes to `invariants_test.go`.
This PR migrates the ref specific calls to a gRPC endpoint.
While looking at this, I realized that we have many methods that want the same thing in the end: refs.
So instead of adding various endpoints (that currently use porcelain commands in the client implementation), we now use the plumbing command `for-each-ref` for all of them.
By that, we can combine the following APIs:
- `ListRefs` -> `ListRefs` (stays the same)
- `ListBranches` -> `ListBranches(headsOnly: true)`
- `ListTags` -> `ListBranches(tagsOnly: true)`
- For endpoints that were interested in both branches and tags, we allow the combination of the two `only` flags
- `BranchesContaining` -> `ListRefs(contains: sha)`
- `RefDescriptions` -> `ListRefs(headsOnly: true, tagsOnly: true)`
Through one simple call to `for-each-ref`, we also get to enjoy consistent ordering throughout. This also fixes an issue and improves perf for the branch picker where ordering would be random once the count hits 1000.
Closes https://github.com/sourcegraph/sourcegraph/issues/55074
Closes https://github.com/sourcegraph/sourcegraph/issues/60415
## Test plan
Added tests at various layers, for gitcli, for the gRPC server, for the new client.
Also, existing tests still pass after this migration, including integration and E2E tests.
Manually ran my instance and clicked around as well, to verify.
I _ALWAYS_ stumble over this double negation. "If not noensurerevision". huh?
Also, we should not encourage people to accidentally make the server do a resolve revision call which is generally costly, so the cheap option as the default seems more desirable.
For now, I haven't changed any of the values, but we should also do another review of the cases where we explicitly set it to true.
Test plan:
Just converted a bool, CI and E2E tests should catch any issues.
This PR consolidates two methods that were effectively doing the same thing: `ReadFile` and `NewFileReader` - where `ReadFile` used the latter under the hood, and wrapped an `io.ReadAll`.
Given entire files are returned from this function that can be of arbitrary size, exposing a simple high-memory-use footgun made me uneasy here, and having to use io.ReadAll explicitly as the consumer hopefully makes the person using this API feel the same way.
Sometimes, we can currently not avoid it (GQL layer), but where possible, using the reader is very likely the better option.
Since there would've been a slight change in how errors are reported, I also fixed that. Previously, `NewFileReader` would NOT return `os.ErrNotExist` errors, those would only be discovered when the returned reader is called.
Now, like `ReadFile`, and like `os.Open` we immediately return that error.
This comes at the cost of making a second gRPC request. This is not great, but we will migrate the FileReader API to a proper gRPC hopefully ASAP, which will let us optimize it better on the server-side.
Since Varun also asked for perf improvements here, this is probably one of the first things that would be easy to switch over to go-git, eliminating probably 50%+ of the overhead here.
Most of our gitserver client methods require a subrepo permissions checker, but we require the caller to pass one in every time. This adds the checker to the gitserver client struct so that it can be used for all methods. Since it's one more thing that needs to be overridden for some tests, I added some more general helpers around constructing a test client as well.
Some history: the authz.DefaultSubrepoPermsClient was written as an global variable that was overridden during the enterprise init step. However, we no longer have an enterprise init step, so this does not need to be a global, mutable variable. This takes the first step towards making that not the case by removing many of the sites that reference the global authz.DefaultSubrepoPermsClient, consolidating all those references to the NewClient() method, which eventually should be made to require a subrepo perms client as a dependency rather than relying on the global.
Continuation of work started in #54788. Some more to follow in a
separate PR.
Primarily migrated usages in internal and enterprise/internal packages
and as a result the code changes that follow in other packages are
related to propagate database.DB/gitserver.Client.
Also changes to internal/repos as well to do the migration.
Part of #54741.
Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com>
To colocate forks and parent repositories on the same shard we will need
to consult the database when doing AddrForRepo. The is a partial
migration to injecting the DB into the gitserver client for that use
case. This PR does not actually use the field yet since we won't be able
to use it until all call sites are migrated. There are individual
commits containing mostly mechanical refactors to pass in a DB. There
are some other commits which manually migrated our use of NewClient. But
there are still many call sites which we didn't look at yet / do not
have a mechanical transformation. Those will be tackled in follow up
commits. For now all non-migrated sites now use the telling constructor
name NewClientDeprecatedNeedsDB.
Alternatively we considered introducing DB as an argument to
AddrForRepo, but this similarly will be a large change. We might still
do that since ClientSource is a singleton in practice. Additionally the
client having this dependency feels like how we would design it when
creating this from scratch.
Test Plan: go test. All changes which introduced the use of a DB used an
existing one. The only risk here is we introduce a new way to access the
db => something is nil along the way and there is no test coverage of
it. We only did that in one or two places, and they are safe. The vast
majority are mechanical changes which do not exercise anything new.
Co-authored-by: @indradhanush
Movin enterprise codeintel stuff (and ~two others that had to be dragged
along) out of `enterprise/internal` and into `internal` as part of the
shift towards enterprise-only
## Test plan
Successfully built frontend with bazel, CI will check the rest
😎 no logic changed, just shufflin things around
We want to get rid of special casing src-cli. This is the first step towards that. src-cli will still be required for batch changes for the time being but I am working on resolving that as well, but it'll be more involved.
To make sure that customers using custom registries can still use this just fine, I added 2 new site settings that make this image configurable. This will also allow to ship a src hot fix to a specific customer, if src is version compatible with that backend.
The service pattern is a great long term goal, but it currently makes the assumption that we can freely use services with each other, and that is not true yet, sadly. For example, it is impossible to import from graphqlbackend because it caused a cyclic dependency. That is required though to extend the API.
To unblock, this will revert that new pattern for now and we can look at it more wholistically later.
Since the new code intel pattern requires some of the `graphqlbackend` internals, but shouldn't import from a `cmd` directory, I've moved these into a separate internal package to get rid of 5x the same code duplicated across the codebase.
Those can be useful generally, so moving them out of the frontend command. Also, they're not specific to it. Looking at the updated imports, it has already been used outside, but created a bad dependency on frontend code.
Next step would be to take a generic DB type thus getting rid of the dependency on internal/database, so we can also use that package in there.