This PR aims to craft the /internal/tenant package for use by all Sourcegraph cluster-internal services to properly scope data visibility to the correct requesting tenant.
For now, we only expose methods we know we will DEFINITELY need.
This PR also adds the required middlewares so we can start to tinker with it in implementations.
## Test plan
CI passes. We don't enforce anything for now except not passing unparseable tenant IDs, which should be fine.
2nd attempt of #63111, a follow up
https://github.com/sourcegraph/sourcegraph/pull/63085
rules_oci 2.0 brings a lot of performance improvement around oci_image
and oci_pull, which will benefit Sourcegraph. It will also make RBE
faster and have less load on remote cache.
However, 2.0 makes some breaking changes like
- oci_tarball's default output is no longer a tarball
- oci_image no longer compresses layers that are uncompressed, somebody
has to make sure all `pkg_tar` targets have a `compression` attribute
set to compress it beforehand.
- there is no curl fallback, but this is fine for sourcegraph as it
already uses bazel 7.1.
I checked all targets that use oci_tarball as much as i could to make
sure nothing depends on the default tarball output of oci_tarball. there
was one target which used the default output which i put a TODO for
somebody else (somebody who is more on top of the repo) to tackle
**later**.
## Test plan
CI. Also run delivery on this PR (don't land those changes)
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
During an investigation, we saw that Rockskip was not using scip-ctags
for symbol parsing when applicable. This means that
1. Rockskip is getting less than optimal symbols for certain languages
(like Go)
2. Rockskip is getting no symbols for languages not in universal ctags
(Magik)
This PR attempts to solve this problem but updating Rockskip to re-use
the ctags parser pool logic from symbol service.
### Key Changes
- Update parser pool to be re-usable
- Push common logic for parser type detection into the parser pool
module
- Update rockskip service config to take a parser pool
- Update and add unit/integration tests
## Questions
- What performance impact will using this pooled parser have compared to
its previous behavior of spawning a new ctags process each time?
## Test plan
- [x] Add unit tests
- [x] Update integration tests
- [x] Manually test rockskip
- [x] Manually test symbolservice (in case of regression)
---------
Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com>
This PR restructures the packages to move all symbols-only code into the
symbols service. This helps to reason better about which service is
accessing what datastores.
Test plan:
Just moved code, compiler and CI are happy.
This PR tinkers a bit with building a test helper to run integration
tests that are still ~lightweight against a real gitserver.
The caller can either clone a real repo to disk / embed it in the git
repo, or can create a small repo on the fly, and then get a running
gitserver gRPC server that returns all the data required.
These tests should only exist outside of cmd/ and internal/, as there is
a big potential to do cross-cmd imports from here, which can cause bad
coupling. But for just these tests, that should be fine.
The most trivial rockskip indexing job that I put in here to POC this
runs in 6.3s, including all setup and teardown. That seems very
reasonable to me.
Test plan:
The POC test passes.
This PR is a result/followup of the improvements we've made in the [SAMS repo](https://github.com/sourcegraph/sourcegraph-accounts/pull/199) that allows call sites to pass down a context (primarily to indicate deadline, and of course, cancellation if desired) and collects the error returned from `background.Routine`s `Stop` method.
Note that I did not adopt returning error from `Stop` method because I realize in monorepo, the more common (and arguably the desired) pattern is to hang on the call of `Start` method until `Stop` is called, so it is meaningless to collect errors from `Start` methods as return values anyway, and doing that would also complicate the design and semantics more than necessary.
All usages of the the `background.Routine` and `background.CombinedRoutines` are updated, I DID NOT try to interpret the code logic and make anything better other than fixing compile and test errors.
The only file that contains the core change is the [`lib/background/background.go`](https://github.com/sourcegraph/sourcegraph/pull/62136/files#diff-65c3228388620e91f8c22d91c18faac3f985fc67d64b08612df18fa7c04fafcd).
We want to migrate the LogReverseEach call to gRPC methods as well.
However, it isn't a great practice to have to keep this process running for potentially hours, as any server restart will have to make the entire process start over.
With one call per commit, we're not occupying a process slot on gitserver, and rollouts of gitserver should much less affect rockskip, as individual commits can be retried before being returned to the rockskip code.
Closes#62100
Test plan:
Existing tests still pass, I let it index a repo locally and that was fast and I get results for symbols instantly now, but not a Rockskip expert so requesting review from the owners.
This will be useful for commit searches, and is generally supported by git so why not support it here. That also lets you express `A..B` as `B, ^A` if you want to do less string concatenation, for example.
Test plan:
Made sure all existing callers pass a non-empty string to the slice if they specify the slice at all, and all existing tests still pass.
This adds support for T statuses (type changed) to ChangedFiles. This was previously hard-erroring, which was causing problems for code insights and hybrid search.
This commit refactors the Rockskip metrics logic to use `observationCtx`. In
tests, we now pass `TestContext` which has a no-op metrics registerer.
This change lets us run more than one test that spins up a Rockskip service.
Without it, tests fail with "duplicate metrics registered" errors.
This PR replaces the gitserver client method RevList by the Commits method.
The Commits method can do the same, but it saves us from implementing and testing a second API. Also, the name RevList sounded very generic but it was super opinionated in the flags it uses, so it was really "ListCommitsForRockskip" but that is too application-specific so Commits is probably just as good.
We'll learn a bunch about our API endpoints as we finish migrating them all, so we can do proper optimizations afterwards.
Closes https://github.com/sourcegraph/sourcegraph/issues/62098
Test plan:
Integration and unit test for rockskip are still passing.
We want to migrate this call to gRPC. However, it isn't good practice to have to keep this process running for potentially hours, as any server restart will have to make the process start over.
With pagination here, we're not occupying a process slot on gitserver, and rollouts of gitserver should much less affect rockskip, as individual pages can be retried before being returned to the rockskip code.
Test plan:
Existing tests still pass, but not a Rockskip expert so requesting review from the owners.
This PR adds an initializer function that will be called from the svcmain package through the call to tracer.Init.
The reasoning behind this is that it's very easy to accidentally use a package that uses conf internally from a service like Cody Gateway, Appliance, Migrator, or other MSP services and just because we don't have any config source for the trace ID should not let the process stall entirely.
To make it absolutely clear that a dependency is safe to use from a conf perspective, this change indicates that well by dropping the dependency on it entirely and making it another packages concern to pass this type down.
Test plan:
All tests for Sourcegraph are still passing.
---------
Co-authored-by: Robert Lin <robert@bobheadxi.dev>
We are getting "fatal error: concurrent map writes" in production when
enabling rockskip at large customers. The root cause is the repoToSize
map which is read for every search request (and written every "cache
miss").
This was the easiest change to make. We should probably also be single
flighting this request, but that can be changed at another time.
Additionally fixed another bug where in the cache miss case we would do
`size :=` and therefore not update size from 0, which then does a
sqliteSearch. This means on the very first request to a rockskip repo we
would always do sqliteSearch which is also a waste of resources.
Test Plan: Stefan manually testing :)
* wip
* gitserver (mostly) wolfi 4 bazel
* the big heck of all things
* Add rules_apko lock translation rules to WORKSPACE
* Call apko_repositories() more
* fix rules_apko to handle our shorter repo urls
* fix workspace from rebase, and missing locks
* visibility on wolfi_base_image
* hand-fix a lock coz apko lock is 🅱️roken
* remove chainguard repo+keyring from base
* update locks
* add chainguard repo+keychain to single server manifest
* unrelated fixes, server+grafana still h*cked
* fix postgres-exporter
* the big fix
* aws lib got bumped?
* downgrade sso-oidc? idk
* ignore wolfi locks from prettier
* dynamically do the locks with a reporule
* document and make nice :nails:
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Update tooling for end-to-end Bazel images (#61106)
* Update sg wolfi image to build using Bazel
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Add update-images and implement apko YAML change monitoring
* Use bazel apko and add support for additional repos
* Refactor sg wolfi
* Rework wolfi base image auto-update pipeline
* sg bazel configure
* [rough] Add --check flag to sg wolfi lock
* Refactor sg wolfi lock --check
* Simplify check and update apko lock hash operations
* Fix resolveImagePath when running in bazel
* Fixup logic error in CheckApkoLockHashes
* Tweak DoBaseImageBuild output
* Remove debug output
* Fix sg wolfi lock --check behaviour for all images
* Replace base image build step with apko lock --check
* Remove debug line
* Minor fixups for CI step
* Wrap with AnnotatedCmd
* Fixup annotation
* Update apko lockfiles
* Allow additional repos to be passed
* Update build-base-image.sh with bazel + add back to pipeline
* Ensure that modified base images are rebuilt
* Solve bazelception
* Remove timestamp for bit-level reproducibility
* Skip local keygen when running on buildkite
* Add workaround for lack of local repo support in rules_apko
* Run apkoOps first as it's quick and might fail
* Remove blocking allBaseImagesBuilt step
* Remove unused promethus-gcp image
* Add special cases to resolveImagePath
* Cleanly handle case where no bazel build path exists
This could happen in cases where a base image is only used outside of sourcegraph/sourcegraph,
or if you've added a new base image config but haven't added the associated Bazel scaffolding
* Add debugging around failing docker builds
* More debugging
* Normalise apko_lockfile to match repo.bzl
* Fixup apko docker call
* Try passing imageconfigdir differently to docker
* Run ls in different container
* Soft-fail when using legacy build in Buildkite
* Add missing include
* Workaround for building sourcegraph and sourcegraph-dev
* Add postgresql-client package to server
This contains createdb, which was recently moved from postgresql
* Inflate postgres-12-codeinsights image to avoid rules_apko errors
* Remove update line from yaml files
* Fix issue caused by moving base sourcegraph image
* Remove apk-tools from server
* Update lockfiles
* Address review feedback
* Remove debug lines
* fix unbound var
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
* go mod tidy + gazelle-update-repos after merging main
* Use aspect bazel cache
* Use Aspect bazel caching when calling bazel in bash and sg
* Append annotation
* Run apko lock on aspect agent
* Remove base image builds
Discussion in https://sourcegraph.slack.com/archives/C05EVRLQEUR/p1712307465660509
* Remove unused functionality
* Update BaseImageConfig comments
* Rewrite wolfi-images/README.md
* Add .apko/range.sh to .gitattributes
* Remove "wolfi" from :base_image and :base_tarball targets
* remove allowlist extras from debugging
* Tweak user instructions around package testing
* Add agent healthcheck to buildkite scripts
* prettier
* sg bazel configure
* bazel run //:gazelle-update-repos
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
Previously, some of these validators would not fire correctly, because some packages aren't imported from cmd/frontend.
The package that contributes validators MUST be imported into frontend, which is not guaranteed. By colocating the GetWarnings func and the validators in one package, this is effectively enforced.
There are a few left that were a little harder to migrate, but this feels like a useful first step (and the end of my time window for this rn 😬)
Test plan:
Moved code around, my instance can still start fine and doesn't report new errors.
Understanding how Parse works (and its callers) with respect to
goroutines and channels is a sure path to the asylum. This is a short
term fix around how it behaves when the context is canceled.
From what I can tell we can have a large number of parse requests go
through handleParseRequest even once the context has been canceled,
leading to large amounts of logging of errors which all amount to
context canceled. This is a short term fix which drains the channel if
we know we are just gonna fail our call to handleParseRequest.
Test Plan: go test. Didn't manual test, but the code change is minor
enough that I trust it (famous last words).
observation.TestContextTB is better to use since your logs will be
scoped to your test and it will use a more pedantic prometheus registry.
To be honest TestContext should be removed but this is the first step.
This is a mechanical change. I replaced "&observation.TestContext" with
"observation.TestContextTB(t)". I then undid the change each time it
caused a compilation error (was only a handful of times).
Test Plan: go test
We are seeing lots of log spam related to cancelation of parse requests.
We haven't fully tracked this down yet, but the behaviour of spamming
logs in this case hides the real issue.
This adjusts the implementation in two places:
- We introduce a parseCanceled metric so we can still track this. This
is used instead of parseFailed in case of cancelation. In this case we
also do not emit the "Closing failed parser" even though we are. This
is due to log spam.
- We introduce an ErrorFilter just for handleParseRequest that will
elide Canceled from the logs. This is called per file. Note: we keep
the error logging for canceled in the high level Parse handler which
runs across all files.
Test Plan: go test. Unfortunetly that is as good as it gets for now, as
I investigate the root cause further I will test this code path (and
only merge once I have manually triggered and tested this code path).
This PR removes the unused `ctagsBinary` field from symbols server and its
set-up commands. The binary that's actually used is loaded through a separate
codepath.
We can remove it since it is only ever created and then closed, never
read or written to. It seems like at some point we moved to using
parseRequestOrErrors and didn't remove parseRequests.
Test Plan: go test
* wip
* gitserver (mostly) wolfi 4 bazel
* the big heck of all things
* Add rules_apko lock translation rules to WORKSPACE
* Call apko_repositories() more
* fix rules_apko to handle our shorter repo urls
* fix workspace from rebase, and missing locks
* visibility on wolfi_base_image
* hand-fix a lock coz apko lock is 🅱️roken
* remove chainguard repo+keyring from base
* update locks
* add chainguard repo+keychain to single server manifest
* unrelated fixes, server+grafana still h*cked
* fix postgres-exporter
* the big fix
* aws lib got bumped?
* downgrade sso-oidc? idk
* ignore wolfi locks from prettier
* dynamically do the locks with a reporule
* document and make nice :nails:
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Update tooling for end-to-end Bazel images (#61106)
* Update sg wolfi image to build using Bazel
* bazel run @rules_apko//apko patch
* Fix .typo.typo
* Add update-images and implement apko YAML change monitoring
* Use bazel apko and add support for additional repos
* Refactor sg wolfi
* Rework wolfi base image auto-update pipeline
* sg bazel configure
* [rough] Add --check flag to sg wolfi lock
* Refactor sg wolfi lock --check
* Simplify check and update apko lock hash operations
* Fix resolveImagePath when running in bazel
* Fixup logic error in CheckApkoLockHashes
* Tweak DoBaseImageBuild output
* Remove debug output
* Fix sg wolfi lock --check behaviour for all images
* Replace base image build step with apko lock --check
* Remove debug line
* Minor fixups for CI step
* Wrap with AnnotatedCmd
* Fixup annotation
* Update apko lockfiles
* Allow additional repos to be passed
* Update build-base-image.sh with bazel + add back to pipeline
* Ensure that modified base images are rebuilt
* Solve bazelception
* Remove timestamp for bit-level reproducibility
* Skip local keygen when running on buildkite
* Add workaround for lack of local repo support in rules_apko
* Run apkoOps first as it's quick and might fail
* Remove blocking allBaseImagesBuilt step
* Remove unused promethus-gcp image
* Add special cases to resolveImagePath
* Cleanly handle case where no bazel build path exists
This could happen in cases where a base image is only used outside of sourcegraph/sourcegraph,
or if you've added a new base image config but haven't added the associated Bazel scaffolding
* Add debugging around failing docker builds
* More debugging
* Normalise apko_lockfile to match repo.bzl
* Fixup apko docker call
* Try passing imageconfigdir differently to docker
* Run ls in different container
* Soft-fail when using legacy build in Buildkite
* Add missing include
* Workaround for building sourcegraph and sourcegraph-dev
* Add postgresql-client package to server
This contains createdb, which was recently moved from postgresql
* Inflate postgres-12-codeinsights image to avoid rules_apko errors
* Remove update line from yaml files
* Fix issue caused by moving base sourcegraph image
* Remove apk-tools from server
* Update lockfiles
* Address review feedback
* Remove debug lines
* fix unbound var
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
* go mod tidy + gazelle-update-repos after merging main
* Use aspect bazel cache
* Use Aspect bazel caching when calling bazel in bash and sg
* Append annotation
* Run apko lock on aspect agent
* Remove base image builds
Discussion in https://sourcegraph.slack.com/archives/C05EVRLQEUR/p1712307465660509
* Remove unused functionality
* Update BaseImageConfig comments
* Rewrite wolfi-images/README.md
* Add .apko/range.sh to .gitattributes
* Remove "wolfi" from :base_image and :base_tarball targets
* remove allowlist extras from debugging
* Tweak user instructions around package testing
* Add agent healthcheck to buildkite scripts
* prettier
---------
Co-authored-by: Noah Santschi-Cooney <noah@santschi-cooney.ch>
Co-authored-by: Noah S-C <noah@sourcegraph.com>
Previously, errors from the Symbol service were not propagated due to a regression that occurred when we switched to grpc. This change fixes that. Additionally, we now signal whether the upper limit of symbol matches was hit so that we can show a notification.
fixes#60698
This fixes the following bugs for unindexed symbol search:
Now we respect the limit given by "frontend". Before, the upper limit of unindexed symbol search was hard coded to 500. This limit could not be overwritten with the count: filter.
Merge symbol matches properly, applying the same logic as we do for indexed search: Before, symbol matches within the same file were simply appended which created duplicates for AND/OR queries if separate terms of the query matched the same symbol.
Test plan:
New unit test
Partially covered by existing tests
The symbol service already stores each file's detected language. Now, when the
'search-content-based-lang-detection' feature is enabled, the symbols service
filters directly on the language column.
This PR doesn't update Rockskip, since it doesn't store the detected language.
Adds a new:
- gazelle generator
- rule + rule targets + catchall target
for generating go-mockgen mocks & testing for their being up-to-date.
Each go_mockgen macro invocation adds targets for generating mocks, copying to the source tree, as well as testing whether the current source tree mocks are up-to-date.
How to use this: `bazel run //dev:go_mockgen` for the catch-all, or `bazel run //some/target:generate_mocks` for an individual package, and `bazel test //some/target:generate_mocks_tests` to test for up-to-date-ness. There is no catch-all for testing
This currently uses a fork of go-mockgen, with an open PR for upstream here: https://github.com/derision-test/go-mockgen/pull/50.
Closes https://github.com/sourcegraph/sourcegraph/issues/60099
## Test plan
Extensive testing during development, including the following cases:
- Deleting a generated file and its entry in a go_library/go_test `srcs` attribute list and then re-running `sg bazel configure`
- Adding a non-existent output directory to mockgen.test.yaml and running the bash one-liner emitted to prepare the workspace for rerunning `sg bazel configure`
The existing config tests a lot of existing paths anyway (creating mocks for a 3rd party library's interface, entries for a given output file in >1 config file etc)
This change is to mitigate excessive remote cache network traffic in the event that oci_tarball targets are cache busted en masse.
Only //cmd/server:image_tarball and //docker-images/executor-vm:image_tarball and used as inputs to downstream targets so only
these two will be built and remote cached on CI.
This PR migrates the ArchiveReader function's use of the exec endpoint to be in line with the rest of #59738
Additionally, sub-repo permission checks are now done on the server instead of in the client code.
---------
Co-authored-by: Erik Seliger <erikseliger@me.com>
We have a number of docs links in the product that point to the old doc site.
Method:
- Search the repo for `docs.sourcegraph.com`
- Exclude the `doc/` dir, all test fixtures, and `CHANGELOG.md`
- For each, replace `docs.sourcegraph.com` with `sourcegraph.com/docs`
- Navigate to the resulting URL ensuring it's not a dead link, updating the URL if necessary
Many of the URLs updated are just comments, but since I'm doing a manual audit of each URL anyways, I felt it was worth it to update these while I was at it.
Changes needed due to Tree-sitter grammar updates (implicitly updated
when updating the smacker/go-tree-sitter bindings since it bundles the
grammars)
- Previously, the Java grammar lumped all comments under the 'comment'
marker, but it changed the queries to use 'line_comment' and
'block_comment', which broke one of the tests for locating hovers.
- We need to extract the identifier out of an as_pattern,
because the way the grammar is written, the 'as' in the
exception clause is no longer treated as a separate anonymous
keyword but as an 'as' expression.
This PR consolidates two methods that were effectively doing the same thing: `ReadFile` and `NewFileReader` - where `ReadFile` used the latter under the hood, and wrapped an `io.ReadAll`.
Given entire files are returned from this function that can be of arbitrary size, exposing a simple high-memory-use footgun made me uneasy here, and having to use io.ReadAll explicitly as the consumer hopefully makes the person using this API feel the same way.
Sometimes, we can currently not avoid it (GQL layer), but where possible, using the reader is very likely the better option.
Since there would've been a slight change in how errors are reported, I also fixed that. Previously, `NewFileReader` would NOT return `os.ErrNotExist` errors, those would only be discovered when the returned reader is called.
Now, like `ReadFile`, and like `os.Open` we immediately return that error.
This comes at the cost of making a second gRPC request. This is not great, but we will migrate the FileReader API to a proper gRPC hopefully ASAP, which will let us optimize it better on the server-side.
Since Varun also asked for perf improvements here, this is probably one of the first things that would be easy to switch over to go-git, eliminating probably 50%+ of the overhead here.
In the upcoming release, we will only support gRPC going forward. This PR removes the old HTTP client and server implementations and a few leftovers from the transition.
* First pass at moving moving logging linting into Bazel
* fixed negation operators
* Update dev/linters/logging/logging.go
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* added more exceptions and refactored one or two impls
* added nogo lint pragmas to offending files
* ran configure
* reverted git-combine refactor
* ran configure
* reverted test as well
---------
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
To determine the icon to show for a file, we need to know
the language. The code previously relied on the getFileInfo
API which did this using file extensions, leading to
incorrect results for header files with the '.h' extension,
which could have either C, C++ or Objective-C code.
Instead, we now get the language from the server,
where the go-enry library can optionally use the file
contents to return a more appropriate result.
Consequently, we can remove the getFileInfo API altogether.
Updates both the React and Svelte versions of the code.
Other changes in this patch:
- I've added icons for a bunch of languages which were missing.
- I've centralized the icon lists (and the bulk of the related logic) for
both React and Svelte in one place.