This PR implements support for draining a zoekt replica via including its hostname in the comma-separated environment variable INDEXED_SEARCH_DRAIN_SERVERS on sourcegraph-frontend.
The way this functionality is implemented is via adjusting the endpoint map we use when making assignment of repos. We still report the hostname as part of the list of endpoints. However, the endpoint is left out of the consistent hash which maps the repositories to endpoints.
Our interactions with zoekt are already designed to do smooth rebalancing when the set of endpoints changes. We have logic to only remove repos from a replica once its new endpoint has it, and we support deduplication of search results across endpoints.
Co-authored-by: Stefan Hengl <stefan@sourcegraph.com>
* log: remove use of description paramter in Scoped
* temporarily point to sglog branch
* bazel configure + gazelle
* remove additional use of description param
* use latest versions of zoekt,log,mountinfo
* go.mod
Originally I started working on this because of
[comment on another
PR](https://github.com/sourcegraph/sourcegraph/pull/53373#discussion_r1228058455).
I quickly wrote an implementation of Ptr and NonZeroPtr. Then I started
refactoring existing code only to find out that @efritz already beat me
to it. But Erics implementation was hidden in
internal/codeintel/resolvers/utils.go
Moved all of the Ptr functions to `pointers` package instead and
refactored all existing code that I could find to use it instead of
redefining the same functions all the time.
Usage is mostly in tests, so hopefully the impact is not as huge as the
diff size might suggest.
## Test plan
A whole lot of unit tests.
The previous approach to enable race detection was too radical and
accidently led to build our binaries with the race flage enabled, which
caused issues when building images down the line.
This happened because putting a `test --something` in bazelrc also sets
it on `build` which is absolutely not what we wanted. Usually folks get
this one working by having a `--stamp` config setting that fixes this
when releasing binaries, which we don't at this stage, as we're still
learning Bazel.
Luckily, this was caught swiftly. The current approach insteads takes a
more granular approach, which makes the `go_test` rule uses our own
variant, which injects the `race = "on"` attribute, but only on
`go_test`.
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
CI, being a main-dry-run, this will cover the container building jobs,
which were the ones failing.
---------
Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>
The only service discovery we support is via k8s. We want to ensure that
is working, so we have the invariant that if its empty we make all Map
lookups fail. However, this means that even if we explicitly wanted an
empty Map those would also fail (once communicated via config's
ServiceConnections).
This moves the enforcement of the invariant from the discovery loop to
just the k8s discovery code. Now our ServiceConnections based discovery
can return an empty list.
Additionally this allows us to simplify the condition around how we take
discovered endpoints/errors and update the map. This is because in all
functions on Map if err is non-nil we return the error.
This commit also allows an empty value for zoekt address. Explicitly setting
an empty value disables zoekt.
Test Plan: go test and manual testing with zoekt disabled.
* migrate to sg/log
* Update internal/endpoint/k8s.go
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* Update internal/endpoint/k8s.go
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* fix top and child scope
* fix frontend call to new endpont
* fix backticks
* fix symbols
* refactor
* last fixes after feedback
* white space
* fix var assignment
* refactor new
* Revert "endpoint: migrate to sg/log (#45210)"
This reverts commit ab82c6ab1e.
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* migrate to sg/log
* Update internal/endpoint/k8s.go
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* Update internal/endpoint/k8s.go
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
* fix top and child scope
* fix frontend call to new endpont
* fix backticks
* fix symbols
* refactor
* last fixes after feedback
* white space
* fix var assignment
Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
Additionally we remove a bunch of benchmarking code which existed in
validating this was the best hash for us to use. This will also remove
some log spam about which hash we are using.
Test Plan: CI
Now that we require go1.18, we can use a builtin type alias for
"interface{}": "any". I've updated all code except lib, which can be
imported by external modules. That is currently pinned at go1.16.
Additionally I had to rerun generate since rewriting generated go code
will fail CI.
find -name '*.go' | xargs gofmt -s -r 'interface{} -> any' -w
git checkout lib
go generate ./...
Test Plan: CI will exercise that the code still compiles. Otherwise this
is a noop.
If an error is returned by discovery we never recover from it. Currently
a customer is working around this by restarting our frontend. This can
happen frequently at smaller scales if a roll out temporary takes down
all endpoints in a service.
Test Plan: Added a unit test which exercised the bug. The test failed
until the fix.
We incorrectly assumed the serviceName was always the same as the
StatefulSet name. We now correctly construct the pod name as described
by the kubernetes documentation.
* gitserver: discover endpoints dynamically in k8s
This commit makes it so GitServers in ServiceConnections are discovered
dynamically in Kubernetes clusters. We re-use the service discovery code
from the endpoint package, without leveraging the consistent hashing
part yet, which is for the time being still handled in gitserver client
(i.e. AddrForRepo).
* fixup! address feedback
* endpoint: add rendezvous hashing support
This commit adds rendezvous hashing support to the endpoint package with
the aim to make it the default in a follow up PR. It's currently
protected by an environment variable which I plan to set on dogfood for
testing.
Additionally, it paves the way for increasing the replication factor by
adding a `GetN` method to map, which I used to replace the previous
`exclude` paramater to `Get` (used only by searcher). Actually raising
replication factor for Zoekt will come later.
Here is the output of the tests, which shows much better distribution,
and illustrates the data movement needed when changing hash function and
cluster sizes.
I think it's OK for us to shepherd this to prod next week and treat it like
and index version bump, and to write some release notes for customers to expect
some downtime in the next upgrade.
* fixup! Lint
* Revert "Revert "endpoint: improve k8s stateful set support (#23852)""
This reverts commit 7dc29e29db.
* endpoints: construct endpoint URL correctly
The last PR introduced a bug by not using u.endpointURL like before. So
we were missing port numbers and the service name in each endpoint.
Facepalm.
* Revert "Revert "endpoints: properly name the discovery function (#23883)""
This reverts commit acb5c1f728.
* endpoint: improve k8s stateful set support
This commit improves the k8s stateful set support by not using service
endpoints and instead generating a static list from the replica count.
It also refactors the package to isolate k8s things from the Map.
* fixup! Fix tests
* fixup! Lazy discovery
* fixup! Get rid of NotReadyAddresses
* endpoint: include not ready k8s endpoints
This commit adds support for including not ready kubernetes endpoints
discovered via the k8s api, which we need to avoid temporary reshuffling
of repo assignments to indexed search replicas.
* fixup! Fix test
In this commit, along with moving to
github.com/cockroachdb/errors, we update a few tests and some
error checks in non-test code.
List of files that have code changes apart from changes in the
error package import:
enterprise/internal/batches/sources/gitlab.go
enterprise/internal/batches/sources/gitlab_test.go
internal/conf/client.go
internal/store/store_test.go
Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>
We currently use gopkg.in/inconshreveable/log15.v2, which points to
github.com/inconshreveable/log15. However, recently goimports started inserting
github.com domain instead of the gopkg.in domain. This also seems to be the
preferred import path based on the documentation / import paths in the log15.
In production we are getting empty hostnames from the searcher endpoints. When
manually running `kubectl get endpoints -o json searcher` the `hostname` field
is not set. However, the k8s client library we are using may be marshalling
the field anyways for some reason. Either way this is causing searcher
unavailability in production.
Pods in a stateful set have DNS addresses of the form podname.servicename. We
can discover this by inspecting the endpoints from the headless service. They
will have the Hostname field set on the endpoint address object. Note: only if
podname.servicename is a valid address will hostname be set.
Additionally we support `rpc://` scheme. When using the rpc scheme, we return
address strings instead (ie hostname:port). This is what our rpc package
expects.
This allows us to dynamically discover the pods in the indexed-search
statefulset. This will allow kubernetes administrators to scale the
indexed-search statefulset without needing to update the INDEXED_SEARCH_SERVERS
configuration. Note: This commit does not update the default for that yet.
In future we could use this for gitservers as well (more testing required
though, for example we don't want to use this if a gitserver has to be marked as
ready).
Static is the same behaviour as calling endpoint.New with space separated
arguments, but instead the type is lifted from parsing a string into a go
slice.
GetMany will be used by our horizontal sharding to more efficiently map the
set of repositories to shards.
Both changes are useful for the changes in zoekt horizontal sharding.