We dived into our go postgres driver and when executing a migration it
is executed as a "simple query". Postgres in this case automatically
wraps the collection of statements in a transaction, unless it contains
transaction statements. So our last attempt at removing the transaction
failed.
In this attempt we use COMMIT AND CHAIN after each table alter. What
this does is commit the current transaction and then starts it up again.
From the perspective of the go driver, it is as if there was only one
transaction. We then switch the migration to using a transaction to
ensure the go drivers clean up the postgres connection in case of
failure.
IE if a query manually starts a transaction and does not clean up, the
connection will be marked as broken for the next person who gets the
connection from the pool. By wrapping in go's transaction code the
connection will be properly cleaned up.
Test Plan: All continuous environments have already succeeded or failed
on this migration number. So we will manually run this again against
them with the migrator code to ensure the same code paths. If they
succeed we will keep code as is, otherwise we will rollback.
Additionally we did lots of adhoc testing to understand the
characteristics of go and transaction handling.
Co-authored-by: Erik Seliger <erikseliger@me.com>
* add patch to turn of race in backcompat
* git checkout with no-overlay
* fix patch
* move go_defs patch to own dir for 5.1.0
* add no-overlay to more git checkouts
* add 5.0.0 patches
* rm 5.0.0 patches since bazel was incomplete then
Main motivation is to see the effect on performance for attribution
search.
Note that this included a bump in the otel version used since zoekt is
using a new version. This had a bit of a cascading effect on our third
party deps since they removed a package. So this bumps most 3rd party
packages that directly interacted with otel.
The changed commits are
7643f3b313...45f608ff95
- a176bde1a3 go get -u -t ./...
- e2e8aede00 Fix template documentation comments
- 25c1ea5177 all: observe missing Stats RegexpsConsidered and FlushReason
- 9abbb8b0d3 zoekt-indexserver: Prevent invalid config from causing an NPE
- 008a775ba8 zoekt-indexserver: use value format directive for bad conf warning
- f9d3a0e2e4 zoekt: add fgprof for full profiling
- 3d0bdd5c9c remove ngram offset code
- 45f608ff95 sort ngrams before looking them up
Test Plan: tested in the zoekt repo. Our CI will handle the dep updates.
I eyeballed them and they all look low risk.
Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
Because we at least need one file to be present or the embed won't work,
this replaces CHANGELOG.md by CONTRIBUTING.md as a placeholder. This
will prevent PRs updating the changelog from invalidating the cache for
nothing for these slow targets.
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
CI
This PR moves the target release for back compat tests to 5.1 and fixes
the various issues that arose.
- We don't need to handle otel breaking updates, as 5.1 and HEAD are on
the same version
- We now need to explicitly remove `/client/*` as we're properly
targeting the `v5.1.0` tag and it doesn't contain my crude patches
anymore
- Bumped pinned back compat scip deps, as it frequently breaks due to
having a lot of API changes across time
- Added a new patch, to handle the consequences of having removed the
client tree: previous release didn't have the refactored ui/assets
explicit oss/enterprise targets and thus didn't explicitly depend on the
client. Now we do, so it needed some patching to make the build working.
- The buildkite command itself now substracts a few steps that are not
playing very well with the backcompat approach, I'm not sure to
understand why, the the `//cmd/migrator:schema_descriptions` `genrule`
doesn't like very much being built in a different place. I suspect
that's because the tree is one folder below, but it's harder to patch
this than it is to simply remove it from the test targets entirely.
- And finally, the buildkite command also filters things to only run go
tests.
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
CI + local fun
Clones and patches Sourcegraph git repository under
@sourcegraph_back_compat so we can run tests targets from that
particular release against the new database schema.
Flakes can be defined to skip known problematic tests which are either
flaky or simply cannot run against the new schema in case of a breaking
change. See //dev/backcompat:flakes.bzl for more details about how to
define them.
The final result is the definition of a @sourcegraph_back_compat target,
whose test targets are exactly the same as back then, but with instead a
new schema.
Example: `bazel test
@sourcegraph_back_compat//enterprise/internal/batches/...`.
See
https://github.com/sourcegraph/sourcegraph/pull/50932/files#diff-2f07315ec320aa4080768fec54f32ebb2cbf4e3e6df7c51a314beda827c48c41R104
for the command generating the mandatory patch file in CI for these
tests to run.
If the patch file were to be missing, a placeholder diff is in place to
make it explicit (in the eventuality of someone running those locally).
A new CI job has been added because this one is fully independent from
the other targets to build, and will be cached most of the time.
Depending how it goes, I might bring that one over the main bazel job to
avoid getting an agent just for that.
Because it's cached, we can run this within PRs and have it take 10s max
for 99% of the PRs and main builds.
TODO: The flakefiles are still there, they're just defined in different
place. I need to port the annotation as well, that has proven to be
really useful.
Also right now it's a bit manual as the 5.0.0 target is a bit peculiar,
but that'll get much simpler as we progress, as the next old release
will be requiring less and less patching.
This turns about a dozen of 10m individual jobs into a single one
ranging from 30s if there's not database changes to 10m if there's some
of them and cold cache.
## Test plan
<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->
QA: fails on old code when a new migration breaks something:
-
https://buildkite.com/sourcegraph/sourcegraph/builds/214283#01879f2d-67be-4caf-b5d5-93f045e19348/118-126