sourcegraph/migrations
Stefan Hengl cd38adb4a7
chore(search_jobs): add janitor job (#64186)
Fixes SPLF-119

This adds a background job to Search Jobs that periodically scans for
finished jobs to aggregate the status, upload logs, and clean up the
tables. This drastically reduces the size of the tables and improves the
performance of the Search Jobs GQL API.

For example, with this change, a finished search job on .com only has 1
entry in the database, whereas before it could have several millions if
we searched each repository.

Notes:
- the diff seems larger than it actually is. I left a couple of comments
to help the reviewers.

## Test plan:
- new unit tests
- manual testing:

I ran a couple of search jobs locally (with the janitor job interval set
to 1 min) and checked that
 - logs are uploaded to `blobstore-go/buckets/search-jobs`
 - repo jobs are deleted from `exhaustive_repo_jobs`
 - logs are served from the blobstore after the janitor ran
 - downloading logs while the job is running still works

## Changelog
The new background job drastically reduces the size of the
`exhaustive_*` tables and improves performance of the Search Jobs GQL
API.
2024-08-01 15:29:10 +02:00
..
codeinsights insights: persist patternType in db (#63579) 2024-07-03 09:51:48 +02:00
codeintel bzl: rework migration schemas generation (#57511) 2023-10-10 17:19:47 +02:00
frontend chore(search_jobs): add janitor job (#64186) 2024-08-01 15:29:10 +02:00
BUILD.bazel build: add buildifier check to Aspect Workflows (#58566) 2023-11-27 14:58:01 +02:00
embed.go chore: Simplify embed files (#30248) 2022-01-27 10:49:43 -06:00
README.md fix: update links for dev docs (#62758) 2024-05-17 13:47:34 +02:00

Postgres Migrations

The children of this directory contain migrations for each Postgres database instance:

  • frontend is the main database (things should go here unless there is a good reason)
  • codeintel is a database containing only processed LSIF data (which can become extremely large)
  • codeinsights is a database containing only Code Insights time series data

The migration path for each database instance is the same and is described below. Each of the database instances described here are deployed separately, but are designed to be overlayable to reduce friction during development. That is, we assume that the names in each database do not overlap so that the same connection parameters can be used for both database instances.

Migrating up and down

Up migrations will happen automatically in development on service startup. In production environments, they are run by the migrator instance. You can run migrations manually during development via sg:

  • sg migration up runs all migrations to the latest version
  • sg migration up -db=frontend -target=<version> runs up migrations (relative to the current database version) on the frontend database until it hits the target version
  • sg migration undo -db=codeintel runs one down migration (relative to the current database version) on the codeintel database

Adding a migration

IMPORTANT: All migrations must be backwards-compatible, meaning that existing code must be able to operate successfully against the new (post-migration) database schema. Consult Writing database migrations in our developer documentation for additional context.

To create a new migration file, run the following command.

$ sg migration add -db=<db_name> <my_migration_name>
Migration files created
 Up query file: ~/migrations/codeintel/1644260831/up.sql
 Down query file: ~/migrations/codeintel/1644260831/down.sql
 Metadata file: ~/migrations/codeintel/1644260831/metadata.yaml

This will create an up and down pair of migration files (whose path is printed by the following command). Add SQL statements to these files that will perform the desired migration. After adding SQL statements to those files, update the schema doc via go generate ./internal/database/ (or regenerate everything via sg generate).

To pass CI, you'll additionally need to:

  • Ensure that your new migrations run against the current Go unit tests
  • Ensure that your new migrations can be run up, then down, then up again (idempotency test)
  • Ensure that your new migrations do not break the Go unit tests published with the previous release (backwards-compatibility test)

Reverting a migration

If a reverted PR contains a DB migration, it may still have been applied to Sourcegraph.com, k8s.sgdev.org, etc. due to their rollout schedules. In some cases, it may also have been part of a Sourcegraph release. To fix this, you should create a PR to revert the migrations of that commit. The sg migration revert <commit> command automates all the necessary changes the migration definitions.