sourcegraph/dev/schema_migrations.bzl

"""
Provide a custom repository_rule to fetch database migrations from previous versions from
a GCS bucket.

The "updated_at" attribute allows to manually invalidate the cache, because the rule itself
cannot know when to do so, as it will simply skip listing the bucket otherwise.
"""

def _schema_migrations(rctx):
    """
    This repository is used to download the schema migrations from GCS.

    We use the GCS JSON API directly instead of gsutil or gcloud because:
        - gsutil may spend up to a ~1m20s trying to contact metadata.google.internal
            without a discovered way to disable that
        - gcloud disallows unauthed access to an even public bucket
    """
    jq_path = rctx.path(Label("@jq//:jq"))

    rctx.file("BUILD.bazel", content = """
package(default_visibility = ["//visibility:public"])

exports_files(["archives"])

filegroup(
    name = "srcs",
    srcs = glob(["**"]),
)
""")

    rctx.report_progress("Listing GCS bucket contents")

    rctx.download("https://storage.googleapis.com/storage/v1/b/schemas-migrations/o?prefix=migrations/migrations-", "bucket_contents.json")

    result = rctx.execute([
        jq_path,
        ".items | map({name, mediaLink, generation})",
        "bucket_contents.json",
    ])
    if result.return_code != 0:
        fail("Failed to extract bucket data from GCS API: {}".format(result.stderr))

    rctx.delete("bucket_contents.json")

    output = json.decode(result.stdout)

    rctx.execute(["mkdir", "archives"])

    rctx.report_progress("Downloading schema migrations from GCS")

    download_tokens = []
    for file in output:
        download_tokens.append(rctx.download(
            file["mediaLink"],
            "archives/" + file["name"].split("/")[-1],
            canonical_id = file["generation"],
            block = False,
        ))

    for token in download_tokens:
        token.wait()

schema_migrations = repository_rule(
    implementation = _schema_migrations,
    attrs = {
        "updated_at": attr.string(mandatory = True),
    },
)
chore(rel): bump minor for stitch graph + add support invalidating migrations repo rule (#62511) chore(rel): bump minor for stitch graph + add support invalidating migrations repo rule (#62490) * chore(bzl): allow to invalidate migrations repo rule * chore(bzl): gen stitch graph for 5.4 2024-05-07 22:04:59 +00:00			`"""`
			`Provide a custom repository_rule to fetch database migrations from previous versions from`
			`a GCS bucket.`

			`The "updated_at" attribute allows to manually invalidate the cache, because the rule itself`
			`cannot know when to do so, as it will simply skip listing the bucket otherwise.`
			`"""`

bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00			`def _schema_migrations(rctx):`
bazel: rework schema migrations reporule without gsutil (#61295) We use the GCS JSON API directly instead of gsutil or gcloud because: - gsutil may spend up to a ~1m20s trying to contact metadata.google.internal without a discovered way to disable that - gcloud disallows unauthed access to an even public bucket TODO: in future iterations we can explore how to properly invalidate this. For now we can force a refresh with `bazel sync` ## Test plan `bazel run //internal/database/migration/shared:write_stitched_migration_graph` runs successfully and without a change to the file 2024-03-25 16:17:26 +00:00			`"""`
			`This repository is used to download the schema migrations from GCS.`

			`We use the GCS JSON API directly instead of gsutil or gcloud because:`
			`- gsutil may spend up to a ~1m20s trying to contact metadata.google.internal`
			`without a discovered way to disable that`
			`- gcloud disallows unauthed access to an even public bucket`
			`"""`
			`jq_path = rctx.path(Label("@jq//:jq"))`
bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00
			`rctx.file("BUILD.bazel", content = """`
			`package(default_visibility = ["//visibility:public"])`

			`exports_files(["archives"])`

			`filegroup(`
			`name = "srcs",`
			`srcs = glob(["**"]),`
			`)`
			`""")`

bazel: rework schema migrations reporule without gsutil (#61295) We use the GCS JSON API directly instead of gsutil or gcloud because: - gsutil may spend up to a ~1m20s trying to contact metadata.google.internal without a discovered way to disable that - gcloud disallows unauthed access to an even public bucket TODO: in future iterations we can explore how to properly invalidate this. For now we can force a refresh with `bazel sync` ## Test plan `bazel run //internal/database/migration/shared:write_stitched_migration_graph` runs successfully and without a change to the file 2024-03-25 16:17:26 +00:00			`rctx.report_progress("Listing GCS bucket contents")`

			`rctx.download("https://storage.googleapis.com/storage/v1/b/schemas-migrations/o?prefix=migrations/migrations-", "bucket_contents.json")`

bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00			`result = rctx.execute([`
bazel: rework schema migrations reporule without gsutil (#61295) We use the GCS JSON API directly instead of gsutil or gcloud because: - gsutil may spend up to a ~1m20s trying to contact metadata.google.internal without a discovered way to disable that - gcloud disallows unauthed access to an even public bucket TODO: in future iterations we can explore how to properly invalidate this. For now we can force a refresh with `bazel sync` ## Test plan `bazel run //internal/database/migration/shared:write_stitched_migration_graph` runs successfully and without a change to the file 2024-03-25 16:17:26 +00:00			`jq_path,`
			`".items \| map({name, mediaLink, generation})",`
			`"bucket_contents.json",`
			`])`
bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00			`if result.return_code != 0:`
bazel: rework schema migrations reporule without gsutil (#61295) We use the GCS JSON API directly instead of gsutil or gcloud because: - gsutil may spend up to a ~1m20s trying to contact metadata.google.internal without a discovered way to disable that - gcloud disallows unauthed access to an even public bucket TODO: in future iterations we can explore how to properly invalidate this. For now we can force a refresh with `bazel sync` ## Test plan `bazel run //internal/database/migration/shared:write_stitched_migration_graph` runs successfully and without a change to the file 2024-03-25 16:17:26 +00:00			`fail("Failed to extract bucket data from GCS API: {}".format(result.stderr))`

			`rctx.delete("bucket_contents.json")`

			`output = json.decode(result.stdout)`

			`rctx.execute(["mkdir", "archives"])`

			`rctx.report_progress("Downloading schema migrations from GCS")`

			`download_tokens = []`
			`for file in output:`
			`download_tokens.append(rctx.download(`
			`file["mediaLink"],`
			`"archives/" + file["name"].split("/")[-1],`
			`canonical_id = file["generation"],`
			`block = False,`
			`))`

			`for token in download_tokens:`
			`token.wait()`
bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00
			`schema_migrations = repository_rule(`
			`implementation = _schema_migrations,`
chore(rel): bump minor for stitch graph + add support invalidating migrations repo rule (#62511) chore(rel): bump minor for stitch graph + add support invalidating migrations repo rule (#62490) * chore(bzl): allow to invalidate migrations repo rule * chore(bzl): gen stitch graph for 5.4 2024-05-07 22:04:59 +00:00			`attrs = {`
			`"updated_at": attr.string(mandatory = True),`
			`},`
bazel: move schema migrations fetching from GCS to bazel repository (#59879) Does what it says on the tin Caveat: As this doesn't use the built-in downloaders, this probably cant make use of the repository cache. While it won't refetch it every single time (there is _some_ degree of caching), I'm not sure what will cause it to not use the cached one and refresh it. Its a very fast operation though. See https://github.com/bazelbuild/bazel/issues/19267 ## Test plan `bazel build //internal/database/migration/shared:generate_stitched_migration_graph` 2024-02-14 17:40:39 +00:00			`)`