ALTER TABLE with a foreign key constraint will _always_ add the
constraint, which means we always require a table lock even if this
migration has run. So we check if the column exists first.
The motivation for this change was me noticing slow migrations against
S2, even though it had been migrated already. This speeds it up since it
avoids a bunch of locks.
In testing I kept running into issues with the lsif_* tables. We agreed
we will only migrate them later.
Test Plan: ran against my local instance and S2 multiple times and CI.
Also manually ran the down and up migration against dotcom
The improvement to the triggers I tried to ship yesterday failed on
dotcom. This is because it has the same issue as the tenant_id column
where we can't acquire the ShareRowExclusiveLock for multiple important
tables in a transaction without deadlocking. We use the same technique
where we commit and restart the transaction. This is safe to do since
even if it fails, the migration is idempotent and partial migration
still works.
Additionally after reading the documentation on triggers more deeply
[1], I noticed we can add in a WHEN clause to triggers. We already have
this in the functions, but this should be more efficient.
I didn't want to remove the condition from the functions just yet, since
that makes the migration more risky under partial failure.
## Test Plan
CI. Also added BEGIN and COMMIT to the migration than ran locally with
``` shell
psql -f migrations/frontend/1723557836_faster_soft_delete_trigger_for_permissions/up.sql
```
Then inspected the DB state for repo and had a good looking trigger come
back
```
\d+ repo
...snip
Triggers:
trig_delete_user_repo_permissions_on_repo_soft_delete AFTER UPDATE
OF deleted_at ON repo FOR EACH ROW WHEN (new.deleted_at IS NOT NULL
AND old.deleted_at IS NULL) EXECUTE FUNCTION
delete_user_repo_permissions_on_repo_soft_delete()
```
Additionally before landing I am going to manually run this migration on
S2 first, then on dotcom to see how it behaves when there is lots of
traffic and scale.
We believe the migration to add tenant_id is failing on dotcom is due to
this trigger which runs for any possible change to a row. Instead it
only needs to run if the deleted_at column changes. So this migration
just adjusts each trigger to only run if deleted_at is changed.
We don't need to change the trigger function, since the trigger
functions only read deleted at.
If this doesn't resolve our issue, we will need to look into a statement
level trigger for all our triggers.
Test Plan: CI and then will test safely on larger instances.
We dived into our go postgres driver and when executing a migration it
is executed as a "simple query". Postgres in this case automatically
wraps the collection of statements in a transaction, unless it contains
transaction statements. So our last attempt at removing the transaction
failed.
In this attempt we use COMMIT AND CHAIN after each table alter. What
this does is commit the current transaction and then starts it up again.
From the perspective of the go driver, it is as if there was only one
transaction. We then switch the migration to using a transaction to
ensure the go drivers clean up the postgres connection in case of
failure.
IE if a query manually starts a transaction and does not clean up, the
connection will be marked as broken for the next person who gets the
connection from the pool. By wrapping in go's transaction code the
connection will be properly cleaned up.
Test Plan: All continuous environments have already succeeded or failed
on this migration number. So we will manually run this again against
them with the migrator code to ensure the same code paths. If they
succeed we will keep code as is, otherwise we will rollback.
Additionally we did lots of adhoc testing to understand the
characteristics of go and transaction handling.
Co-authored-by: Erik Seliger <erikseliger@me.com>
We hit a deadlock when deploying this migration to s2. This is because
within our transaction of the migration we likely didn't obtain table
locks in the same order as a transaction in our application code.
So this commit introduces a new migration metadata field
"noTransaction". The documentation for migrator says you should create a
migration per needed transactions. However, this would require us to
create 100s of migrations. We believe the better approach is introducing
this field and barely advertising it.
When reading the code which actually runs migrations, there is no extra
logic done outside of BEGIN; run_migration; COMMIT; so this change is
safe.
We update the migrations to avoid duplicating the function name we
introduce in case something goes wrong (now that the transaction could
leak out the function name).
Test Plan: The actual migrations are tested by go test. I added a test
assertion that we don't call Transact, but to be honest that is super
sketchy. However, we couldn't actually find any test fixtures which
actually run against the DB. So that would require a much deeper
investment for how simple the code change is.
Co-authored-by: Erik Seliger <erikseliger@me.com>
This reverts commit 43e06cef0c.
This migration deadlocked on S2, we didn't expect that we have processes
taking table locks but apparently so.. Reverting for now and will
manually fix up S2 once the build went through. We probably need to make
this one migration per table :rip:
## Test plan
Revert.
We introduce the tenant_id column to every table without any magic
enforcement or magic default value. Instead we default to NULL and plan
to have an out of band migration. From out testing this change is
instant since it is only a table metadata change. According to the
documentation this sort of ALTER TABLE is instant since Postgres 11.
Test Plan: ran this migration against a clone of the s2 database.
---------
Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com>
In some deployments, these are running in separate instances. This
migration makes sure that we can still reference the tenants from the
tenant_id column reliably everywhere.
It's critical that this migration is a noop on instances that already
have this table, so please double check me here during code review.
Test plan: Integration and E2E tests still pass, as this should be a
noop.
This PR adds a basic table for storing tenants in the database, and automatically inserts a tenant 1 for existing instances.
Test plan: Adds a new, not yet used table. The migration is very simple, so shouldn't cause any issues to existing instances.
Fixes SPLF-119
This adds a background job to Search Jobs that periodically scans for
finished jobs to aggregate the status, upload logs, and clean up the
tables. This drastically reduces the size of the tables and improves the
performance of the Search Jobs GQL API.
For example, with this change, a finished search job on .com only has 1
entry in the database, whereas before it could have several millions if
we searched each repository.
Notes:
- the diff seems larger than it actually is. I left a couple of comments
to help the reviewers.
## Test plan:
- new unit tests
- manual testing:
I ran a couple of search jobs locally (with the janitor job interval set
to 1 min) and checked that
- logs are uploaded to `blobstore-go/buckets/search-jobs`
- repo jobs are deleted from `exhaustive_repo_jobs`
- logs are served from the blobstore after the janitor ran
- downloading logs while the job is running still works
## Changelog
The new background job drastically reduces the size of the
`exhaustive_*` tables and improves performance of the Search Jobs GQL
API.
Closes SRCH-741
Closes SRCH-716
This PR removes the GitHub installation code from the redirect flow to a
webhook-based appraoch. We expect that the GitHub server calls the
webhook when the installation is ready, and therefore shouldn't see the
errors explained in the issues above.
To handle the potential delay until the webhook is called and the
credential is set up, I added a scrappy info notice that the user should
refresh their page:
<img width="928" alt="Screenshot 2024-07-24 at 13 48 24"
src="https://github.com/user-attachments/assets/4d298f2a-d7b8-423b-9e2f-2ae53fbce1ac">
Below is what you see after you refreshed (or if the webhook was called
faster than the user being redirected back to the settings):
<img width="929" alt="Screenshot 2024-07-24 at 13 50 14"
src="https://github.com/user-attachments/assets/b6826158-8561-476d-b20e-e36f8cfb86fd">
I'm able to create PRs for sourcegraph-testing with an app that was
created this way.
<img width="1171" alt="Screenshot 2024-07-24 at 16 16 06"
src="https://github.com/user-attachments/assets/86e20acb-136f-4a46-a33b-bdfdd0d51d71">
I'm seeing an error when getting an access token with a personal github
app to run a batch change, but that will be handled with another PR.
<img width="1053" alt="Screenshot 2024-07-24 at 16 38 38"
src="https://github.com/user-attachments/assets/5655ba91-1ae4-453a-8d5c-1bcdbe34bc17">
## Test plan
Manual testing locally, and more testing to be done on S2 where we have
a more production like environment
## Changelog
- When installing a GitHub app for batch changes, the instance now waits
for a callback from GitHub to complete the installation to avoid issues
from eventual consistency.
---------
Co-authored-by: Peter Guy <peter.guy@sourcegraph.com>
This adds support to searching for repo metadata with a regex pattern.
Background: repo metadata is a useful feature for shoehorning
business-specific information into the search query language. It allows
tagging repos with arbitrary metadata (think ownership info, quality
info, 3rd-party system IDs, etc.). This ends up being a useful escape
hatch to shim in functionality that is not natively supported in
Sourcegraph.
However it's currently limited to searching with an exact key/value
pair. We've had a few requests to extend this to allow searching by
pattern because it enables ingesting semi-structured data and making it
searchable.
This adds the ability to use a `/.../`-delimited regex pattern to match
against both keys and values. For example,
`repo:has.meta(team:/^my\/org/)`
The Prompt Library lets you create, share, and browse chat prompts for
use with Cody. Prompts are owned by users or organizations, and site
admins can make prompts public so that all users on the instance can see
and use them.
A prompt is just plain text for now, and you can see a list of prompts
in your Prompt Library from within Cody chat
(https://github.com/sourcegraph/cody/pull/4903).
See https://www.loom.com/share/f3124269300c481ebfcbd0a1e300be1b.
Depends on https://github.com/sourcegraph/cody/pull/4903.

## Test plan
Add a prompt on the web. Ensure you can access it from Cody.
## Changelog
- The Prompt Library lets you create, share, and browse chat prompts for
use with Cody. Prompts are owned by users or organizations, and site
admins can make prompts public so that all users on the instance can see
and use them. To use a prompt from your Prompt Library in Cody, select
it in the **Prompts** dropdown in the Cody chat message field.
**Public saved searches will let us make global saved searches for
dotcom and for customers to help them discover and share awesome search
queries!**
Saved searches now have:
- Visibility (public vs. secret). Only site admins may make a saved
search public. Secret saved searches are visible only to their owners
(either a user, or all members of the owning org). A public saved search
can be viewed by everyone on the instance.
- Draft status: If a saved search's "draft" checkbox is checked, that
means that other people shouldn't use that saved search yet. You're
still working on it.
- Timestamps: The last user to update a saved search and the creator of
the saved search are now recorded.
Also adds a lot more tests for saved search UI and backend code.



## Test plan
Create a saved search. Ensure it's in secret visibility to begin with.
As a site admin, make it public. Ensure other users can view it, and no
edit buttons are shown. Try changing visibility back and forth.
## Changelog
- Saved searches can now be made public (by site admins), which means
all users can view them. This is a great way to share useful search
queries with all users of a Sourcegraph instance.
- Saved searches can be marked as a "draft", which is a gentle indicator
that other people shouldn't use it yet.
This PR modifies the sub_repo_permissions database to store the ip
addresses associated with each Perforce path rule, as part of the IP
based sub repo permissions work.
The new IP column is implemetned as a []text similar to the path column.
The IP address associated with `paths[0]` is stored in the ips column in
`ips[0]`.
For example, the follownig proections table
```
Protections:
read user emily * //depot/elm_proj/...
write group devgrp * //...
write user * 192.168.41.0/24 -//...
write user * [2001:db8:1:2::]/64 -//...
write user joe * -//...
write user lisag * -//depot/...
write user lisag * //depot/doc/...
super user edk * //...
```
turns into the following rows in the sub_repo_permissions table
| repo_id | user_id | version | updated_at | paths | ips |
|---------|---------|---------|------------|-------|-----|
| 1 | 1 | 1 | 2023-07-01 10:00:00 | {`"//depot/elm_proj/..."`} | {`"*"`}
|
| 1 | 2 | 1 | 2023-07-01 10:00:00 | {`"//..."`} | {`"*"`} |
| 1 | 3 | 1 | 2023-07-01 10:00:00 | {`"-//..."`} | {`"192.168.41.0/24"`}
|
| 1 | 4 | 1 | 2023-07-01 10:00:00 | {`"-//..."`} |
{`"2001:db8:1:2::]/64"`} |
| 1 | 5 | 1 | 2023-07-01 10:00:00 | {`"-//..."`} | {`"*"`} |
| 1 | 6 | 1 | 2023-07-01 10:00:00 | {`"-//depot/..."`,
`"//depot/doc/..."`} | {`"*"`, `"*"`} |
| 1 | 7 | 1 | 2023-07-01 10:00:00 | {`"//..."`} | {`"*"`} |
## Test plan
The unit test for the sub_repository_permissions store PR that is built
on this PR:
https://app.graphite.dev/github/pr/sourcegraph/sourcegraph/63811/internal-database-sub_repo_permissions-modify-store-to-be-able-to-insert-ip-based-permissions
## Changelog
- The sub_repo_permissions table now has an ips column to store the
associated IP address associated with each path rule.
- Remove long-deprecated and long-ineffective notifications for saved
searches (removed in
de8ae5ee28
2.5 years ago). Note that code monitors were the replacement for saved
searches and work great.
- Clean up UI.
- Make the UI global instead of in the user/org area.
- Convert React class components to function components.
- Add default `patterntype:` because it's required.
- Use `useQuery` and `useMutation` instead of `requestGraphQL`.
- Use a single namespace `owner` GraphQL arg instead of separating out
`userID` and `orgID`.
- Clean up GraphQL resolver code and factor out common auth checking.
- Support transferring ownership of saved searches among owners (the
user's own user account and the orgs they're a member of).
(I know this is not in Svelte.)
SECURITY: There is one substantive change. Site admins may now view any
user's and any org's saved searches. This is so that they can audit and
delete them if needed.




## Test plan
Try creating, updating, and deleting saved searches, and transferring
ownership of them.
## Changelog
- Improved the saved searches feature, which lets you save search
queries to easily reuse them later and share them with other people in
an organization.
- Added the ability to transfer ownership of a saved search to a user's
organizations or from an organization to a user's account.
- Removed a long-deprecated and ineffective settings
`search.savedQueries` field. You can manage saved searches in a user's
or organization's profile area (e.g., at `/user/searches`).
Relates to https://github.com/sourcegraph/sourcegraph/pull/63472
This changes the default patternType of Notebooks from "standard" to
"keyword". As a consequence, all new notebooks will default to keyword
search. Existing notebooks will keep using standard search.
Test plan:
- I ran the migration locally and verified that new notebooks use
"keyword" as default.
This PR refactors insights to support the upcoming Keyword Search GA.
New insights are always persisted with `patternType:`.
Queries of existing insights are updated with `patternType:standard
<query>` if they don't already specify a pattern type. This reflects the
current default of the Stream API and thus, merely makes the pattern
type explicit.
Notes:
- The Insights UI has a lot of query input fields, but there are only 2
fields which allow the user to freely specify a pattern type: the
interactive insight on the landing page and the query input field in the
"Track Changes" creation form.
Test plan:
- new unit test
- existing tests pass without changes
- manual testing:
- I created a couple of insights and checked that the pattern type is
persisted in the db.
- I experimented with different default pattern types.
- I tested the workflow to create an insight from the search results
page.
- I created a "language" insights and ran the migration to make sure we
don't set pattern type in those cases.
This PR refactors notebooks to support the upcoming Keyword Search GA. The main goal is to make it easier to switch to a new default pattern type without breaking existing notebooks.
**Before**
- pattern type and version were hardcoded in several places
**After**
- Each notebook has a read-only pattern type as determined by the new column `notebooks.pattern_type` (defaults to "standard").
**Notes**
- Notebooks call the Stream API via various helper functions. `patternType` and `version` are both required parameters, which is redundant, because version acts as a default pattern type already. I left a TODO in the code for that. I don't want to change this as part of this PR because the change would get very big and affect too much code outside of Notebooks.
- We support rendering notebooks with `.snb.md` extension. Unlike notebooks stored in our db, we cannot migrate those files.
**Q&A**
Q: How does this help for Keyword Search GA?
A: Once we default to keyword search, we can change the default of `notebooks.pattern_type` from "standard" to "keyword". Existing notebooks will still work with "standard". New Notebooks will use "keyword".
Q: How can customers migrate existing notebooks to a new version?
A: Use the existing "Copy to My Notebooks" function of Notebooks. The copied notebook will have the current default pattern type.
Test plan:
- existing tests pass
- manual testing
- I created a couple of notebooks with all the different block types and verified via the network tab that all requests to the Stream API have the proper pattern type. I played around with different values in `notebooks.pattern_type` to make sure that the request parameters change.
When the user external account entry for SCIM has been deleted in the
past, updating the user again fails as the deleted record is found and
then the insertion fails.
This PR fixes that by adjusting the unique index to only consider
non-soft-deleted external accounts.
Closes SRC-383
Test plan:
Added a test suite for the DB method - it was untested before :(
The search jobs tables are missing an index on the state column that
most worker tables have - the dequeue query on dotcom takes 12.07s at
the moment.
Test plan:
`sg migration up --db frontend; sg migration undo`
Previously we failed a code monitor search if any of the revisions could not be resolved. For example, this happens if an empty repo is added to Sourcegraph. Now we log the error and continue with the job. The log message is shown in the Code Monitors log view.
This fixes two small pain points with automated treatment of feature flags:
1) A user cannot add an override unless the feature flag has already been created and a default has been set. However, in all our code, we require the caller to specify a default if the feature flag doesn't exist, so this is not a necessary limitation. It's also particularly annoying because only site admins can create a feature flag.
2) In order to update an override, a user has to first fetch the override to see if it exists, then update it if it exists or create it if it doesn't. This modifies create to just overwrite if the override already exists.
These fields were used in the cloud v1 implementation, but are no longer used. All the records have been removed from Sourcegraph.com. As a precaution, I've added ANOTHER round of deletes here.
Closes#43115
We have a number of docs links in the product that point to the old doc site.
Method:
- Search the repo for `docs.sourcegraph.com`
- Exclude the `doc/` dir, all test fixtures, and `CHANGELOG.md`
- For each, replace `docs.sourcegraph.com` with `sourcegraph.com/docs`
- Navigate to the resulting URL ensuring it's not a dead link, updating the URL if necessary
Many of the URLs updated are just comments, but since I'm doing a manual audit of each URL anyways, I felt it was worth it to update these while I was at it.
Adds a basic schema for syntactic code intel indexes (up and down migration)
Sets up everything required to use dbworker interface
Adds a stub indexing worker implementation
Adds tests to ensure conformance with dbworker interface (run against real postgres)
* Add telemetry events for file preview and filters panels
* Generate allow list event migrations
* Fix prettier and eslint problems
* Fix eslint and rebase fixes
* Fix failing search jobs site_test.go
* Fix telemetry service missing prop problem
Implement expiry for access tokens, by default all newly created access tokens will expire. There is site configuration available at auth.accessToken to enable tokens without expiration and to configure the values in days are presented to users and selected by default.
Co-authored-by: Erik Seliger <erikseliger@me.com>
As part of the embeddings policy framework, a worker periodically checks what
repos can be embedded. For every candidate repo, it queries the DB to see if
there's a new revision to embed. This runs every minute and becomes
increasingly expensive as the jobs table fills up with more entries over time.
This change makes small optimizations to improve this:
* Add an index to make selecting on `repo_id` and `revision` much faster
* Check the repos every 5 minutes instead of 1 minute. This shouldn't make a
huge difference in user experience, since by default embeddings jobs aren't
allowed to be scheduled within 24h of the last run
This removes contention around the single table. If you have a small
number of shards and lots of repositories that are cloned/recloned, they
all try to write to the same table concurrently.
What this does here is it changes the `gitserver_repos_statistics` table
to work like `repo_statistics`.
Co-authored-by: Erik Seliger <erikseliger@me.com>
Fix the slow "list indexers" GraphQL endpoint, which is especially
problematic on the upload/index list view when not scoped to a
particular repository.
Previously, querying repos by topic was pretty messy, especially after we added support for gitlab topics. We had to dig down into the metadata, which not particularly nicely structured JSON for github, and we had to do this both for the index and for queries.
This adds a generated column topics which isolates the complexity of extracting the topics from the metadata while maintaining normalization. It replaces the codehost-specific indexes with a single GIN index on that column so that querying for repos with a specific topic is both fast and easy.
Gazelle wrongly assumes that all files embededed with
embedsrcs are present on disk, while the go_library rule
totally support using targets that generates those inputs.
This has been confusing for teammates. Alas, the only option
we have is to add a gazelle:ignore pragma, so we don't get the
confusing warning.
I added a comment in that BUILD file to reflect that and inform
teammates that if they are to add a new Go file there, they have
to update the rule manually.