sourcegraph/cmd/gitserver
Noah S-C cb7034680d
bump to Go 1.22.1 (#60902)
🚀 💎 🙌 🚙 

## Test plan

CI
2024-03-06 17:38:43 -07:00
..
internal bump to Go 1.22.1 (#60902) 2024-03-06 17:38:43 -07:00
shared gitserver: Small code cleanup (#60573) 2024-02-19 19:27:34 +01:00
BUILD.bazel bazel: use transitions to apply cross-compile platform automatically to oci_image (#60569) 2024-02-20 13:57:56 +00:00
image_test.yaml Bump p4-fusion to latest version in gitserver (#57754) 2023-10-20 18:14:42 +02:00
main.go remove more remannts of OSS build (#58253) 2023-11-10 07:59:03 +00:00
p4-fusion-wrapper-detect-kill.sh gitserver: Merge enterprise and non-enterprise cmd (#56214) 2023-08-25 13:25:07 +02:00
process-stats-watcher.sh detect OOM reaper action on p4-fusion (#51284) 2023-05-17 07:26:53 -07:00
README.md Docs: update links to point to new site (#60381) 2024-02-13 00:23:47 +00:00

gitserver

Mirrors repositories from their code host. All other Sourcegraph services talk to gitserver when they need data from git. Requests for fetch operations, however, go through repo-updater.

gitserver exposes an "exec" API over HTTP for running git commands against clones of repositories. gitserver also exposes APIs for the management of clones.

The management of clones comprises most of the complexity in gitserver since:

  • We want to avoid concurrent clones and fetches of the same repository.
  • We want to limit the number of concurrent clones and fetches.
  • When adding/removing/modifying a clone, concurrent attempts to run commands needs to be gracefully dealt with.
  • We need to be robust against the many ways git clones can degrade. (gc, interrupted clones)

Additionally we have invested heavily in the observability of gitserver. Nearly every operation Sourcegraph does runs one or more git commands. So we have detailed observability in prometheus, net/event, jaeger, honeycomb and stderr logs.

We normalize repository names when storing them on disk. Always use protocol.NormalizeRepo. The $GIT_DIR of a repository is at reposRoot/normalized_name/.git.

When doing an operation on a file or directory which may be concurrently read/written please use atomic filesystem patterns. This usually involves heavy use of os.Rename. Search for existing uses of os.Rename to see examples.

Scaling

gitserver's memory usage consists of short lived git subprocesses.

This is an IO and compute heavy service since most Sourcegraph requests will trigger 1 or more git commands. As such we shard requests for a repo to a specific replica. This allows us to horizontally scale out the service.

The service is stateful (maintaining git clones). However, it only contains data mirrored from upstream code hosts.

Perforce depots

Syncing of Perforce depots is accomplished by either p4-fusion or git p4 (deprecated), both of which clone Perforce depots into Git repositories in gitserver.

p4-fusion in development

To use p4-fusion while developing Sourcegraph, there are a couple of options.

Docker

Run gitserver in a Docker container. This is the option that gives an experience closest to a deployed Sourcegraph instance, and will work for any platform/OS on which you're developing (running sg start).

Bazel

Native binaries are provided through Bazel, built via Nix in our fork of p4-fusion. It can be invoked either through ./dev/p4-fusion-dev or directly with bazel run //dev/tools:p4-fusion.

Native binary executable

The p4-fusion native binary has been built on Linux and macOS, but is untested on Windows.

Read the comprehensive instructions.

If you do go the native binary route, you may also want to enable using the wrapper shell script that detects when the process has been killed and outputs an error so that the calling process can handle it.

That wrapper shell script is p4-fusion-wrapper-detect-kill.sh, and in order to use it:

  1. Rename the p4-fusion binary executable to p4-fusion-binary and move it to a location in the PATH.
  2. Copy the shell script p4-fusion-wrapper-detect-kill.sh to a location in the PATH, renaming it p4-fusion.
  3. Copy the shell script process-stats-watcher.sh to a location in the PATH.
  4. Ensure all three of those are executable.

After those steps, when a native gitserver process runs p4-fusion, it will run the wrapper shell script, which will itself run the p4-fusion-binary executable, and the process-stats-watcher.sh executable.