search: create and document git-stats script (#32663)

This is a script we have shared directly with customers before to
understand the scale of the monorepos. This now stores it in our
repository and documents it under our monorepo documentation.

Test Plan: ran git-stats on the sourcegraph repo. Note the links in the
documentation will only work once this PR has landed.
This commit is contained in:
Keegan Carruthers-Smith 2022-03-16 13:41:17 +02:00 committed by GitHub
parent 811d62f5f3
commit 2b9cd22718
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 72 additions and 0 deletions

45
dev/git-stats Executable file
View File

@ -0,0 +1,45 @@
#!/usr/bin/env bash
# This script outputs statistics for the current git repository. This script
# is used by the search-core team to understand the size and shape of a
# repository. In particular we use this when understanding the scale of a
# monorepo to help us guide our work.
set -e
# Do everything from the gitdir. Makes ls-tree below also not work on a
# subtree.
cd "$(git rev-parse --git-dir)"
# The size of the git store (not the working copy).
echo "$(du -sh .)" gitdir
# The number of commits reachable from HEAD
echo "$(git rev-list --count HEAD)" commits
# Some awk which extracts statistics on the files in the latest commit.
echo
echo HEAD statistics
git ls-tree -r --long HEAD | awk '
BEGIN {
base = 10
logbase = log(base)
}
$4 != "-" {
if ($4 == 0) {
hist[0]++
} else {
hist[int(log($4) / logbase) + 1]++
}
total += $4
count++
}
END {
printf("%.3fGiB\n%d files\n", total / 1024 / 1024 / 1024, count)
printf("histogram:\n")
for (x in hist) {
printf("%d^%d\t%d\n", base, x, hist[x])
}
}
'

View File

@ -24,3 +24,30 @@ Sourcegraph's code search index scales horizontally with the number of files bei
Sourcegraph clones code from your code host via the usual `git clone` or `git fetch` commands. Some organisations use custom `git` binaries or commands to speed up these operations. Sourcegraph supports using alternative git binaries to allow cloning. This can be done by inheriting from the `gitserver` docker image and installing the custom `git` onto the `$PATH`.
Some monorepos use a custom command for `git fetch` to speed up fetch. Sourcegraph provides the `experimentalFeatures.customGitFetch` site setting to specify the custom command.
## Statistics
You can help the Sourcegraph developers understand the scale of your monorepo by sharing some statistics with the team. The bash script [`git-stats`](https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats) when run in your git repository will calculate these statistics.
Example output on the Sourcegraph repository:
``` shellsession
$ wget https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats
$ chmod +x git-stats
$ ./git-stats
725M . gitdir
19671 commits
HEAD statistics
0.096GiB
8638 files
histogram:
10^0 6
10^1 69
10^2 667
10^3 2236
10^4 4589
10^5 971
10^6 86
10^7 14
```