From 2b9cd2271876c4dffa481576287b1e45d43bb020 Mon Sep 17 00:00:00 2001 From: Keegan Carruthers-Smith Date: Wed, 16 Mar 2022 13:41:17 +0200 Subject: [PATCH] search: create and document git-stats script (#32663) This is a script we have shared directly with customers before to understand the scale of the monorepos. This now stores it in our repository and documents it under our monorepo documentation. Test Plan: ran git-stats on the sourcegraph repo. Note the links in the documentation will only work once this PR has landed. --- dev/git-stats | 45 +++++++++++++++++++++++++++++++++++++++++++ doc/admin/monorepo.md | 27 ++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) create mode 100755 dev/git-stats diff --git a/dev/git-stats b/dev/git-stats new file mode 100755 index 00000000000..a8b22dd758e --- /dev/null +++ b/dev/git-stats @@ -0,0 +1,45 @@ +#!/usr/bin/env bash + +# This script outputs statistics for the current git repository. This script +# is used by the search-core team to understand the size and shape of a +# repository. In particular we use this when understanding the scale of a +# monorepo to help us guide our work. + +set -e + +# Do everything from the gitdir. Makes ls-tree below also not work on a +# subtree. +cd "$(git rev-parse --git-dir)" + +# The size of the git store (not the working copy). +echo "$(du -sh .)" gitdir + +# The number of commits reachable from HEAD +echo "$(git rev-list --count HEAD)" commits + + +# Some awk which extracts statistics on the files in the latest commit. +echo +echo HEAD statistics +git ls-tree -r --long HEAD | awk ' +BEGIN { + base = 10 + logbase = log(base) +} +$4 != "-" { + if ($4 == 0) { + hist[0]++ + } else { + hist[int(log($4) / logbase) + 1]++ + } + total += $4 + count++ +} +END { + printf("%.3fGiB\n%d files\n", total / 1024 / 1024 / 1024, count) + printf("histogram:\n") + for (x in hist) { + printf("%d^%d\t%d\n", base, x, hist[x]) + } +} +' diff --git a/doc/admin/monorepo.md b/doc/admin/monorepo.md index ea38d86d00f..98889af3991 100644 --- a/doc/admin/monorepo.md +++ b/doc/admin/monorepo.md @@ -24,3 +24,30 @@ Sourcegraph's code search index scales horizontally with the number of files bei Sourcegraph clones code from your code host via the usual `git clone` or `git fetch` commands. Some organisations use custom `git` binaries or commands to speed up these operations. Sourcegraph supports using alternative git binaries to allow cloning. This can be done by inheriting from the `gitserver` docker image and installing the custom `git` onto the `$PATH`. Some monorepos use a custom command for `git fetch` to speed up fetch. Sourcegraph provides the `experimentalFeatures.customGitFetch` site setting to specify the custom command. + +## Statistics + +You can help the Sourcegraph developers understand the scale of your monorepo by sharing some statistics with the team. The bash script [`git-stats`](https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats) when run in your git repository will calculate these statistics. + +Example output on the Sourcegraph repository: + +``` shellsession +$ wget https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats +$ chmod +x git-stats +$ ./git-stats +725M . gitdir +19671 commits + +HEAD statistics +0.096GiB +8638 files +histogram: +10^0 6 +10^1 69 +10^2 667 +10^3 2236 +10^4 4589 +10^5 971 +10^6 86 +10^7 14 +```