sourcegraph/cmd
Michael Bahr f61e637062
feat(code insights): language stats speed improvements by using archive loading (#62946)
We previously improved the performance of Language Stats Insights by
introducing parallel requests to gitserver:
https://github.com/sourcegraph/sourcegraph/pull/62011

This PR replaces the previous approach where we would iterate through
and request each file from gitserver with an approach where we request
just one archive. This eliminates a lot of network traffic, and gives us
an additional(!) performance improvement of 70-90%.

Even repositories like chromium (42GB) can now be processed (on my
machine in just one minute).

---

Caching: We dropped most of the caching, and kept only the top-level
caching (repo@commit). This means that we only need to compute the
language stats once per commit, and subsequent users/requests can see
the cached data. We dropped the file/directory level caching, because
(1) the code to do that got very complex and (2) we can assume that most
repositories are able to compute within the 5 minutes timeout (which can
be increase via the environment variable `GET_INVENTORY_TIMEOUT`). The
timeout is not bound to the user's request anymore. Before, the frontend
would request the stats up to three times to let the computation move
forward and pick up where the last request aborted. While we still have
this frontend retry mechanism, we don't have to worry about an
abort-and-continue mechanism in the backend.

---

Credits for the code to @eseliger:
https://github.com/sourcegraph/sourcegraph/issues/62019#issuecomment-2119278481

I've taken the diff, and updated the caching methods to allow for more
advanced use cases should we decide to introduce more caching. We can
take that out again if the current caching is sufficient.

Todos:

- [x] Check if CI passes, manual testing seems to be fine
- [x] Verify that insights are cached at the top level

---

Test data:

- sourcegraph/sourcegraph: 9.07s (main) -> 1.44s (current): 74% better
- facebook/react: 17.52s (main) -> 0.87s (current): 95% better
- godotengine/godot: 28.92s (main) -> 1.98s (current): 93% better
- chromium/chromium: ~1 minute: 100% better, because it didn't compute
before

## Changelog

- Language stats queries now request one archive from gitserver instead
of individual file requests. This leads to a huge performance
improvement. Even extra large repositories like chromium are now able to
compute within one minute. Previously they timed out.

## Test plan

- New unit tests
- Plenty of manual testing
2024-07-18 08:40:48 +02:00
..
appliance chore(appliance): Stub out react UI expected URIs and JSON API (#63741) 2024-07-15 14:48:38 -04:00
batcheshelper bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
blobstore bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
bundled-executor bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
cody-gateway Update flagging.go 2024-07-16 07:15:40 -07:00
cody-gateway-config Several fixes around merging modelconfig, and the current Cody Gateway data (#63814) 2024-07-15 17:14:28 +00:00
embeddings lib/background: upgrade Routine interface with context and errors (#62136) 2024-05-24 10:04:55 -04:00
enterprise-portal fix/enterpriseportal: drop old gorm fk constraints (#63864) 2024-07-17 14:16:02 -07:00
executor chore/deps: upgrade grpc, prometheus/common (#63328) 2024-06-19 09:55:44 -04:00
executor-kubernetes bazel: transcribe test ownership to bazel tags (#62664) 2024-05-16 15:51:16 +01:00
frontend feat(code insights): language stats speed improvements by using archive loading (#62946) 2024-07-18 08:40:48 +02:00
gitserver Update comment and decode bytes instead (#63754) 2024-07-11 09:40:51 +02:00
loadtest chore(bazel): update ownership tags to increase coverage (#63001) 2024-05-31 14:10:29 +00:00
migrator chore(ci): conditionally stamp genrules (#63204) 2024-06-12 15:04:43 +02:00
msp-example msp/runtime: split contract into JobContract and ServiceContract (#63494) 2024-06-26 19:46:10 +00:00
pings msp/runtime: split contract into JobContract and ServiceContract (#63494) 2024-06-26 19:46:10 +00:00
precise-code-intel-worker chore: Change errors.HasType to respect multi-errors (#63024) 2024-06-06 13:02:14 +00:00
repo-updater scheduler: Simplify query for uncloned repos (#63681) 2024-07-10 02:24:32 +02:00
searcher Structural search: fix precise lang filtering (#63791) 2024-07-15 09:20:21 +02:00
server feat/bazel: //cmd/{frontend,server} targets that don't include client bundle for backend integration tests (#62877) 2024-05-28 14:32:48 +01:00
sourcegraph support fast, simple sg start single-program-experimental-blame-sqs for local dev (#63435) 2024-06-24 21:12:47 +00:00
symbols symbols: Make symbols specific code internal (#63736) 2024-07-10 01:26:22 +02:00
syntactic-code-intel-worker Syntactic indexing produce scip files (#63580) 2024-07-09 13:49:55 +02:00
telemetry-gateway chore/telemetrygateway: gracefully handle sams introspectToken cancelation (#63809) 2024-07-15 10:45:00 -07:00
worker chore(worker): disable jobs based on ENVs (#63853) 2024-07-16 18:07:22 +02:00
README.md Reminder to keep architecture diagram in-sync (#36869) 2022-06-08 19:40:36 -07:00

This directory contains Sourcegraph services and binaries.

When a services is added, removed, or when a service's dependencies change, update our architecture diagram.