sourcegraph

mirror of https://github.com/sourcegraph/sourcegraph.git synced 2026-02-06 18:11:48 +00:00

Author	SHA1	Message	Date
Robert Lin	5fa93155fc	telemetry-gateway: migrate to MSP runtime (#58814 ) This change migrates Telemetry Gateway to use the MSP runtime library for service initialization, which now handles Sentry, OpenTelemetry, etc and offers a simpler interface for defining services. Because we now only expose 1 port (i.e. no debugserver port), I've made the default in local dev `6080`, because my browser was complaining about `10080`. ## Test plan ```sh sg run telemetry-gateway curl http://localhost:6080/-/version # 0.0.0+dev% curl http://localhost:6080/-/healthz # unauthorized% curl -H 'Authorization: bearer sekret' http://localhost:6080/-/healthz # healthz: ok% ``` Also visit http://localhost:6080/debug/grpcui/ and http://localhost:6080/metrics, which are expected to be enabled in local dev. Then try with full Sourcegraph stack: ``` sg start ``` <img width="660" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/9e799c58-4d02-4752-9f9f-da3108ba762f">	2023-12-08 12:14:34 -08:00
Robert Lin	8700fae431	msp/runtime: add diagnostics handlers (#58762 ) Adds support to MSP runtime for health and version checking. Also splits up the `msp-example` service for better readability, and registers a Prometheus metrics exporter in local dev. Closes https://github.com/sourcegraph/sourcegraph/issues/58784 ## Test plan ``` ➜ curl localhost:9080/ Variable: 13% ➜ curl localhost:9080/-/healthz unauthorized% ➜ curl -H 'Authorization: bearer sekret' localhost:9080/-/healthz healthz: ok% ➜ curl localhost:9080/-/version dev% ``` ![image](https://github.com/sourcegraph/sourcegraph/assets/23356519/afdf6773-d110-4fba-9366-9bfda25e595b)	2023-12-06 10:54:09 -08:00
Robert Lin	e835a66c76	telemetrygateway: add exporter and service (#56699 ) This change adds: - telemetry export background jobs: flagged behind `TELEMETRY_GATEWAY_EXPORTER_EXPORT_ADDR`, default empty => disabled - telemetry redaction: configured in package `internal/telemetry/sensitivemetadataallowlist` - telemetry-gateway service receiving events and forwarding it to a pub/sub topic (or just logging it, as configured in local dev) - utilities for easily creating an event recorder: `internal/telemetry/telemetryrecorder` Notes: - all changes are feature-flagged to some degree, off by default, so the merge should be fairly low-risk. - we decided that transmitting the full license key continues to be the way to go. we transmit it once per stream and attach it on all events in the telemetry-gateway. there is no auth mechanism at the moment - GraphQL return type `EventLog.Source` is now a plain string instead of string enum. This should not be a breaking change in our clients, but must be made so that our generated V2 events do not break requesting of event logs Stacked on https://github.com/sourcegraph/sourcegraph/pull/56520 Closes https://github.com/sourcegraph/sourcegraph/issues/56289 Closes https://github.com/sourcegraph/sourcegraph/issues/56287 ## Test plan Add an override to make the export super frequent: ``` env: TELEMETRY_GATEWAY_EXPORTER_EXPORT_INTERVAL: "10s" TELEMETRY_GATEWAY_EXPORTER_EXPORTED_EVENTS_RETENTION: "5m" ``` Start sourcegraph: ``` sg start ``` Enable `telemetry-export` featureflag (from https://github.com/sourcegraph/sourcegraph/pull/56520) Emit some events in GraphQL: ```gql mutation { telemetry { recordEvents(events:[{ feature:"foobar" action:"view" source:{ client:"WEB" } parameters:{ version:0 } }]) { alwaysNil } } ``` See series of log events: ``` [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/telemetrygatewayexporter.go:61 Telemetry Gateway export enabled - initializing background routines [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/exporter.go:99 exporting events {"maxBatchSize": 10000, "count": 1} [telemetry-g...y] INFO telemetry-gateway.pubsub pubsub/topic.go:115 Publish {"TraceId": "7852903434f0d2f647d397ee83b4009b", "SpanId": "8d945234bccf319b", "message": "{\"event\":{\"id\":\"dc96ae84-4ac4-4760-968f-0a0307b8bb3d\",\"timestamp\":\"2023-09-19T01:57:13.590266Z\",\"feature\":\"foobar\", .... ``` Build: ``` export VERSION="insiders" bazel run //cmd/telemetry-gateway:candidate_push --config darwin-docker --stamp --workspace_status_command=./dev/bazel_stamp_vars.sh -- --tag $VERSION --repository us.gcr.io/sourcegraph-dev/telemetry-gateway ``` Deploy: https://github.com/sourcegraph/managed-services/pull/7 Add override: ```yaml env: # Port required. TODO: What's the best way to provide gRPC addresses, such that a # localhost address is also possible? TELEMETRY_GATEWAY_EXPORTER_EXPORT_ADDR: "https://telemetry-gateway.sgdev.org:443" ``` Repeat the above (`sg start` and emit some events): ``` [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/exporter.go:94 exporting events {"maxBatchSize": 10000, "count": 6} [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/exporter.go:113 events exported {"maxBatchSize": 10000, "succeeded": 6} [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/exporter.go:94 exporting events {"maxBatchSize": 10000, "count": 1} [ worker] INFO worker.telemetrygateway-exporter telemetrygatewayexporter/exporter.go:113 events exported {"maxBatchSize": 10000, "succeeded": 1} ```	2023-09-20 05:20:15 +00:00
Erik Seliger	711ee1a495	Remove GitHub proxy service (#56485 ) This service is being replaced by a redsync.Mutex that lives directly in the GitHub client. By this change we will: - Simplify deployments by removing one service - Centralize GitHub access control in the client instead of splitting it across services - Remove the dependency on a non-HA service to talk to GitHub.com successfully Other repos referencing this service will be updated once this has shipped to dotcom and proven to work over the course of a couple days.	2023-09-14 19:43:40 +02:00
Robert Lin	294fe3df22	cody-gateway: push GCP metrics or publish Prometheus metrics via OpenTelemetry (#55134 ) Our first usage of [the recently stabilized OpenTelemetry metrics](https://opentelemetry.io/docs/specs/otel/metrics/) 😁 Currently this is Cody-Gateway-specific, nothing is added for Sourcegraph as a whole. We add the following: - If a GCP project is configured, we set up a GCP exporter that pushes metrics periodically and on shutdown. It's important this is push-based as Cloud Run instances are ephemeral. - Otherwise, we set up a Prometheus exporter that works the same as using the Prometheus SDK, where metrics are exported in `/metrics` (set up by debugserver) and Prometheus scrapes periodically. To start off I've added a simple gauge that records concurrent ongoing requests to upstream Cody Gateway services - see test plan below. Closes https://github.com/sourcegraph/sourcegraph/issues/53775 ## Test plan I've only tested the Prometheus exporter. Hopefully the GCP one will "just work" - the configuration is very similar to the one used in the tracing equivalent, and that one "just worked". ``` sg start dotcom sg run prometheus ``` See target picked up: <img width="1145" alt="Screenshot 2023-07-19 at 7 09 31 PM" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/c9aa4c06-c817-400e-9086-c8ed6997844e"> Talk to Cody aggressively: <img width="1705" alt="image" src="https://github.com/sourcegraph/sourcegraph/assets/23356519/fbda23c7-565f-4a11-ae1b-1bdd8fbceca1">	2023-07-20 20:35:16 +00:00
Jean-Hadrien Chabran	58da6780d7	Switch to OCI/Wolfi based image (#52693 ) This PR ships our freshly rewritten container images built with rules_oci and Wolfi, which for now will only be used on S2. What is this about This work is the conjunction of [hardening container images](https://github.com/orgs/sourcegraph/projects/302?pane=issue&itemId=25019223) and fully building our container images with Bazel. * All base images are now distroless, based on Wolfi, meaning we fully control every little package version and we won't be subject anymore to Alpine maintainers dropping a postgres version for example. * Container images are now built with `rules_oci`, meaning we don't have Dockerfile anymore, but instead created through [Bazel rules](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph@bzl/oci_wolfi/-/blob/enterprise/cmd/gitserver/BUILD.bazel). Don't be scared, while this will look a bit strange to you at first, it's much saner and simpler to do than our Dockerfiles and their muddy shell scripts calling themselves in cascade. :spiral_note_pad: Plan: 1/ (NOW) We merge our branch on `main` today, here is what it does change for you 👇:skin-tone-3:: * On `main`: * It will introduce a new job on `main` _Bazel Push_, which will push those new images on our registries with all tags prefixed by `bazel-`. * These new images will be picked up by S2 and S2 only. * The existing jobs building docker images and pushing them will stay in place until we have QA'ed them enough and are confident to roll them out on Dotcom. * Because we'll be building both images, there will be more jobs running on `main`, but this should not affect the wall clock time. * On all branches (so your PRs and `main`) * The _Bazel Test_ job will now run: Backend Integration Tests, E2E Tests and CodeIntel QA * This will increase the duration of your test jobs in PRs, but as we haven't removed yet the `sg lint` step, it should not affect too much the wall clock time of your PRs. * But it will also increase your confidence toward your changes, as the coverage will vastly increased compared to before. * If you have ongoing branches which are affecting the docker images (like adding a new binary, like the recent `scip-tags`, reach us out on #job-fair-bazel so we can help you to port your changes. It's much much simpler than before, but it's going to be unfamiliar to you). * If something goes awfully wrong, we'll rollback and update this thread. 2/ (EOW / Early next week) Once we're confident enough with what we saw on S2, we'll roll the new images on Dotcom. * After the first successful deploy and a few sanity checks, we will drop the old images building jobs. * At this point, we'll reach out to all TLs asking for their help to exercise all features of our product to ensure we catch any potential breakage. ## Test plan <!-- All pull requests REQUIRE a test plan: https://docs.sourcegraph.com/dev/background-information/testing_principles --> * We tested our new images on `scale-testing` and it worked. * The new container building rules comes with _container tests_ which ensures that produced images are containing and configured with what should be in there: [example](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph@bzl/oci_wolfi/-/blob/enterprise/cmd/gitserver/image_test.yaml) . --------- Co-authored-by: Dave Try <davetry@gmail.com> Co-authored-by: Will Dollman <will.dollman@sourcegraph.com>	2023-06-02 12:12:52 +02:00
Erik Seliger	fcefe5b372	Add embeddings to server behind env var (#50288 ) This might be useful for some customers, but definitely will be useful for us to write an E2E pipeline for embeddings. cc @sourcegraph/dev-experience for a quick glance at the debugserver/prometheus part of this. ## Test plan Will build the image locally and see if it works alright with the env var set.	2023-04-04 16:45:50 +02:00
William Bezuidenhout	6c7389f37c	otel: add collector dashboard (#45009 ) * add initial dashboard for otel * add failed sent dashboard * extra panels * use sum and rate for resource queries * review comments * add warning alerts * Update monitoring/definitions/otel_collector.go * review comments * run go generate * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * Update monitoring/definitions/otel_collector.go Co-authored-by: Robert Lin <robert@bobheadxi.dev> * review comments * review feedback also drop two panels * remove brackets in metrics * update docs * fix goimport * gogenerate Co-authored-by: Robert Lin <robert@bobheadxi.dev> Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>	2022-12-19 13:18:51 +01:00
Erik Seliger	82443158d9	sg: Fix prometheus scraping for local dev (#43703 ) This was missing github-proxy and the new dual deployment of gitserver.	2022-10-31 17:22:27 +00:00
Erik Seliger	dcbd01f545	Push executor metrics (#36969 )	2022-08-03 12:08:04 +02:00
Camden Cheek	de8ae5ee28	Remove query runner (#28333 ) This removes the query runner service. Followup work will remove all code around saved search notifications and update the graphql API.	2021-11-30 10:13:20 -07:00
Erik Seliger	713819a4f2	Incorporate executor-queue into frontend server (#23239 ) This PR moves all the executor queue code into the frontend service. The service no longer needs to run as a singleton and we save one proxy layer while talking from the executor to the queue.	2021-07-27 19:01:49 +02:00
Erik Seliger	d39d12d066	Fix running codeintel and batches executor in parallel (#22612 )	2021-07-06 17:14:39 +02:00
Eric Fritz	91940a0a8d	worker: Add skeleton service (#21768 )	2021-06-04 14:48:13 -05:00
Ryan Hitchman	a7c562f374	dev: monitor zoekt-indexserver-1 and zoekt-webserver-* with prometheus	2021-06-03 10:35:40 -06:00
uwedeportivo	7143f35239	uniform prometheus job label for dev env (#18152 )	2021-02-10 13:46:33 -08:00
Eric Fritz	4ba6132ed2	codeintel: Increase background task observability (#16739 )	2020-12-15 12:01:38 -06:00
Eric Fritz	1f46817d6d	codeintel: Remove bundle manager (#15490 )	2020-11-09 10:31:26 -06:00
Eric Fritz	893e5a6af4	codeintel: Remove indexer service (#15135 )	2020-11-02 16:14:31 -06:00
Eric Fritz	1ff9c72624	codeintel: Remove precise-code-intel-indexer-vm (#15123 )	2020-10-29 08:09:04 -05:00
Eric Fritz	092065b79a	executor: Extract executor from precise-code-intel-executor-vm (#14883 )	2020-10-26 10:13:18 -05:00
Eric Fritz	357a5430dc	executor: Extract executor-queue from precise-code-intel-indexer (#14882 )	2020-10-26 09:56:58 -05:00
Eric Fritz	a08e1cce77	codeintel: VM-based indexer service (#12723 )	2020-08-10 14:30:57 -05:00
Rijnard van Tonder	dd1d5dd5f3	remove replacer service (#12812 )	2020-08-07 11:28:37 -07:00
Rijnard van Tonder	3a380cdd42	Revert "remove replacer (#12480 )" (#12541 )	2020-07-29 11:53:46 -07:00
Rijnard van Tonder	7d6cafd040	remove replacer (#12480 )	2020-07-28 14:27:37 -07:00
Eric Fritz	045265de8a	codeintel: Create auto-indexer service skeleton (#10884 )	2020-05-25 19:10:56 -05:00
Eric Fritz	999207eb1d	Remove precise-code-intel-api-server service (#10906 )	2020-05-21 16:08:05 -05:00
Eric Fritz	22c88de606	Switch from TypeScript to Go precise-code-intel services (#10529 )	2020-05-11 11:01:36 -05:00
Eric Fritz	60d7f713e4	Rename and move lsif directory (#9366 )	2020-03-27 11:36:13 -05:00
Eric Fritz	2e01c2909e	LSIF: Rename lsif-server (the process) to lsif-api-server (#9259 )	2020-03-26 20:08:28 -05:00
Eric Fritz	fc5d12773b	LSIF: Rename lsif-dump-manager to lsif-bundle-manager (#9258 )	2020-03-26 17:06:23 -05:00
Eric Fritz	5e01a0d22b	LSIF: Rename lsif-dump-processor to lsif-worker (#9257 )	2020-03-26 16:42:19 -05:00
Eric Fritz	9d46b9a8b6	LSIF: Add skeleton dump-manager (#9081 )	2020-03-19 12:12:24 -05:00
Rijnard van Tonder	924488766a	Start replacer service in server image (#6712 ) * start replacer * Define global debugserver port for replacer service	2019-11-20 19:16:40 +01:00
uwedeportivo	16745fa25b	monitoring: add postgres_exporter process to dev and single server (#6616 ) * monitoring: add postgres_exporter process to dev and single server and postgres grafana dashboard * prettier * pin postgres_exporter docker image	2019-11-14 16:52:16 -08:00
uwedeportivo	36e6cf1b15	monitoring: network dashboard narrow by client and by gitserver (#5972 ) * monitoring: network dashboard narrow by client and by gitserver * rename var * fix test * more dashboards * prettier * zoekt indexserver dashboard * gitserver dashboard * turn on zoekt-indexer target in dev * job label for prom targets, new grafana image	2019-10-14 10:04:04 -07:00
Eric Fritz	e4ab447c1e	LSIF: Split server and worker (#5525 ) Move the long-running and cpu-bound LSIF conversion step into a separate process that consumes a work queue kept in Redis. This will allow us to scale server and worker replicas independently without worrying about resource over-commit (workers will need more ram/cpu) and will _eventually_ allow us to scale workers without worrying about a write contention in shared SQLite databases. This last step will require that only one worker attaches to a particular queue to handle such work.	2019-09-23 09:04:20 -05:00
Eric Fritz	85e89e8bb2	LSIF server metrics (#5387 ) Add basic metrics to the LSIF service.	2019-09-13 16:07:36 -05:00
uwedeportivo	61776129d4	metrics: custom prometheus/grafana docker images (#5343 ) * metrics: custom prometheus/grafana docker images * transfer work to maxx * Dockerfile config refinements * dev launch use new prom image * cleanup prom after goreman ctrl-c * code review stephen * add new grafana image * single container use new sg prom/graf images * npm run prettier * docker image READMEs * grafana tweaks (datasources provisioning) * forgot to commit this * dockerfile lints and code review * go.mod * revert back to initial versioning approach * code review stephen	2019-09-06 22:57:51 -07:00
uwedeportivo	d8978a17be	single container: add prometheus process (#5131 ) * single container: add prometheus process * change localhost to 127.0.0.1 * use --from correctly	2019-08-08 10:12:38 -07:00
uwedeportivo	182acb6307	dev env: launch prometheus if desired (#4963 ) * dev env: launch prometheus if desired * remove network * pair up prometheus with grafana * declare new files in code owners * prettier * sh comment cleanup	2019-08-05 13:50:27 -07:00

42 Commits