mirror of
https://github.com/sourcegraph/sourcegraph.git
synced 2026-02-06 18:51:59 +00:00
docs: document new telemetry and telemetry gateway (#56868)
This PR adds development guidance for the new telemetry system and Telemetry Gateway, and a little bit of context.
This commit is contained in:
parent
8f7d6c2b4e
commit
ec99949a4e
5
cmd/telemetry-gateway/README.md
Normal file
5
cmd/telemetry-gateway/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# telemetry-gateway
|
||||
|
||||
Telemetry Gateway is a managed service that ingests events exported from Sourcegraph instances, manipulates them as needed, and exports them to designated Pub/Sub topics or other destinations for processing.
|
||||
|
||||
Refer to [the Telemetry Gateway development documentation](https://docs.sourcegraph.com/dev/how-to/telemetry_gateway) for more development guidance.
|
||||
@ -1,4 +1,6 @@
|
||||
# Adding, changing, and debugging user event data
|
||||
# DEPRECATED: Adding, changing, and debugging user event data
|
||||
|
||||
> WARNING: **This process is deprecated.** To export Telemetry events from Sourcegraph instances, refer to the new [telemetry reference](./telemetry/index.md).
|
||||
|
||||
This document outlines the process for adding or changing the raw user event data collected from Sourcegraph instances. This is limited to certain managed instances (cloud) where the customer has signed a corresponding data collection agreement.
|
||||
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
# Event level data usage pipeline
|
||||
# DEPRECATED: Event level data usage pipeline
|
||||
|
||||
> WARNING: **This process is deprecated.** To export Telemetry events from Sourcegraph instances, refer to the new [telemetry reference](./telemetry/index.md).
|
||||
|
||||
This document outlines information about the ability to export raw user event data from Sourcegraph. This is limited
|
||||
to certain managed instances (cloud) where the customer has signed a corresponding data collection agreement.
|
||||
|
||||
33
doc/dev/background-information/telemetry/architecture.md
Normal file
33
doc/dev/background-information/telemetry/architecture.md
Normal file
@ -0,0 +1,33 @@
|
||||
# Telemetry export architecture
|
||||
|
||||
> WARNING: This is a guide intended for development reference.
|
||||
>
|
||||
> Additionally, export capabilities are **not yet enabled by default**.
|
||||
|
||||
This page outlines the architecture and components involved in Sourcegraph's new telemetry export system.
|
||||
|
||||
## Storing events
|
||||
|
||||
Once [recorded](./index.md#recording-events), telemetry events are stored in two places:
|
||||
|
||||
1. The structured `event_logs` table, for use in [admin analytics](../../../admin/analytics.md).
|
||||
2. The unstructured `telemetry_events_export_queue` table, which stores raw event payloads in Protobuf wire format for export.
|
||||
|
||||
## Exporting events
|
||||
|
||||
The [`telemetrygatewayexporter`](https://github.com/sourcegraph/sourcegraph/blob/main/enterprise/cmd/worker/internal/telemetrygatewayexporter/telemetrygatewayexporter.go) running in the worker service spawns a set of background jobs that handle:
|
||||
|
||||
1. Reporting metrics on the `telemetry_events_export_queue`
|
||||
2. Cleaning up already-exported entries in the `telemetry_events_export_queue`
|
||||
3. Exporting batches of not-yet-exported entries in the `telemetry_events_export_queue` to the Telemetry Gateway service
|
||||
|
||||
When exporting events, we explicitly only mark an event as successfully exported when the Telemetry Gateway returns a response with a particular event's generated ID. This ensures we always export events at least once.
|
||||
|
||||
Note that before export, [sensitive attributes are stripped](./index.md#sensitive-attributes).
|
||||
|
||||
## Telemetry Gateway
|
||||
|
||||
The Telemetry Gateway is a managed Sourcegraph service that ingests event exports from all Sourcegraph instances, and handles manipulating the events and publishing raw payloads to a Pub/Sub topic.
|
||||
It exposes a gRPC API defined in [`telemetrygateway/v1`](https://github.com/sourcegraph/sourcegraph/tree/main/internal/telemetrygateway/v1).
|
||||
|
||||
Also see [How to set up Telemetry Gateway locally](../../how-to/telemetry_gateway.md).
|
||||
@ -1,16 +1,19 @@
|
||||
# Telemetry
|
||||
# DEPRECATED: Telemetry
|
||||
|
||||
1. [Action telemetry](#action-telemetry)
|
||||
2. [Browser extension telemetry](#browser-extension-telemetry)
|
||||
1. [Action Telemetry (a.k.a User event logs)](#action-telemetry-aka-user-event-logs)
|
||||
1. [UTM markers](#utm-markers)
|
||||
1. [Error logging](#error-logging)
|
||||
> WARNING: In Sourcegraph 5.2 and later, there is a new framework for writing telemetry events - see [new telemetry documentation](./index.md) for more details.
|
||||
> Existing telemetry mechanisms continue to coexist with the new telemetry framework, so this page is retained for reference.
|
||||
>
|
||||
> Some parts of this page may remain valid - they should be migrated to one of the new pages.
|
||||
|
||||
- [Browser extension telemetry](#browser-extension-telemetry)
|
||||
- [Action telemetry (a.k.a User event logs)](#action-telemetry-aka-user-event-logs)
|
||||
- [UTM markers](#utm-markers)
|
||||
- [Error logging](#error-logging)
|
||||
|
||||
> NOTE: This document is a work-in-progress.
|
||||
|
||||
Telemetry describes the logging of user events, such as a page view or search. Telemetry data is collected by each Sourcegraph instance and is not sent to Sourcegraph.com (except in aggregate form as documented in "[Pings](../../admin/pings.md)"). Some select managed instances enable
|
||||
event level (non-aggregated) [telemetry](./data-usage-pipeline.md).
|
||||
Telemetry describes the logging of user events, such as a page view or search. Telemetry data is collected by each Sourcegraph instance and is not sent to Sourcegraph.com (except in aggregate form as documented in "[Pings](../../../admin/pings.md)"). Some select managed instances enable
|
||||
event level (non-aggregated) [telemetry](../data-usage-pipeline.md).
|
||||
|
||||
## Browser extension telemetry
|
||||
|
||||
@ -18,7 +21,7 @@ event level (non-aggregated) [telemetry](./data-usage-pipeline.md).
|
||||
|
||||
#### Action telemetry (a.k.a User event logs)
|
||||
|
||||
Browser extension telemetry data is sent only to the connected Sourcegraph instance URL (except in aggregate form as documented in "[Pings](../../admin/pings.md)").
|
||||
Browser extension telemetry data is sent only to the connected Sourcegraph instance URL (except in aggregate form as documented in "[Pings](../../../admin/pings.md)").
|
||||
|
||||
- In Chrome and Safari telemetry is always enabled
|
||||
- In Firefox telemetry is enabled if
|
||||
@ -58,7 +61,6 @@ Currently following UTM markers are generated by browser extension:
|
||||
- `?utm_source={platform-name}&utm_campaign=hover`: All external links from hover overlay
|
||||
- `?utm_source={platform-name}&utm_campaign=browser-extension-uninstall&utm_content={extension-version}`: When redirecting to about.sourcegraph.com/uninstall on browser extension uninstall
|
||||
|
||||
|
||||
#### Error logging
|
||||
|
||||
We use **Sentry** to automatically log any errors in background/content scripts if `Allow Error Reporting` is enabled.
|
||||
129
doc/dev/background-information/telemetry/index.md
Normal file
129
doc/dev/background-information/telemetry/index.md
Normal file
@ -0,0 +1,129 @@
|
||||
# Telemetry
|
||||
|
||||
> WARNING: This is a guide intended for development reference.
|
||||
>
|
||||
> Additionally, export capabilities are **not yet enabled by default**.
|
||||
|
||||
Telemetry describes the logging of user events, such as a page view or search, from various components of the Sourcegraph and Cody applications.
|
||||
There are currently two ways to log product telemetry:
|
||||
|
||||
- legacy mechanisms outlined in [DEPRECATED: Telemetry](deprecated.md), including writing directly to the `event_logs` database table or using `mutation { logEvent }`.
|
||||
- the new telemetry framework introduced in Sourcegraph 5.2 and later (documented on this page)
|
||||
|
||||
All usages of old telemetry mechanisms should be migrated to the new framework.
|
||||
|
||||
- [Why a new framework and APIs?](#why-a-new-framework-and-apis)
|
||||
- [Event lifecycle](#event-lifecycle)
|
||||
- [Recording events](#recording-events)
|
||||
- [Backend services](#backend-services)
|
||||
- [Clients](#clients)
|
||||
- [Exporting events](#exporting-events)
|
||||
- [Sensitive attributes](#sensitive-attributes)
|
||||
- [Exported event schema](#exported-event-schema)
|
||||
- [Enabling telemetry export](#enabling-telemetry-export)
|
||||
|
||||
## Why a new framework and APIs?
|
||||
|
||||
The new telemetry framework and API aims to address the following issues:
|
||||
|
||||
- The existing `event_logs` parameters are arbitrarily shaped - to provide stronger guarantees against accidentally exporting sensitive data, the new APIs enforce stricter requirements - see [recording events](#recording-events) for more details.
|
||||
- The shape of existing `event_logs` have grown organically over time without a clear structured schema.
|
||||
Callsites must construct full events on their own, and we cannot easily prune event objects of potentially [sensitive attributes](#sensitive-attributes) before export.
|
||||
|
||||
Events recorded in the new framework and APIs are still translated into the existing `event_logs` table for admin analytics on a best-effort basis - see [event lifecycle](#event-lifecycle) for more details.
|
||||
|
||||
## Event lifecycle
|
||||
|
||||
All events stay in the instance that events are recording in until they get exported - users of standalone Sourcegraph instances should no longer report any telemetry directly to the [Sourcegraph.com](https://sourcegraph.com/search) deployment, and should instead report events to their own Sourcegraph instance.
|
||||
|
||||
In general, the lifecycle of an event in the new system looks like this:
|
||||
|
||||
1. [A telemetry event is recorded](#recording-events). This can happen in clients using SDKs like [`@sourcegraph/telemetry`](https://github.com/sourcegraph/telemetry), or using [`internal/telemetry/telemetryrecorder`](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetry/telemetryrecorder/telemetryrecorder.go) in the backend.
|
||||
2. Within each telemetry SDK, additional metadata is automatically injected - in clients through [processors](https://github.com/sourcegraph/telemetry/blob/main/src/processors/index.ts) and [the GraphQL mutation](https://github.com/sourcegraph/sourcegraph/blob/main/cmd/frontend/internal/telemetry/resolvers/telemetrygateway.go), and in the backend through [the events adapter](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetry/telemetrygateway.go).
|
||||
3. The telemetry event is [translated into the existing `event_logs` table](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetry/teestore/teestore.go) (for use in [admin analytics](../../../admin/analytics.md)), and stored in a temporary queue for export.
|
||||
4. Periodically, events are [exported from the cache](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetry/export/export.go) and exported to Sourcegraph's Telemetry Gateway service, which forwards it to our data warehouse - see [exporting events](#exporting-events).
|
||||
|
||||
## Recording events
|
||||
|
||||
Note that recording APIs are intentionally stricter and have a smaller surface area than [the full events we end up exporting](#exported-event-schema).
|
||||
This is to help prevent accidental export of sensitive data, and to make it clear what properties should be injected in a uniform manner instead of being constructed ad-hoc by callers - see [event lifecycle](#event-lifecycle) for details.
|
||||
|
||||
### Backend services
|
||||
|
||||
In the backend, events are recorded using `EventRecorder` instances created from the `internal/telemetry/telemetryrecorder` package. For example:
|
||||
|
||||
```go
|
||||
import (
|
||||
"github.com/sourcegraph/sourcegraph/internal/telemetry"
|
||||
"github.com/sourcegraph/sourcegraph/internal/telemetry/telemetryrecorder"
|
||||
)
|
||||
|
||||
func doMyThing(db database.DB) error {
|
||||
recorder := telemetryrecorder.New(db)
|
||||
|
||||
if err := recorder.Record("myFeature", "myAction", telemetry.EventParameters{
|
||||
Version: 0,
|
||||
Metadata: telemetry.EventMetadata{"my_metadata": 12},
|
||||
// See 'Sensitive attributes'
|
||||
PrivateMetadata: map[string]any{"my_private_metadata": 42},
|
||||
}); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you don't care about failures to record telemetry, you can use `telemetryrecorder.NewBestEffort(log.Logger, database.DB)` to automatically have errors logged and not returned.
|
||||
|
||||
Note that not all attributes are exported - see [Sensitive attributes](#sensitive-attributes) for details.
|
||||
|
||||
### Clients
|
||||
|
||||
Clients should use [`@sourcegraph/telemetry`](https://github.com/sourcegraph/telemetry), providing client-specific metadata and implementation for exporting to a Sourcegraph instance's `mutation { telemetry { recordEvent(...) }}` GraphQL mutation.
|
||||
|
||||
> NOTE: More guidance coming soon!
|
||||
|
||||
## Exporting events
|
||||
|
||||
See [telemetry export architecture](./architecture.md) for more details on how exports work.
|
||||
|
||||
### Sensitive attributes
|
||||
|
||||
There are two core attributes in events that are considered potentially sensitive, and thus not exported from individual Sourcegraph instances:
|
||||
|
||||
- `parameters.privateMetadata`: this fields allows the recording of arbitrarily shaped metadata, as opposed to the integer values supported in `parameters.metadata`. Due to the risk of sensitive data and PII exposure, we do not export this field by default
|
||||
- Certain events may be allowlisted to have this field exported - this is defined in [`internal/telemetry/sensitiviemetadataallowlist`](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetry/sensitivemetadataallowlist/sensitiviemetadataallowlist.go). Adding events to this list requires review and approval from Legal.
|
||||
- `marketingTracking`: this field tracks a lot of properties around URLs visited and marketing tracking that may contain sensitive data. This is only exported from the [Sourcegraph.com](https://sourcegraph.com/search) instance.
|
||||
|
||||
### Exported event schema
|
||||
|
||||
The full event schema is intentionally a significant superset from the shape of the [event-recording APIs](#recording-events).
|
||||
Standardized metadata (users, feature flags, etc) are automatically added at various points in an event's lifecycle - callsites should only be concerned with properties associated with the specific event.
|
||||
|
||||
The full event schema that ends up getting exported is defined in [`telemetrygateway.proto`](https://github.com/sourcegraph/sourcegraph/blob/main/internal/telemetrygateway/v1/telemetrygateway.proto)'s `Event` message type. The event forwarded from Telemetry Gateway currently has the following shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"identifier": {
|
||||
// ... telemetrygatewayv1.Identifier
|
||||
}
|
||||
},
|
||||
"event": {
|
||||
// ... telemetrygatewayv1.Event
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
> NOTE: In the Sourcegraph application, the new events being exported using `internal/telemetry` are sometimes loosely referred to as "V2", as it supersedes the existing mechanisms of writing directly to the `event_logs` database table.
|
||||
> The *Telemetry Gateway* schema, however, is `telemetrygateway/v1`, as it is the first iteration of the service's API.
|
||||
|
||||
## Enabling telemetry export
|
||||
|
||||
> NOTE: Telemetry export is currently experimental, and disabled by default.
|
||||
|
||||
Telemetry export can be enabled by making the following configuration changes:
|
||||
|
||||
- Set environment variable `TELEMETRY_GATEWAY_EXPORTER_EXPORT_ADDR="https://telemetry-gateway.sourcegraph.com:443"`
|
||||
- Enable feature flag `telemetry-export` on the entire instance, or on a subset of users that you want to export telemetry for
|
||||
|
||||
Our defaults for the above may change in the future.
|
||||
@ -1,6 +1,20 @@
|
||||
# How to set up Telemetry Gateway locally
|
||||
|
||||
By default, exports of Telemetry V2 events to a local Telemetry Gateway instance is enabled in `sg start` and `sg start dotcom`.
|
||||
> WARNING: This is a guide intended for development reference.
|
||||
|
||||
Telemetry Gateway is a managed service that ingests events exported from Sourcegraph instances, manipulates them as needed, and exports them to designated Pub/Sub topics or other destinations for processing.
|
||||
|
||||
It exposes a gRPC API defined in [`telemetrygateway/v1`](https://github.com/sourcegraph/sourcegraph/tree/main/internal/telemetrygateway/v1), and the service itself is implemented in [`cmd/telemetry-gateway`](https://github.com/sourcegraph/sourcegraph/tree/main/cmd/telemetry-gateway).
|
||||
|
||||
To learn more about the Sourcegraph's new Telemetry framework, refer to [the telemetry documentation](../background-information/telemetry/index.md).
|
||||
|
||||
> NOTE: In the Sourcegraph application, the [new events being exported using `internal/telemetry`](../background-information/telemetry/index.md) are sometimes loosely referred to as "V2", as it supersedes the existing mechanisms of writing directly to the `event_logs` database table.
|
||||
> The *Telemetry Gateway* schema, however, is `telemetrygateway/v1`, as it is the first iteration of the service's API.
|
||||
|
||||
## Running Telemetry Gateway locally
|
||||
|
||||
Exports of [telemetry events](../background-information/telemetry/index.md) to a local Telemetry Gateway instance is enabled in as part of `sg start` and `sg start dotcom`.
|
||||
By default, the local Telemetry Gateway instance will simply log any events it receives.
|
||||
|
||||
You can increase the frequency of exports by setting the following in `sg.config.yaml`:
|
||||
|
||||
@ -11,3 +25,13 @@ env:
|
||||
```
|
||||
|
||||
In development, a gRPC interface is enabled for Telemetry Gateway as well at `http://127.0.0.1:10085/debug/grpcui/`.
|
||||
|
||||
## Testing against a remote Telemetry Gateway
|
||||
|
||||
A test deployment is available at `telemetry-gateway.sgdev.org`, which publishes events to a test dataset.
|
||||
In local development, you can configure Sourcegraph to export to this test deployment by setting the following in `sg.config.yaml`:
|
||||
|
||||
```yaml
|
||||
env:
|
||||
TELEMETRY_GATEWAY_EXPORTER_EXPORT_ADDR: "https://telemetry-gateway.sgdev.org:443"
|
||||
```
|
||||
|
||||
@ -160,10 +160,10 @@ Clarification and discussion about key concepts, architecture, and development s
|
||||
|
||||
### Other
|
||||
|
||||
- [Telemetry](background-information/telemetry.md)
|
||||
- [Telemetry](background-information/telemetry/index.md)
|
||||
- [Adding, changing and debugging pings](background-information/adding_ping_data.md)
|
||||
- [Event level data usage pipeline](background-information/data-usage-pipeline.md)
|
||||
- [Adding, changing and debugging user event data](background-information/adding_event_level_data.md)
|
||||
- [DEPRECATED: Event level data usage pipeline](background-information/data-usage-pipeline.md)
|
||||
- [DEPRECATED: Adding, changing and debugging user event data](background-information/adding_event_level_data.md)
|
||||
- [Deploy Sourcegraph with Helm chart (BETA)](../../../admin/deploy/kubernetes/helm.md)
|
||||
- [GitHub API oddities](background-information/github-api-oddities.md)
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user