sourcegraph/doc/dev/background-information/observability/cadvisor.md

# Sourcegraph cAdvisor

We ship a custom [cAdvisor](https://github.com/google/cadvisor) image as part of the standard Sourcegraph Kubernetes and docker-compose distribution.
cAdvisor exports container monitoring metrics scraped by [Prometheus](./prometheus.md) and visualized in [Grafana](./grafana.md).

The image is defined in [`docker-images/cadvisor`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/docker-images/cadvisor).

## Monitoring

Monitoring on cAdvisor metrics is defined in the [monitoring generator](./monitoring-generator.md).
cAdvisor observables are generally defined as [shared observables](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring/definitions/shared).

When adding monitoring on cAdvisor metrics, please ensure that the [metric can be identified](#identifying-containers) (if not, it is likely the [metric is not supported](#available-metrics)).

## Identifying containers

How relevant containers are identified from exported cAdvisor metrics is documented in [`CadvisorNameMatcher`](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+type:symbol+CadvisorNameMatcher&patternType=literal), which generates the label matcher for [monitoring observables](#monitoring).

Because cAdvisor run on a *machine* and exports *container* metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor.
Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph.
Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!).
This means that cAdvisor can pick up non-Sourcegraph metrics, which can be problematic—see [known issues](#known-issues) for more details and current workarounds.

## Available metrics

Exported metrics are documented in the [cAdvisor Prometheus metrics list](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics).
In the list, the column `-disable_metrics parameter` indicates the "group" the metric belongs in.

Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups—before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and `containerd` container runtimes).
Support is generally poorly documented, but a search through the [cAdvisor repository issues](https://github.com/google/cadvisor/issues) might provide some hints.

## Known issues

- cAdvisor can pick up non-Sourcegraph metrics (can cause issues with [our built-in observability](../../../admin/observability/index.md) and, in extreme cases, cause cAdvisor and Prometheus performance issues if the number of metrics is very large) due to how we currently [identitify containers](#identifying-containers): [sourcegraph#17365](https://github.com/sourcegraph/sourcegraph/issues/17365) ([Kubernetes workaround](../../../admin/deploy/kubernetes/configure.md#filtering-cadvisor-metrics))
- Metrics issues
  - `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
  - `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
- When using a Kustomize non-privileged overlay in a deployment, cAdvisor is disabled by default and hence cannot scrape container metrics for visualization in Grafana. cAdvisor requires elevated privileges to collect this data and hence will not work with this overlay.
monitoring: cadvisor observables review (#17239) Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing 2021-01-13 14:56:04 +00:00			`# Sourcegraph cAdvisor`

			`We ship a custom [cAdvisor](https://github.com/google/cadvisor) image as part of the standard Sourcegraph Kubernetes and docker-compose distribution.`
			`cAdvisor exports container monitoring metrics scraped by [Prometheus](./prometheus.md) and visualized in [Grafana](./grafana.md).`

			The image is defined in [`docker-images/cadvisor`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/docker-images/cadvisor).

			`## Monitoring`

			`Monitoring on cAdvisor metrics is defined in the [monitoring generator](./monitoring-generator.md).`
			`cAdvisor observables are generally defined as [shared observables](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring/definitions/shared).`

			`When adding monitoring on cAdvisor metrics, please ensure that the [metric can be identified](#identifying-containers) (if not, it is likely the [metric is not supported](#available-metrics)).`

			`## Identifying containers`

			How relevant containers are identified from exported cAdvisor metrics is documented in [`CadvisorNameMatcher`](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+type:symbol+CadvisorNameMatcher&patternType=literal), which generates the label matcher for [monitoring observables](#monitoring).

			`Because cAdvisor run on a machine and exports container metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor.`
			`Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph.`
doc: improve cadvisor metrics issues docs and workaround (#23817) 2021-08-12 23:55:33 +00:00			Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!).
Docs: Replace hyphens in text with em-dashes (#42367) 2023-01-31 12:18:49 +00:00			`This means that cAdvisor can pick up non-Sourcegraph metrics, which can be problematic—see [known issues](#known-issues) for more details and current workarounds.`
monitoring: cadvisor observables review (#17239) Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing 2021-01-13 14:56:04 +00:00
			`## Available metrics`

			`Exported metrics are documented in the [cAdvisor Prometheus metrics list](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics).`
			In the list, the column `-disable_metrics parameter` indicates the "group" the metric belongs in.

Docs: Replace hyphens in text with em-dashes (#42367) 2023-01-31 12:18:49 +00:00			Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups—before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and `containerd` container runtimes).
monitoring: cadvisor observables review (#17239) Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing 2021-01-13 14:56:04 +00:00			`Support is generally poorly documented, but a search through the [cAdvisor repository issues](https://github.com/google/cadvisor/issues) might provide some hints.`

doc: improve cadvisor metrics issues docs and workaround (#23817) 2021-08-12 23:55:33 +00:00			`## Known issues`
monitoring: cadvisor observables review (#17239) Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing 2021-01-13 14:56:04 +00:00
Product Education: Rewrite for Overview, Docker Compose, and Docker Single Container to align to new IA and deployment focus (#34715) * Overall update to admin\install These changes impact the install section with an update to include a deployment overview and a reorganization of the Docker Compose and Single-container sections. Because of a directory name change, I also have updated all other links directly. * Remove old install files. Removed indexes for Docker Compose and Single Container, as well as the separate operations guides as they are now part of the main docs. * Updated admin index. Added some links to the admin main page. * Update redirects Fix to redirect links. * Update site_config.md Updated link. * Link fixes. Various links associated with directory/folder change for deployment fixed. * Various fixes based on review and link failures. Fixed Buildkite link failures and implemented issues found during review. * Update doc/admin/deploy/docker-compose/index.md Suggested edit for clarification. Co-authored-by: Michael Lin <mlzc@hey.com> * Edits to correct link issues. * Link fix. Removed sentence given redundant. * Update git_server.go Updated links to new location. * update gitserver solutions or sth * Deployment type table update. Removed checkmarks (which have varied meanings globally), and the colored circles for easy/medium/hard (which may not be accessible for some readers). * Update links for Kubernetes. Added distinct links for k8s w/ and w/out Helm. * Update doc/admin/how-to/redis_configmap.md Fix link typo. Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/docker-single-container/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Various changes based on review. * Fix unintended update due to VS Code Extension. * Fixed markdown numbering issue. * Minor edits to correct links and deployment language. * Moved location of Cloud installation docs within Docker Compose IA * Updates based on review. Improved language related to k8s w/helm. Removed language around CPU core recommendation. * added more redirects to address link changes. Co-authored-by: Michael Lin <mlzc@hey.com> Co-authored-by: Robert Lin <robert@bobheadxi.dev> Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> 2022-05-06 16:54:45 +00:00			`- cAdvisor can pick up non-Sourcegraph metrics (can cause issues with [our built-in observability](../../../admin/observability/index.md) and, in extreme cases, cause cAdvisor and Prometheus performance issues if the number of metrics is very large) due to how we currently [identitify containers](#identifying-containers): [sourcegraph#17365](https://github.com/sourcegraph/sourcegraph/issues/17365) ([Kubernetes workaround](../../../admin/deploy/kubernetes/configure.md#filtering-cadvisor-metrics))`
doc: improve cadvisor metrics issues docs and workaround (#23817) 2021-08-12 23:55:33 +00:00			`- Metrics issues`
			- `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
			- `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
Update cAdvisor documentation (#58440) Adding a known issue where cAdvisor doesn't scrape metrics in a non-privileged overlay Kustomize deployment 2023-11-20 16:37:46 +00:00			`- When using a Kustomize non-privileged overlay in a deployment, cAdvisor is disabled by default and hence cannot scrape container metrics for visualization in Grafana. cAdvisor requires elevated privileges to collect this data and hence will not work with this overlay.`