sourcegraph/doc/dev/background-information/observability/cadvisor.md

39 lines
3.8 KiB
Markdown
Raw Permalink Normal View History

# Sourcegraph cAdvisor
We ship a custom [cAdvisor](https://github.com/google/cadvisor) image as part of the standard Sourcegraph Kubernetes and docker-compose distribution.
cAdvisor exports container monitoring metrics scraped by [Prometheus](./prometheus.md) and visualized in [Grafana](./grafana.md).
The image is defined in [`docker-images/cadvisor`](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/docker-images/cadvisor).
## Monitoring
Monitoring on cAdvisor metrics is defined in the [monitoring generator](./monitoring-generator.md).
cAdvisor observables are generally defined as [shared observables](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring/definitions/shared).
When adding monitoring on cAdvisor metrics, please ensure that the [metric can be identified](#identifying-containers) (if not, it is likely the [metric is not supported](#available-metrics)).
## Identifying containers
How relevant containers are identified from exported cAdvisor metrics is documented in [`CadvisorNameMatcher`](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+type:symbol+CadvisorNameMatcher&patternType=literal), which generates the label matcher for [monitoring observables](#monitoring).
Because cAdvisor run on a *machine* and exports *container* metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor.
Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph.
Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!).
This means that cAdvisor can pick up non-Sourcegraph metrics, which can be problematic—see [known issues](#known-issues) for more details and current workarounds.
## Available metrics
Exported metrics are documented in the [cAdvisor Prometheus metrics list](https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics).
In the list, the column `-disable_metrics parameter` indicates the "group" the metric belongs in.
Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups—before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and `containerd` container runtimes).
Support is generally poorly documented, but a search through the [cAdvisor repository issues](https://github.com/google/cadvisor/issues) might provide some hints.
## Known issues
Product Education: Rewrite for Overview, Docker Compose, and Docker Single Container to align to new IA and deployment focus (#34715) * Overall update to admin\install These changes impact the install section with an update to include a deployment overview and a reorganization of the Docker Compose and Single-container sections. Because of a directory name change, I also have updated all other links directly. * Remove old install files. Removed indexes for Docker Compose and Single Container, as well as the separate operations guides as they are now part of the main docs. * Updated admin index. Added some links to the admin main page. * Update redirects Fix to redirect links. * Update site_config.md Updated link. * Link fixes. Various links associated with directory/folder change for deployment fixed. * Various fixes based on review and link failures. Fixed Buildkite link failures and implemented issues found during review. * Update doc/admin/deploy/docker-compose/index.md Suggested edit for clarification. Co-authored-by: Michael Lin <mlzc@hey.com> * Edits to correct link issues. * Link fix. Removed sentence given redundant. * Update git_server.go Updated links to new location. * update gitserver solutions or sth * Deployment type table update. Removed checkmarks (which have varied meanings globally), and the colored circles for easy/medium/hard (which may not be accessible for some readers). * Update links for Kubernetes. Added distinct links for k8s w/ and w/out Helm. * Update doc/admin/how-to/redis_configmap.md Fix link typo. Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Update doc/admin/deploy/docker-single-container/index.md Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com> * Various changes based on review. * Fix unintended update due to VS Code Extension. * Fixed markdown numbering issue. * Minor edits to correct links and deployment language. * Moved location of Cloud installation docs within Docker Compose IA * Updates based on review. Improved language related to k8s w/helm. Removed language around CPU core recommendation. * added more redirects to address link changes. Co-authored-by: Michael Lin <mlzc@hey.com> Co-authored-by: Robert Lin <robert@bobheadxi.dev> Co-authored-by: Crystal Augustus <91073224+caugustus-sourcegraph@users.noreply.github.com>
2022-05-06 16:54:45 +00:00
- cAdvisor can pick up non-Sourcegraph metrics (can cause issues with [our built-in observability](../../../admin/observability/index.md) and, in extreme cases, cause cAdvisor and Prometheus performance issues if the number of metrics is very large) due to how we currently [identitify containers](#identifying-containers): [sourcegraph#17365](https://github.com/sourcegraph/sourcegraph/issues/17365) ([Kubernetes workaround](../../../admin/deploy/kubernetes/configure.md#filtering-cadvisor-metrics))
- Metrics issues
- `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
- `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
- When using a Kustomize non-privileged overlay in a deployment, cAdvisor is disabled by default and hence cannot scrape container metrics for visualization in Grafana. cAdvisor requires elevated privileges to collect this data and hence will not work with this overlay.