mirror of
https://github.com/sourcegraph/sourcegraph.git
synced 2026-02-06 17:51:57 +00:00
doc: improve cadvisor metrics issues docs and workaround (#23817)
This commit is contained in:
parent
9e38d89d1b
commit
9e7a4a7947
@ -800,6 +800,35 @@ spec:
|
||||
value: bob
|
||||
```
|
||||
|
||||
## Filtering cAdvisor metrics
|
||||
|
||||
Due to how cAdvisor works, Sourcegraph's cAdvisor deployment can pick up metrics for services unrelated to the Sourcegraph deployment running on the same nodes as Sourcegraph services.
|
||||
[Learn more](../../../dev/background-information/observability/cadvisor.md#identifying-containers).
|
||||
|
||||
To work around this, update your `prometheus.ConfigMap.yaml` to target your [namespaced Sourcegraph deployment](#namespaced-overlay) by uncommenting the below `metric_relabel_configs` entry and updating it with the appropriate namespace.
|
||||
This will cause Prometheus to drop all metrics *from cAdvisor* that are not from services in the desired namespace.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
data:
|
||||
prometheus.yml: |
|
||||
# ...
|
||||
|
||||
metric_relabel_configs:
|
||||
# cAdvisor-specific customization. Drop container metrics exported by cAdvisor
|
||||
# not in the same namespace as Sourcegraph.
|
||||
# Uncomment this if you have problems with certain dashboards or cAdvisor itself
|
||||
# picking up non-Sourcegraph services. Ensure all Sourcegraph services are running
|
||||
# within the Sourcegraph namespace you have defined.
|
||||
# The regex must keep matches on '^$' (empty string) to ensure other metrics do not
|
||||
# get dropped.
|
||||
- source_labels: [container_label_io_kubernetes_pod_namespace]
|
||||
regex: ^$|ns-sourcegraph # ensure this matches with namespace declarations
|
||||
action: keep
|
||||
|
||||
# ...
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
See the [Troubleshooting docs](troubleshoot.md).
|
||||
|
||||
@ -65,11 +65,10 @@ This indicates the instance is getting rate-limited by Docker Hub([link](https:/
|
||||
- [**OPTIONAL**] Upgrade your account to a Docker Pro or Team subscription ([See Docker Hub for more information](https://www.docker.com/increase-rate-limits))
|
||||
|
||||
|
||||
### Prometheus Pod is constantly down when using the namespace overlays.
|
||||
|
||||
This is most likely due to cadvisor picking up other metrics from the cluster.
|
||||
You can confirm this theory by checking your [prometheus.ConfigMap.yaml](https://sourcegraph.com/github.com/sourcegraph/deploy-sourcegraph@3.27/-/blob/base/prometheus/prometheus.ConfigMap.yaml#L248-250) file, where the `source_labels: [container_label_io_kubernetes_pod_namespace]` fields under `metric_relabel_configs` should be commented out and the `regex` field must be updated with your namespace.
|
||||
### Irrelevant cAdvisor metrics are causing strange alerts and performance issues.
|
||||
|
||||
This is most likely due to cAdvisor picking up other metrics from the cluster.
|
||||
A workaround is available: [Filtering cAdvisor metrics](./configure.md#filtering-cadvisor-metrics).
|
||||
|
||||
### I don't see any metrics on my Grafana Dashboard.
|
||||
|
||||
|
||||
@ -18,7 +18,8 @@ How relevant containers are identified from exported cAdvisor metrics is documen
|
||||
|
||||
Because cAdvisor run on a *machine* and exports *container* metrics, standard strategies for identifying what container a metric belongs to (such as Prometheus scrape target labels) cannot be used, because all the metrics look like they belong to cAdvisor.
|
||||
Making things complicated is how containers are identified on various environments (namely Kubernetes and docker-compose) varies, sometimes due to characteristics of the environments and sometimes due to naming inconsistencies within Sourcegraph.
|
||||
Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!), so we might have to create a custom naming strategy.
|
||||
Variations in how cAdvisor generates the `name` label it provides also makes things difficult (in some environments, it cannot generate one at all!).
|
||||
This means that cAdvisor can pick up non-Sourcegraph metrics, which can be problematic - see [known issues](#known-issues) for more details and current workarounds.
|
||||
|
||||
## Available metrics
|
||||
|
||||
@ -28,7 +29,9 @@ In the list, the column `-disable_metrics parameter` indicates the "group" the m
|
||||
Container runtime and deployment environment compatability for various metrics seem to be grouped by these groups - before using a metric, ensure that the metric is supported in all relevant environments (for example, both Docker and `containerd` container runtimes).
|
||||
Support is generally poorly documented, but a search through the [cAdvisor repository issues](https://github.com/google/cadvisor/issues) might provide some hints.
|
||||
|
||||
### Known issues
|
||||
## Known issues
|
||||
|
||||
- `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
|
||||
- `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
|
||||
- cAdvisor can pick up non-Sourcegraph metrics (can cause issues with [our built-in observability](../../../admin/observability/index.md) and, in extreme cases, cause cAdvisor and Prometheus performance issues if the number of metrics is very large) due to how we currently [identitify containers](#identifying-containers): [sourcegraph#17365](https://github.com/sourcegraph/sourcegraph/issues/17365) ([Kubernetes workaround](../../../admin/install/kubernetes/configure.md#filtering-cadvisor-metrics))
|
||||
- Metrics issues
|
||||
- `disk` metrics are not available in `containerd`: [cadvisor#2785](https://github.com/google/cadvisor/issues/2785)
|
||||
- `diskIO` metrics do not seem to be available in Kubernetes: [sourcegraph#12163](https://github.com/sourcegraph/sourcegraph/issues/12163)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user