otel: add collector dashboard (#45009)

* add initial dashboard for otel

* add failed sent dashboard

* extra panels

* use sum and rate for resource queries

* review comments

* add warning alerts

* Update monitoring/definitions/otel_collector.go

* review comments

* run go generate

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* Update monitoring/definitions/otel_collector.go

Co-authored-by: Robert Lin <robert@bobheadxi.dev>

* review comments

* review feedback also drop two panels

* remove brackets in metrics

* update docs

* fix goimport

* gogenerate

Co-authored-by: Robert Lin <robert@bobheadxi.dev>
Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
This commit is contained in:
William Bezuidenhout 2022-12-19 14:18:51 +02:00 committed by GitHub
parent 61251ab989
commit 6c7389f37c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 616 additions and 0 deletions

View File

@ -63,3 +63,8 @@
targets:
# github proxy
- host.docker.internal:6090
- labels:
job: otel-collector
targets:
# opentelemetry collector
- host.docker.internal:8888

View File

@ -63,3 +63,8 @@
targets:
# github proxy
- 127.0.0.1:6090
- labels:
job: otel-collector
targets:
# opentelemetry collector
- host.docker.internal:8888

View File

@ -7851,3 +7851,161 @@ Generated query for warning alert: `max((rate(src_telemetry_job_total{op="SendEv
<br />
## otel-collector: otel_span_refused
<p class="subtitle">spans refused per receiver</p>
**Descriptions**
- <span class="badge badge-warning">warning</span> otel-collector: 1+ spans refused per receiver for 5m0s
**Next steps**
- Check logs of the collector and configuration of the receiver
- More help interpreting this metric is available in the [dashboards reference](./dashboards.md#otel-collector-otel-span-refused).
- **Silence this alert:** If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
```json
"observability.silenceAlerts": [
"warning_otel-collector_otel_span_refused"
]
```
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Generated query for warning alert: `max((sum by(receiver) (rate(otelcol_receiver_refused_spans[1m]))) > 1)`
</details>
<br />
## otel-collector: otel_span_export_failures
<p class="subtitle">span export failures by exporter</p>
**Descriptions**
- <span class="badge badge-warning">warning</span> otel-collector: 1+ span export failures by exporter for 5m0s
**Next steps**
- Check the configuration of the exporter and if the service being exported is up
- More help interpreting this metric is available in the [dashboards reference](./dashboards.md#otel-collector-otel-span-export-failures).
- **Silence this alert:** If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
```json
"observability.silenceAlerts": [
"warning_otel-collector_otel_span_export_failures"
]
```
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Generated query for warning alert: `max((sum by(exporter) (rate(otelcol_exporter_send_failed_spans[1m]))) > 1)`
</details>
<br />
## otel-collector: container_cpu_usage
<p class="subtitle">container cpu usage total (1m average) across all cores by instance</p>
**Descriptions**
- <span class="badge badge-warning">warning</span> otel-collector: 99%+ container cpu usage total (1m average) across all cores by instance
**Next steps**
- **Kubernetes:** Consider increasing CPU limits in the the relevant `Deployment.yaml`.
- **Docker Compose:** Consider increasing `cpus:` of the otel-collector container in `docker-compose.yml`.
- Learn more about the related dashboard panel in the [dashboards reference](./dashboards.md#otel-collector-container-cpu-usage).
- **Silence this alert:** If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
```json
"observability.silenceAlerts": [
"warning_otel-collector_container_cpu_usage"
]
```
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Generated query for warning alert: `max((cadvisor_container_cpu_usage_percentage_total{name=~"^otel-collector.*"}) >= 99)`
</details>
<br />
## otel-collector: container_memory_usage
<p class="subtitle">container memory usage by instance</p>
**Descriptions**
- <span class="badge badge-warning">warning</span> otel-collector: 99%+ container memory usage by instance
**Next steps**
- **Kubernetes:** Consider increasing memory limit in relevant `Deployment.yaml`.
- **Docker Compose:** Consider increasing `memory:` of otel-collector container in `docker-compose.yml`.
- Learn more about the related dashboard panel in the [dashboards reference](./dashboards.md#otel-collector-container-memory-usage).
- **Silence this alert:** If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
```json
"observability.silenceAlerts": [
"warning_otel-collector_container_memory_usage"
]
```
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Generated query for warning alert: `max((cadvisor_container_memory_usage_percentage_total{name=~"^otel-collector.*"}) >= 99)`
</details>
<br />
## otel-collector: pods_available_percentage
<p class="subtitle">percentage pods available</p>
**Descriptions**
- <span class="badge badge-critical">critical</span> otel-collector: less than 90% percentage pods available for 10m0s
**Next steps**
- Determine if the pod was OOM killed using `kubectl describe pod otel-collector` (look for `OOMKilled: true`) and, if so, consider increasing the memory limit in the relevant `Deployment.yaml`.
- Check the logs before the container restarted to see if there are `panic:` messages or similar using `kubectl logs -p otel-collector`.
- Learn more about the related dashboard panel in the [dashboards reference](./dashboards.md#otel-collector-pods-available-percentage).
- **Silence this alert:** If you are aware of this alert and want to silence notifications for it, add the following to your site configuration and set a reminder to re-evaluate the alert:
```json
"observability.silenceAlerts": [
"critical_otel-collector_pods_available_percentage"
]
```
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Generated query for critical alert: `min((sum by(app) (up{app=~".*otel-collector"}) / count by(app) (up{app=~".*otel-collector"}) * 100) <= 90)`
</details>
<br />

View File

@ -24485,3 +24485,301 @@ Query: `rate(src_telemetry_job_total{op="SendEvents"}[1h]) / on() group_right()
<br />
## OpenTelemetry Collector
<p class="subtitle">The OpenTelemetry collector ingests OpenTelemetry data from Sourcegraph and exports it to the configured backends.</p>
To see this dashboard, visit `/-/debug/grafana/d/otel-collector/otel-collector` on your Sourcegraph instance.
### OpenTelemetry Collector: Receivers
#### otel-collector: otel_span_receive_rate
<p class="subtitle">Spans received per receiver per minute</p>
Shows the rate of spans accepted by the configured reveiver
A Trace is a collection of spans and a span represents a unit of work or operation. Spans are the building blocks of Traces.
The spans have only been accepted by the receiver, which means they still have to move through the configured pipeline to be exported.
For more information on tracing and configuration of a OpenTelemetry receiver see https://opentelemetry.io/docs/collector/configuration/#receivers.
See the Exporters section see spans that have made it through the pipeline and are exported.
Depending the configured processors, received spans might be dropped and not exported. For more information on configuring processors see
https://opentelemetry.io/docs/collector/configuration/#processors.
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100000` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (receiver) (rate(otelcol_receiver_accepted_spans[1m]))`
</details>
<br />
#### otel-collector: otel_span_refused
<p class="subtitle">Spans refused per receiver</p>
Shows the amount of spans that have been refused by a receiver.
A Trace is a collection of spans. A Span represents a unit of work or operation. Spans are the building blocks of Traces.
Spans can be rejected either due to a misconfigured receiver or receiving spans in the wrong format. The log of the collector will have more information on why a span was rejected.
For more information on tracing and configuration of a OpenTelemetry receiver see https://opentelemetry.io/docs/collector/configuration/#receivers.
Refer to the [alerts reference](./alerts.md#otel-collector-otel-span-refused) for 1 alert related to this panel.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100001` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (receiver) (rate(otelcol_receiver_refused_spans[1m]))`
</details>
<br />
### OpenTelemetry Collector: Exporters
#### otel-collector: otel_span_export_rate
<p class="subtitle">Spans exported per exporter per minute</p>
Shows the rate of spans being sent by the exporter
A Trace is a collection of spans. A Span represents a unit of work or operation. Spans are the building blocks of Traces.
The rate of spans here indicates spans that have made it through the configured pipeline and have been sent to the configured export destination.
For more information on configuring a exporter for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#exporters.
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100100` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (exporter) (rate(otelcol_exporter_sent_spans[1m]))`
</details>
<br />
#### otel-collector: otel_span_export_failures
<p class="subtitle">Span export failures by exporter</p>
Shows the rate of spans failed to be sent by the configured reveiver. A number higher than 0 for a long period can indicate a problem with the exporter configuration or with the service that is being exported too
For more information on configuring a exporter for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#exporters.
Refer to the [alerts reference](./alerts.md#otel-collector-otel-span-export-failures) for 1 alert related to this panel.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100101` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (exporter) (rate(otelcol_exporter_send_failed_spans[1m]))`
</details>
<br />
### OpenTelemetry Collector: Collector resource usage
#### otel-collector: otel_cpu_usage
<p class="subtitle">Cpu usage of the collector</p>
Shows the cpu usage of the OpenTelemetry collector
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100200` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (job) (rate(otelcol_process_cpu_seconds{job=~"^.*"}[1m]))`
</details>
<br />
#### otel-collector: otel_memory_resident_set_size
<p class="subtitle">Memory allocated to the otel collector</p>
Shows the memory Resident Set Size (RSS) allocated to the OpenTelemetry collector
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100201` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (job) (rate(otelcol_process_memory_rss{job=~"^.*"}[1m]))`
</details>
<br />
#### otel-collector: otel_memory_usage
<p class="subtitle">Memory used by the collector</p>
Shows how much memory is being used by the otel collector.
* High memory usage might indicate thad the configured pipeline is keeping a lot of spans in memory for processing
* Spans failing to be sent and the exporter is configured to retry
* A high batch count by using a batch processor
For more information on configuring processors for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#processors.
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100202` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by (job) (rate(otelcol_process_runtime_total_alloc_bytes{job=~"^.*"}[1m]))`
</details>
<br />
### OpenTelemetry Collector: Container monitoring (not available on server)
#### otel-collector: container_missing
<p class="subtitle">Container missing</p>
This value is the number of times a container has not been seen for more than one minute. If you observe this
value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- **Kubernetes:**
- Determine if the pod was OOM killed using `kubectl describe pod otel-collector` (look for `OOMKilled: true`) and, if so, consider increasing the memory limit in the relevant `Deployment.yaml`.
- Check the logs before the container restarted to see if there are `panic:` messages or similar using `kubectl logs -p otel-collector`.
- **Docker Compose:**
- Determine if the pod was OOM killed using `docker inspect -f '{{json .State}}' otel-collector` (look for `"OOMKilled":true`) and, if so, consider increasing the memory limit of the otel-collector container in `docker-compose.yml`.
- Check the logs before the container restarted to see if there are `panic:` messages or similar using `docker logs otel-collector` (note this will include logs from the previous and currently running container).
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100300` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `count by(name) ((time() - container_last_seen{name=~"^otel-collector.*"}) > 60)`
</details>
<br />
#### otel-collector: container_cpu_usage
<p class="subtitle">Container cpu usage total (1m average) across all cores by instance</p>
Refer to the [alerts reference](./alerts.md#otel-collector-container-cpu-usage) for 1 alert related to this panel.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100301` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `cadvisor_container_cpu_usage_percentage_total{name=~"^otel-collector.*"}`
</details>
<br />
#### otel-collector: container_memory_usage
<p class="subtitle">Container memory usage by instance</p>
Refer to the [alerts reference](./alerts.md#otel-collector-container-memory-usage) for 1 alert related to this panel.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100302` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `cadvisor_container_memory_usage_percentage_total{name=~"^otel-collector.*"}`
</details>
<br />
#### otel-collector: fs_io_operations
<p class="subtitle">Filesystem reads and writes rate by instance over 1h</p>
This value indicates the number of filesystem read and write operations by containers of this service.
When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
This panel has no related alerts.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100303` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by(name) (rate(container_fs_reads_total{name=~"^otel-collector.*"}[1h]) + rate(container_fs_writes_total{name=~"^otel-collector.*"}[1h]))`
</details>
<br />
### OpenTelemetry Collector: Kubernetes monitoring (only available on Kubernetes)
#### otel-collector: pods_available_percentage
<p class="subtitle">Percentage pods available</p>
Refer to the [alerts reference](./alerts.md#otel-collector-pods-available-percentage) for 1 alert related to this panel.
To see this panel, visit `/-/debug/grafana/d/otel-collector/otel-collector?viewPanel=100400` on your Sourcegraph instance.
<sub>*Managed by the [Sourcegraph Cloud DevOps team](https://handbook.sourcegraph.com/departments/engineering/teams/devops).*</sub>
<details>
<summary>Technical details</summary>
Query: `sum by(app) (up{app=~".*otel-collector"}) / count by (app) (up{app=~".*otel-collector"}) * 100`
</details>
<br />

View File

@ -24,6 +24,9 @@ extensions:
endpoint: ":55679"
service:
telemetry:
metrics:
address: ":8888"
extensions: [health_check,zpages]
pipelines:
traces:

View File

@ -31,6 +31,7 @@ func Default() Dashboards {
CodeIntelRanking(),
CodeIntelUploads(),
Telemetry(),
OtelCollector(),
}
}

View File

@ -0,0 +1,145 @@
package definitions
import (
"time"
"github.com/sourcegraph/sourcegraph/monitoring/definitions/shared"
"github.com/sourcegraph/sourcegraph/monitoring/monitoring"
)
func OtelCollector() *monitoring.Dashboard {
containerName := "otel-collector"
return &monitoring.Dashboard{
Name: containerName,
Title: "OpenTelemetry Collector",
Description: "The OpenTelemetry collector ingests OpenTelemetry data from Sourcegraph and exports it to the configured backends.",
Groups: []monitoring.Group{
{
Title: "Receivers",
Hidden: false,
Rows: []monitoring.Row{
{
{
Name: "otel_span_receive_rate",
Description: "spans received per receiver per minute",
Panel: monitoring.Panel().Unit(monitoring.Number).LegendFormat("receiver: {{receiver}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (receiver) (rate(otelcol_receiver_accepted_spans[1m]))",
NoAlert: true,
Interpretation: `
Shows the rate of spans accepted by the configured reveiver
A Trace is a collection of spans and a span represents a unit of work or operation. Spans are the building blocks of Traces.
The spans have only been accepted by the receiver, which means they still have to move through the configured pipeline to be exported.
For more information on tracing and configuration of a OpenTelemetry receiver see https://opentelemetry.io/docs/collector/configuration/#receivers.
See the Exporters section see spans that have made it through the pipeline and are exported.
Depending the configured processors, received spans might be dropped and not exported. For more information on configuring processors see
https://opentelemetry.io/docs/collector/configuration/#processors.`,
},
{
Name: "otel_span_refused",
Description: "spans refused per receiver",
Panel: monitoring.Panel().Unit(monitoring.Number).LegendFormat("receiver: {{receiver}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (receiver) (rate(otelcol_receiver_refused_spans[1m]))",
Warning: monitoring.Alert().Greater(1).For(5 * time.Minute),
NextSteps: "Check logs of the collector and configuration of the receiver",
Interpretation: `
Shows the amount of spans that have been refused by a receiver.
A Trace is a collection of spans. A Span represents a unit of work or operation. Spans are the building blocks of Traces.
Spans can be rejected either due to a misconfigured receiver or receiving spans in the wrong format. The log of the collector will have more information on why a span was rejected.
For more information on tracing and configuration of a OpenTelemetry receiver see https://opentelemetry.io/docs/collector/configuration/#receivers.`,
},
},
},
},
{
Title: "Exporters",
Hidden: false,
Rows: []monitoring.Row{
{
{
Name: "otel_span_export_rate",
Description: "spans exported per exporter per minute",
Panel: monitoring.Panel().Unit(monitoring.Number).LegendFormat("exporter: {{exporter}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (exporter) (rate(otelcol_exporter_sent_spans[1m]))",
NoAlert: true,
Interpretation: `
Shows the rate of spans being sent by the exporter
A Trace is a collection of spans. A Span represents a unit of work or operation. Spans are the building blocks of Traces.
The rate of spans here indicates spans that have made it through the configured pipeline and have been sent to the configured export destination.
For more information on configuring a exporter for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#exporters.`,
},
{
Name: "otel_span_export_failures",
Description: "span export failures by exporter",
Panel: monitoring.Panel().Unit(monitoring.Number).LegendFormat("exporter: {{exporter}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (exporter) (rate(otelcol_exporter_send_failed_spans[1m]))",
Warning: monitoring.Alert().Greater(1).For(5 * time.Minute),
NextSteps: "Check the configuration of the exporter and if the service being exported is up",
Interpretation: `
Shows the rate of spans failed to be sent by the configured reveiver. A number higher than 0 for a long period can indicate a problem with the exporter configuration or with the service that is being exported too
For more information on configuring a exporter for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#exporters.`,
},
},
},
},
{
Title: "Collector resource usage",
Hidden: false,
Rows: []monitoring.Row{
{
{
Name: "otel_cpu_usage",
Description: "cpu usage of the collector",
Panel: monitoring.Panel().Unit(monitoring.Seconds).LegendFormat("{{job}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (job) (rate(otelcol_process_cpu_seconds{job=~\"^.*\"}[1m]))",
NoAlert: true,
Interpretation: `
Shows the cpu usage of the OpenTelemetry collector`,
},
{
Name: "otel_memory_resident_set_size",
Description: "memory allocated to the otel collector",
Panel: monitoring.Panel().Unit(monitoring.Bytes).LegendFormat("{{job}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (job) (rate(otelcol_process_memory_rss{job=~\"^.*\"}[1m]))",
NoAlert: true,
Interpretation: `
Shows the memory Resident Set Size (RSS) allocated to the OpenTelemetry collector`,
},
{
Name: "otel_memory_usage",
Description: "memory used by the collector",
Panel: monitoring.Panel().Unit(monitoring.Bytes).LegendFormat("{{job}}"),
Owner: monitoring.ObservableOwnerDevOps,
Query: "sum by (job) (rate(otelcol_process_runtime_total_alloc_bytes{job=~\"^.*\"}[1m]))",
NoAlert: true,
Interpretation: `
Shows how much memory is being used by the otel collector.
* High memory usage might indicate thad the configured pipeline is keeping a lot of spans in memory for processing
* Spans failing to be sent and the exporter is configured to retry
* A high batch count by using a batch processor
For more information on configuring processors for the OpenTelemetry collector see https://opentelemetry.io/docs/collector/configuration/#processors.`,
},
},
},
},
shared.NewContainerMonitoringGroup("otel-collector", monitoring.ObservableOwnerDevOps, nil),
shared.NewKubernetesMonitoringGroup("otel-collector", monitoring.ObservableOwnerDevOps, nil),
},
}
}

View File

@ -752,6 +752,7 @@ commands:
docker container rm otel-collector
docker run --rm --name=otel-collector $DOCKER_NET $DOCKER_ARGS \
-p 4317:4317 -p 4318:4318 -p 55679:55679 -p 55670:55670 \
-p 8888:8888 \
-e JAEGER_HOST=$JAEGER_HOST \
-e HONEYCOMB_API_KEY=$HONEYCOMB_API_KEY \
-e HONEYCOMB_DATASET=$HONEYCOMB_DATASET \