sourcegraph/internal/tracer/policy_sampler.go
Robert Lin 4556b317f1
tracer: enforce trace policy entirely via Sampler (#58068)
This change updates our internal tracer to enforce policy only via a `Sampler` implementation. This has the following benefits:

1. Even when a trace should not be sampled, contexts are still populated with valid spans, rather than no-op ones. This is important to make use of trace IDs for non-tracing purposes, e.g. https://github.com/sourcegraph/sourcegraph/pull/57774 and https://github.com/sourcegraph/sourcegraph/pull/58060
2. We enforce trace policies the way they were meant to be enforced in OpenTelemetry: by simply indicating that the span should not be exported.

This was not possible before because OpenTracing did not use context propagation, hence we did not have a way to use trace policy flags set in context - but thanks to @camdencheek's work removing OpenTracing entirely, we can now do this in a more idiomatic fashion.

Thanks to this, I've removed a few places that prevented trace context from being populated based on trace policy (HTTP and GraphQL middleware, and `internal/trace`). This delegates sampling decisions to the sampler, and ensures we accept valid trace context everywhere.

## Test plan

Unit tests on a TracerProvider configured with the new sampler.

Manual testing:

```
sg run jaeger otel-collector 
sg start
```

Setting `observability.tracing.debug` to `true` we can see logs indicating the desired traits for non-`ShouldTrace` traces:

```
[         worker] INFO tracer tracer/logged_otel.go:63 Start {"spanName": "workerutil.dbworker.store.insights_query_runner_jobs_store.dequeue", "isRecording": false, "isSampled": false, "isValid": true}
```

With `observability.tracing.sampling` set to `none`, running a search with `&trace=1` only gives us spans from zoekt, which seems to have always been outside our rules here.

With `observability.tracing.sampling` set to `selective`, running a search with `&trace=1` gives us a full trace.

With `observability.tracing.sampling` set to `all`, Jaeger instantly gets loads of traces, and in logs we see:

```
[         worker] INFO tracer tracer/logged_otel.go:63 Start {"spanName": "workerutil.dbworker.store.exhaustive_search_worker_store.dequeue", "isRecording": true, "isSampled": true, "isValid": true}
```

---------

Co-authored-by: William Bezuidenhout <william.bezuidenhout@sourcegraph.com>
2023-11-02 09:00:14 -07:00

47 lines
1.4 KiB
Go

package tracer
import (
oteltracesdk "go.opentelemetry.io/otel/sdk/trace"
"github.com/sourcegraph/sourcegraph/internal/trace/policy"
)
var (
// Use upstream samplers to ensure we return the right thing in our
// custom Sampler implementation.
alwaysSampleSampler = oteltracesdk.AlwaysSample()
neverSampleSampler = oteltracesdk.NeverSample()
)
// tracePolicySampler implements the oteltrace.Sampler interface and indicates
// whether a trace should be sampled or not based on the global trace policy
// and comparing it against the policy indicated in the parent context where
// relevant.
type tracePolicySampler struct{}
var _ oteltracesdk.Sampler = tracePolicySampler{}
func (tracePolicySampler) ShouldSample(p oteltracesdk.SamplingParameters) oteltracesdk.SamplingResult {
switch policy.GetTracePolicy() {
case policy.TraceAll:
// Retain and export all events.
return alwaysSampleSampler.ShouldSample(p)
case policy.TraceNone:
// Drop all events.
return neverSampleSampler.ShouldSample(p)
default:
// By default, enforce policy.TraceSelective, which means that we only
// sample if the parent context is marked for tracing.
if policy.ShouldTrace(p.ParentContext) {
return alwaysSampleSampler.ShouldSample(p)
}
}
// Otherwise, indicate this span should be dropped and not exported.
return neverSampleSampler.ShouldSample(p)
}
func (tracePolicySampler) Description() string { return "internal/tracer.tracePolicySampler" }