Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .claude/commands/review-ddtrace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# /review-ddtrace — Code review for dd-trace-go

Review code changes against the patterns and conventions that dd-trace-go reviewers consistently enforce. This captures the implicit standards that live in reviewers' heads but aren't in CONTRIBUTING.md.

Run this on a diff, a set of changed files, or a PR.

## How to use

If `$ARGUMENTS` contains a PR number or URL, fetch and review that PR's diff.
If `$ARGUMENTS` contains file paths, review those files.
If `$ARGUMENTS` is empty, review the current unstaged and staged git diff.

## Review approach

1. Read the diff to understand what changed and why.
2. Determine which reference files to consult based on what's in the diff:
- **Always read** `.claude/review-ddtrace/style-and-idioms.md` — these patterns apply to all Go code in this repo.
- **Read if the diff touches concurrency** (mutexes, atomics, goroutines, channels, sync primitives, or shared state): `.claude/review-ddtrace/concurrency.md`
- **Read if the diff touches `contrib/`**: `.claude/review-ddtrace/contrib-patterns.md`
- **Read if the diff touches hot paths** (span creation, serialization, sampling, payload encoding, tag setting) or adds/changes benchmarks: `.claude/review-ddtrace/performance.md`
3. Review the diff against the loaded guidance. Focus on issues the guidance specifically calls out — these come from real review feedback that was given repeatedly over the past 3 months.
4. Report findings using the output format below.

## Universal checklist

These are the highest-frequency review comments across the repo. Check every diff against these:

### Happy path left-aligned
The single most repeated review comment. Guard clauses and error returns should come first so the main logic stays at the left edge. If you see an `if err != nil` or an edge-case check that wraps the happy path in an else block, flag it.

```go
// Bad: happy path nested
if condition {
// lots of main logic
} else {
return err
}

// Good: early return, happy path left-aligned
if !condition {
return err
}
// main logic here
```

### Regression tests for bug fixes
If the PR fixes a bug, there should be a test that reproduces the original bug. Reviewers ask for this almost every time it's missing.

### Don't silently drop errors
If a function returns an error, handle it. Logging at an appropriate level counts as handling. Silently discarding errors (especially from marshaling, network calls, or state mutations) is a recurring source of review comments.

### Named constants over magic strings/numbers
Use constants from `ddtrace/ext`, `instrumentation`, or define new ones. Don't scatter raw string literals like `"_dd.svc_src"` or protocol names through the code. If the constant already exists somewhere in the repo, import and use it.

### Don't add unused API surface
If a function, type, or method is not yet called anywhere, don't add it. Reviewers consistently push back on speculative API additions.

### Don't export internal-only functions
Functions meant for internal use should not follow the `WithX` naming pattern or be exported. `WithX` is the public configuration option convention — don't use it for internal plumbing.

### Extract shared/duplicated logic
If you see the same 3+ lines repeated across call sites, extract a helper. But don't create premature abstractions for one-time operations.

### Config through proper channels
- Environment variables must go through `internal/env` (or `instrumentation/env` for contrib), never raw `os.Getenv`. Note: `internal.BoolEnv` and similar helpers in the top-level `internal` package are **not** the same as `internal/env` — they are raw `os.Getenv` wrappers that bypass the validated config pipeline. Code should use `internal/env.Get`/`internal/env.Lookup` or the config provider, not `internal.BoolEnv`.
- Config loading belongs in `internal/config/config.go`'s `loadConfig`, not scattered through `ddtrace/tracer/option.go`.
- See CONTRIBUTING.md for the full env var workflow.

### Nil safety and type assertion guards
Multiple P1 bugs in this repo come from nil-typed interface values and unguarded type assertions. When casting a concrete type to an interface (like `context.Context`), a nil pointer of the concrete type produces a non-nil interface that panics on method calls. Guard with a nil check before the cast. Similarly, prefer type switches or comma-ok assertions over bare type assertions in code paths that handle user-provided or externally-sourced values.

### Error messages should describe impact
When logging a failure, explain what the user loses — not just what failed. Reviewers flag vague messages like `"failed to create admin client: %s"` and ask for impact context like `"failed to create admin client for cluster ID; cluster.id will be missing from DSM spans: %s"`. This helps operators triage without reading source code.

### Encapsulate internal state behind methods
When a struct has internal fields that could change representation (like a map being replaced with a typed struct), consumers should access data through methods, not by reaching into fields directly. Reviewers flag `span.meta[key]` style access and ask for `span.meta.Get(key)` — this decouples callers from the internal layout and makes migrations easier.

### Don't check in local/debug artifacts
Watch for `.claude/settings.local.json`, debugging `fmt.Println` leftovers, or commented-out test code. These get flagged immediately.

## Output format

Group findings by severity. Use inline code references (`file:line`).

**Blocking** — Must fix before merge (correctness bugs, data races, silent error drops, API surface problems).

**Should fix** — Strong conventions that reviewers will flag (happy path alignment, missing regression tests, magic strings, naming).

**Nits** — Style preferences that improve readability but aren't blocking (import grouping, comment wording, minor naming).

For each finding, briefly explain *why* (what could go wrong, or what convention it violates) rather than just stating the rule. Keep findings concise — one or two sentences each.

If the code looks good against all loaded guidance, say so. Don't manufacture issues.
141 changes: 141 additions & 0 deletions .claude/review-ddtrace/concurrency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Concurrency Reference

Concurrency bugs are the highest-severity class of review feedback in dd-trace-go. Reviewers catch data races, lock misuse, and unsafe shared state frequently. This file covers the patterns they flag.

## Mutex discipline

### Use checklocks annotations
This repo uses the `checklocks` static analyzer. When a struct field is guarded by a mutex, annotate it:

```go
type myStruct struct {
mu sync.Mutex
// +checklocks:mu
data map[string]string
}
```

When you add a new field that's accessed under an existing lock, add the annotation. When you add a new method that accesses locked fields, the analyzer will verify correctness at compile time. Reviewers explicitly ask for `checklocks` and `checkatomic` annotations.

### Use assert.RWMutexLocked for helpers called under lock
When a helper function expects to be called with a lock already held, add a runtime assertion at the top:

```go
func (ps *prioritySampler) getRateLocked(spn *Span) float64 {
assert.RWMutexLocked(&ps.mu)
// ...
}
```

This documents the contract and catches violations at runtime. Import from `internal/locking/assert`.

### Don't acquire the same lock multiple times
A recurring review comment: "We're now getting the locking twice." If a function needs two values protected by the same lock, get both in one critical section:

```go
// Bad: two lock acquisitions
rate := ps.getRate(spn) // locks ps.mu
loaded := ps.agentRatesLoaded // needs ps.mu again

// Good: one acquisition
ps.mu.RLock()
rate := ps.getRateLocked(spn)
loaded := ps.agentRatesLoaded
ps.mu.RUnlock()
```

### Don't invoke callbacks under a lock
Calling external code (callbacks, hooks, provider functions) while holding a mutex risks deadlocks if that code ever calls back into the locked structure. Capture what you need under the lock, release it, then invoke the callback:

```go
// Bad: callback under lock
mu.Lock()
cb := state.callback
if buffered != nil {
cb(*buffered) // dangerous: cb might call back into state
}
mu.Unlock()

// Good: release lock before calling
mu.Lock()
cb := state.callback
buffered := state.buffered
state.buffered = nil
mu.Unlock()

if buffered != nil {
cb(*buffered)
}
```

## Atomic operations

### Prefer atomic.Value for write-once fields
When a field is set once from a goroutine and read concurrently, reviewers suggest `atomic.Value` over `sync.RWMutex` — it's simpler and sufficient:

```go
type Tracer struct {
clusterID atomic.Value // stores string, written once
}

func (tr *Tracer) ClusterID() string {
v, _ := tr.clusterID.Load().(string)
return v
}
```

### Mark atomic fields with checkatomic
Similar to `checklocks`, use annotations for fields accessed atomically.

## Shared slice mutation

Appending to a shared slice is a race condition even if it looks safe:

```go
// Bug: r.config.spanOpts is shared across concurrent requests.
// If the underlying array has spare capacity, append writes into it directly,
// corrupting reads happening concurrently on other goroutines.
options := append(r.config.spanOpts, tracer.ServiceName(serviceName))
```

Always allocate a fresh slice before appending per-request values:

```go
options := make([]tracer.StartSpanOption, len(r.config.spanOpts), len(r.config.spanOpts)+1)
copy(options, r.config.spanOpts)
options = append(options, tracer.ServiceName(serviceName))
```

## Global state

### Avoid adding global state
Reviewers push back on global variables that make test isolation or restart behavior difficult. When you need process-level config, prefer passing it through struct fields or function parameters.

### Global state must reset on tracer restart
This repo supports `tracer.Start()` -> `tracer.Stop()` -> `tracer.Start()` cycles. Any global state that is set during `Start()` must be cleaned up or reset during `Stop()`, or the second `Start()` will operate on stale values.

**When reviewing code that uses global flags, `sync.Once`, or package-level variables, actively check:** does `Stop()` reset this state? If not, a restart cycle will silently reuse old values.

Common variants:
- A `sync.Once` guarding initialization: won't re-run after restart because `Once` is consumed
- A boolean flag like `initialized`: if not reset in `Stop()`, the next `Start()` skips init
- A cached value (e.g., an env var read once): if the value changed between stop and start, the stale value persists

Also: `sync.Once` consumes the once even on failure. If initialization can fail, subsequent calls return nil without retrying.

### Map iteration order nondeterminism
Go map iteration order is randomized. When code iterates a map and writes state based on specific keys, check whether the final state depends on iteration order. If it does, process the order-sensitive keys explicitly rather than relying on map iteration.

## Race-prone patterns in this repo

### Span field access during serialization
Spans are accessed concurrently (user goroutine sets tags, serialization goroutine reads them). All span field access after `Finish()` must go through the span's mutex. Watch for:
- Stats pipeline holding references to span maps (`s.meta`, `s.metrics`) that get cleared by pooling
- Benchmarks calling span methods without acquiring the lock

### Trace-level operations during partial flush
When the trace lock is released to acquire a span lock (lock ordering), recheck state after reacquiring the trace lock — another goroutine may have flushed or modified the trace in the interim.

### time.Time fields
`time.Time` is not safe for concurrent read/write. Fields like `lastFlushedAt` that are read from a worker goroutine and written from `Flush()` need synchronization.

105 changes: 105 additions & 0 deletions .claude/review-ddtrace/contrib-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Contrib Integration Patterns Reference

Patterns specific to `contrib/` packages.

## API design for integrations

### Prefer library-native extension points over wrapper types
When the instrumented library supports hooks, middleware, or options, use those rather than returning a custom wrapper type. Custom wrappers complicate Orchestrion (automatic instrumentation) and force users to change their code.

### WithX is for user-facing options only
The `WithX` naming convention is reserved for public configuration options. Don't use it for internal plumbing:

```go
// Bad: internal-only function using public naming convention
func WithClusterID(id string) Option { ... }

// Good: unexported setter for internal use
func (tr *Tracer) setClusterID(id string) { ... }
```

### Service name conventions
- Most integrations use optional `WithService(name)` — the service name is NOT a mandatory argument
- Default service names should derive from the package's `componentName` (via `instrumentation.PackageXxx`)
- Track where the service name came from using `_dd.svc_src` (service source). Import the tag key from `ext` or `instrumentation`, don't hardcode it

### Span options must be request-local
Never append to a shared slice of span options from concurrent request handlers:

```go
// Bug: races when concurrent requests append to shared slice
options := append(r.config.spanOpts, tracer.ServiceName(svc))
```

Copy the options slice before appending per-request values.

## Async work and lifecycle

### Async work must be cancellable on Close
When an integration starts background goroutines, they must be cancellable when the user calls `Close()`. Use a context with cancellation and cancel it in `Close()`.

### Don't block user code for observability
Users don't expect their observability library to add latency. Question any synchronous wait in a startup or request path — if the data being fetched is optional (metadata, IDs), fetch it asynchronously.

### Suppress expected cancellation noise
When `Close()` cancels a background operation, don't log the cancellation as a warning:

```go
if err != nil && !errors.Is(err, context.Canceled) {
log.Warn("failed to fetch metadata: %s", err)
}
```

### Error messages should describe impact
When logging failures, explain what the user loses — not just what failed:

```go
// Bad:
log.Warn("failed to create admin client: %s", err)

// Good:
log.Warn("failed to create admin client; cluster metadata will be missing from spans: %s", err)
```

## Data Streams Monitoring (DSM) patterns
Comment thread
bm1549 marked this conversation as resolved.

These patterns apply anywhere DSM code appears — in `contrib/`, `ddtrace/tracer/`, or `datastreams/`.

### Gate DSM work on processor availability
Don't tag spans with DSM metadata or do DSM processing when DSM is disabled:

```go
// Bad: tags spans even when DSM is off
tagActiveSpan(ctx, id, name)
if p := datastreams.GetProcessor(ctx); p != nil {
p.TrackTransaction(...)
}

// Good: check first
if p := datastreams.GetProcessor(ctx); p != nil {
tagActiveSpan(ctx, id, name)
p.TrackTransaction(...)
}
```

## Consistency across similar integrations

When a feature exists in one integration, implementations in related integrations should follow the same patterns. Check for:
- Same synchronization approach (don't use `map + sync.Mutex` in one package and `sync.Map` in another)
- Same error handling strategy for the same failure mode
- Same input normalization (e.g., trimming whitespace from addresses)

## Span tags and metadata

### Required tags for integration spans
Per the contrib README:
- `span.kind`: set in root spans (`client`, `server`, `producer`, `consumer`). Omit if `internal`.
- `component`: set in all spans, value is the integration's full package path

### Resource name changes
Changing the resource name format is a potential breaking change for the backend. Verify backward compatibility before changing resource name formatting.

### Orchestrion compatibility
Be aware of Orchestrion (automatic instrumentation) implications:
- The `orchestrion.yml` in contrib packages defines instrumentation weaving
- Guard against nil typed interface values: a nil pointer cast to an interface produces a non-nil interface that panics on method calls
Loading
Loading