diff --git a/.claude/commands/review-ddtrace.md b/.claude/commands/review-ddtrace.md new file mode 100644 index 00000000000..385d9e0be21 --- /dev/null +++ b/.claude/commands/review-ddtrace.md @@ -0,0 +1,93 @@ +# /review-ddtrace — Code review for dd-trace-go + +Review code changes against the patterns and conventions that dd-trace-go reviewers consistently enforce. This captures the implicit standards that live in reviewers' heads but aren't in CONTRIBUTING.md. + +Run this on a diff, a set of changed files, or a PR. + +## How to use + +If `$ARGUMENTS` contains a PR number or URL, fetch and review that PR's diff. +If `$ARGUMENTS` contains file paths, review those files. +If `$ARGUMENTS` is empty, review the current unstaged and staged git diff. + +## Review approach + +1. Read the diff to understand what changed and why. +2. Determine which reference files to consult based on what's in the diff: + - **Always read** `.claude/review-ddtrace/style-and-idioms.md` — these patterns apply to all Go code in this repo. + - **Read if the diff touches concurrency** (mutexes, atomics, goroutines, channels, sync primitives, or shared state): `.claude/review-ddtrace/concurrency.md` + - **Read if the diff touches `contrib/`**: `.claude/review-ddtrace/contrib-patterns.md` + - **Read if the diff touches hot paths** (span creation, serialization, sampling, payload encoding, tag setting) or adds/changes benchmarks: `.claude/review-ddtrace/performance.md` +3. Review the diff against the loaded guidance. Focus on issues the guidance specifically calls out — these come from real review feedback that was given repeatedly over the past 3 months. +4. Report findings using the output format below. + +## Universal checklist + +These are the highest-frequency review comments across the repo. Check every diff against these: + +### Happy path left-aligned +The single most repeated review comment. Guard clauses and error returns should come first so the main logic stays at the left edge. If you see an `if err != nil` or an edge-case check that wraps the happy path in an else block, flag it. + +```go +// Bad: happy path nested +if condition { + // lots of main logic +} else { + return err +} + +// Good: early return, happy path left-aligned +if !condition { + return err +} +// main logic here +``` + +### Regression tests for bug fixes +If the PR fixes a bug, there should be a test that reproduces the original bug. Reviewers ask for this almost every time it's missing. + +### Don't silently drop errors +If a function returns an error, handle it. Logging at an appropriate level counts as handling. Silently discarding errors (especially from marshaling, network calls, or state mutations) is a recurring source of review comments. + +### Named constants over magic strings/numbers +Use constants from `ddtrace/ext`, `instrumentation`, or define new ones. Don't scatter raw string literals like `"_dd.svc_src"` or protocol names through the code. If the constant already exists somewhere in the repo, import and use it. + +### Don't add unused API surface +If a function, type, or method is not yet called anywhere, don't add it. Reviewers consistently push back on speculative API additions. + +### Don't export internal-only functions +Functions meant for internal use should not follow the `WithX` naming pattern or be exported. `WithX` is the public configuration option convention — don't use it for internal plumbing. + +### Extract shared/duplicated logic +If you see the same 3+ lines repeated across call sites, extract a helper. But don't create premature abstractions for one-time operations. + +### Config through proper channels +- Environment variables must go through `internal/env` (or `instrumentation/env` for contrib), never raw `os.Getenv`. Note: `internal.BoolEnv` and similar helpers in the top-level `internal` package are **not** the same as `internal/env` — they are raw `os.Getenv` wrappers that bypass the validated config pipeline. Code should use `internal/env.Get`/`internal/env.Lookup` or the config provider, not `internal.BoolEnv`. +- Config loading belongs in `internal/config/config.go`'s `loadConfig`, not scattered through `ddtrace/tracer/option.go`. +- See CONTRIBUTING.md for the full env var workflow. + +### Nil safety and type assertion guards +Multiple P1 bugs in this repo come from nil-typed interface values and unguarded type assertions. When casting a concrete type to an interface (like `context.Context`), a nil pointer of the concrete type produces a non-nil interface that panics on method calls. Guard with a nil check before the cast. Similarly, prefer type switches or comma-ok assertions over bare type assertions in code paths that handle user-provided or externally-sourced values. + +### Error messages should describe impact +When logging a failure, explain what the user loses — not just what failed. Reviewers flag vague messages like `"failed to create admin client: %s"` and ask for impact context like `"failed to create admin client for cluster ID; cluster.id will be missing from DSM spans: %s"`. This helps operators triage without reading source code. + +### Encapsulate internal state behind methods +When a struct has internal fields that could change representation (like a map being replaced with a typed struct), consumers should access data through methods, not by reaching into fields directly. Reviewers flag `span.meta[key]` style access and ask for `span.meta.Get(key)` — this decouples callers from the internal layout and makes migrations easier. + +### Don't check in local/debug artifacts +Watch for `.claude/settings.local.json`, debugging `fmt.Println` leftovers, or commented-out test code. These get flagged immediately. + +## Output format + +Group findings by severity. Use inline code references (`file:line`). + +**Blocking** — Must fix before merge (correctness bugs, data races, silent error drops, API surface problems). + +**Should fix** — Strong conventions that reviewers will flag (happy path alignment, missing regression tests, magic strings, naming). + +**Nits** — Style preferences that improve readability but aren't blocking (import grouping, comment wording, minor naming). + +For each finding, briefly explain *why* (what could go wrong, or what convention it violates) rather than just stating the rule. Keep findings concise — one or two sentences each. + +If the code looks good against all loaded guidance, say so. Don't manufacture issues. diff --git a/.claude/review-ddtrace/concurrency.md b/.claude/review-ddtrace/concurrency.md new file mode 100644 index 00000000000..1b1d3e299af --- /dev/null +++ b/.claude/review-ddtrace/concurrency.md @@ -0,0 +1,141 @@ +# Concurrency Reference + +Concurrency bugs are the highest-severity class of review feedback in dd-trace-go. Reviewers catch data races, lock misuse, and unsafe shared state frequently. This file covers the patterns they flag. + +## Mutex discipline + +### Use checklocks annotations +This repo uses the `checklocks` static analyzer. When a struct field is guarded by a mutex, annotate it: + +```go +type myStruct struct { + mu sync.Mutex + // +checklocks:mu + data map[string]string +} +``` + +When you add a new field that's accessed under an existing lock, add the annotation. When you add a new method that accesses locked fields, the analyzer will verify correctness at compile time. Reviewers explicitly ask for `checklocks` and `checkatomic` annotations. + +### Use assert.RWMutexLocked for helpers called under lock +When a helper function expects to be called with a lock already held, add a runtime assertion at the top: + +```go +func (ps *prioritySampler) getRateLocked(spn *Span) float64 { + assert.RWMutexLocked(&ps.mu) + // ... +} +``` + +This documents the contract and catches violations at runtime. Import from `internal/locking/assert`. + +### Don't acquire the same lock multiple times +A recurring review comment: "We're now getting the locking twice." If a function needs two values protected by the same lock, get both in one critical section: + +```go +// Bad: two lock acquisitions +rate := ps.getRate(spn) // locks ps.mu +loaded := ps.agentRatesLoaded // needs ps.mu again + +// Good: one acquisition +ps.mu.RLock() +rate := ps.getRateLocked(spn) +loaded := ps.agentRatesLoaded +ps.mu.RUnlock() +``` + +### Don't invoke callbacks under a lock +Calling external code (callbacks, hooks, provider functions) while holding a mutex risks deadlocks if that code ever calls back into the locked structure. Capture what you need under the lock, release it, then invoke the callback: + +```go +// Bad: callback under lock +mu.Lock() +cb := state.callback +if buffered != nil { + cb(*buffered) // dangerous: cb might call back into state +} +mu.Unlock() + +// Good: release lock before calling +mu.Lock() +cb := state.callback +buffered := state.buffered +state.buffered = nil +mu.Unlock() + +if buffered != nil { + cb(*buffered) +} +``` + +## Atomic operations + +### Prefer atomic.Value for write-once fields +When a field is set once from a goroutine and read concurrently, reviewers suggest `atomic.Value` over `sync.RWMutex` — it's simpler and sufficient: + +```go +type Tracer struct { + clusterID atomic.Value // stores string, written once +} + +func (tr *Tracer) ClusterID() string { + v, _ := tr.clusterID.Load().(string) + return v +} +``` + +### Mark atomic fields with checkatomic +Similar to `checklocks`, use annotations for fields accessed atomically. + +## Shared slice mutation + +Appending to a shared slice is a race condition even if it looks safe: + +```go +// Bug: r.config.spanOpts is shared across concurrent requests. +// If the underlying array has spare capacity, append writes into it directly, +// corrupting reads happening concurrently on other goroutines. +options := append(r.config.spanOpts, tracer.ServiceName(serviceName)) +``` + +Always allocate a fresh slice before appending per-request values: + +```go +options := make([]tracer.StartSpanOption, len(r.config.spanOpts), len(r.config.spanOpts)+1) +copy(options, r.config.spanOpts) +options = append(options, tracer.ServiceName(serviceName)) +``` + +## Global state + +### Avoid adding global state +Reviewers push back on global variables that make test isolation or restart behavior difficult. When you need process-level config, prefer passing it through struct fields or function parameters. + +### Global state must reset on tracer restart +This repo supports `tracer.Start()` -> `tracer.Stop()` -> `tracer.Start()` cycles. Any global state that is set during `Start()` must be cleaned up or reset during `Stop()`, or the second `Start()` will operate on stale values. + +**When reviewing code that uses global flags, `sync.Once`, or package-level variables, actively check:** does `Stop()` reset this state? If not, a restart cycle will silently reuse old values. + +Common variants: +- A `sync.Once` guarding initialization: won't re-run after restart because `Once` is consumed +- A boolean flag like `initialized`: if not reset in `Stop()`, the next `Start()` skips init +- A cached value (e.g., an env var read once): if the value changed between stop and start, the stale value persists + +Also: `sync.Once` consumes the once even on failure. If initialization can fail, subsequent calls return nil without retrying. + +### Map iteration order nondeterminism +Go map iteration order is randomized. When code iterates a map and writes state based on specific keys, check whether the final state depends on iteration order. If it does, process the order-sensitive keys explicitly rather than relying on map iteration. + +## Race-prone patterns in this repo + +### Span field access during serialization +Spans are accessed concurrently (user goroutine sets tags, serialization goroutine reads them). All span field access after `Finish()` must go through the span's mutex. Watch for: +- Stats pipeline holding references to span maps (`s.meta`, `s.metrics`) that get cleared by pooling +- Benchmarks calling span methods without acquiring the lock + +### Trace-level operations during partial flush +When the trace lock is released to acquire a span lock (lock ordering), recheck state after reacquiring the trace lock — another goroutine may have flushed or modified the trace in the interim. + +### time.Time fields +`time.Time` is not safe for concurrent read/write. Fields like `lastFlushedAt` that are read from a worker goroutine and written from `Flush()` need synchronization. + diff --git a/.claude/review-ddtrace/contrib-patterns.md b/.claude/review-ddtrace/contrib-patterns.md new file mode 100644 index 00000000000..b436bb70507 --- /dev/null +++ b/.claude/review-ddtrace/contrib-patterns.md @@ -0,0 +1,105 @@ +# Contrib Integration Patterns Reference + +Patterns specific to `contrib/` packages. + +## API design for integrations + +### Prefer library-native extension points over wrapper types +When the instrumented library supports hooks, middleware, or options, use those rather than returning a custom wrapper type. Custom wrappers complicate Orchestrion (automatic instrumentation) and force users to change their code. + +### WithX is for user-facing options only +The `WithX` naming convention is reserved for public configuration options. Don't use it for internal plumbing: + +```go +// Bad: internal-only function using public naming convention +func WithClusterID(id string) Option { ... } + +// Good: unexported setter for internal use +func (tr *Tracer) setClusterID(id string) { ... } +``` + +### Service name conventions +- Most integrations use optional `WithService(name)` — the service name is NOT a mandatory argument +- Default service names should derive from the package's `componentName` (via `instrumentation.PackageXxx`) +- Track where the service name came from using `_dd.svc_src` (service source). Import the tag key from `ext` or `instrumentation`, don't hardcode it + +### Span options must be request-local +Never append to a shared slice of span options from concurrent request handlers: + +```go +// Bug: races when concurrent requests append to shared slice +options := append(r.config.spanOpts, tracer.ServiceName(svc)) +``` + +Copy the options slice before appending per-request values. + +## Async work and lifecycle + +### Async work must be cancellable on Close +When an integration starts background goroutines, they must be cancellable when the user calls `Close()`. Use a context with cancellation and cancel it in `Close()`. + +### Don't block user code for observability +Users don't expect their observability library to add latency. Question any synchronous wait in a startup or request path — if the data being fetched is optional (metadata, IDs), fetch it asynchronously. + +### Suppress expected cancellation noise +When `Close()` cancels a background operation, don't log the cancellation as a warning: + +```go +if err != nil && !errors.Is(err, context.Canceled) { + log.Warn("failed to fetch metadata: %s", err) +} +``` + +### Error messages should describe impact +When logging failures, explain what the user loses — not just what failed: + +```go +// Bad: +log.Warn("failed to create admin client: %s", err) + +// Good: +log.Warn("failed to create admin client; cluster metadata will be missing from spans: %s", err) +``` + +## Data Streams Monitoring (DSM) patterns + +These patterns apply anywhere DSM code appears — in `contrib/`, `ddtrace/tracer/`, or `datastreams/`. + +### Gate DSM work on processor availability +Don't tag spans with DSM metadata or do DSM processing when DSM is disabled: + +```go +// Bad: tags spans even when DSM is off +tagActiveSpan(ctx, id, name) +if p := datastreams.GetProcessor(ctx); p != nil { + p.TrackTransaction(...) +} + +// Good: check first +if p := datastreams.GetProcessor(ctx); p != nil { + tagActiveSpan(ctx, id, name) + p.TrackTransaction(...) +} +``` + +## Consistency across similar integrations + +When a feature exists in one integration, implementations in related integrations should follow the same patterns. Check for: +- Same synchronization approach (don't use `map + sync.Mutex` in one package and `sync.Map` in another) +- Same error handling strategy for the same failure mode +- Same input normalization (e.g., trimming whitespace from addresses) + +## Span tags and metadata + +### Required tags for integration spans +Per the contrib README: +- `span.kind`: set in root spans (`client`, `server`, `producer`, `consumer`). Omit if `internal`. +- `component`: set in all spans, value is the integration's full package path + +### Resource name changes +Changing the resource name format is a potential breaking change for the backend. Verify backward compatibility before changing resource name formatting. + +### Orchestrion compatibility +Be aware of Orchestrion (automatic instrumentation) implications: +- The `orchestrion.yml` in contrib packages defines instrumentation weaving +- Guard against nil typed interface values: a nil pointer cast to an interface produces a non-nil interface that panics on method calls diff --git a/.claude/review-ddtrace/performance.md b/.claude/review-ddtrace/performance.md new file mode 100644 index 00000000000..ad4e9d75f45 --- /dev/null +++ b/.claude/review-ddtrace/performance.md @@ -0,0 +1,58 @@ +# Performance Reference + +dd-trace-go runs in every instrumented Go service. Performance regressions directly impact customer applications. Reviewers are vigilant about hot-path changes. + +## Benchmark before and after + +When changing code in hot paths (span creation, tag setting, serialization, sampling), include before/after benchmark comparisons in the PR description. Run `go test -bench` on both the old and new code. + +## Inlining cost awareness + +On hot-path functions in `ddtrace/tracer/`, reviewers sometimes verify inlining with `go build -gcflags="-m=2"`. If a change grows a function past the compiler's inlining budget, wrap cold-path code (like error logging) in a `//go:noinline` helper to keep the hot caller inlineable. + +## Avoid allocations in hot paths + +### Pre-compute sizes +When building slices for serialization, compute the size upfront rather than growing dynamically. + +### Avoid unnecessary byte slice allocation +When appending to a byte buffer, don't allocate intermediate slices: + +```go +// Bad: allocates a temporary slice +tmp := make([]byte, 0, idLen+9) +tmp = append(tmp, checkpointID) +// ... +dst = append(dst, tmp...) + +// Good: append directly to destination +dst = append(dst, checkpointID) +dst = binary.BigEndian.AppendUint64(dst, uint64(timestamp)) +dst = append(dst, byte(idLen)) +dst = append(dst, transactionID[:idLen]...) +``` + +### String building +Per CONTRIBUTING.md: favor `strings.Builder` or string concatenation (`a + "b" + c`) over `fmt.Sprintf` in hot paths. + +## Lock contention in hot paths + +### Cache config reads outside hot loops +Don't call lock-acquiring config accessors (like `TracerConf()`) on every span. Cache what you need at a higher level to avoid per-span lock contention and allocations. + +### Minimize critical section scope +Get in and out of critical sections quickly. Don't do I/O, allocations, or complex logic while holding a lock. + +## Serialization correctness + +### Array header counts must match actual entries +When encoding msgpack arrays, the declared count must match the number of entries actually written. If entries can be conditionally skipped (e.g., a value fails to serialize), the count will be wrong and downstream decoders will corrupt. Either pre-validate entries, use a two-pass approach (serialize then count), or adjust the header retroactively. + +## Profiler-specific concerns + +### Measure overhead for new profile types +New profile types can impact application performance through STW pauses or GC triggers. Include overhead analysis and reference relevant benchmarks when introducing profile types that interact with GC or runtime internals. + +### Concurrent profile capture ordering +Be aware of how profile types interact when captured concurrently. For example, a goroutine leak profile that waits for a GC cycle will cause the heap profile to reflect the *previous* cycle's data, not the current one. + diff --git a/.claude/review-ddtrace/style-and-idioms.md b/.claude/review-ddtrace/style-and-idioms.md new file mode 100644 index 00000000000..00592a09750 --- /dev/null +++ b/.claude/review-ddtrace/style-and-idioms.md @@ -0,0 +1,142 @@ +# Style and Idioms Reference + +dd-trace-go-specific patterns reviewers consistently enforce. General Go conventions (naming, formatting, error handling) are covered by [Effective Go](https://go.dev/doc/effective_go) — this file focuses on what's specific to this repo. + +## Happy path left-aligned (highest frequency) + +Error/edge-case handling should return early, keeping the main logic at the left margin. + +```go +// Bad: +if cond { + doMainWork() +} else { + return err +} + +// Good: +if !cond { + return err +} +doMainWork() +``` + +## Naming conventions + +### Go initialisms +Use standard Go capitalization for initialisms: `OTel` not `Otel`, `ID` not `Id`. This applies to struct fields, function names, and comments. + +```go +logsOTelEnabled // not logsOtelEnabled +LogsOTelEnabled() // not LogsOtelEnabled() +``` + +### Function/method naming +- Prefer `getRateLocked` over `getRate2` — the suffix should convey intent (in this case, that the lock must be held) +- Functions that expect to be called with a lock already held should be named `*Locked` (e.g., `getRateLocked`) so the contract is visible at call sites + +## Constants and magic values + +Use named constants instead of inline literals: + +```go +// Reviewers flag: +if u.Scheme == "unix" || u.Scheme == "http" || u.Scheme == "https" { ... } + +// Preferred: define or reuse constants +const ( + schemeUnix = "unix" + schemeHTTP = "http" + schemeHTTPS = "https" +) +``` + +Specific patterns: +- String tag keys: import from `ddtrace/ext` or `instrumentation` rather than hardcoding `"_dd.svc_src"` +- Protocol identifiers, retry intervals, and timeout values should be named constants with comments explaining the choice +- If a constant already exists in `ext`, `instrumentation`, or elsewhere in the repo, use it rather than defining a new one + +### Bit flags and magic numbers +Name bitmap values and numeric constants. + +## Comments and documentation + +### Godoc accuracy +Comments that appear in godoc should be precise. Reviewers flag comments that are slightly wrong or misleading, like `// IsSet returns true if the key is set` when the actual behavior checks for non-empty values. + +### Don't pin comments to specific files +```go +// Bad: "A zero value uses the default from option.go" +// Good: "A zero value uses defaultAgentInfoPollInterval." +``` +Files move. Reference the constant or concept, not the file location. + +### Explain "why" for non-obvious config +For feature flags, polling intervals, and other tunables, add a brief comment explaining the rationale, not just what the field does: +```go +// agentInfoPollInterval controls how often we refresh /info. +// A zero value uses defaultAgentInfoPollInterval. +agentInfoPollInterval time.Duration +``` + +### Comments for hooks and callbacks +When implementing interface methods that serve as hooks or callbacks, add a comment explaining when the hook is called and what it does — these aren't obvious to someone reading the code later. + +## Avoid unnecessary aliases and indirection + +Don't create type aliases or function wrappers that don't add value: + +```go +// Bad: +type myAlias = somePackage.Type +func doThing() { somePackage.DoThing() } + +// Only alias when genuinely needed (import cycles, cleaner public API) +``` + +## Avoid `init()` functions + +Avoid `init()` in this repo. Use named helper functions called from variable initialization instead: + +```go +// Bad: +func init() { + cfg.rootSessionID = computeSessionID() +} + +// Good: +var cfg = &config{ + rootSessionID: computeRootSessionID(), +} +``` + +The exception is `instrumentation.Load()` calls in contrib packages, which are expected to use `init()` per the contrib README. + +## Embed interfaces for forward compatibility + +When wrapping a type that implements an interface, embed the interface rather than proxying every method individually. This way, new methods added to the interface in future versions are automatically forwarded: + +```go +// Fragile: must manually add every new method +type telemetryExporter struct { + inner metric.Exporter +} +func (t *telemetryExporter) Export(ctx context.Context, rm *metricdata.ResourceMetrics) error { + return t.inner.Export(ctx, rm) +} + +// Better: embed so new methods are forwarded automatically +type telemetryExporter struct { + metric.Exporter // embed the interface +} +``` + +## Deprecation markers +When marking functions as deprecated, use the Go-standard `// Deprecated:` comment prefix so that linters and IDEs flag usage: +```go +// Deprecated: Use [Wrap] instead. +func Middleware(service string, opts ...Option) echo.MiddlewareFunc { +``` + +## Generated files +Maintain ordering in generated files. If a generated file like `supported_configurations.gen.go` has sorted keys, don't hand-edit in a way that breaks the sort — it'll cause confusion when the file is regenerated. diff --git a/.gitignore b/.gitignore index 47ece489df7..9c14c521bf0 100644 --- a/.gitignore +++ b/.gitignore @@ -32,4 +32,5 @@ coverage-*.txt /.vscode /.claude/* !/.claude/commands/ +!/.claude/review-ddtrace/ !/.claude/settings.json