Skip to content

feat(contrib/otel): add OpenTelemetry tracing submodule#79

Merged
xuxife merged 13 commits into
mainfrom
xuxife/26/05/13/add-contrib
May 15, 2026
Merged

feat(contrib/otel): add OpenTelemetry tracing submodule#79
xuxife merged 13 commits into
mainfrom
xuxife/26/05/13/add-contrib

Conversation

@xuxife
Copy link
Copy Markdown
Collaborator

@xuxife xuxife commented May 15, 2026

Summary

  • New independent Go submodule contrib/otel (github.com/Azure/go-workflow/contrib/otel, package flowotel) that integrates OpenTelemetry traces via the existing StepInterceptor / AttemptInterceptor extension points — no new core hooks needed.
  • Two factories: flowotel.NewStepInterceptor(opts...) (one span per step lifetime, across retries) and flowotel.NewAttemptInterceptor(opts...) (one span per attempt). Used independently or together (attempt span becomes child of step span when both registered).
  • Six functional options: WithTracerProvider, WithTracerName, WithStepSpanNamer, WithAttemptSpanNamer, WithStepAttributes, WithAttemptAttributes. Canonical attributes (workflow.step.name, workflow.step.status, workflow.step.attempt) always win over user-supplied keys.
  • Submodule depends only on the OpenTelemetry API at runtime (go.opentelemetry.io/otel, …/otel/trace); SDK / tracetest / stdouttrace are test-only. Core go.mod/go.sum is byte-identical — OTel dependency does NOT enter core's transitive graph.

Why a separate submodule?

If contrib/otel were a subpackage of core, every core user would inherit the OpenTelemetry dependency tree. Releasing it as a separate Go module (versioned independently as contrib/otel/v0.x.y) keeps that cost opt-in, mirroring how gin-contrib, otelhttp, etc. are structured.

What's in the change

  • contrib/otel/options.goOption + six With* constructors + (*config).resolveTracer() (resolves provider once at factory call time).
  • contrib/otel/step.go + attempt.go — the two factories. ~70 LOC each, mirror each other's structure.
  • contrib/otel/consts.go — shared attribute keys / status values.
  • 18 unit + integration tests + 1 godoc Example covering: success / retries-as-one-step-span / per-attempt-span / final-error / context.Canceled / Skipped-condition-bypass / custom namer / custom attrs / canonical-attrs-win regression / parent-child relation / retry-attempt count / factory-time provider snapshot.
  • contrib/otel/doc.go, contrib/otel/README.md, root README.md updated.
  • Full openspec change at openspec/changes/contrib-otel-tracing/ (proposal + design + tasks + spec deltas under capability contrib-otel).

Test plan

  • go test ./... from repo root passes (core unchanged).
  • cd contrib/otel && go test -race -count=1 ./... passes (18 tests + Example).
  • go vet ./... and gofmt -l . clean inside contrib/otel/.
  • cd contrib/otel && go list -deps -test=false ./... | grep -E 'otel/sdk|stdouttrace|tracetest' produces no output (SDK / exporter / tracetest stay test-only).
  • Root go.mod/go.sum byte-identical to before (git diff fca6b68c..HEAD -- go.mod go.sum empty).
  • openspec validate contrib-otel-tracing --strict reports valid.
  • Reviewer to check the package godoc renders cleanly on pkg.go.dev (after merge / tag).

Follow-ups (intentionally not in this PR; documented in contrib/otel/README.md)

  • Add a CI job that runs go test ./... (and -race) inside contrib/otel/ in addition to the root-module job.
  • Before tagging contrib/otel/v0.1.0, drop or pin the replace github.com/Azure/go-workflow => ../.. directive in contrib/otel/go.mod so go get …/contrib/otel@v0.1.0 resolves cleanly.

Xingfei Xu and others added 13 commits May 14, 2026 11:41
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements Task 3 of the contrib-otel-tracing change: NewStepInterceptor
emits exactly one OpenTelemetry span per Step lifetime (covering all
retry attempts), with workflow.step.name + workflow.step.status
attributes and codes.Error / RecordError on failure (context.Canceled
included). Skipped/Canceled-by-Condition steps bypass the chain and
produce no span.

Replaces deps_test.go: helpers_test.go now imports the SDK +
tracetest packages directly, anchoring them to the test graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements NewAttemptInterceptor: one OTel span per attempt, default name
"<step> (attempt N)", canonical attrs workflow.step.name and
workflow.step.attempt (int64), error path records via RecordError +
SetStatus(codes.Error). User-supplied attributes are appended but the
canonical pair always wins (last-write-wins). WithAttemptAttributes godoc
in options.go updated to document this precedence symmetric with
WithStepAttributes.

Tests (package otel_test, reusing helpers from step_test.go):
- TestAttemptInterceptor_OneSpanPerAttempt
- TestAttemptInterceptor_DefaultName
- TestAttemptInterceptor_FailingAttemptRecorded
- TestAttemptInterceptor_ChildOfCallerSpan
- TestAttemptInterceptor_CustomNamer
- TestAttemptInterceptor_CustomAttributes (regression: user cannot
  override workflow.step.attempt)
- TestBothLayers_AttemptIsChildOfStep: attempt span shares TraceID with step
  and has step span as parent.
- TestBothLayers_RetryAttemptCount: one step span + N attempt spans across
  retries, all in the same trace.
- TestProviderResolutionAtFactoryTime: locks in that NewStepInterceptor and
  NewAttemptInterceptor snapshot the global TracerProvider at construction
  time, not on every interception.
- Example: runnable godoc that wires both interceptors with a stdouttrace
  exporter on a 2-step pipeline. // Output: omitted because span IDs and
  timestamps are non-deterministic.
- Add stdouttrace as a test-only dependency; runtime audit confirms no
  production deps on sdk/stdouttrace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move openspec/changes/contrib-otel-tracing/ to archive/2026-05-15-* and
copy the delta spec into openspec/specs/contrib-otel/spec.md so the new
capability is part of the main spec set.
Copy link
Copy Markdown

@XiangyuKuangMSFT XiangyuKuangMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xuxife xuxife added this pull request to the merge queue May 15, 2026
Merged via the queue into main with commit 6221bd7 May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants