Skip to content

agents,lib,src,test: add traceSampleRate support#430

Open
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/tracing_rate
Open

agents,lib,src,test: add traceSampleRate support#430
santigimeno wants to merge 1 commit intonode-v24.x-nsolid-v6.xfrom
santi/tracing_rate

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Mar 3, 2026

Add end-to-end traceSampleRate handling across config, runtime propagation, tracing decisions, and regression tests.

Why:

  • Enable configurable probabilistic trace sampling with predictable behavior.
  • Ensure consistent semantics across all config entry points.
  • Prevent invalid updates from corrupting current sampling behavior.
  • Keep transaction consistency by deciding sampling at the root span only.

What changed:

  • Added traceSampleRate parsing and normalization in JS config paths with explicit default fallback and finite/range validation in [0, 1].
  • Added native config sanitization for traceSampleRate to reject invalid values before merge, preserving previous valid configuration.
  • Ensured runtime sampling state is synchronized from effective current config after updates to avoid stale shared-memory sample rates.
  • Added gRPC reconfigure support for traceSampleRate in proto and agent mapping, including generated protobuf updates.
  • Updated tracing logic so root spans perform the sampling decision and child spans inherit parent traceFlags.
  • Extended tests for:
    • invalid value handling (including NaN/Infinity)
    • env/package bootstrap behavior
    • partial updates preserving existing traceSampleRate
    • gRPC invalid-update fallback behavior
    • sampling behavior at 0%, 50% (tolerance), and 100%
    • worker-thread sampling behavior
    • explicit parent/child trace consistency assertions

Summary by CodeRabbit

Release Notes

  • New Features
    • Added trace sampling rate feature for probabilistic sampling control of root spans. Configuration supported via environment variables, package.json, or runtime API. Valid values range from 0 to 1; invalid values are rejected automatically.

Add end-to-end traceSampleRate handling across config, runtime
propagation, tracing decisions, and regression tests.

Why:
- Enable configurable probabilistic trace sampling with predictable
  behavior.
- Ensure consistent semantics across all config entry points.
- Prevent invalid updates from corrupting current sampling behavior.
- Keep transaction consistency by deciding sampling at the root span
  only.

What changed:
- Added traceSampleRate parsing and normalization in JS config paths
  with explicit default fallback and finite/range validation in [0, 1].
- Added native config sanitization for traceSampleRate to reject invalid
  values before merge, preserving previous valid configuration.
- Ensured runtime sampling state is synchronized from effective current
  config after updates to avoid stale shared-memory sample rates.
- Added gRPC reconfigure support for traceSampleRate in proto and agent
  mapping, including generated protobuf updates.
- Updated tracing logic so root spans perform the sampling decision and
  child spans inherit parent traceFlags.
- Extended tests for:
  - invalid value handling (including NaN/Infinity)
  - env/package bootstrap behavior
  - partial updates preserving existing traceSampleRate
  - gRPC invalid-update fallback behavior
  - sampling behavior at 0%, 50% (tolerance), and 100%
  - worker-thread sampling behavior
  - explicit parent/child trace consistency assertions
@santigimeno santigimeno requested a review from RafaelGSS March 3, 2026 16:01
@santigimeno santigimeno self-assigned this Mar 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Walkthrough

This PR introduces probabilistic trace sampling control to the NSolid/OpenTelemetry integration. A new traceSampleRate configuration parameter (0–1 range) controls root span sampling, propagated through protocol buffers, the gRPC reconfiguration API, runtime environment management, and the JavaScript tracing layer. Comprehensive tests verify configuration precedence, validation, and sampling behavior.

Changes

Cohort / File(s) Summary
Protocol Buffer Schema & Generation
agents/grpc/proto/reconfigure.proto, agents/grpc/src/proto/reconfigure.pb.cc, agents/grpc/src/proto/reconfigure.pb.h
Added new optional traceSampleRate field (double, field 14) to ReconfigureBody message. Updated generated C++ code with serialization, deserialization, has-bit tracking, and field accessors.
gRPC Agent Integration
agents/grpc/src/grpc_agent.cc
Added serialization of traceSampleRate from incoming reconfigure requests to output config, enabling propagation of trace sampling settings through the gRPC channel.
JavaScript Configuration & OpenTelemetry
lib/nsolid.js, lib/internal/otel/trace.js
Introduced traceSampleRate config parameter with validation (0–1 range). Root spans now sampled probabilistically based on binding rate; child spans inherit parent flags. Added parseTraceSampleRate() helper with undefined fallback for invalid values.
Runtime Environment API
src/nsolid/nsolid_api.cc, src/nsolid/nsolid_api.h
Added trace_sample_rate_ member to EnvInst; exposed via new Float64Array binding. Introduced validation (validate_trace_sample_rate), config merging, and propagation (update_tracing_sample_rate) on EnvList. Invalid rates are sanitized before applying.
Test Fixtures
test/fixtures/nsolid-trace-sample-rate-package.json, test/fixtures/test-nsolid-config-trace-sample-rate-env-script.js
Added JSON fixture with sample traceSampleRate: 0.7 and a startup script that exports the runtime's configured trace sampling rate.
Integration & Validation Tests
test/parallel/test-nsolid-config-trace-sample-rate.js, test/parallel/test-nsolid-config-trace-sample-rate-env.js, test/parallel/test-nsolid-trace-sample-rate-sampling.js, test/addons/nsolid-tracing/test-otel-basic2.js, test/agents/test-grpc-reconfigure.mjs
Added tests for configuration via environment variables, package.json precedence, invalid-rate rejection, sampling count verification across worker threads, and child–parent span flag consistency.

Sequence Diagram

sequenceDiagram
    participant App as JavaScript App
    participant Config as nsolid.config
    participant gRPC as gRPC Agent
    participant EnvList as EnvList (C++)
    participant Tracer as OpenTelemetry Tracer
    participant Span as Root Span

    App->>Config: updateConfig({traceSampleRate: 0.5})
    Config->>Config: parseTraceSampleRate(0.5)
    Config->>gRPC: serialize & send reconfigure
    gRPC->>EnvList: validate_trace_sample_rate(config)
    EnvList->>EnvList: sanitize & check [0, 1]
    EnvList->>EnvList: update_tracing_sample_rate(0.5)
    EnvList->>EnvList: store in trace_sample_rate_
    EnvList->>Config: propagate via binding
    Note over Config: binding.trace_sample_rate = 0.5

    App->>Tracer: create root span (no parent)
    Tracer->>Span: sample decision: MathRandom() < 0.5?
    alt Sampled (50% probability)
        Span->>Span: spanContext.flags = SAMPLED
    else Not Sampled (50% probability)
        Span->>Span: spanContext.flags = NOT_SAMPLED
    end
    Span-->>Tracer: return span
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Suggested reviewers

  • RafaelGSS
  • EHortua

Poem

🐰 Hops through traces with jubilation,
Sampling spans with precision inflation,
From 0 to 1, the rate now flows,
Root and child—tracing wisdom grows!
Configured, validated, propagated bright,
The telemetry shines with just the right light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and directly summarizes the main change: adding traceSampleRate support across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch santi/tracing_rate

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/agents/test-grpc-reconfigure.mjs (1)

171-208: Consider adding NaN/Infinity invalid-rate cases in this gRPC path test.

This block currently checks only out-of-range finite numbers. Extending it with Number.NaN and ±Infinity would better guard the gRPC reconfigure edge cases already covered in non-gRPC tests.

✅ Minimal test extension
-        const invalidRates = [2, -0.5];
+        const invalidRates = [2, -0.5, Number.NaN, Number.POSITIVE_INFINITY, Number.NEGATIVE_INFINITY];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/agents/test-grpc-reconfigure.mjs` around lines 171 - 208, Update the
test "should preserve previous traceSampleRate for invalid values" to include
NaN and Infinity cases: add Number.NaN, Number.POSITIVE_INFINITY and
Number.NEGATIVE_INFINITY (or +/-Infinity) to the invalidRates array used with
grpcServer.reconfigure and client.config assertions so
grpcServer.reconfigure(agentId, { traceSampleRate: invalidRate }) is exercised
for NaN/Infinity and the existing assert.strictEqual checks that
nsolidConfig.traceSampleRate remained 0.4 still apply.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents/grpc/src/grpc_agent.cc`:
- Around line 1761-1763: PopulateReconfigureEvent currently omits the
traceSampleRate field when building the outgoing reconfigure event body; update
PopulateReconfigureEvent to set body.traceSampleRate from the internal config so
the outgoing event mirrors inbound updates (i.e., when you previously read
body.tracesamplerate() on incoming updates, ensure PopulateReconfigureEvent
calls the corresponding setter to populate body.traceSampleRate in the event).
Locate the PopulateReconfigureEvent function in grpc_agent.cc and add the
traceSampleRate assignment using the same source/field used for other mapped
settings so reconfigure responses include traceSampleRate.

In `@lib/nsolid.js`:
- Around line 1144-1154: The parseTraceSampleRate function currently coerces
unintended types via unary +; update parseTraceSampleRate to only accept inputs
that are typeof 'number' or 'string', explicitly reject booleans and other
types, trim string inputs and return undefined for empty/whitespace-only
strings, then convert the trimmed numeric string (or the number input) to a
numeric value and validate with NumberIsFinite and range checks (>=0 && <=1);
ensure the function returns undefined for invalid types, whitespace-only
strings, NaN, non-finite numbers, or values outside the 0–1 range while
returning the numeric rate for valid inputs.

In `@test/parallel/test-nsolid-config-trace-sample-rate-env.js`:
- Around line 15-18: The test currently spreads process.env into the child env
(env: {...process.env, ...envVars}), which can leak ambient NSOLID_* variables
and make the test nondeterministic; update the env construction in
test/parallel/test-nsolid-config-trace-sample-rate-env.js to explicitly filter
out any keys starting with "NSOLID_" (e.g., build a filteredEnv =
Object.fromEntries(Object.entries(process.env).filter(([k]) =>
!k.startsWith('NSOLID_'))) and then use env: { ...filteredEnv, ...envVars } so
the child process is isolated from ambient NSOLID_* variables while still
allowing envVars to override.

---

Nitpick comments:
In `@test/agents/test-grpc-reconfigure.mjs`:
- Around line 171-208: Update the test "should preserve previous traceSampleRate
for invalid values" to include NaN and Infinity cases: add Number.NaN,
Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY (or +/-Infinity) to the
invalidRates array used with grpcServer.reconfigure and client.config assertions
so grpcServer.reconfigure(agentId, { traceSampleRate: invalidRate }) is
exercised for NaN/Infinity and the existing assert.strictEqual checks that
nsolidConfig.traceSampleRate remained 0.4 still apply.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 369ce34 and 5048dab.

📒 Files selected for processing (15)
  • agents/grpc/proto/reconfigure.proto
  • agents/grpc/src/grpc_agent.cc
  • agents/grpc/src/proto/reconfigure.pb.cc
  • agents/grpc/src/proto/reconfigure.pb.h
  • lib/internal/otel/trace.js
  • lib/nsolid.js
  • src/nsolid/nsolid_api.cc
  • src/nsolid/nsolid_api.h
  • test/addons/nsolid-tracing/test-otel-basic2.js
  • test/agents/test-grpc-reconfigure.mjs
  • test/fixtures/nsolid-trace-sample-rate-package.json
  • test/fixtures/test-nsolid-config-trace-sample-rate-env-script.js
  • test/parallel/test-nsolid-config-trace-sample-rate-env.js
  • test/parallel/test-nsolid-config-trace-sample-rate.js
  • test/parallel/test-nsolid-trace-sample-rate-sampling.js

Comment on lines +1761 to +1763
if (body.has_tracesamplerate()) {
out["traceSampleRate"] = body.tracesamplerate();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

traceSampleRate is accepted but not returned in reconfigure event body.

Lines 1761-1763 map inbound updates, but PopulateReconfigureEvent does not set body.traceSampleRate. This makes reconfigure responses incomplete even when the update is applied.

💡 Suggested fix
diff --git a/agents/grpc/src/grpc_agent.cc b/agents/grpc/src/grpc_agent.cc
@@ void PopulateReconfigureEvent(grpcagent::ReconfigureEvent* reconfigure_event,
   it = config.find("assetsEnabled");
   if (it != config.end()) {
     body->set_assetsenabled(*it);
   }
+  it = config.find("traceSampleRate");
+  if (it != config.end()) {
+    body->set_tracesamplerate(*it);
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/grpc/src/grpc_agent.cc` around lines 1761 - 1763,
PopulateReconfigureEvent currently omits the traceSampleRate field when building
the outgoing reconfigure event body; update PopulateReconfigureEvent to set
body.traceSampleRate from the internal config so the outgoing event mirrors
inbound updates (i.e., when you previously read body.tracesamplerate() on
incoming updates, ensure PopulateReconfigureEvent calls the corresponding setter
to populate body.traceSampleRate in the event). Locate the
PopulateReconfigureEvent function in grpc_agent.cc and add the traceSampleRate
assignment using the same source/field used for other mapped settings so
reconfigure responses include traceSampleRate.

Comment on lines +1144 to +1154
function parseTraceSampleRate(value) {
if (value === undefined || value === null || value === '') {
return undefined;
}

const normalized = +value;
if (!NumberIsFinite(normalized) || normalized < 0 || normalized > 1) {
return undefined;
}

return normalized;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

JavaScript unary plus operator coercion behavior true false empty string symbols

💡 Result:

In JavaScript, unary plus (+x) coerces its operand using ToNumber (spec: it evaluates the operand, gets its value, then applies ToNumber). [3]

Common coercions

Operand Result of +operand Notes
true 1 Boolean → number coercion [1]
false 0 Boolean → number coercion [1]
"" (empty string) 0 String numeric coercion; empty string becomes 0 [1]
Symbol("x") (any Symbol) throws TypeError ToNumber throws on Symbol (and also on BigInt) [2]

Examples: +true === 1, +false === 0, +"" === 0, +"hello" === NaN, +Symbol() throws TypeError. [1][2]

Sources:
[1] MDN: Unary plus (+) operator coercion examples and behavior
[2] ECMAScript spec: ToNumber throws on Symbol (and BigInt)
[3] ECMAScript spec: unary + evaluation uses ToNumber(GetValue(...))


Reject non-numeric, non-string inputs and properly validate whitespace in traceSampleRate.

+value coerces unintended inputs (true1, false0) and whitespace-only strings (e.g., ' '0) into valid rates, causing invalid configuration to silently alter sampling instead of being rejected. Type-check to accept only numbers and strings, trim and reject empty/whitespace-only strings, and reject booleans and other types.

Suggested fix
 const {
   ArrayIsArray,
   Date,
   DateNow,
   JSONParse,
   JSONStringify,
   NumberIsFinite,
   NumberParseInt,
   ObjectAssign,
   ObjectDefineProperty,
   ObjectGetOwnPropertyNames,
   ObjectPrototype,
+  StringPrototypeTrim,
 } = primordials;
 ...
 function parseTraceSampleRate(value) {
-  if (value === undefined || value === null || value === '') {
+  if (value === undefined || value === null) {
     return undefined;
   }
 
-  const normalized = +value;
+  if (typeof value !== 'number' && typeof value !== 'string') {
+    return undefined;
+  }
+  if (typeof value === 'string' && StringPrototypeTrim(value) === '') {
+    return undefined;
+  }
+
+  const normalized = Number(value);
   if (!NumberIsFinite(normalized) || normalized < 0 || normalized > 1) {
     return undefined;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/nsolid.js` around lines 1144 - 1154, The parseTraceSampleRate function
currently coerces unintended types via unary +; update parseTraceSampleRate to
only accept inputs that are typeof 'number' or 'string', explicitly reject
booleans and other types, trim string inputs and return undefined for
empty/whitespace-only strings, then convert the trimmed numeric string (or the
number input) to a numeric value and validate with NumberIsFinite and range
checks (>=0 && <=1); ensure the function returns undefined for invalid types,
whitespace-only strings, NaN, non-finite numbers, or values outside the 0–1
range while returning the numeric rate for valid inputs.

Comment on lines +15 to +18
env: {
...process.env,
...envVars,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Isolate child test env from ambient NSOLID_* variables.

Inheriting process.env directly can make this test nondeterministic if the parent already exports NSOLID_TRACE_SAMPLE_RATE or NSOLID_PACKAGE_JSON.

Suggested hardening
 function runWithEnv(envVars) {
+  const baseEnv = { ...process.env };
+  delete baseEnv.NSOLID_TRACE_SAMPLE_RATE;
+  delete baseEnv.NSOLID_PACKAGE_JSON;
+
   const result = spawnSync(process.execPath, [script], {
     env: {
-      ...process.env,
+      ...baseEnv,
       ...envVars,
     },
     encoding: 'utf8',
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env: {
...process.env,
...envVars,
},
function runWithEnv(envVars) {
const baseEnv = { ...process.env };
delete baseEnv.NSOLID_TRACE_SAMPLE_RATE;
delete baseEnv.NSOLID_PACKAGE_JSON;
const result = spawnSync(process.execPath, [script], {
env: {
...baseEnv,
...envVars,
},
encoding: 'utf8',
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/parallel/test-nsolid-config-trace-sample-rate-env.js` around lines 15 -
18, The test currently spreads process.env into the child env (env:
{...process.env, ...envVars}), which can leak ambient NSOLID_* variables and
make the test nondeterministic; update the env construction in
test/parallel/test-nsolid-config-trace-sample-rate-env.js to explicitly filter
out any keys starting with "NSOLID_" (e.g., build a filteredEnv =
Object.fromEntries(Object.entries(process.env).filter(([k]) =>
!k.startsWith('NSOLID_'))) and then use env: { ...filteredEnv, ...envVars } so
the child process is isolated from ambient NSOLID_* variables while still
allowing envVars to override.

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants