feat: add request layer lifecycle TracingChannels#196
feat: add request layer lifecycle TracingChannels#196logaretm wants to merge 11 commits intopillarjs:masterfrom
Conversation
Add a single `express:request` TracingChannel that emits structured
lifecycle events (start, end, asyncStart, asyncEnd, error) for every
middleware, route handler, and error handler execution.
Context shape: { req, res, layer } where layer is the Layer instance,
giving subscribers access to layer.name, layer.handle.length (for error
handler detection), and layer.route. Consumers decide how to classify
and name spans based on these properties.
Route dispatch wrapper layers (internal glue that calls route.dispatch)
are excluded from tracing to avoid duplicate events. The actual user
handlers inside the route are traced individually.
Zero overhead when no subscribers are registered. The hasSubscribers
check gates all context allocation and tracePromise wrapping.
Refs: pillarjs#96
Refs: expressjs/express#6353
Extract handlePromise and unify the traced/untraced paths into a single try/catch and .then handler.
Express TracingChannel PR opened: pillarjs/router#196
There was a problem hiding this comment.
Pull request overview
Adds request-layer lifecycle instrumentation using Node.js diagnostics_channel TracingChannel, emitting structured events for middleware, route handlers, and error handlers to support APM-style tracing with context propagation.
Changes:
- Add
express:requestTracingChannel integration inLayerrequest/error handling. - Introduce shared invocation helper to wrap handler execution and promise handling with tracing hooks.
- Add a new test suite validating emitted events, context shape, and basic ordering/behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lib/layer.js | Adds TracingChannel wiring and refactors handler invocation through a tracing-aware helper. |
| test/tracing.js | New mocha tests covering tracing context, handler types, and lifecycle event expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Qard
left a comment
There was a problem hiding this comment.
Generally LGTM. I think we might want to validate if handled errors are still captured on the span prior to the handler though.
| }) | ||
|
|
||
| describe('error handler tracing', function () { | ||
| it('should trace error handlers (fn.length === 4)', function (done) { |
There was a problem hiding this comment.
Should this one capture the error even though it is considered handled overall?
There was a problem hiding this comment.
Not entirely sure, APMs will usually have a global error handler or whatever to catch them so having them traced might be an overkill and may duplicate error emissions for those who don't dedup them.
WDYT?
There was a problem hiding this comment.
But does it having a handler here result in the error not being present on the span for the layer that lead to the error handler? Whatever the behaviour is, I think it would be useful to validate it more explicitly in this particular test. We should have a clear model of what errors happen where.
There was a problem hiding this comment.
Thanks! I added more tests to pin down various behaviors. Here is how errors flow right now:
No error handler (no app.use):
- Route throws sync ->
errorchannel publishes for the route layer. Router responds 500. - Route promise rejects ->
errorchannel publishes for the route layer. Router responds 500.
With error handler:
- Route throws sync ->
errorchannel publishes for the route, handler runs, nothing on theerrorchannel for the handler. - Route rejects async -> same,
errorchannel publishes for the route, handler clean.
With error handler that also throws/rejects:
- Route throws sync, handler throws ->
errorchannel publishes twice, once per layer. Two legitimately failed executions. - Route rejects async, handler throws -> same.
For an APM: both throw cases produce two error events on the channel — one with ctx.layer pointing at the route, one at the error handler. Each event targets its own span, so the APM just marks both spans as failed.
In the case for next(err) then the child error handler span will be marked, not the parent one. Which is fine I suppose since it technically recovered as you've said.
Tests in 1ad1b9c.
30c87c5 to
1ad1b9c
Compare
Qard
left a comment
There was a problem hiding this comment.
Hmm...this handling seems a bit incorrect. If an error is unhandled it should definitely be reported. And even if it is handled we should still probably be reporting it so we can capture cases like a too permissive error handler silencing legitimate failures.
I think probably what we should be doing is always capturing the error and just adding some handled flag to the context to mark when it has been handled somehow. That way we can still report it with the detail that the application thinks it has been handled while still allowing human judgement to look at the span and see if that assessment actually makes sense.
| }) | ||
| }) | ||
|
|
||
| it('should not emit error on the originating layer when next(err) is unhandled', function (done) { |
There was a problem hiding this comment.
This seems problematic. Unhandled errors should definitely always be observable.
There was a problem hiding this comment.
Okay agreed, this could've become a real gap and it's always easier to dedup rather than miss events.
I pushed a fix to that in 90ed7e2 and adjusted the tests to pin that behavior.
Qard
left a comment
There was a problem hiding this comment.
I think it's in a reasonable state now. Might be good to also get a review from @rochdev or @Flarna for some more perspectives from other APM vendors. The specific error semantics I think might be a bit up to interpretation how they should be handled, but I think this shape should generally provide the context that is needed for the different ways we think about route errors. 🤔
| * @private | ||
| */ | ||
|
|
||
| const requestChannel = dc.tracingChannel && dc.tracingChannel('express:request') |
There was a problem hiding this comment.
Why is express:request channel used here?
I think it would be better to use a dedicated channel for router to allow consumers to decide on what to receive and how to handle it.
I think router component can be also used without express.
There was a problem hiding this comment.
Good point, I don't think it has to be express prefixed but it needs a unique prefix for sure which is why I wanted some opinions on what it should be. My concern is router:request is too generic and can easily conflict with other names in the ecosystem.
I had this in the PR description:
The channel name right now is express:request, in the original PR it was request but I think channel names need to have a relationship with the module name otherwise we risk duplication in the ecosystem.
WDYT? Maybe pillarjs:router:request?
There was a problem hiding this comment.
There seems to be a guideline.
So I guess the string to use would be pillarjs.router.request as tracing: prefix and event name postfix is added by dc.tracingChannel().
There was a problem hiding this comment.
Doesn't seem like there is a consistent standard, docs do suggest that but others just went with lib:something:action formats.
Anyways I have no strong opinions here and I think as long as the name is unique enough, it is arbitrary. I renamed it to your suggestion 👍
Follows the Node.js TracingChannel naming guideline (dot-separated, module-scoped) and avoids the express-specific prefix since router is usable outside express.
|
The actual data published by the tracing channel is the If I understand the router module correct Also I don't find in the docs here that In general I would love some section in the docs describing the diag channel and it's emitted data. |
I can't answer that as it usually up to maintainers to decide this but I would say it does imply a public API contract and should be subject to smever. As for the The
I think we have 3 options here:
I think that would be up to the maintainers to decide, but all options work for us. |
This PR adds a single tracing channel that emits structured lifecycle events (start, end, asyncStart, asyncEnd, error) for every middleware, route handler, and error handler execution.
This builds on #96 but uses tracing channels instead which enables context propagation out of the box.
Context shape
The context object is pretty minimal, providing both
reqandresenables APMs to figure out how to react to that specific payload.The
layeris the Layer instance. Subscribers uselayer.name,layer.handle.length(4 for error handlers), andreq.route(set when inside a matched route) to classify and name spans however they see fit.Usage Example:
This is an example I've set up locally using a Sentry setup with this PR patched in:
This produces a waterfall view looking like this:
Note: The three spans that look the same at the bottom actually represent the three handlers executing for that route, it is up to the subscriber to decide which one is the route handler and how to name the other ones since it has access to
layer.Related work
Caveats
pillarjs.router.request, in the original PR it wasrequestbut I think channel names need to have a relationship with the module name otherwise we risk duplication in the ecosystem.hasSubscriberswill assume that there is a subscriber and will execute the traced path (with context object allocation). Multiple libraries likeredisandmysql2accepted this limitation.tracingChannelsupport will just not offer tracing of any kind. The fallback here is NOOP.tracingChannelwill be skipped viadescribe.skip.