Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .changeset/async-latency-causes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
"@lanterna-profiler/core": minor
"@lanterna-profiler/detectors": minor
"@lanterna-profiler/cli": patch
---

Make async profiling pinpoint *which* code is slow, *what* the latency is, and *why*.

- Decompose async latency: capture `firstRunAtMs` (scheduling delay precursor) and derive `waitMs` (time waiting, not on CPU) and `scheduleDelayMs` per operation, plus per-family latency percentiles (`summary.byKindLatency`).
- Classify the root cause of each operation's latency (`latencyCause` + `causeConfidence` + `causeEvidence`) by overlapping its wait windows with event-loop stalls, GC pauses, downstream-async activity, I/O kind, or CPU-bound execution.
- Improve "which code" attribution: when an operation's own stack has no user frame, inherit the nearest one via the trigger ancestry (`attributedFrameOrigin`), and raise the default init-stack capture depth.
- Make CPU→async attribution more precise and honest: attribute samples in overlapping ancestor/descendant run windows to the innermost async context instead of dropping them, grade CPU-attribution confidence by the unrelated-overlap ratio instead of collapsing on the first ambiguous sample, and report a real `clockSyncUncertaintyMs` (CDP jitter / clock resolution) instead of a placeholder.
- Enrich the `long-await` finding with the latency decomposition and cause-specific guidance, and add a new `event-loop-blocked-async` detector that ties a slow async operation to the synchronous frame blocking the event loop.
- Reliability refinements: classify long-lived idle resources as a distinct `background` latency cause (instead of mis-reading their incidental stall overlap as `event-loop-blocked`); when the event-loop heartbeat is unavailable, mark an `unknown` cause with `causeEvidence.basis = "no-eventloop-signal"` and add a quality reason, so missing signal is not conflated with "no problem"; and under `--async-max-events` pressure, evict the shortest-duration completed record instead of FIFO so the slow/long operations that matter for latency survive.
- Cause-classification hardening (from an empirical audit against real targets):
- GC-pause overlap now uses each GC event's **actual pause duration** instead of a ±20ms padded window. Padding made dense sub-millisecond scavenges tile the whole timeline and blanket nearly every wait with a spurious ~100% GC overlap, so most operations — even on event-loop-blocked or I/O workloads — were mislabelled `gc-pause`. `gc-pause` is now correctly rare.
- The documented priority is now actually applied: a blocked event loop outranks a coincidental GC/downstream overlap rather than losing to whichever signal had the higher raw percentage.
- `event-loop-blocked` now requires the loop to have **still been stalled when the callback became runnable** (around `firstRunAtMs`); a stall that ended well before the operation ran is treated as a coincidental overlap, eliminating false `event-loop-blocked` labels on genuinely slow I/O whose wait merely spans an unrelated stall.
- Orphans (resources still in flight at capture end) are excluded from `topOperations` and `summary.byKindLatency` — their capture-clamped, fictional duration was dominating the ranking and skewing the percentiles — and remain reported in `orphans[]`.
- The `event-loop-blocked-async` detector stands down when no CPU hotspot identifies a culprit frame, instead of emitting a critical finding anchored at a placeholder `(event-loop)` location.
- Persistent/multiplexed handles (keep-alive sockets, HTTP parsers, pools, intervals) that activated more than once and stayed alive for ~the whole capture are now classified `background` instead of having their capture-length aggregate `waitMs` reported as a single `event-loop-blocked`/`long-await` finding. The `runCount > 1` discriminator preserves genuine single long operations (a discrete delayed callback runs at most once). Validated on a real HTTP server under load: the blocking `pbkdf2Sync` handler is still correctly surfaced, without the misleading multi-second findings on keep-alive connections.
- The `event-loop-blocked-async` detector now attributes the blocking frame **per stall** instead of stamping the single globally-dominant CPU hotspot on every blocked op. `profiles.cpu.eventLoop.stallIntervals[]` gains an optional `topFrame` (the user frame that dominated CPU during that specific stall), and the detector matches each delayed op to the stall active when its callback became runnable (`firstRunAtMs`), falling back to the global hotspot only when no stall matches. With several distinct blocking call sites, each delayed operation now points at its own culprit.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ Most Node.js profilers were designed for a human staring at a flamegraph. That's
Lanterna takes a different stance:

- **Structured JSON, not pixels.** The `LanternaReport` is a stable schema — hotspots, allocators, async chains, GC pauses, event-loop lag, and findings — that an agent can read, correlate, and act on directly.
- **Detectors, not just data.** 18 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
- **Detectors, not just data.** 19 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
- **CPU + memory + async in one capture.** Combine kinds in a single run; cross-kind detectors like `alloc-in-hot-path` and `hot-async-context` surface the highest-priority fixes (something flamegraph tools can't represent).
- **Spawn or attach.** Profile a CLI, a server under load, or a live production process — same report shape, same detector surface.

### Compared to other Node.js profilers

| Tool | Primary output | CPU | Memory | Async | Findings / detectors | Agent-friendly |
| --- | --- | :-: | :-: | :-: | :-: | :-: |
| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 18 built-in, pluggable | ✅ |
| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 19 built-in, pluggable | ✅ |
| `node --prof` / `--cpu-prof` | V8 isolate log / `.cpuprofile` | ✅ | — | — | — | ⚠️ raw, post-processing required |
| [0x](https://github.com/davidmarkclements/0x) | HTML flamegraph | ✅ | — | — | — | ❌ |
| [Clinic.js](https://github.com/clinicjs/node-clinic) (Doctor / Flame / Bubbleprof) | HTML dashboards | ✅ | ⚠️ via Doctor | ⚠️ via Bubbleprof | ⚠️ heuristic recommendations | ❌ |
Expand All @@ -54,7 +54,7 @@ Lanterna is the right fit when the consumer of the report is **an agent or an au
- **Two capture modes** — `lanterna run` to spawn & profile a command, `lanterna attach` to connect to a live process via the inspector. `lanterna ps` lists live `node`/`nodejs` processes (table or JSON) when you need to find a PID first.
- **Three profile kinds** — opt in with `--kind`: `cpu` (V8 sampling profiler, default), `memory` (heap allocation profile + RSS series), and `async` (experimental async-resource profiling). Combine kinds by repeating `--kind` (`--kind cpu --kind memory`) or using commas (`--kind cpu,memory`).
- **Enriched `LanternaReport`** — categorized hotspots, hot stacks, GC pauses, event-loop lag, allocator ranking, async chains, capture-integrity flags.
- **18 built-in detectors** across CPU, memory, and async kinds, including 2 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`) — see the [Built-in detectors](#built-in-detectors) section below.
- **19 built-in detectors** across CPU, memory, and async kinds, including 3 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`, `event-loop-blocked-async`) — see the [Built-in detectors](#built-in-detectors) section below.
- **Stable JSON schema** with finding `confidence` and `proofLevel` fields so consumers can distinguish direct sampled evidence from heuristics.
- **Extensible** — ship your own detectors and profile kinds as plugins.

Expand Down Expand Up @@ -87,7 +87,7 @@ lanterna run --kind memory --heap-snapshot-analysis --duration 60s -- node app.j

## Built-in detectors

Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.
Lanterna ships 19 detectors out of the box, including 3 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` and `event-loop-blocked-async` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.

**CPU kind** (9)

Expand All @@ -112,7 +112,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
| `external-buffer-pressure` | Off-heap pressure (Buffers, ArrayBuffers) |
| `alloc-in-hot-path` | Allocators that are also CPU hot stacks — double impact, top-priority fix (cross-kind: requires both `cpu` and `memory`, auto-skips otherwise) |

**Async kind** (experimental, 5)
**Async kind** (experimental, 6)

| ID | What it flags |
| --- | --- |
Expand All @@ -121,6 +121,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
| `deep-async-chain` | Deeply nested await chains amplifying latency |
| `microtask-flood` | Microtask queue saturation starving the event loop |
| `hot-async-context` | Async contexts dominating CPU (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |
| `event-loop-blocked-async` | An async op's wait overlaps an event-loop stall — latency is a blocked loop, not slow I/O; anchored on the synchronous CPU frame (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |

Built-in thresholds are exported as `DETECTOR_THRESHOLDS` for detector authors — see [docs/extending/detectors.md](docs/extending/detectors.md#thresholds). `.lanterna.json` configures capture options and plugin loading; see [docs/configuration.md](docs/configuration.md). To ship your own detectors, see [docs/extending/detectors.md](docs/extending/detectors.md).

Expand Down
1 change: 1 addition & 0 deletions docs/extending/detectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ The default pack lives in `@lanterna-profiler/detectors` and pre-wires detectors
| `orphan-async-resource` | Async resources never resolved or destroyed during capture. |
| `microtask-flood` | Microtask volume crosses a per-window threshold (requires `--async-include-microtasks`). |
| `hot-async-context:<rootAsyncId>` | Same async context repeatedly entered. |
| `event-loop-blocked-async:<asyncId>` | An async op's `waitMs` overlaps an event-loop stall, with the loop still blocked when the callback became runnable — latency is a blocked loop, not slow I/O. Anchored on the synchronous CPU frame. Requires `--kind cpu,async`. |

### Cross-kind

Expand Down
Loading
Loading