arkerone · arkerone · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.changeset/async-latency-causes.md b/.changeset/async-latency-causes.md
@@ -0,0 +1,22 @@
+---
+"@lanterna-profiler/core": minor
+"@lanterna-profiler/detectors": minor
+"@lanterna-profiler/cli": patch
+---
+
+Make async profiling pinpoint *which* code is slow, *what* the latency is, and *why*.
+
+- Decompose async latency: capture `firstRunAtMs` (scheduling delay precursor) and derive `waitMs` (time waiting, not on CPU) and `scheduleDelayMs` per operation, plus per-family latency percentiles (`summary.byKindLatency`).
+- Classify the root cause of each operation's latency (`latencyCause` + `causeConfidence` + `causeEvidence`) by overlapping its wait windows with event-loop stalls, GC pauses, downstream-async activity, I/O kind, or CPU-bound execution.
+- Improve "which code" attribution: when an operation's own stack has no user frame, inherit the nearest one via the trigger ancestry (`attributedFrameOrigin`), and raise the default init-stack capture depth.
+- Make CPU→async attribution more precise and honest: attribute samples in overlapping ancestor/descendant run windows to the innermost async context instead of dropping them, grade CPU-attribution confidence by the unrelated-overlap ratio instead of collapsing on the first ambiguous sample, and report a real `clockSyncUncertaintyMs` (CDP jitter / clock resolution) instead of a placeholder.
+- Enrich the `long-await` finding with the latency decomposition and cause-specific guidance, and add a new `event-loop-blocked-async` detector that ties a slow async operation to the synchronous frame blocking the event loop.
+- Reliability refinements: classify long-lived idle resources as a distinct `background` latency cause (instead of mis-reading their incidental stall overlap as `event-loop-blocked`); when the event-loop heartbeat is unavailable, mark an `unknown` cause with `causeEvidence.basis = "no-eventloop-signal"` and add a quality reason, so missing signal is not conflated with "no problem"; and under `--async-max-events` pressure, evict the shortest-duration completed record instead of FIFO so the slow/long operations that matter for latency survive.
+- Cause-classification hardening (from an empirical audit against real targets):
+  - GC-pause overlap now uses each GC event's **actual pause duration** instead of a ±20ms padded window. Padding made dense sub-millisecond scavenges tile the whole timeline and blanket nearly every wait with a spurious ~100% GC overlap, so most operations — even on event-loop-blocked or I/O workloads — were mislabelled `gc-pause`. `gc-pause` is now correctly rare.
+  - The documented priority is now actually applied: a blocked event loop outranks a coincidental GC/downstream overlap rather than losing to whichever signal had the higher raw percentage.
+  - `event-loop-blocked` now requires the loop to have **still been stalled when the callback became runnable** (around `firstRunAtMs`); a stall that ended well before the operation ran is treated as a coincidental overlap, eliminating false `event-loop-blocked` labels on genuinely slow I/O whose wait merely spans an unrelated stall.
+  - Orphans (resources still in flight at capture end) are excluded from `topOperations` and `summary.byKindLatency` — their capture-clamped, fictional duration was dominating the ranking and skewing the percentiles — and remain reported in `orphans[]`.
+  - The `event-loop-blocked-async` detector stands down when no CPU hotspot identifies a culprit frame, instead of emitting a critical finding anchored at a placeholder `(event-loop)` location.
+  - Persistent/multiplexed handles (keep-alive sockets, HTTP parsers, pools, intervals) that activated more than once and stayed alive for ~the whole capture are now classified `background` instead of having their capture-length aggregate `waitMs` reported as a single `event-loop-blocked`/`long-await` finding. The `runCount > 1` discriminator preserves genuine single long operations (a discrete delayed callback runs at most once). Validated on a real HTTP server under load: the blocking `pbkdf2Sync` handler is still correctly surfaced, without the misleading multi-second findings on keep-alive connections.
+  - The `event-loop-blocked-async` detector now attributes the blocking frame **per stall** instead of stamping the single globally-dominant CPU hotspot on every blocked op. `profiles.cpu.eventLoop.stallIntervals[]` gains an optional `topFrame` (the user frame that dominated CPU during that specific stall), and the detector matches each delayed op to the stall active when its callback became runnable (`firstRunAtMs`), falling back to the global hotspot only when no stall matches. With several distinct blocking call sites, each delayed operation now points at its own culprit.
diff --git a/README.md b/README.md
@@ -28,15 +28,15 @@ Most Node.js profilers were designed for a human staring at a flamegraph. That's
 Lanterna takes a different stance:
 
 - **Structured JSON, not pixels.** The `LanternaReport` is a stable schema — hotspots, allocators, async chains, GC pauses, event-loop lag, and findings — that an agent can read, correlate, and act on directly.
-- **Detectors, not just data.** 18 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
+- **Detectors, not just data.** 19 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
 - **CPU + memory + async in one capture.** Combine kinds in a single run; cross-kind detectors like `alloc-in-hot-path` and `hot-async-context` surface the highest-priority fixes (something flamegraph tools can't represent).
 - **Spawn or attach.** Profile a CLI, a server under load, or a live production process — same report shape, same detector surface.
 
 ### Compared to other Node.js profilers
 
 | Tool | Primary output | CPU | Memory | Async | Findings / detectors | Agent-friendly |
 | --- | --- | :-: | :-: | :-: | :-: | :-: |
-| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 18 built-in, pluggable | ✅ |
+| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 19 built-in, pluggable | ✅ |
 | `node --prof` / `--cpu-prof` | V8 isolate log / `.cpuprofile` | ✅ | — | — | — | ⚠️ raw, post-processing required |
 | [0x](https://github.com/davidmarkclements/0x) | HTML flamegraph | ✅ | — | — | — | ❌ |
 | [Clinic.js](https://github.com/clinicjs/node-clinic) (Doctor / Flame / Bubbleprof) | HTML dashboards | ✅ | ⚠️ via Doctor | ⚠️ via Bubbleprof | ⚠️ heuristic recommendations | ❌ |
@@ -54,7 +54,7 @@ Lanterna is the right fit when the consumer of the report is **an agent or an au
 - **Two capture modes** — `lanterna run` to spawn & profile a command, `lanterna attach` to connect to a live process via the inspector. `lanterna ps` lists live `node`/`nodejs` processes (table or JSON) when you need to find a PID first.
 - **Three profile kinds** — opt in with `--kind`: `cpu` (V8 sampling profiler, default), `memory` (heap allocation profile + RSS series), and `async` (experimental async-resource profiling). Combine kinds by repeating `--kind` (`--kind cpu --kind memory`) or using commas (`--kind cpu,memory`).
 - **Enriched `LanternaReport`** — categorized hotspots, hot stacks, GC pauses, event-loop lag, allocator ranking, async chains, capture-integrity flags.
-- **18 built-in detectors** across CPU, memory, and async kinds, including 2 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`) — see the [Built-in detectors](#built-in-detectors) section below.
+- **19 built-in detectors** across CPU, memory, and async kinds, including 3 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`, `event-loop-blocked-async`) — see the [Built-in detectors](#built-in-detectors) section below.
 - **Stable JSON schema** with finding `confidence` and `proofLevel` fields so consumers can distinguish direct sampled evidence from heuristics.
 - **Extensible** — ship your own detectors and profile kinds as plugins.
 
@@ -87,7 +87,7 @@ lanterna run --kind memory --heap-snapshot-analysis --duration 60s -- node app.j
 
 ## Built-in detectors
 
-Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.
+Lanterna ships 19 detectors out of the box, including 3 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` and `event-loop-blocked-async` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.
 
 **CPU kind** (9)
 
@@ -112,7 +112,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
 | `external-buffer-pressure` | Off-heap pressure (Buffers, ArrayBuffers) |
 | `alloc-in-hot-path` | Allocators that are also CPU hot stacks — double impact, top-priority fix (cross-kind: requires both `cpu` and `memory`, auto-skips otherwise) |
 
-**Async kind** (experimental, 5)
+**Async kind** (experimental, 6)
 
 | ID | What it flags |
 | --- | --- |
@@ -121,6 +121,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
 | `deep-async-chain` | Deeply nested await chains amplifying latency |
 | `microtask-flood` | Microtask queue saturation starving the event loop |
 | `hot-async-context` | Async contexts dominating CPU (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |
+| `event-loop-blocked-async` | An async op's wait overlaps an event-loop stall — latency is a blocked loop, not slow I/O; anchored on the synchronous CPU frame (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |
 
 Built-in thresholds are exported as `DETECTOR_THRESHOLDS` for detector authors — see [docs/extending/detectors.md](docs/extending/detectors.md#thresholds). `.lanterna.json` configures capture options and plugin loading; see [docs/configuration.md](docs/configuration.md). To ship your own detectors, see [docs/extending/detectors.md](docs/extending/detectors.md).
 

diff --git a/docs/extending/detectors.md b/docs/extending/detectors.md
@@ -167,6 +167,7 @@ The default pack lives in `@lanterna-profiler/detectors` and pre-wires detectors
 | `orphan-async-resource` | Async resources never resolved or destroyed during capture. |
 | `microtask-flood` | Microtask volume crosses a per-window threshold (requires `--async-include-microtasks`). |
 | `hot-async-context:<rootAsyncId>` | Same async context repeatedly entered. |
+| `event-loop-blocked-async:<asyncId>` | An async op's `waitMs` overlaps an event-loop stall, with the loop still blocked when the callback became runnable — latency is a blocked loop, not slow I/O. Anchored on the synchronous CPU frame. Requires `--kind cpu,async`. |
 
 ### Cross-kind