diff --git a/.changeset/async-latency-causes.md b/.changeset/async-latency-causes.md
new file mode 100644
index 0000000..16706f5
--- /dev/null
+++ b/.changeset/async-latency-causes.md
@@ -0,0 +1,22 @@
+---
+"@lanterna-profiler/core": minor
+"@lanterna-profiler/detectors": minor
+"@lanterna-profiler/cli": patch
+---
+
+Make async profiling pinpoint *which* code is slow, *what* the latency is, and *why*.
+
+- Decompose async latency: capture `firstRunAtMs` (scheduling delay precursor) and derive `waitMs` (time waiting, not on CPU) and `scheduleDelayMs` per operation, plus per-family latency percentiles (`summary.byKindLatency`).
+- Classify the root cause of each operation's latency (`latencyCause` + `causeConfidence` + `causeEvidence`) by overlapping its wait windows with event-loop stalls, GC pauses, downstream-async activity, I/O kind, or CPU-bound execution.
+- Improve "which code" attribution: when an operation's own stack has no user frame, inherit the nearest one via the trigger ancestry (`attributedFrameOrigin`), and raise the default init-stack capture depth.
+- Make CPU→async attribution more precise and honest: attribute samples in overlapping ancestor/descendant run windows to the innermost async context instead of dropping them, grade CPU-attribution confidence by the unrelated-overlap ratio instead of collapsing on the first ambiguous sample, and report a real `clockSyncUncertaintyMs` (CDP jitter / clock resolution) instead of a placeholder.
+- Enrich the `long-await` finding with the latency decomposition and cause-specific guidance, and add a new `event-loop-blocked-async` detector that ties a slow async operation to the synchronous frame blocking the event loop.
+- Reliability refinements: classify long-lived idle resources as a distinct `background` latency cause (instead of mis-reading their incidental stall overlap as `event-loop-blocked`); when the event-loop heartbeat is unavailable, mark an `unknown` cause with `causeEvidence.basis = "no-eventloop-signal"` and add a quality reason, so missing signal is not conflated with "no problem"; and under `--async-max-events` pressure, evict the shortest-duration completed record instead of FIFO so the slow/long operations that matter for latency survive.
+- Cause-classification hardening (from an empirical audit against real targets):
+  - GC-pause overlap now uses each GC event's **actual pause duration** instead of a ±20ms padded window. Padding made dense sub-millisecond scavenges tile the whole timeline and blanket nearly every wait with a spurious ~100% GC overlap, so most operations — even on event-loop-blocked or I/O workloads — were mislabelled `gc-pause`. `gc-pause` is now correctly rare.
+  - The documented priority is now actually applied: a blocked event loop outranks a coincidental GC/downstream overlap rather than losing to whichever signal had the higher raw percentage.
+  - `event-loop-blocked` now requires the loop to have **still been stalled when the callback became runnable** (around `firstRunAtMs`); a stall that ended well before the operation ran is treated as a coincidental overlap, eliminating false `event-loop-blocked` labels on genuinely slow I/O whose wait merely spans an unrelated stall.
+  - Orphans (resources still in flight at capture end) are excluded from `topOperations` and `summary.byKindLatency` — their capture-clamped, fictional duration was dominating the ranking and skewing the percentiles — and remain reported in `orphans[]`.
+  - The `event-loop-blocked-async` detector stands down when no CPU hotspot identifies a culprit frame, instead of emitting a critical finding anchored at a placeholder `(event-loop)` location.
+  - Persistent/multiplexed handles (keep-alive sockets, HTTP parsers, pools, intervals) that activated more than once and stayed alive for ~the whole capture are now classified `background` instead of having their capture-length aggregate `waitMs` reported as a single `event-loop-blocked`/`long-await` finding. The `runCount > 1` discriminator preserves genuine single long operations (a discrete delayed callback runs at most once). Validated on a real HTTP server under load: the blocking `pbkdf2Sync` handler is still correctly surfaced, without the misleading multi-second findings on keep-alive connections.
+  - The `event-loop-blocked-async` detector now attributes the blocking frame **per stall** instead of stamping the single globally-dominant CPU hotspot on every blocked op. `profiles.cpu.eventLoop.stallIntervals[]` gains an optional `topFrame` (the user frame that dominated CPU during that specific stall), and the detector matches each delayed op to the stall active when its callback became runnable (`firstRunAtMs`), falling back to the global hotspot only when no stall matches. With several distinct blocking call sites, each delayed operation now points at its own culprit.
diff --git a/README.md b/README.md
index 87c5b17..cdbf330 100644
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ Most Node.js profilers were designed for a human staring at a flamegraph. That's
 Lanterna takes a different stance:
 
 - **Structured JSON, not pixels.** The `LanternaReport` is a stable schema — hotspots, allocators, async chains, GC pauses, event-loop lag, and findings — that an agent can read, correlate, and act on directly.
-- **Detectors, not just data.** 18 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
+- **Detectors, not just data.** 19 built-in detectors emit categorized `findings` (sync crypto, blocking I/O, deopt loops, memory growth, orphan async resources, …) with `confidence` and `proofLevel` so consumers know when to trust a hypothesis vs. require corroboration.
 - **CPU + memory + async in one capture.** Combine kinds in a single run; cross-kind detectors like `alloc-in-hot-path` and `hot-async-context` surface the highest-priority fixes (something flamegraph tools can't represent).
 - **Spawn or attach.** Profile a CLI, a server under load, or a live production process — same report shape, same detector surface.
 
@@ -36,7 +36,7 @@ Lanterna takes a different stance:
 
 | Tool | Primary output | CPU | Memory | Async | Findings / detectors | Agent-friendly |
 | --- | --- | :-: | :-: | :-: | :-: | :-: |
-| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 18 built-in, pluggable | ✅ |
+| **Lanterna** | Structured JSON (+ text/markdown/agent renderers) | ✅ | ✅ | ✅ (experimental) | ✅ 19 built-in, pluggable | ✅ |
 | `node --prof` / `--cpu-prof` | V8 isolate log / `.cpuprofile` | ✅ | — | — | — | ⚠️ raw, post-processing required |
 | [0x](https://github.com/davidmarkclements/0x) | HTML flamegraph | ✅ | — | — | — | ❌ |
 | [Clinic.js](https://github.com/clinicjs/node-clinic) (Doctor / Flame / Bubbleprof) | HTML dashboards | ✅ | ⚠️ via Doctor | ⚠️ via Bubbleprof | ⚠️ heuristic recommendations | ❌ |
@@ -54,7 +54,7 @@ Lanterna is the right fit when the consumer of the report is **an agent or an au
 - **Two capture modes** — `lanterna run` to spawn & profile a command, `lanterna attach` to connect to a live process via the inspector. `lanterna ps` lists live `node`/`nodejs` processes (table or JSON) when you need to find a PID first.
 - **Three profile kinds** — opt in with `--kind`: `cpu` (V8 sampling profiler, default), `memory` (heap allocation profile + RSS series), and `async` (experimental async-resource profiling). Combine kinds by repeating `--kind` (`--kind cpu --kind memory`) or using commas (`--kind cpu,memory`).
 - **Enriched `LanternaReport`** — categorized hotspots, hot stacks, GC pauses, event-loop lag, allocator ranking, async chains, capture-integrity flags.
-- **18 built-in detectors** across CPU, memory, and async kinds, including 2 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`) — see the [Built-in detectors](#built-in-detectors) section below.
+- **19 built-in detectors** across CPU, memory, and async kinds, including 3 cross-kind detectors (`alloc-in-hot-path`, `hot-async-context`, `event-loop-blocked-async`) — see the [Built-in detectors](#built-in-detectors) section below.
 - **Stable JSON schema** with finding `confidence` and `proofLevel` fields so consumers can distinguish direct sampled evidence from heuristics.
 - **Extensible** — ship your own detectors and profile kinds as plugins.
 
@@ -87,7 +87,7 @@ lanterna run --kind memory --heap-snapshot-analysis --duration 60s -- node app.j
 
 ## Built-in detectors
 
-Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.
+Lanterna ships 19 detectors out of the box, including 3 cross-kind detectors (`alloc-in-hot-path` for `cpu + memory`, `hot-async-context` and `event-loop-blocked-async` for `cpu + async`). Each emits a `Finding` in the report with `confidence` and `proofLevel` so consumers can distinguish direct sampled evidence from heuristics.
 
 **CPU kind** (9)
 
@@ -112,7 +112,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
 | `external-buffer-pressure` | Off-heap pressure (Buffers, ArrayBuffers) |
 | `alloc-in-hot-path` | Allocators that are also CPU hot stacks — double impact, top-priority fix (cross-kind: requires both `cpu` and `memory`, auto-skips otherwise) |
 
-**Async kind** (experimental, 5)
+**Async kind** (experimental, 6)
 
 | ID | What it flags |
 | --- | --- |
@@ -121,6 +121,7 @@ Lanterna ships 18 detectors out of the box, including 2 cross-kind detectors (`a
 | `deep-async-chain` | Deeply nested await chains amplifying latency |
 | `microtask-flood` | Microtask queue saturation starving the event loop |
 | `hot-async-context` | Async contexts dominating CPU (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |
+| `event-loop-blocked-async` | An async op's wait overlaps an event-loop stall — latency is a blocked loop, not slow I/O; anchored on the synchronous CPU frame (cross-kind: requires both `cpu` and `async`, auto-skips otherwise) |
 
 Built-in thresholds are exported as `DETECTOR_THRESHOLDS` for detector authors — see [docs/extending/detectors.md](docs/extending/detectors.md#thresholds). `.lanterna.json` configures capture options and plugin loading; see [docs/configuration.md](docs/configuration.md). To ship your own detectors, see [docs/extending/detectors.md](docs/extending/detectors.md).
 
diff --git a/docs/extending/detectors.md b/docs/extending/detectors.md
index cdb558e..7837237 100644
--- a/docs/extending/detectors.md
+++ b/docs/extending/detectors.md
@@ -167,6 +167,7 @@ The default pack lives in `@lanterna-profiler/detectors` and pre-wires detectors
 | `orphan-async-resource` | Async resources never resolved or destroyed during capture. |
 | `microtask-flood` | Microtask volume crosses a per-window threshold (requires `--async-include-microtasks`). |
 | `hot-async-context:<rootAsyncId>` | Same async context repeatedly entered. |
+| `event-loop-blocked-async:<asyncId>` | An async op's `waitMs` overlaps an event-loop stall, with the loop still blocked when the callback became runnable — latency is a blocked loop, not slow I/O. Anchored on the synchronous CPU frame. Requires `--kind cpu,async`. |
 
 ### Cross-kind
 
diff --git a/docs/kinds/async.md b/docs/kinds/async.md
index 8779476..7ac5101 100644
--- a/docs/kinds/async.md
+++ b/docs/kinds/async.md
@@ -41,15 +41,16 @@ Async-specific options:
 
 `profiles.async.*` exposes:
 
-- **`summary`** — availability, `collectedVia`, operation counts by kind, duration stats, concurrency summary, orphan count, dropped record count, and optional `topAsyncHotFile`.
+- **`summary`** — availability, `collectedVia`, operation counts by kind, duration stats, concurrency summary, orphan count, dropped record count, optional `topAsyncHotFile`, and **`byKindLatency`** (per-family `p50/p95/p99/max` of total-lifetime `durationMs` plus `meanWaitMs`, computed over **completed operations only** — orphans are excluded so their capture-clamped, fictional duration cannot skew the percentiles — e.g. compare `http` p99 against `fs` p99).
 - **`topOperations`**, **`hotFiles`**, and **`cpuAttribution.topChains`** — ranked async operations, hot user files, and CPU-over-window chains. Entries include `userCaller` when an existing user frame can anchor the work; CPU-window execution frames use `basis: "async-cpu-window"`, otherwise stack-derived anchors use `basis: "async-stack"`.
+- **`topOperations[]` latency decomposition** — each operation carries `durationMs` (total lifetime), `runMs` (time on CPU), **`waitMs`** (time alive but *not* on CPU — the real latency), **`scheduleDelayMs`** (init → first run, i.e. queue/scheduling delay), and **`firstRunAtMs`**. The classified **`latencyCause`** (`event-loop-blocked` | `gc-pause` | `downstream-async` | `io-wait` | `cpu-bound` | `background` | `unknown`) plus `causeConfidence` and `causeEvidence` explain *why* the operation was slow, and **`attributedFrameOrigin`** records where the anchored user frame came from (`self`, `inherited-trigger`, `cpu-window`, or `cdp`). `background` marks a long-lived resource that is not a latency bug: either idle (never ran, alive ~the whole capture) or a **persistent/multiplexed** handle (keep-alive socket, HTTP parser, pool, interval) that activated many times across most of the capture — its aggregate `waitMs` is the idle gap between activations, not a single delayed callback. When `latencyCause` is `unknown`, `causeEvidence.basis` is `no-eventloop-signal` if the event-loop heartbeat was unavailable (the loop could not be checked) versus `none` if it simply did not overlap any signal. Orphans (resources still in flight at capture end) are **excluded** from `topOperations` and listed in `orphans[]` instead, so their fictional capture-clamped duration does not dominate the ranking.
 - **`chains`** — async parent chains rooted at user-code, with depth and frame counts. Drives `deep-async-chain` findings.
 - **`topOperations[].awaitFrame` / `primaryReason: "await"`** — await-boundary attribution when available. Drives `long-await` findings.
 - **`orphans`** — resources that never resolved or destroyed during capture. Drives `orphan-async-resource` findings.
 - **`concurrencyTimeline`** — timeline of inflight and active async work at the configured cadence.
 - **`filteredCounts`** — counts of async resources filtered from the public rankings.
 - **`cdpAsyncContexts`** — supplemental CDP async stacks, when CDP provided them.
-- **`quality`** — `attachPartialCapture`, `cdpAsyncStackCoverageRatio`, `recordsDropped`, CPU attribution coverage, clock-sync uncertainty, `reasons[]`, and `recommendations[]`. Full-instrumentation rewrite counters live under `meta.kinds.async.transformStats`. See [signal-quality.md](../signal-quality.md#profilesasync-quality).
+- **`quality`** — `attachPartialCapture`, `sampledStackRatio`, **`attributedStackRatio`** (fraction of operations with a user-editable frame, from their own stack or inherited via the trigger ancestry), `cdpAsyncStackCoverageRatio`, `recordsDropped`, CPU attribution coverage, **`ambiguousRatio`** (CPU samples that fell in overlapping *unrelated* run windows), a real measured `clockSyncUncertaintyMs`, `reasons[]`, and `recommendations[]`. Full-instrumentation rewrite counters live under `meta.kinds.async.transformStats`. See [signal-quality.md](../signal-quality.md#profilesasync-quality).
 
 ## Findings
 
@@ -60,6 +61,7 @@ Async-specific options:
 | `orphan-async-resource` | Async resources that initialized during capture but never resolved or destroyed. |
 | `microtask-flood` | Microtask volume crosses a per-window threshold (requires `--async-include-microtasks`). |
 | `hot-async-context:<rootAsyncId>` | Same async context repeatedly entered — suggests a hot route that should be batched or memoized. |
+| `event-loop-blocked-async:<asyncId>` | A slow async operation whose `waitMs` overlaps an event-loop stall — the latency is a blocked loop, not slow I/O. Anchored on the synchronous CPU frame that blocked the loop (needs `--kind cpu,async`). |
 
 ## Reading order
 
@@ -76,5 +78,12 @@ Async-specific options:
 - **`--async-instrumentation full` is experimental.** It rewrites `await` sites in modules loaded **after** registration. Code loaded earlier is not covered. It can interact poorly with bundlers, source maps, or other instrumentation hooks. Stick to `safe` unless `safe` cannot identify the await sites you need.
 - **Microtasks default to off.** Enabling `--async-include-microtasks` produces very noisy reports. Use it only for the `microtask-flood` finding.
 - **Dropped records are sampled, not lost forever.** `quality.recordsDropped > 0` means raise `--async-max-events` for the next run if completeness matters.
-- **User callers are anchors, not proof.** Async `userCaller` is derived from already captured user frames. Prefer high-confidence CPU-window attribution when present; stack-only callers should guide inspection rather than be treated as the definitive line to edit.
+- **User callers are anchors, not proof.** Async `userCaller` is derived from already captured user frames. Prefer high-confidence CPU-window attribution when present; stack-only callers should guide inspection rather than be treated as the definitive line to edit. When an operation's own stack has no user frame, the frame may be inherited from the trigger ancestry — `attributedFrameOrigin: "inherited-trigger"` flags this (lower confidence than `self`).
+- **Latency cause is a windowed correlation.** `latencyCause` is derived by overlapping an operation's wait windows with event-loop stalls, GC pauses, and downstream-async activity (or by I/O kind / CPU ratio). It is a directional explanation with `causeConfidence` + `causeEvidence`, not proof of causation — treat `unknown` as "not enough signal", not "no problem". Per-cause limits worth knowing:
+  - **`event-loop-blocked`** requires the loop to have still been stalled when the callback became *runnable* (around `firstRunAtMs`). A stall that ended well before the operation ran is treated as a coincidental overlap, not the cause — so a genuinely slow I/O whose wait merely *spans* an unrelated stall is not mislabelled.
+  - **`gc-pause`** is matched against the *actual* GC pause durations (not padded windows), so it only fires when GC genuinely dominates a wait — which is rare, because most GC pauses are sub-millisecond. Expect `gc-pause` to be uncommon; its absence is not evidence that GC is cheap.
+  - **`downstream-async`** only fires when a **trigger-descendant runs on CPU** during the parent's wait. Work you `await` that is itself *waiting* (e.g. a timer or socket resolved on a sibling resource, not a trigger-descendant) is **not** counted and typically shows as `unknown`. Do not read the absence of `downstream-async` as "nothing downstream".
+  - **One `await` fragments into several promise resources** (the async function's result promise, the awaited promise, intermediate reactions). Only the resource that actually carries the work is classified — the one running CPU shows as `cpu-bound`, while its sibling/parent promises that merely wait on it commonly show as `unknown`. Read `topOperations` at the level of the resource that carries `runMs`/`waitMs`, not the count of `unknown` siblings.
+- **The blocking frame is attributed per stall.** The `event-loop-blocked-async` finding anchors on the user frame that dominated CPU during the *specific* stall that delayed each operation — matched by when the callback became runnable (`firstRunAtMs`) — so several distinct blocking call sites each point at their own culprit rather than one globally-dominant frame (`profiles.cpu.eventLoop.stallIntervals[].topFrame` carries the per-stall culprit). It falls back to the globally-dominant hotspot only when an op's run time matches no stall, and stands down entirely when no CPU hotspot correlates.
+- **CPU↔async attribution is statistical, with a reported bound.** CPU sample times are profile-relative (≈ capture-relative); the residual skew versus the async timeline is the small `Profiler.start`↔capture-start startup gap, surfaced as `quality.clockSyncUncertaintyMs` (a real measured bound — CDP round-trip jitter / `performance.now()` resolution, replacing the former misleading value). The precision win is in attribution, not the clock: samples in overlapping ancestor/descendant run windows are attributed to the innermost async context, and only genuinely *unrelated* overlapping windows count toward `quality.ambiguousRatio`, which lowers CPU-attribution confidence proportionally instead of dropping the sample.
 - **Public async file paths are normalized.** When V8/CDP reports `file://` URLs, Lanterna converts them to normal filesystem paths before grouping hot files, chains, and finding evidence. Virtual bundler URLs are kept as-is.
diff --git a/docs/report-schema.md b/docs/report-schema.md
index 70bcb7b..6485197 100644
--- a/docs/report-schema.md
+++ b/docs/report-schema.md
@@ -115,7 +115,7 @@ When present, prefer `source.file:source.line` for human diagnosis and patching,
 | `hotStacks` | Most frequent complete sampled stacks with `weightPct` and `frames[]`. |
 | `hotStackClusters` | Optional hot-stack groups anchored on the nearest user-code frame. |
 | `gc` | Pause totals, counts, `longestPauseMs`, `pausesOver10ms`, `correlatedHotspots`. |
-| `eventLoop` | `available`, `measurementBasis` (`both`/`heartbeats`/`histogram`/`none`), `confidence`, lag percentiles, `stallIntervals`, `correlatedHotspots`. |
+| `eventLoop` | `available`, `measurementBasis` (`both`/`heartbeats`/`histogram`/`none`), `confidence`, lag percentiles, `stallIntervals` (each with an optional `topFrame` — the user frame that dominated CPU during that specific stall), `correlatedHotspots`. |
 | `deopts` | V8 deoptimisation clusters — populated only when `meta.kinds.cpu.deep === true`. |
 
 Detail: [kinds/cpu.md](./kinds/cpu.md).
@@ -136,6 +136,8 @@ Detail: [kinds/memory.md](./kinds/memory.md).
 
 Async resource lifecycle summaries, `topOperations`, `hotFiles`, `chains`, `orphans`, `concurrencyTimeline`, `filteredCounts`, `cdpAsyncContexts`, `cpuAttribution`, and quality metadata. Only present when `--kind async` was selected. In attach mode, capture is intentionally partial — the section's `quality` records this.
 
+`summary.byKindLatency` adds per-family latency percentiles (`p50/p95/p99/maxMs` + `meanWaitMs`). Each `topOperations[]` entry decomposes latency into `durationMs` (total), `runMs` (on-CPU), `waitMs` (waiting, not on CPU), `scheduleDelayMs`, and `firstRunAtMs`, plus a classified `latencyCause` (`event-loop-blocked` | `gc-pause` | `downstream-async` | `io-wait` | `cpu-bound` | `background` | `unknown`) with `causeConfidence`/`causeEvidence` (where `causeEvidence.basis` distinguishes `no-eventloop-signal` from `none` for `unknown`), and `attributedFrameOrigin` (`self` | `inherited-trigger` | `cpu-window` | `cdp`). `quality` adds `attributedStackRatio` and `ambiguousRatio`, and `clockSyncUncertaintyMs` is now a real measured bound (CDP jitter / clock resolution) rather than a placeholder. These fields are additive and optional within schema v2.
+
 Detail: [kinds/async.md](./kinds/async.md).
 
 ## `findings[]`
diff --git a/docs/signal-quality.md b/docs/signal-quality.md
index ec34fd6..258718b 100644
--- a/docs/signal-quality.md
+++ b/docs/signal-quality.md
@@ -83,8 +83,12 @@ Memory-specific signals to inspect alongside `profiles.memory.quality`:
 The async kind reports its own quality fields:
 
 - `quality.attachPartialCapture` — `true` in attach mode, signaling that resources created before hook installation cannot be observed.
+- `quality.sampledStackRatio` — fraction of operations whose own init stack contained a user frame.
+- `quality.attributedStackRatio` — fraction of operations with a user-editable frame from their own stack **or** inherited via the trigger ancestry. Higher than `sampledStackRatio` because inheritance recovers call sites for operations created deep inside dependencies; drives how often a finding can point at editable code.
 - `quality.cdpAsyncStackCoverageRatio` — fraction of resources for which Lanterna obtained a CDP async stack. Low coverage weakens chain-related findings.
 - `quality.recordsDropped` — number of records discarded once `--async-max-events` was reached. A non-zero value means the report is sampled, not exhaustive.
+- `quality.ambiguousRatio` — fraction of CPU samples that fell in overlapping *unrelated* async run windows (related ancestor/descendant windows are resolved to the innermost context, not counted). CPU-attribution confidence is graded by this ratio rather than collapsing to `low` on the first ambiguous sample.
+- `quality.clockSyncUncertaintyMs` — a real measured bound on CPU↔async clock alignment (CDP round-trip jitter / `performance.now()` resolution), not a placeholder. CPU sample times are treated as capture-relative; this bounds the residual `Profiler.start`↔capture-start skew.
 - `meta.kinds.async.transformStats.failed` — counter for `--async-instrumentation full` rewrite failures. These are non-fatal; when full instrumentation is partial, `quality.reasons[]` and `quality.recommendations[]` explain how to interpret the report.
 
 ## Failure and degradation modes
diff --git a/examples/async-latency/README.md b/examples/async-latency/README.md
new file mode 100644
index 0000000..a56f0fc
--- /dev/null
+++ b/examples/async-latency/README.md
@@ -0,0 +1,44 @@
+# Async latency example — one script, five latency causes
+
+A Node script that repeatedly triggers five distinct async latency patterns so you can see how Lanterna decomposes latency (`waitMs` / `scheduleDelayMs` vs `runMs`) and classifies its **root cause** per operation.
+
+## Run
+
+From this directory:
+
+```bash
+npx -y @lanterna-profiler/cli run --kind cpu,async --duration 20s --output report.json -- node app.js
+npx -y @lanterna-profiler/cli report report.json --format text
+```
+
+## What you should see
+
+In `profiles.async.topOperations[]`, operations classified by `latencyCause`. The reliably-classified cases:
+
+- **`event-loop-blocked`** — a timer scheduled for ~30ms that fires ~350ms late because a synchronous busy-loop blocked the event loop. Its `waitMs` is large, `runMs` ~0, and a top-level **`event-loop-blocked-async`** finding points at `blockLoopFor` (the synchronous culprit), *not* the timer.
+- **`cpu-bound`** — `cpuBoundCase` where `runMs` ≈ `durationMs` (the awaited work is CPU, not waiting).
+- **`io-wait`** — the `fs.readFile` of `package-lock.json` (kind `fs`, when no loop stall overlaps it).
+
+Two cases deliberately show the **limits** of cause classification (see [docs/kinds/async.md](../../docs/kinds/async.md) → Caveats):
+
+- **`downstreamCase`** does **not** become `downstream-async`. It awaits `inner()`'s 250ms timer, but a timer only *waits* — it never *runs* — so there are no descendant run windows to overlap. It typically shows as **`unknown`**. `downstream-async` only fires when a trigger-descendant burns CPU during the parent's wait.
+- **`gcCase`** rarely becomes `gc-pause`. GC pauses are matched against their real (mostly sub-millisecond) durations, so they seldom cover ≥50% of a wait. Expect **`cpu-bound`** (allocation is CPU) or **`unknown`**.
+
+Also inspect:
+
+- `profiles.async.summary.byKindLatency` — per-family p50/p95/p99 and `meanWaitMs`.
+- `profiles.async.quality.clockSyncUncertaintyMs` — small but non-zero (CPU↔async clock anchoring is active).
+- `profiles.async.quality.attributedStackRatio` / `ambiguousRatio`.
+
+## Inspect the JSON directly
+
+```bash
+jq '.profiles.async.topOperations[] | {kind, durationMs, waitMs, scheduleDelayMs, runMs, latencyCause, causeConfidence, attributedFrameOrigin}' report.json
+jq '.profiles.async.summary.byKindLatency' report.json
+jq '.findings[] | select(.category=="event-loop-blocked-async")' report.json
+```
+
+## What to try next
+
+- Replace `blockLoopFor(350)` with `await new Promise(r => setTimeout(r, 350))` — the `event-loop-blocked` cause and the `event-loop-blocked-async` finding should clear.
+- Move `cpuBoundCase`'s loop to a worker thread — its operation should drop out of the CPU-bound bucket.
diff --git a/examples/async-latency/app.js b/examples/async-latency/app.js
new file mode 100644
index 0000000..53caf02
--- /dev/null
+++ b/examples/async-latency/app.js
@@ -0,0 +1,86 @@
+// Async latency demo — exercises the async kind's latency-cause classifier.
+// Run with `--kind cpu,async`. Reliably classified, per profiles.async.topOperations:
+//   - latencyCause: "event-loop-blocked"  (callback ready, loop busy) plus a
+//                                          top-level event-loop-blocked-async finding
+//   - latencyCause: "cpu-bound"           (the handler itself burns CPU)
+//   - latencyCause: "io-wait"             (genuine fs read, no stall overlap)
+// The other two cases illustrate the classifier's LIMITS (see docs/kinds/async.md):
+//   - downstreamCase -> usually "unknown": the awaited inner work only *waits*
+//                       (a timer), so there are no descendant *run* windows.
+//   - gcCase         -> usually "cpu-bound"/"unknown": real GC pauses rarely
+//                       cover >=50% of a wait, so "gc-pause" seldom fires.
+
+import { readFile } from 'node:fs/promises';
+
+const PKG_LOCK = new URL('../../package-lock.json', import.meta.url);
+
+function blockLoopFor(ms) {
+  const end = Date.now() + ms;
+  // Synchronous busy-loop: nothing else on the loop can run until this returns.
+  while (Date.now() < end) {
+    /* spin */
+  }
+}
+
+// 1) event-loop-blocked: a timer is scheduled, then the loop is blocked so the
+//    callback fires hundreds of ms late — the latency is the block, not the timer.
+async function eventLoopBlockedCase() {
+  const ready = new Promise((resolve) => setTimeout(resolve, 30));
+  blockLoopFor(350);
+  await ready;
+}
+
+// 2) downstreamCase: outer awaits a slow inner timer. Classifies as "unknown",
+//    not "downstream-async" — the inner only waits (a timer), it never runs, so
+//    there are no descendant run windows to attribute the wait to.
+async function inner() {
+  await new Promise((resolve) => setTimeout(resolve, 250));
+}
+async function downstreamCase() {
+  await inner();
+}
+
+// 3) cpu-bound: the awaited work is CPU, not waiting.
+async function cpuBoundCase() {
+  await Promise.resolve();
+  let acc = 0;
+  for (let i = 0; i < 5_000_000; i += 1) acc += Math.sqrt(i) * Math.sin(i);
+  return acc;
+}
+
+// 4) io-wait: a real filesystem read with no loop stall around it.
+async function ioWaitCase() {
+  const buf = await readFile(PKG_LOCK, 'utf8');
+  return buf.length;
+}
+
+// 5) gcCase: allocate aggressively to provoke GC. Usually "cpu-bound"/"unknown":
+//    real GC pauses rarely cover >=50% of a wait, so "gc-pause" seldom fires.
+async function gcCase() {
+  const sink = [];
+  for (let i = 0; i < 40; i += 1) {
+    sink.push(new Array(50_000).fill(i));
+    await Promise.resolve();
+  }
+  return sink.length;
+}
+
+let running = true;
+async function tick() {
+  await eventLoopBlockedCase();
+  await downstreamCase();
+  await cpuBoundCase();
+  await ioWaitCase();
+  await gcCase();
+}
+
+(async () => {
+  while (running) {
+    await tick();
+  }
+})();
+
+setTimeout(() => {
+  running = false;
+  console.log('async-latency demo done');
+}, 20_000);
diff --git a/packages/cli/src/commands/profile-command.ts b/packages/cli/src/commands/profile-command.ts
index e6ab899..e940df5 100644
--- a/packages/cli/src/commands/profile-command.ts
+++ b/packages/cli/src/commands/profile-command.ts
@@ -114,7 +114,7 @@ function formatProfileQualityWarning(report: {
         recommendations?: string[];
         recordsDropped?: number;
         attachPartialCapture?: boolean;
-        cpuAmbiguousSamples?: number;
+        ambiguousRatio?: number;
         cdpAsyncStackCoverageRatio?: number;
         instrumentationMode?: string;
       };
@@ -144,7 +144,7 @@ function shouldWarnAsyncQuality(quality: {
   confidence?: string;
   recordsDropped?: number;
   attachPartialCapture?: boolean;
-  cpuAmbiguousSamples?: number;
+  ambiguousRatio?: number;
   cdpAsyncStackCoverageRatio?: number;
   instrumentationMode?: string;
 }): boolean {
@@ -152,7 +152,7 @@ function shouldWarnAsyncQuality(quality: {
     quality.confidence === 'low' ||
     (quality.recordsDropped ?? 0) > 0 ||
     Boolean(quality.attachPartialCapture) ||
-    (quality.cpuAmbiguousSamples ?? 0) > 0 ||
+    (quality.ambiguousRatio ?? 0) > 0.33 ||
     (quality.cdpAsyncStackCoverageRatio ?? 1) < 0.2
   );
 }
diff --git a/packages/core/src/analysis/model/event-loop-report.ts b/packages/core/src/analysis/model/event-loop-report.ts
index afa67b9..76e2c83 100644
--- a/packages/core/src/analysis/model/event-loop-report.ts
+++ b/packages/core/src/analysis/model/event-loop-report.ts
@@ -1,4 +1,4 @@
-import type { CaptureBundle } from '../../capture/core/types.js';
+import type { CaptureBundle, EventLoopSample } from '../../capture/core/types.js';
 import type {
   EventLoopReport,
   MeasurementBasis,
@@ -72,11 +72,29 @@ function deriveStallIntervals(
   input: EventLoopInput,
   resolutionMs: number,
 ): EventLoopReport['stallIntervals'] {
-  const samples = input.runtimeSignals.eventLoopSamples;
+  return buildEventLoopStallWindows(
+    input.runtimeSignals.eventLoopSamples,
+    input.durationMs,
+    resolutionMs,
+  );
+}
+
+/**
+ * Pure stall-window builder shared by the event-loop report and async
+ * latency cause-classification. A stall window is `[atMs - lagMs, atMs]` for
+ * every sample whose lag crosses `thresholdMs`, plus a trailing-lag window
+ * when the loop was still stalled at capture end, then merged.
+ */
+export function buildEventLoopStallWindows(
+  samples: readonly EventLoopSample[],
+  durationMs: number,
+  resolutionMs: number,
+  thresholdMs: number = EVENT_LOOP_STALL_INTERVAL_MS,
+): EventLoopReport['stallIntervals'] {
   const intervals: EventLoopReport['stallIntervals'] = [];
 
   for (const sample of samples) {
-    if (sample.lagMs < EVENT_LOOP_STALL_INTERVAL_MS) continue;
+    if (sample.lagMs < thresholdMs) continue;
     intervals.push({
       startMs: Math.max(0, sample.atMs - sample.lagMs),
       endMs: sample.atMs,
@@ -84,14 +102,16 @@ function deriveStallIntervals(
     });
   }
 
-  const trailingLagMs = inferTrailingLag(input);
-  if (trailingLagMs >= EVENT_LOOP_STALL_INTERVAL_MS) {
-    const lastSampleAtMs = samples[samples.length - 1]?.atMs ?? 0;
-    intervals.push({
-      startMs: Math.max(0, lastSampleAtMs + resolutionMs),
-      endMs: input.durationMs,
-      maxLagMs: trailingLagMs,
-    });
+  const lastSampleAtMs = samples[samples.length - 1]?.atMs;
+  if (lastSampleAtMs !== undefined) {
+    const trailingLagMs = Math.max(0, durationMs - lastSampleAtMs - resolutionMs);
+    if (trailingLagMs >= thresholdMs) {
+      intervals.push({
+        startMs: Math.max(0, lastSampleAtMs + resolutionMs),
+        endMs: durationMs,
+        maxLagMs: trailingLagMs,
+      });
+    }
   }
 
   return mergeIntervals(intervals);
diff --git a/packages/core/src/capture/coordinator.ts b/packages/core/src/capture/coordinator.ts
index 0e151aa..aaf12ed 100644
--- a/packages/core/src/capture/coordinator.ts
+++ b/packages/core/src/capture/coordinator.ts
@@ -94,7 +94,9 @@ export async function runCapture<TSourceOptions>(
   try {
     const target = await fetchTargetInfo(cdp, { pid: connected.target.pid });
     await markCaptureStart(cdp);
+    const clockReadStartHr = performance.now();
     const runtimeCaptureStartMs = await readRuntimeClockNow(cdp);
+    const cdpClockJitterMs = (performance.now() - clockReadStartHr) / 2;
     const startedAtHr = performance.now();
     emitCaptureProgress(options.sourceOptions, {
       stage: 'start-capture',
@@ -152,6 +154,7 @@ export async function runCapture<TSourceOptions>(
       durationMs,
       captureIntegrity,
       runtimeSignals,
+      cdpClockJitterMs,
       kinds: kindsData as CaptureBundle['kinds'],
     };
   } finally {
diff --git a/packages/core/src/capture/core/types.ts b/packages/core/src/capture/core/types.ts
index fd14082..476ecb1 100644
--- a/packages/core/src/capture/core/types.ts
+++ b/packages/core/src/capture/core/types.ts
@@ -137,6 +137,8 @@ export interface CaptureBundle {
   captureIntegrity: CaptureIntegrity;
   runtimeSignals: RuntimeSignalsData;
   kinds: Partial<CaptureKindDataMap>;
+  /** Half the CDP round-trip span of the capture-start clock read (ms) — a bound on cross-kind clock-alignment uncertainty. */
+  cdpClockJitterMs?: number;
 }
 
 /**
diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts
index bbfacca..96420ad 100644
--- a/packages/core/src/index.ts
+++ b/packages/core/src/index.ts
@@ -164,11 +164,14 @@ export { LANTERNA_REPORT_SCHEMA_VERSION } from './report/meta.js';
 export { buildReportSchema } from './report/schema.js';
 export type {
   AlternativeHotspotEvidence,
+  AsyncAttributedFrameOrigin,
+  AsyncByKindLatency,
   AsyncChainSummary,
   AsyncConcurrencyTimelineSample,
   AsyncCpuAttribution,
   AsyncCpuAttributionEntry,
   AsyncHotFile,
+  AsyncLatencyCause,
   AsyncOperationKindReport,
   AsyncOrphan,
   AsyncProfileQuality,
@@ -181,6 +184,7 @@ export type {
   BlockingIoEvidenceExtra,
   BuiltinFinding,
   BuiltinFindingCategory,
+  CorrelatedHotspot,
   CpuHotspotEvidenceExtra,
   CpuProfileReport,
   CpuSummary,
diff --git a/packages/core/src/kinds/async/analysis.ts b/packages/core/src/kinds/async/analysis.ts
index 1adb109..6310487 100644
--- a/packages/core/src/kinds/async/analysis.ts
+++ b/packages/core/src/kinds/async/analysis.ts
@@ -1,4 +1,6 @@
 import { fileURLToPath } from 'node:url';
+import { buildGcCorrelationWindows, type TimeWindow } from '../../analysis/model/correlations.js';
+import { buildEventLoopStallWindows } from '../../analysis/model/event-loop-report.js';
 import type { SourceMapResolver } from '../../analysis/sourcemap/resolver.js';
 import type { CaptureBundle, RawCpuProfile } from '../../capture/core/types.js';
 import type {
@@ -16,8 +18,17 @@ import type {
   ProfileConfidence,
   UserCallerAttribution,
 } from '../../report/types.js';
+import { HEARTBEAT_RESOLUTION_MS } from '../../shared/config.js';
 import type { KindAnalysisContext, KindAnalysisContributor } from '../core/types.js';
 import { firstCdpAsyncContextFrame } from './cdp-stack.js';
+import {
+  buildByKindLatency,
+  buildWaitWindows,
+  classifyLatencyCause,
+  collectDescendantRunWindows,
+  deriveLatency,
+  resolveAttributedFrame,
+} from './latency.js';
 import type {
   AsyncCdpContext,
   AsyncChainNode,
@@ -120,23 +131,67 @@ function analyzeInner(
   const chains = buildChains(chainNodes, recordById);
   const orphans = buildOrphans(data.records, bundle.durationMs);
 
+  // Signal windows used to classify why each operation spent its latency
+  // (event-loop stalls, GC pauses, downstream async work). CPU sample times are
+  // profile-relative (≈ capture-relative); the residual skew is the small
+  // Profiler.start↔capture-start startup gap, reported via clockSyncUncertaintyMs.
+  const clockSyncUncertaintyMs = Math.max(
+    bundle.cdpClockJitterMs ?? 0,
+    data.clockResolutionMs ?? 0,
+  );
+  const stallWindows: TimeWindow[] = buildEventLoopStallWindows(
+    bundle.runtimeSignals.eventLoopSamples,
+    bundle.durationMs,
+    bundle.runtimeSignals.eventLoopResolutionMs ?? HEARTBEAT_RESOLUTION_MS,
+  );
+  // Exact GC pause windows (no ±lookaround padding): for latency attribution we
+  // ask "did GC run *during* this wait", so each window must be the real pause,
+  // not padded. Padding sub-millisecond scavenges by tens of ms makes them tile
+  // the whole timeline and blanket every wait with a spurious 100% gc overlap.
+  const gcWindows = buildGcCorrelationWindows(bundle.runtimeSignals.gcEvents, bundle.durationMs, 0);
+  // Which runtime signals were actually observed — so cause classification can
+  // say *why* it could not explain a wait rather than silently guessing.
+  const signals = {
+    eventLoop:
+      Boolean(bundle.captureIntegrity.eventLoopTimed) &&
+      bundle.runtimeSignals.eventLoopSamples.length > 0,
+    gc: Boolean(bundle.captureIntegrity.gcTimed),
+  };
+
   const cpuAttribution = buildCpuAttribution({
     records: data.records,
     recordById,
     rootByAsyncId,
+    chainNodes,
     cpuKind: bundle.kinds.cpu as { cpuProfile: RawCpuProfile } | undefined,
+    clockSyncUncertaintyMs,
   });
   const userCallerByRootId = new Map<number, UserCallerAttribution>();
   for (const entry of cpuAttribution.topChains) {
     if (entry.userCaller) userCallerByRootId.set(entry.rootAsyncId, entry.userCaller);
   }
-  const topOperations = buildTopOperations(
-    sortedByDuration,
-    bundle.durationMs,
+  const topOperations = buildTopOperations({
+    // Orphans (resources still in flight at capture end) have a fictional
+    // capture-clamped duration that would otherwise dominate this ranking; they
+    // are reported separately in `orphans[]`.
+    sorted: sortedByDuration.filter((rec) => !rec.orphan),
+    captureDurationMs: bundle.durationMs,
     rootByAsyncId,
     userCallerByRootId,
-  );
-  const quality = buildQuality(data, cpuAttribution);
+    recordById,
+    chainNodes,
+    stallWindows,
+    gcWindows,
+    signals,
+    clockSyncUncertaintyMs,
+  });
+  const quality = buildQuality({
+    data,
+    cpuAttribution,
+    recordById,
+    clockSyncUncertaintyMs,
+    eventLoopSignalAvailable: signals.eventLoop,
+  });
   const hotFiles = buildHotFiles({
     records: data.records,
     captureDurationMs: bundle.durationMs,
@@ -284,16 +339,42 @@ function buildSummary(
     };
   }
   if (concurrencyStats) summary.concurrency = concurrencyStats;
-  void captureDurationMs;
+  // Only completed operations have a real latency; orphans carry a fictional
+  // capture-clamped duration that would corrupt the percentiles.
+  const byKindLatency = buildByKindLatency(
+    data.records.filter((rec) => !rec.orphan),
+    captureDurationMs,
+  );
+  if (Object.keys(byKindLatency).length > 0) summary.byKindLatency = byKindLatency;
   return summary;
 }
 
-function buildTopOperations(
-  sorted: AsyncOperationRecord[],
-  captureDurationMs: number,
-  rootByAsyncId: Map<number, number>,
-  userCallerByRootId: Map<number, UserCallerAttribution>,
-): AsyncTopOperation[] {
+interface BuildTopOperationsArgs {
+  sorted: AsyncOperationRecord[];
+  captureDurationMs: number;
+  rootByAsyncId: Map<number, number>;
+  userCallerByRootId: Map<number, UserCallerAttribution>;
+  recordById: Map<number, AsyncOperationRecord>;
+  chainNodes: Map<number, AsyncChainNode>;
+  stallWindows: TimeWindow[];
+  gcWindows: TimeWindow[];
+  signals: { eventLoop: boolean; gc: boolean };
+  clockSyncUncertaintyMs: number;
+}
+
+function buildTopOperations(args: BuildTopOperationsArgs): AsyncTopOperation[] {
+  const {
+    sorted,
+    captureDurationMs,
+    rootByAsyncId,
+    userCallerByRootId,
+    recordById,
+    chainNodes,
+    stallWindows,
+    gcWindows,
+    signals,
+    clockSyncUncertaintyMs,
+  } = args;
   const out: AsyncTopOperation[] = [];
   for (const rec of sorted) {
     if (out.length >= MAX_TOP_OPERATIONS) break;
@@ -358,7 +439,7 @@ function buildTopOperations(
       op.executionConfidence = rec.executionConfidence ?? 'medium';
       op.cpuAttributedSamples = rec.cpuAttributedSamples ?? 0;
       op.cpuAmbiguousSamples = rec.cpuAmbiguousSamples ?? 0;
-      op.clockSyncUncertaintyMs = 0;
+      op.clockSyncUncertaintyMs = clockSyncUncertaintyMs;
     }
     if (cdpAsyncContextFrame) {
       op.cdpAsyncContextFrame = cdpAsyncContextFrame;
@@ -381,16 +462,50 @@ function buildTopOperations(
       op.overallConfidence =
         op.awaitConfidence ?? op.executionConfidence ?? op.creationConfidence ?? 'medium';
     }
+    const end = rec.initAtMs + durationMs;
+    const latency = deriveLatency(rec, end);
+    op.waitMs = latency.waitMs;
+    if (latency.scheduleDelayMs !== undefined) op.scheduleDelayMs = latency.scheduleDelayMs;
+    if (rec.firstRunAtMs !== undefined) op.firstRunAtMs = rec.firstRunAtMs;
+    const cause = classifyLatencyCause({
+      waitWindows: buildWaitWindows(rec, end),
+      stallWindows,
+      gcWindows,
+      descendantWindows: collectDescendantRunWindows(rec.asyncId, chainNodes, recordById),
+      kind: rec.kind,
+      runMs: rec.runMs,
+      runCount: rec.runCount,
+      durationMs,
+      captureDurationMs,
+      signals,
+      firstRunAtMs: rec.firstRunAtMs,
+    });
+    op.latencyCause = cause.cause;
+    op.causeConfidence = cause.confidence;
+    op.causeEvidence = cause.evidence;
+
+    const attributed = resolveAttributedFrame(rec, recordById);
+    if (attributed.origin) op.attributedFrameOrigin = attributed.origin;
+
     const rootId = rootByAsyncId.get(rec.asyncId) ?? rec.asyncId;
     const cpuCaller = userCallerByRootId.get(rootId);
-    op.userCaller =
-      cpuCaller ??
-      userCallerFromAsyncFrame(primaryFrame, {
+    if (cpuCaller) {
+      op.userCaller = cpuCaller;
+    } else if (attributed.origin === 'inherited-trigger' && attributed.frame) {
+      op.userCaller = userCallerFromAsyncFrame(toReportFrame(attributed.frame), {
+        profilePct: 0,
+        supportPct: 100,
+        confidence: 'low',
+        basis: 'async-stack',
+      });
+    } else {
+      op.userCaller = userCallerFromAsyncFrame(primaryFrame, {
         profilePct: 0,
         supportPct: 100,
         confidence: op.overallConfidence ?? 'medium',
         basis: 'async-stack',
       });
+    }
     out.push(op);
   }
   return out;
@@ -636,13 +751,31 @@ function buildOrphans(records: AsyncOperationRecord[], captureDurationMs: number
   return orphans.slice(0, MAX_ORPHANS);
 }
 
-function buildQuality(
-  data: AsyncKindData,
-  cpuAttribution: AsyncCpuAttribution,
-): AsyncProfileQuality {
+interface BuildQualityArgs {
+  data: AsyncKindData;
+  cpuAttribution: AsyncCpuAttribution;
+  recordById: Map<number, AsyncOperationRecord>;
+  clockSyncUncertaintyMs: number;
+  eventLoopSignalAvailable: boolean;
+}
+
+function buildQuality(args: BuildQualityArgs): AsyncProfileQuality {
+  const { data, cpuAttribution, recordById, clockSyncUncertaintyMs, eventLoopSignalAvailable } =
+    args;
   const operationCount = data.records.length;
   const recordsWithStacks = data.records.filter((rec) => rec.initStack.length > 0).length;
   const sampledStackRatio = operationCount > 0 ? recordsWithStacks / operationCount : 0;
+  const attributedStackRatio =
+    operationCount > 0
+      ? data.records.filter((rec) => {
+          const origin = resolveAttributedFrame(rec, recordById).origin;
+          return origin === 'self' || origin === 'inherited-trigger';
+        }).length / operationCount
+      : 0;
+  const cpuSamplesConsidered =
+    cpuAttribution.cpuAttributedSamples + cpuAttribution.cpuAmbiguousSamples;
+  const ambiguousRatio =
+    cpuSamplesConsidered > 0 ? cpuAttribution.cpuAmbiguousSamples / cpuSamplesConsidered : 0;
   const cdpAsyncStackCoverageRatio =
     operationCount > 0 ? Math.min(1, (data.cdpAsyncContexts?.length ?? 0) / operationCount) : 0;
   const runWindowCount = data.records.reduce((sum, rec) => sum + rec.runWindows.length, 0);
@@ -675,6 +808,14 @@ function buildQuality(
     reasons.push('attach mode can only observe async resources created after hooks were installed');
     recommendations.add('Use run mode for complete startup async lifecycle coverage.');
   }
+  if (operationCount > 0 && !eventLoopSignalAvailable) {
+    reasons.push(
+      'event-loop signal was unavailable, so latency causes cannot distinguish a blocked loop from genuine I/O wait (such operations are reported as `unknown` with basis `no-eventloop-signal`)',
+    );
+    recommendations.add(
+      'Capture in spawn mode (or ensure the event-loop heartbeat is available) to classify event-loop-blocked latency.',
+    );
+  }
   if (cpuAttribution.cpuAmbiguousSamples > 0) {
     reasons.push(
       `${cpuAttribution.cpuAmbiguousSamples} CPU samples overlapped multiple async run windows and were marked ambiguous`,
@@ -683,9 +824,9 @@ function buildQuality(
       'Treat CPU-to-async attribution as directional when async windows overlap.',
     );
   }
-  if ((data.clockSyncUncertaintyMs ?? 0) > 10) {
+  if (clockSyncUncertaintyMs > 10) {
     reasons.push(
-      `runtime/CDP clock synchronization uncertainty was ${data.clockSyncUncertaintyMs?.toFixed(1)}ms`,
+      `runtime/CDP clock synchronization uncertainty was ${clockSyncUncertaintyMs.toFixed(1)}ms`,
     );
   }
   if (cpuAttribution.available && runWindowCount === 0) {
@@ -710,7 +851,7 @@ function buildQuality(
       recordsDropped: data.integrity.recordsDropped,
       collectedVia: data.collectedVia,
       attachPartialCapture,
-      cpuAmbiguousSamples: cpuAttribution.cpuAmbiguousSamples,
+      cpuAmbiguousRatio: ambiguousRatio,
       fullTransformPartial: instrumentationMode === 'full' && Boolean(data.transformStats?.partial),
     }),
     instrumentationMode,
@@ -718,13 +859,15 @@ function buildQuality(
     operationCount,
     sampledStackRatio,
     initStackCoverageRatio: sampledStackRatio,
+    attributedStackRatio,
     cdpAsyncStackCoverageRatio,
     recordsDropped: data.integrity.recordsDropped,
     maxRecords: data.maxRecords,
     runWindowCount,
     cpuAttributionCoveragePct: cpuAttribution.attributedCpuPct,
     cpuAmbiguousSamples: cpuAttribution.cpuAmbiguousSamples,
-    clockSyncUncertaintyMs: data.clockSyncUncertaintyMs ?? cpuAttribution.clockSyncUncertaintyMs,
+    ambiguousRatio,
+    clockSyncUncertaintyMs,
     reasons,
     recommendations: Array.from(recommendations),
   };
@@ -736,10 +879,10 @@ function scoreAsyncConfidence(input: {
   recordsDropped: number;
   collectedVia: AsyncKindData['collectedVia'];
   attachPartialCapture: boolean;
-  cpuAmbiguousSamples: number;
+  cpuAmbiguousRatio: number;
   fullTransformPartial: boolean;
 }): ProfileConfidence {
-  if (input.attachPartialCapture || input.cpuAmbiguousSamples > 0 || input.fullTransformPartial) {
+  if (input.attachPartialCapture || input.cpuAmbiguousRatio > 0.5 || input.fullTransformPartial) {
     return 'low';
   }
   if (
@@ -902,7 +1045,10 @@ interface BuildAttributionArgs {
   records: AsyncOperationRecord[];
   recordById: Map<number, AsyncOperationRecord>;
   rootByAsyncId: Map<number, number>;
+  chainNodes: Map<number, AsyncChainNode>;
   cpuKind: { cpuProfile: RawCpuProfile } | undefined;
+  /** Real clock-alignment uncertainty (CDP jitter / perf.now resolution) to report. */
+  clockSyncUncertaintyMs: number;
 }
 
 /**
@@ -915,7 +1061,7 @@ interface BuildAttributionArgs {
  * the CPU profile lacks per-sample timestamps, or no run windows exist.
  */
 function buildCpuAttribution(args: BuildAttributionArgs): AsyncCpuAttribution {
-  const { records, recordById, rootByAsyncId, cpuKind } = args;
+  const { records, recordById, rootByAsyncId, chainNodes, cpuKind } = args;
   if (!cpuKind) {
     return emptyAttribution('cpu kind not captured');
   }
@@ -992,11 +1138,19 @@ function buildCpuAttribution(args: BuildAttributionArgs): AsyncCpuAttribution {
       windowCursor += 1;
     }
     activeWindows = activeWindows.filter((w) => w.endMs >= tMs);
+    let win: Window | undefined;
     if (activeWindows.length > 1) {
-      ambiguousCount += 1;
-      continue;
+      // Overlapping windows from one ancestor/descendant chain belong to the
+      // innermost (deepest) async context; only unrelated concurrent chains
+      // are genuinely ambiguous.
+      win = resolveOverlappingWindow(activeWindows, recordById, chainNodes);
+      if (!win) {
+        ambiguousCount += 1;
+        continue;
+      }
+    } else {
+      win = findLatestStartedWindow(activeWindows);
     }
-    const win = findLatestStartedWindow(activeWindows);
     if (!win) continue;
     attributedCount += 1;
     const bucket = cpuByRoot.get(win.rootId) ?? {
@@ -1032,6 +1186,10 @@ function buildCpuAttribution(args: BuildAttributionArgs): AsyncCpuAttribution {
   const totalCpuMs = samples.length * sampleIntervalMs;
   const attributedCpuPct =
     totalCpuMs > 0 ? (attributedCount * sampleIntervalMs * 100) / totalCpuMs : 0;
+  const samplesConsidered = attributedCount + ambiguousCount;
+  const ambiguousRatio = samplesConsidered > 0 ? ambiguousCount / samplesConsidered : 0;
+  const executionConfidence: ProfileConfidence =
+    ambiguousRatio < 0.1 ? 'high' : ambiguousRatio < 0.33 ? 'medium' : 'low';
 
   const topChains: AsyncCpuAttributionEntry[] = [];
   for (const [rootId, bucket] of cpuByRoot.entries()) {
@@ -1050,7 +1208,7 @@ function buildCpuAttribution(args: BuildAttributionArgs): AsyncCpuAttribution {
     if (executionFrame) {
       const reportFrame = toReportFrame(executionFrame);
       entry.executionFrame = reportFrame;
-      entry.executionConfidence = ambiguousCount > 0 ? 'low' : 'high';
+      entry.executionConfidence = executionConfidence;
       root.executionStack = [executionFrame];
       root.executionConfidence = entry.executionConfidence;
       root.cpuAttributedSamples = bucket.sampleNodeIds.length;
@@ -1075,7 +1233,7 @@ function buildCpuAttribution(args: BuildAttributionArgs): AsyncCpuAttribution {
     totalCpuMs,
     cpuAttributedSamples: attributedCount,
     cpuAmbiguousSamples: ambiguousCount,
-    clockSyncUncertaintyMs: 0,
+    clockSyncUncertaintyMs: args.clockSyncUncertaintyMs,
     topChains: topChains.slice(0, MAX_CPU_ATTRIBUTION_CHAINS),
   };
 }
@@ -1133,6 +1291,51 @@ function emptyAttribution(reason: string): AsyncCpuAttribution {
   };
 }
 
+/** True when `ancestorId` is `descendantId` or one of its trigger ancestors. */
+function isAsyncAncestor(
+  ancestorId: number,
+  descendantId: number,
+  recordById: Map<number, AsyncOperationRecord>,
+  maxHops = 256,
+): boolean {
+  if (ancestorId === descendantId) return true;
+  const seen = new Set<number>();
+  let current = recordById.get(descendantId);
+  let hops = 0;
+  while (current && !seen.has(current.asyncId) && hops < maxHops) {
+    seen.add(current.asyncId);
+    if (current.triggerAsyncId === ancestorId) return true;
+    current = recordById.get(current.triggerAsyncId);
+    hops += 1;
+  }
+  return false;
+}
+
+/**
+ * When several run windows overlap a CPU sample, attribute it to the innermost
+ * (deepest) async context — but only if all active windows lie on a single
+ * ancestor/descendant chain. Unrelated concurrent chains return undefined so
+ * the caller marks the sample ambiguous.
+ */
+function resolveOverlappingWindow<W extends { asyncId: number; order: number }>(
+  windows: readonly W[],
+  recordById: Map<number, AsyncOperationRecord>,
+  chainNodes: Map<number, AsyncChainNode>,
+): W | undefined {
+  let deepest = windows[0];
+  if (!deepest) return undefined;
+  for (const w of windows) {
+    const depth = chainNodes.get(w.asyncId)?.depth ?? 0;
+    const bestDepth = chainNodes.get(deepest.asyncId)?.depth ?? 0;
+    if (depth > bestDepth || (depth === bestDepth && w.order > deepest.order)) deepest = w;
+  }
+  for (const w of windows) {
+    if (w.asyncId === deepest.asyncId) continue;
+    if (!isAsyncAncestor(w.asyncId, deepest.asyncId, recordById)) return undefined;
+  }
+  return deepest;
+}
+
 function mean(values: readonly number[]): number {
   if (values.length === 0) return 0;
   let sum = 0;
diff --git a/packages/core/src/kinds/async/latency.ts b/packages/core/src/kinds/async/latency.ts
new file mode 100644
index 0000000..83028f4
--- /dev/null
+++ b/packages/core/src/kinds/async/latency.ts
@@ -0,0 +1,407 @@
+import type { TimeWindow } from '../../analysis/model/correlations.js';
+import { scoreConfidence } from '../../analysis/model/correlations.js';
+import type { ProfileConfidence } from '../../report/types.js';
+import {
+  ASYNC_BACKGROUND_DURATION_RATIO,
+  ASYNC_CAUSE_MIN_OVERLAP_PCT,
+  ASYNC_CPU_BOUND_RATIO,
+  ASYNC_LONGLIVED_DURATION_RATIO,
+  ASYNC_STALL_READINESS_MARGIN_MS,
+} from '../../shared/config.js';
+import { percentile } from '../../shared/percentile.js';
+import {
+  ASYNC_IO_KINDS,
+  type AsyncAttributedFrameOrigin,
+  type AsyncChainNode,
+  type AsyncLatencyCause,
+  type AsyncOperationKind,
+  type AsyncOperationRecord,
+  type AsyncStackFrame,
+} from './types.js';
+
+/**
+ * Pure latency analysis for the async kind: decomposes an operation's
+ * wall-clock latency, classifies its root cause by overlapping the time the
+ * resource spent *waiting* with event-loop / GC / downstream-async signals,
+ * and attributes a user-code frame by walking the trigger ancestry. No source
+ * maps, no global state — callers (analysis.ts) handle frame normalization.
+ */
+
+export interface DerivedLatency {
+  /** Wall-clock time the resource was alive but NOT executing on CPU. */
+  waitMs: number;
+  /** init → first `before`: queue/scheduling delay before the resource first ran. */
+  scheduleDelayMs?: number;
+}
+
+export interface LatencyCauseInput {
+  waitWindows: TimeWindow[];
+  stallWindows: TimeWindow[];
+  gcWindows: TimeWindow[];
+  descendantWindows: TimeWindow[];
+  kind: AsyncOperationKind;
+  runMs: number;
+  durationMs: number;
+  /** Total capture window, to detect long-lived background resources. */
+  captureDurationMs: number;
+  /** Which runtime signals were actually observed — lets `unknown` say *why* it could not classify. */
+  signals: { eventLoop: boolean; gc: boolean };
+  /** When the resource first ran (capture-relative ms), if it ran — used to require a stall to be active when the callback became runnable, not merely overlapping the wait. */
+  firstRunAtMs?: number;
+  /** Number of times the resource ran (before/after pairs). >1 over ~the whole capture marks a persistent/multiplexed handle, not a single delayed callback. */
+  runCount?: number;
+}
+
+export interface LatencyCauseResult {
+  cause: AsyncLatencyCause;
+  confidence: ProfileConfidence;
+  evidence: { overlapPct: number; basis: string; windowMs: number };
+}
+
+export interface ByKindLatencyEntry {
+  count: number;
+  p50Ms: number;
+  p95Ms: number;
+  p99Ms: number;
+  maxMs: number;
+  meanWaitMs: number;
+}
+
+export interface AttributedFrame {
+  frame?: AsyncStackFrame;
+  origin?: AsyncAttributedFrameOrigin;
+}
+
+/** End timestamp (capture-relative ms) of a resource's lifetime. */
+export function effectiveEndMs(rec: AsyncOperationRecord, captureDurationMs: number): number {
+  if (rec.durationMs !== undefined) return rec.initAtMs + rec.durationMs;
+  return Math.max(rec.initAtMs, captureDurationMs);
+}
+
+export function deriveLatency(rec: AsyncOperationRecord, endMs: number): DerivedLatency {
+  const durationMs = Math.max(0, endMs - rec.initAtMs);
+  const result: DerivedLatency = { waitMs: Math.max(0, durationMs - rec.runMs) };
+  if (rec.firstRunAtMs !== undefined) {
+    result.scheduleDelayMs = Math.max(0, rec.firstRunAtMs - rec.initAtMs);
+  }
+  return result;
+}
+
+/** The complement of a resource's run windows within `[initAtMs, endMs]` — the time it was waiting. */
+export function buildWaitWindows(rec: AsyncOperationRecord, endMs: number): TimeWindow[] {
+  const start = rec.initAtMs;
+  const end = Math.max(start, endMs);
+  if (rec.runWindows.length === 0) {
+    return end > start ? [{ startMs: start, endMs: end }] : [];
+  }
+  const runs = mergeWindows(
+    rec.runWindows
+      .map((w) => ({
+        startMs: clamp(w.startMs, start, end),
+        endMs: clamp(w.endMs, start, end),
+      }))
+      .filter((w) => w.endMs > w.startMs),
+  );
+  const gaps: TimeWindow[] = [];
+  let cursor = start;
+  for (const run of runs) {
+    if (run.startMs > cursor) gaps.push({ startMs: cursor, endMs: run.startMs });
+    cursor = Math.max(cursor, run.endMs);
+  }
+  if (cursor < end) gaps.push({ startMs: cursor, endMs: end });
+  return gaps;
+}
+
+/** Run windows of every descendant in the trigger subtree (cycle-guarded, size-capped). */
+export function collectDescendantRunWindows(
+  asyncId: number,
+  chainNodes: Map<number, AsyncChainNode>,
+  recordById: Map<number, AsyncOperationRecord>,
+  capNodes = 5000,
+): TimeWindow[] {
+  const windows: TimeWindow[] = [];
+  const visited = new Set<number>([asyncId]);
+  const stack = [...(chainNodes.get(asyncId)?.childIds ?? [])];
+  let count = 0;
+  while (stack.length > 0 && count < capNodes) {
+    const id = stack.pop();
+    if (id === undefined || visited.has(id)) continue;
+    visited.add(id);
+    count += 1;
+    const rec = recordById.get(id);
+    if (rec) {
+      for (const w of rec.runWindows) windows.push({ startMs: w.startMs, endMs: w.endMs });
+    }
+    const node = chainNodes.get(id);
+    if (node) {
+      for (const child of node.childIds) if (!visited.has(child)) stack.push(child);
+    }
+  }
+  return windows;
+}
+
+export function classifyLatencyCause(input: LatencyCauseInput): LatencyCauseResult {
+  const { waitWindows, kind, runMs, durationMs, captureDurationMs, signals } = input;
+  const cpuRatio = durationMs > 0 ? runMs / durationMs : 0;
+
+  // Long-lived resource that never ran and stayed alive for ~the whole capture:
+  // a keep-alive / idle handle, not a latency bug. Classify it as such so its
+  // incidental overlap with stalls is not mis-read as event-loop-blocked.
+  if (runMs === 0 && durationMs >= captureDurationMs * ASYNC_BACKGROUND_DURATION_RATIO) {
+    return {
+      cause: 'background',
+      confidence: 'high',
+      evidence: { overlapPct: 0, basis: 'long-lived', windowMs: round(durationMs) },
+    };
+  }
+
+  // Persistent/multiplexed handle (keep-alive socket, HTTP parser, pool, interval):
+  // it activated more than once and stayed alive for ~the whole capture. A single
+  // delayed callback runs at most once, so runCount>1 + capture-spanning marks
+  // infrastructure whose aggregate waitMs is the idle gap between many activations,
+  // not one blocked callback. CPU-dominated handles fall through to the CPU rule.
+  if (
+    (input.runCount ?? 0) > 1 &&
+    durationMs >= captureDurationMs * ASYNC_LONGLIVED_DURATION_RATIO &&
+    cpuRatio < ASYNC_CPU_BOUND_RATIO
+  ) {
+    return {
+      cause: 'background',
+      confidence: 'high',
+      evidence: { overlapPct: 0, basis: 'long-lived-multiplexed', windowMs: round(durationMs) },
+    };
+  }
+
+  // Dominated by its own execution → it is computing, not waiting.
+  if (cpuRatio >= ASYNC_CPU_BOUND_RATIO) {
+    return {
+      cause: 'cpu-bound',
+      confidence: cpuRatio >= 0.8 ? 'high' : 'medium',
+      evidence: { overlapPct: round(cpuRatio * 100), basis: 'cpu-ratio', windowMs: round(runMs) },
+    };
+  }
+
+  const waitTotalMs = totalMs(waitWindows);
+  if (waitTotalMs <= 0) {
+    return {
+      cause: 'unknown',
+      confidence: 'low',
+      evidence: { overlapPct: 0, basis: 'none', windowMs: 0 },
+    };
+  }
+
+  const mergedStalls = mergeWindows(input.stallWindows);
+  const stallMs = overlapMs(waitWindows, mergedStalls);
+  const gcMs = overlapMs(waitWindows, mergeWindows(input.gcWindows));
+  const descMs = overlapMs(waitWindows, mergeWindows(input.descendantWindows));
+  const stallPct = (stallMs / waitTotalMs) * 100;
+  const gcPct = (gcMs / waitTotalMs) * 100;
+  const descPct = (descMs / waitTotalMs) * 100;
+
+  // Priority: a blocked loop explains a late callback even for an I/O resource,
+  // so it wins whenever the wait overlaps a stall past the threshold — a
+  // coincidental GC or downstream overlap must not outrank it on raw percentage.
+  // But require the loop to have still been blocked when the callback became
+  // runnable: a stall that ended long before the resource ran merely overlapped
+  // a genuine I/O wait, it did not cause it.
+  if (stallPct >= ASYNC_CAUSE_MIN_OVERLAP_PCT && stallActiveAtReadiness(mergedStalls, input)) {
+    return {
+      cause: 'event-loop-blocked',
+      confidence: scoreConfidence(Math.min(100, stallPct), Math.max(gcPct, descPct)),
+      evidence: {
+        overlapPct: round(Math.min(100, stallPct)),
+        basis: 'event-loop-stall',
+        windowMs: round(stallMs),
+      },
+    };
+  }
+  // Otherwise GC pauses and downstream async work compete on overlap share.
+  const ranked = [
+    { cause: 'gc-pause' as const, pct: gcPct, basis: 'gc', windowMs: gcMs },
+    {
+      cause: 'downstream-async' as const,
+      pct: descPct,
+      basis: 'downstream-async',
+      windowMs: descMs,
+    },
+  ];
+  ranked.sort((a, b) => b.pct - a.pct);
+  const top = ranked[0];
+  if (top && top.pct >= ASYNC_CAUSE_MIN_OVERLAP_PCT) {
+    const second = ranked[1]?.pct ?? 0;
+    return {
+      cause: top.cause,
+      confidence: scoreConfidence(Math.min(100, top.pct), second),
+      evidence: {
+        overlapPct: round(Math.min(100, top.pct)),
+        basis: top.basis,
+        windowMs: round(top.windowMs),
+      },
+    };
+  }
+
+  if (ASYNC_IO_KINDS.has(kind)) {
+    return {
+      cause: 'io-wait',
+      confidence: 'medium',
+      evidence: { overlapPct: 0, basis: 'io-kind', windowMs: round(waitTotalMs) },
+    };
+  }
+
+  // It waited, but nothing we observed explains it. Be honest about *why* we
+  // could not classify: if the event-loop heartbeat was unavailable we could
+  // not even check for loop blocking, so this is not the same as "no overlap".
+  return {
+    cause: 'unknown',
+    confidence: 'low',
+    evidence: {
+      overlapPct: 0,
+      basis: signals.eventLoop ? 'none' : 'no-eventloop-signal',
+      windowMs: round(waitTotalMs),
+    },
+  };
+}
+
+export function buildByKindLatency(
+  records: readonly AsyncOperationRecord[],
+  captureDurationMs: number,
+): Partial<Record<AsyncOperationKind, ByKindLatencyEntry>> {
+  const buckets = new Map<AsyncOperationKind, { durations: number[]; waits: number[] }>();
+  for (const rec of records) {
+    const end = effectiveEndMs(rec, captureDurationMs);
+    const durationMs = Math.max(0, end - rec.initAtMs);
+    const { waitMs } = deriveLatency(rec, end);
+    const bucket = buckets.get(rec.kind) ?? { durations: [], waits: [] };
+    bucket.durations.push(durationMs);
+    bucket.waits.push(waitMs);
+    buckets.set(rec.kind, bucket);
+  }
+  const out: Partial<Record<AsyncOperationKind, ByKindLatencyEntry>> = {};
+  for (const [kind, bucket] of buckets) {
+    const sorted = [...bucket.durations].sort((a, b) => a - b);
+    out[kind] = {
+      count: sorted.length,
+      p50Ms: round(percentile(sorted, 0.5)),
+      p95Ms: round(percentile(sorted, 0.95)),
+      p99Ms: round(percentile(sorted, 0.99)),
+      maxMs: round(sorted[sorted.length - 1] ?? 0),
+      meanWaitMs: round(mean(bucket.waits)),
+    };
+  }
+  return out;
+}
+
+/**
+ * Best user-code frame for an operation: its own init stack if it has a
+ * user-editable frame, else the nearest such frame walking the trigger
+ * ancestry, else the CPU-attributed execution frame, else a CDP async frame.
+ */
+export function resolveAttributedFrame(
+  rec: AsyncOperationRecord,
+  recordById: Map<number, AsyncOperationRecord>,
+  maxHops = 64,
+): AttributedFrame {
+  const selfFrame = rec.initStack.find((f) => isUserEditableFile(f.file));
+  if (selfFrame) return { frame: selfFrame, origin: 'self' };
+
+  const visited = new Set<number>([rec.asyncId]);
+  let current = recordById.get(rec.triggerAsyncId);
+  let hops = 0;
+  while (current && !visited.has(current.asyncId) && hops < maxHops) {
+    visited.add(current.asyncId);
+    const frame = current.initStack.find((f) => isUserEditableFile(f.file));
+    if (frame) return { frame, origin: 'inherited-trigger' };
+    current = recordById.get(current.triggerAsyncId);
+    hops += 1;
+  }
+
+  const executionFrame = rec.executionStack?.[0];
+  if (executionFrame) return { frame: executionFrame, origin: 'cpu-window' };
+
+  const cdpFrame =
+    rec.cdpAsyncContext?.frames[0] ??
+    rec.cdpAsyncContext?.asyncStack.find((segment) => segment.frames[0])?.frames[0];
+  if (cdpFrame) return { frame: cdpFrame, origin: 'cdp' };
+
+  return {};
+}
+
+export function isUserEditableFile(file: string): boolean {
+  if (file === '') return false;
+  return !(
+    file.startsWith('node:') ||
+    file.includes('/node_modules/') ||
+    file.includes('/pnpm-store/') ||
+    file.includes('/.pnpm/') ||
+    file.includes('/caches/pnpm-store/')
+  );
+}
+
+// --- internal helpers ---
+
+function clamp(value: number, min: number, max: number): number {
+  return Math.max(min, Math.min(max, value));
+}
+
+function totalMs(windows: readonly TimeWindow[]): number {
+  let total = 0;
+  for (const w of windows) total += Math.max(0, w.endMs - w.startMs);
+  return total;
+}
+
+function overlapMs(windows: readonly TimeWindow[], against: readonly TimeWindow[]): number {
+  let total = 0;
+  for (const w of windows) {
+    for (const a of against) {
+      total += Math.max(0, Math.min(w.endMs, a.endMs) - Math.max(w.startMs, a.startMs));
+    }
+  }
+  return total;
+}
+
+/**
+ * True when a stall was still active at the moment the resource became runnable
+ * (its first run, else the end of its wait). A stall that ended well before the
+ * resource ran did not cause its latency — it only coincided with the wait.
+ */
+function stallActiveAtReadiness(
+  stalls: readonly TimeWindow[],
+  input: Pick<LatencyCauseInput, 'waitWindows' | 'firstRunAtMs'>,
+): boolean {
+  let readiness = input.firstRunAtMs;
+  if (readiness === undefined) {
+    readiness = 0;
+    for (const w of input.waitWindows) readiness = Math.max(readiness, w.endMs);
+  }
+  const point = readiness;
+  return stalls.some(
+    (s) => s.startMs <= point && point <= s.endMs + ASYNC_STALL_READINESS_MARGIN_MS,
+  );
+}
+
+function mergeWindows(windows: readonly TimeWindow[]): TimeWindow[] {
+  if (windows.length === 0) return [];
+  const sorted = [...windows].sort((a, b) => a.startMs - b.startMs);
+  const first = sorted[0];
+  if (!first) return [];
+  const out: TimeWindow[] = [{ startMs: first.startMs, endMs: first.endMs }];
+  for (let i = 1; i < sorted.length; i += 1) {
+    const cur = sorted[i];
+    const prev = out[out.length - 1];
+    if (!cur || !prev) continue;
+    if (cur.startMs <= prev.endMs) prev.endMs = Math.max(prev.endMs, cur.endMs);
+    else out.push({ startMs: cur.startMs, endMs: cur.endMs });
+  }
+  return out;
+}
+
+function mean(values: readonly number[]): number {
+  if (values.length === 0) return 0;
+  let sum = 0;
+  for (const v of values) sum += v;
+  return sum / values.length;
+}
+
+function round(value: number): number {
+  return Math.round(value * 100) / 100;
+}
diff --git a/packages/core/src/kinds/async/probe.ts b/packages/core/src/kinds/async/probe.ts
index f6447cc..d6d0478 100644
--- a/packages/core/src/kinds/async/probe.ts
+++ b/packages/core/src/kinds/async/probe.ts
@@ -64,7 +64,7 @@ export function createAsyncProbe(options: AsyncProbeOptions): CaptureProbe<Async
           filteredCounts: {},
           instrumentationMode: read?.instrumentationMode ?? 'safe',
           attachPartialCapture: true,
-          clockSyncUncertaintyMs: read?.clockSyncUncertaintyMs ?? 0,
+          clockResolutionMs: read?.clockResolutionMs,
           cdpAsyncStackSupport: asyncStackSupport,
           cdpAsyncStackDepthRequested: options.asyncStackDepth,
           cdpAsyncContexts,
@@ -80,7 +80,7 @@ export function createAsyncProbe(options: AsyncProbeOptions): CaptureProbe<Async
         filteredCounts: read.filteredCounts,
         instrumentationMode: read.instrumentationMode ?? 'safe',
         attachPartialCapture: Boolean(read.attachPartialCapture),
-        clockSyncUncertaintyMs: read.clockSyncUncertaintyMs ?? 0,
+        clockResolutionMs: read.clockResolutionMs,
         cdpAsyncStackSupport: asyncStackSupport,
         cdpAsyncStackDepthRequested: options.asyncStackDepth,
         transformStats: read.transformStats,
diff --git a/packages/core/src/kinds/async/types.ts b/packages/core/src/kinds/async/types.ts
index 6a1c742..e438b26 100644
--- a/packages/core/src/kinds/async/types.ts
+++ b/packages/core/src/kinds/async/types.ts
@@ -21,6 +21,34 @@ export type AsyncOperationKind =
   | 'microtask'
   | 'other';
 
+/**
+ * Async families that represent genuine external I/O. Used by latency
+ * cause-classification to distinguish real I/O waits from loop/GC stalls.
+ */
+export const ASYNC_IO_KINDS: ReadonlySet<AsyncOperationKind> = new Set([
+  'tcp',
+  'udp',
+  'fs',
+  'http',
+  'http2',
+  'tls',
+  'dns',
+  'pipe',
+]);
+
+/** Why an async operation spent its wall-clock latency (root-cause classification). */
+export type AsyncLatencyCause =
+  | 'event-loop-blocked'
+  | 'gc-pause'
+  | 'downstream-async'
+  | 'io-wait'
+  | 'cpu-bound'
+  | 'background'
+  | 'unknown';
+
+/** Provenance of the user-code frame attributed to an async operation. */
+export type AsyncAttributedFrameOrigin = 'self' | 'inherited-trigger' | 'cpu-window' | 'cdp';
+
 /**
  * One async-resource lifecycle, post-aggregated in the preload hook from the
  * raw async_hooks callbacks. We deliberately drop sub-callback granularity
@@ -71,6 +99,8 @@ export interface AsyncOperationRecord {
   destroyedAtMs?: number;
   /** Total elapsed time the resource was alive (resolved or destroyed - init). */
   durationMs?: number;
+  /** ms from capture-start to the first `before` (first time the resource ran). */
+  firstRunAtMs?: number;
   /** Sum of (after - before) over the resource's run windows. */
   runMs: number;
   /** Number of times the resource ran (before/after pair count). */
@@ -129,6 +159,8 @@ export interface AsyncKindData {
   instrumentationMode?: AsyncInstrumentationMode;
   attachPartialCapture?: boolean;
   clockSyncUncertaintyMs?: number;
+  /** Measured `performance.now()` tick resolution in the target (ms). Feeds the reported clock-sync uncertainty. */
+  clockResolutionMs?: number;
   /**
    * Outcome of `Debugger.setAsyncCallStackDepth` at start. `unsupported`
    * means CDP rejected the call (older Node) — CPU samples will lack their
diff --git a/packages/core/src/kinds/cpu/analysis.ts b/packages/core/src/kinds/cpu/analysis.ts
index 263c740..411a8b5 100644
--- a/packages/core/src/kinds/cpu/analysis.ts
+++ b/packages/core/src/kinds/cpu/analysis.ts
@@ -95,6 +95,15 @@ export function createCpuAnalysisContributor(
       if (eventLoopCorrelation.coverage.windowCount > 0) {
         eventLoop.correlationCoverage = eventLoopCorrelation.coverage;
       }
+      // Per-stall blocking frame: the user frame that dominated CPU during *each*
+      // stall, so a delayed async op can be attributed to the specific stall that
+      // blocked it rather than the single globally-dominant frame.
+      for (const interval of eventLoop.stallIntervals) {
+        const top = correlateUserHotspotsWithCoverage(timedSamples, tree, [interval], {
+          topN: 1,
+        }).hotspots[0];
+        if (top) interval.topFrame = top;
+      }
 
       const summary = buildCpuSummary(tree);
       const quality = buildCpuProfileQuality({
diff --git a/packages/core/src/report/schema/async-profile.ts b/packages/core/src/report/schema/async-profile.ts
index 580c018..47c82d3 100644
--- a/packages/core/src/report/schema/async-profile.ts
+++ b/packages/core/src/report/schema/async-profile.ts
@@ -43,6 +43,19 @@ const asyncSummarySchema = z.object({
     .optional(),
   orphanCount: z.number().int().nonnegative(),
   recordsDropped: z.number().int().nonnegative(),
+  byKindLatency: z
+    .partialRecord(
+      asyncOperationKindSchema,
+      z.object({
+        count: z.number().int().nonnegative(),
+        p50Ms: z.number().nonnegative(),
+        p95Ms: z.number().nonnegative(),
+        p99Ms: z.number().nonnegative(),
+        maxMs: z.number().nonnegative(),
+        meanWaitMs: z.number().nonnegative(),
+      }),
+    )
+    .optional(),
   topAsyncHotFile: z
     .object({
       function: z.string(),
@@ -92,6 +105,25 @@ const asyncTopOperationSchema = z.object({
   initAtMs: z.number(),
   triggerAsyncId: z.number().int(),
   orphan: z.boolean(),
+  firstRunAtMs: z.number().optional(),
+  waitMs: z.number().nonnegative().optional(),
+  scheduleDelayMs: z.number().nonnegative().optional(),
+  latencyCause: z
+    .enum([
+      'event-loop-blocked',
+      'gc-pause',
+      'downstream-async',
+      'io-wait',
+      'cpu-bound',
+      'background',
+      'unknown',
+    ])
+    .optional(),
+  causeConfidence: z.enum(['low', 'medium', 'high']).optional(),
+  causeEvidence: z
+    .object({ overlapPct: z.number(), basis: z.string(), windowMs: z.number() })
+    .optional(),
+  attributedFrameOrigin: z.enum(['self', 'inherited-trigger', 'cpu-window', 'cdp']).optional(),
   initFrame: asyncStackFrameSchema.optional(),
   primaryFrame: asyncStackFrameSchema.optional(),
   primaryReason: z
@@ -170,12 +202,14 @@ const asyncQualitySchema = z.object({
   operationCount: z.number().int().nonnegative(),
   sampledStackRatio: z.number().min(0).max(1),
   initStackCoverageRatio: z.number().min(0).max(1),
+  attributedStackRatio: z.number().min(0).max(1),
   cdpAsyncStackCoverageRatio: z.number().min(0).max(1),
   recordsDropped: z.number().int().nonnegative(),
   maxRecords: z.number().int().nonnegative(),
   runWindowCount: z.number().int().nonnegative(),
   cpuAttributionCoveragePct: z.number().nonnegative(),
   cpuAmbiguousSamples: z.number().int().nonnegative(),
+  ambiguousRatio: z.number().min(0).max(1),
   clockSyncUncertaintyMs: z.number().nonnegative(),
   reasons: z.array(z.string()),
   recommendations: z.array(z.string()),
diff --git a/packages/core/src/report/schema/primitives.ts b/packages/core/src/report/schema/primitives.ts
index cb6e21b..b2dd094 100644
--- a/packages/core/src/report/schema/primitives.ts
+++ b/packages/core/src/report/schema/primitives.ts
@@ -112,4 +112,6 @@ export const stallIntervalSchema = z.object({
   startMs: z.number(),
   endMs: z.number(),
   maxLagMs: z.number(),
+  /** The user frame that dominated CPU during this specific stall (the culprit blocking the loop here), when CPU was captured. Lets consumers attribute a delayed op to the stall that actually blocked it rather than the globally-dominant frame. */
+  topFrame: correlatedHotspotSchema.optional(),
 });
diff --git a/packages/core/src/report/types.ts b/packages/core/src/report/types.ts
index 3334008..82a9cc7 100644
--- a/packages/core/src/report/types.ts
+++ b/packages/core/src/report/types.ts
@@ -207,7 +207,13 @@ export interface EventLoopReport {
   p50LagMs: number;
   meanLagMs: number;
   sampleCount: number;
-  stallIntervals: Array<{ startMs: number; endMs: number; maxLagMs: number }>;
+  stallIntervals: Array<{
+    startMs: number;
+    endMs: number;
+    maxLagMs: number;
+    /** User frame that dominated CPU during this specific stall (the culprit here), when CPU was captured. */
+    topFrame?: CorrelatedHotspot;
+  }>;
   available: boolean;
   measurementBasis: MeasurementBasis;
   confidence: MeasurementConfidence;
@@ -611,6 +617,28 @@ export type AsyncOperationKindReport =
   | 'microtask'
   | 'other';
 
+/** Why an async operation spent its wall-clock latency (mirrors kinds/async/types). */
+export type AsyncLatencyCause =
+  | 'event-loop-blocked'
+  | 'gc-pause'
+  | 'downstream-async'
+  | 'io-wait'
+  | 'cpu-bound'
+  | 'background'
+  | 'unknown';
+
+/** Provenance of the user-code frame attributed to an async operation. */
+export type AsyncAttributedFrameOrigin = 'self' | 'inherited-trigger' | 'cpu-window' | 'cdp';
+
+export interface AsyncByKindLatency {
+  count: number;
+  p50Ms: number;
+  p95Ms: number;
+  p99Ms: number;
+  maxMs: number;
+  meanWaitMs: number;
+}
+
 export interface AsyncSummary {
   available: boolean;
   collectedVia: 'async-hooks' | 'cdp-only' | 'unavailable';
@@ -631,6 +659,8 @@ export interface AsyncSummary {
   };
   orphanCount: number;
   recordsDropped: number;
+  /** Per-family latency percentiles + mean wait time (e.g. http p99 vs fs p99). */
+  byKindLatency?: Partial<Record<AsyncOperationKindReport, AsyncByKindLatency>>;
   topAsyncHotFile?: {
     function: string;
     file: string;
@@ -672,6 +702,19 @@ export interface AsyncTopOperation {
   initAtMs: number;
   triggerAsyncId: number;
   orphan: boolean;
+  /** ms from capture-start to first execution (first `before`) — scheduling/queue delay precursor. */
+  firstRunAtMs?: number;
+  /** Wall-clock time alive but NOT executing on CPU (durationMs - runMs). The real "latency". */
+  waitMs?: number;
+  /** init → first run: time the resource sat queued before it first executed. */
+  scheduleDelayMs?: number;
+  /** Classified root cause of the operation's latency. */
+  latencyCause?: AsyncLatencyCause;
+  causeConfidence?: ProfileConfidence;
+  /** Supporting numbers for the classified cause (overlap %, basis, overlap ms). */
+  causeEvidence?: { overlapPct: number; basis: string; windowMs: number };
+  /** Where the attributed user-code frame came from (self stack, trigger ancestry, CPU window, CDP). */
+  attributedFrameOrigin?: AsyncAttributedFrameOrigin;
   /** First user-code frame at init, when available. */
   initFrame?: AsyncStackFrameReport;
   primaryFrame?: AsyncStackFrameReport;
@@ -756,12 +799,16 @@ export interface AsyncProfileQuality {
   operationCount: number;
   sampledStackRatio: number;
   initStackCoverageRatio: number;
+  /** Fraction of operations with a user-editable frame (own stack or inherited via trigger chain). */
+  attributedStackRatio: number;
   cdpAsyncStackCoverageRatio: number;
   recordsDropped: number;
   maxRecords: number;
   runWindowCount: number;
   cpuAttributionCoveragePct: number;
   cpuAmbiguousSamples: number;
+  /** cpuAmbiguousSamples / (attributed + ambiguous) — drives graded CPU-attribution confidence. */
+  ambiguousRatio: number;
   clockSyncUncertaintyMs: number;
   reasons: string[];
   recommendations: string[];
diff --git a/packages/core/src/runtime-signals/hooks/installers/async-operations.ts b/packages/core/src/runtime-signals/hooks/installers/async-operations.ts
index 5ba8998..a1d92ab 100644
--- a/packages/core/src/runtime-signals/hooks/installers/async-operations.ts
+++ b/packages/core/src/runtime-signals/hooks/installers/async-operations.ts
@@ -42,7 +42,7 @@ export function createAsyncOperationsInstaller(
   const maxRecords = options.maxRecords ?? 50_000;
   const concurrencyIntervalMs = options.concurrencyIntervalMs ?? 100;
   const includeMicrotasks = Boolean(options.includeMicrotasks);
-  const stackDepth = options.stackDepth ?? 8;
+  const stackDepth = options.stackDepth ?? 16;
   const maxRunWindows = options.maxRunWindows ?? 8;
   const instrumentationMode = options.instrumentationMode ?? 'safe';
   const attachPartialCapture = Boolean(options.attachPartialCapture);
@@ -152,6 +152,7 @@ interface RawRecord {
   resolvedAtMs: number | undefined;
   destroyedAtMs: number | undefined;
   durationMs: number | undefined;
+  firstRunAtMs: number | undefined;
   runMs: number;
   runCount: number;
   orphan: boolean;
@@ -199,21 +200,20 @@ function installAsyncOperations(
         filteredCounts: {},
         instrumentationMode,
         attachPartialCapture,
-        clockSyncUncertaintyMs: 0,
+        clockResolutionMs: 0,
       }),
       disable,
     });
     return;
   }
 
-  // Measure performance.now() tick resolution. NOTE this is a lower bound
-  // on jitter between consecutive in-target observations — it does NOT
-  // capture the offset between the V8 sampling profiler's zero-point
-  // (Profiler.start) and the async installer's zero-point (captureStartMs
-  // below). Those two events fire at slightly different instants in the
-  // same V8 process; the skew is typically tens of ms. Cross-clock
-  // attribution in `kinds/async/analysis.ts` accepts that imprecision —
-  // run windows are ms-granularity by design.
+  // Measure performance.now() tick resolution (a lower bound on jitter
+  // between consecutive in-target observations), reported as
+  // `clockResolutionMs`. CPU↔async alignment is not exact: the CPU sampler's
+  // clock (Profiler.start) and the async installer's clock (captureStartMs)
+  // share the V8 monotonic source but have different zero-points, so analysis
+  // treats CPU sample times as ~capture-relative and reports the residual
+  // skew via `quality.clockSyncUncertaintyMs`.
   const measureClockResolutionMs = (): number => {
     let smallest = Number.POSITIVE_INFINITY;
     for (let i = 0; i < 200; i += 1) {
@@ -229,7 +229,7 @@ function installAsyncOperations(
     }
     return Number.isFinite(smallest) ? smallest : 0;
   };
-  const clockSyncUncertaintyMs = measureClockResolutionMs();
+  const clockResolutionMs = measureClockResolutionMs();
 
   let captureStartMs = api.performance.now();
   const records = new Map<number, RawRecord>();
@@ -253,6 +253,9 @@ function installAsyncOperations(
   // stack (which the previous FIFO-only design allowed).
   const pendingAwaitStacks: Array<{ stack: RawFrame[]; triggerId: number }> = [];
   const PENDING_AWAIT_CAP = 256;
+  // How many of the oldest completed records to scan when choosing an eviction
+  // victim under buffer pressure (bounded so the hot init path stays cheap).
+  const EVICTION_SCAN_LIMIT = 64;
   const transformStats = {
     transformed: 0,
     skipped: 0,
@@ -330,12 +333,26 @@ function installAsyncOperations(
         return;
       }
       if (records.size >= maxRecords) {
-        // Evict the oldest already-completed record (Set preserves insertion
-        // order) so a long-running capture keeps room for fresh data instead
-        // of blindly dropping new observations.
-        const oldestCompleted = completedRecords.values().next();
-        if (!oldestCompleted.done) {
-          const evictId = oldestCompleted.value;
+        // Under buffer pressure, evict the SHORTEST-duration record among the
+        // oldest completed candidates (bounded scan). Slow/long-lived ops are
+        // the ones that matter for latency analysis, so they survive load
+        // instead of being dropped FIFO. Only completed records are eligible
+        // (Set is insertion = completion order); if none have completed yet,
+        // drop the new observation as a last resort.
+        let evictId: number | undefined;
+        let evictDurationMs = Number.POSITIVE_INFINITY;
+        let scanned = 0;
+        for (const id of completedRecords) {
+          if (scanned >= EVICTION_SCAN_LIMIT) break;
+          scanned += 1;
+          const candidate = records.get(id);
+          const durationMs = candidate?.durationMs ?? 0;
+          if (durationMs < evictDurationMs) {
+            evictDurationMs = durationMs;
+            evictId = id;
+          }
+        }
+        if (evictId !== undefined) {
           completedRecords.delete(evictId);
           records.delete(evictId);
         } else {
@@ -353,6 +370,7 @@ function installAsyncOperations(
         resolvedAtMs: undefined,
         destroyedAtMs: undefined,
         durationMs: undefined,
+        firstRunAtMs: undefined,
         runMs: 0,
         runCount: 0,
         orphan: false,
@@ -373,6 +391,7 @@ function installAsyncOperations(
       const rec = records.get(asyncId);
       if (!rec) return;
       const now = api.performance.now() - captureStartMs;
+      if (rec.firstRunAtMs === undefined) rec.firstRunAtMs = now;
       openRuns.set(asyncId, now);
       activeCount += 1;
     },
@@ -859,7 +878,7 @@ function installAsyncOperations(
         filteredCounts: { ...filteredCounts },
         instrumentationMode,
         attachPartialCapture,
-        clockSyncUncertaintyMs,
+        clockResolutionMs,
         transformStats: { ...transformStats },
       };
       clearRetainedState();
diff --git a/packages/core/src/runtime-signals/readers/async-operations.ts b/packages/core/src/runtime-signals/readers/async-operations.ts
index c60f35b..60560fc 100644
--- a/packages/core/src/runtime-signals/readers/async-operations.ts
+++ b/packages/core/src/runtime-signals/readers/async-operations.ts
@@ -41,6 +41,7 @@ const recordSchema = z.object({
   resolvedAtMs: z.number().optional(),
   destroyedAtMs: z.number().optional(),
   durationMs: z.number().optional(),
+  firstRunAtMs: z.number().optional(),
   runMs: z.number().nonnegative(),
   runCount: z.number().int().nonnegative(),
   orphan: z.boolean(),
@@ -76,7 +77,7 @@ const readSchema = z.object({
   filteredCounts: z.record(z.string(), z.number().int().nonnegative()),
   instrumentationMode: z.enum(['off', 'safe', 'full']).optional(),
   attachPartialCapture: z.boolean().optional(),
-  clockSyncUncertaintyMs: z.number().nonnegative().optional(),
+  clockResolutionMs: z.number().nonnegative().optional(),
   transformStats: z
     .object({
       transformed: z.number().int().nonnegative(),
diff --git a/packages/core/src/shared/config.ts b/packages/core/src/shared/config.ts
index 2baaff7..4b30268 100644
--- a/packages/core/src/shared/config.ts
+++ b/packages/core/src/shared/config.ts
@@ -2,6 +2,17 @@ export const HEARTBEAT_RESOLUTION_MS = 20;
 export const GC_CORRELATION_LOOKAROUND_MS = 20;
 export const EVENT_LOOP_STALL_INTERVAL_MS = 200;
 
+/** Minimum % of an async op's wait time overlapping a signal window to attribute the cause to it. */
+export const ASYNC_CAUSE_MIN_OVERLAP_PCT = 50;
+/** runMs/durationMs at or above this classifies an async op's latency as CPU-bound (work, not waiting). */
+export const ASYNC_CPU_BOUND_RATIO = 0.6;
+/** An op that never ran and stayed alive past this fraction of the capture is a long-lived/idle background resource, not a latency bug. */
+export const ASYNC_BACKGROUND_DURATION_RATIO = 0.9;
+/** A resource that activated more than once and stayed alive past this fraction of the capture is a persistent/multiplexed handle (keep-alive socket, HTTP parser, pool, interval), not a discrete delayed callback. Lower than the idle ratio because runCount>1 is a strong persistence signal and connections start shortly after capture begins. */
+export const ASYNC_LONGLIVED_DURATION_RATIO = 0.8;
+/** Margin (ms) between a stall ending and a callback running that still counts as "the block delayed it" — covers heartbeat resolution and the scheduling hop, and separates a causal block from a coincidental overlap. */
+export const ASYNC_STALL_READINESS_MARGIN_MS = 50;
+
 export const INSPECTOR_STARTUP_TIMEOUT_MS = 5_000;
 export const TERMINATE_GRACE_MS = 500;
 export const TERMINATE_SIGTERM_WAIT_MS = 2_000;
diff --git a/packages/core/test/async-kind.test.ts b/packages/core/test/async-kind.test.ts
index 2173146..9f5434d 100644
--- a/packages/core/test/async-kind.test.ts
+++ b/packages/core/test/async-kind.test.ts
@@ -1370,4 +1370,189 @@ describe('async installer eviction', () => {
     expect(read.records.length).toBe(2);
     expect(read.integrity.recordsDropped).toBe(2);
   });
+
+  it('evicts the shortest-duration completed record under buffer pressure', () => {
+    let now = 0;
+    let globalValue: { read: () => AsyncKindData } | undefined;
+    let callbacks: Partial<AsyncHookCallbacks> = {};
+    vi.stubGlobal('setInterval', () => ({ unref() {} }));
+    vi.stubGlobal('clearInterval', () => {});
+    const api = {
+      performance: { now: () => now },
+      registerGlobal: (_name: string, value: unknown) => {
+        globalValue = value as typeof globalValue;
+      },
+      addResetHook: () => {},
+      getBuiltin: (name: string) =>
+        name === 'async_hooks'
+          ? {
+              createHook: (cbs: Partial<AsyncHookCallbacks>) => {
+                callbacks = cbs;
+                return { enable() {}, disable() {} };
+              },
+            }
+          : null,
+    };
+
+    const installer = createAsyncOperationsInstaller({ maxRecords: 3, instrumentationMode: 'off' });
+    new Function('__lanterna', installer.source)(api);
+    if (!globalValue || !callbacks.init || !callbacks.promiseResolve) {
+      throw new Error('install failed');
+    }
+
+    // Three completed records with durations 100 (id1), 10 (id2), 50 (id3).
+    now = 0;
+    callbacks.init(1, 'PROMISE', 0);
+    now = 5;
+    callbacks.init(2, 'PROMISE', 0);
+    now = 10;
+    callbacks.init(3, 'PROMISE', 0);
+    now = 15;
+    callbacks.promiseResolve(2); // dur 10
+    now = 60;
+    callbacks.promiseResolve(3); // dur 50
+    now = 100;
+    callbacks.promiseResolve(1); // dur 100
+    // Buffer is full; the new init must evict the SHORTEST completed (id2),
+    // keeping the slow/long ops that matter for latency analysis.
+    now = 200;
+    callbacks.init(4, 'PROMISE', 0);
+
+    const ids = globalValue
+      .read()
+      .records.map((r) => r.asyncId)
+      .sort((a, b) => a - b);
+    expect(ids).toEqual([1, 3, 4]);
+  });
+});
+
+describe('async CPU attribution precision', () => {
+  function cpuProfileOf(sampleCount: number): CaptureBundle['kinds']['cpu'] {
+    return {
+      cpuProfile: {
+        nodes: [
+          {
+            id: 1,
+            callFrame: {
+              functionName: '(root)',
+              scriptId: '0',
+              url: '',
+              lineNumber: -1,
+              columnNumber: -1,
+            },
+            hitCount: 0,
+            children: [],
+          },
+        ],
+        startTime: 0,
+        endTime: sampleCount * 1000,
+        samples: Array.from({ length: sampleCount }, () => 1),
+        timeDeltas: Array.from({ length: sampleCount }, () => 1000),
+      },
+      deopts: [],
+      samplesTimed: true,
+    };
+  }
+
+  it('attributes nested ancestor/descendant overlap to the innermost context, not ambiguous', () => {
+    // child #2 (run [50,150]) is nested inside parent #1 (run [0,200]).
+    const records: AsyncOperationRecord[] = [
+      record(1, 0, 200, 0, [{ startMs: 0, endMs: 200 }]),
+      record(2, 1, 100, 50, [{ startMs: 50, endMs: 150 }]),
+    ];
+    const bundle = makeBundle(records);
+    bundle.kinds.cpu = cpuProfileOf(200);
+    const pipeline = createAnalysisPipeline({ kinds: [createAsyncProfileKind()] });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    const attribution = result.profiles.async?.cpuAttribution;
+    expect(attribution?.available).toBe(true);
+    expect(attribution?.cpuAmbiguousSamples).toBe(0);
+    expect(result.profiles.async?.quality.ambiguousRatio).toBe(0);
+    expect(attribution?.attributedCpuPct).toBeGreaterThan(50);
+  });
+
+  it('counts unrelated overlapping windows as ambiguous without collapsing confidence', () => {
+    // #1 and #2 share no ancestry (both triggerAsyncId 0); they overlap only in [190,200].
+    const records: AsyncOperationRecord[] = [
+      withFrame(record(1, 0, 200, 0, [{ startMs: 0, endMs: 200 }]), {
+        function: 'a',
+        file: 'file:///app/a.js',
+        line: 1,
+      }),
+      withFrame(record(2, 0, 20, 180, [{ startMs: 190, endMs: 200 }]), {
+        function: 'b',
+        file: 'file:///app/b.js',
+        line: 1,
+      }),
+    ];
+    const bundle = makeBundle(records);
+    bundle.kinds.cpu = cpuProfileOf(200);
+    const pipeline = createAnalysisPipeline({ kinds: [createAsyncProfileKind()] });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    const attribution = result.profiles.async?.cpuAttribution;
+    expect(attribution?.cpuAmbiguousSamples).toBeGreaterThan(0);
+    expect(attribution?.cpuAmbiguousSamples).toBeLessThan(40);
+    // Small ambiguous ratio must NOT force the whole kind to low confidence.
+    expect(result.profiles.async?.quality.confidence).not.toBe('low');
+  });
+});
+
+describe('async latency cause wiring (regression)', () => {
+  function signalBundle(
+    records: AsyncOperationRecord[],
+    signals: {
+      gcEvents?: Array<{ atMs: number; kind: string; durationMs: number }>;
+      eventLoopSamples?: Array<{ atMs: number; lagMs: number }>;
+    } = {},
+  ): CaptureBundle {
+    const base = makeBundle(records);
+    return {
+      ...base,
+      captureIntegrity: {
+        ...base.captureIntegrity,
+        gcTimed: (signals.gcEvents?.length ?? 0) > 0,
+        eventLoopTimed: (signals.eventLoopSamples?.length ?? 0) > 0,
+      },
+      runtimeSignals: {
+        ...base.runtimeSignals,
+        gcEvents: signals.gcEvents ?? [],
+        eventLoopSamples: signals.eventLoopSamples ?? [],
+        eventLoopAvailable: (signals.eventLoopSamples?.length ?? 0) > 0,
+      },
+    };
+  }
+
+  it('does not blanket-classify waits as gc-pause from many tiny scavenges', () => {
+    // 50 sub-millisecond GC events densely spread across a 200ms wait. Their real
+    // overlap is ~5ms (2.5%), so the completed op must NOT be gc-pause. Padding
+    // each event by ±20ms would tile the timeline and force a spurious gc-pause.
+    const gcEvents = Array.from({ length: 50 }, (_, i) => ({
+      atMs: i * 4,
+      kind: 'minor',
+      durationMs: 0.1,
+    }));
+    const op = record(1, 0, 200, 0); // completed promise, waited 200ms
+    const result = createAnalysisPipeline({ kinds: [createAsyncProfileKind()] }).run(
+      signalBundle([op], { gcEvents }),
+      { command: ['node', 'app.js'], mode: 'spawn' },
+    );
+    const top = result.profiles.async?.topOperations.find((o) => o.asyncId === 1);
+    expect(top?.latencyCause).not.toBe('gc-pause');
+  });
+
+  it('excludes orphans from topOperations and byKindLatency', () => {
+    const completed = record(1, 0, 100, 0); // promise, duration 100ms
+    const orphan = record(2, 0, undefined, 0); // never resolved -> capture-length duration
+    const result = createAnalysisPipeline({ kinds: [createAsyncProfileKind()] }).run(
+      signalBundle([completed, orphan]),
+      { command: ['node', 'app.js'], mode: 'spawn' },
+    );
+    const async = result.profiles.async;
+    expect(async?.topOperations.some((o) => o.asyncId === 2)).toBe(false);
+    expect(async?.orphans.some((o) => o.asyncId === 2)).toBe(true);
+    // byKindLatency reflects only the completed op, not the orphan's fictional
+    // capture-length duration.
+    expect(async?.summary.byKindLatency?.promise?.count).toBe(1);
+    expect(async?.summary.byKindLatency?.promise?.maxMs).toBe(100);
+  });
 });
diff --git a/packages/core/test/async-latency.test.ts b/packages/core/test/async-latency.test.ts
new file mode 100644
index 0000000..2ce070b
--- /dev/null
+++ b/packages/core/test/async-latency.test.ts
@@ -0,0 +1,336 @@
+import { describe, expect, it } from 'vitest';
+import { buildEventLoopStallWindows } from '../src/analysis/model/event-loop-report.js';
+import {
+  buildByKindLatency,
+  buildWaitWindows,
+  classifyLatencyCause,
+  collectDescendantRunWindows,
+  deriveLatency,
+  isUserEditableFile,
+  resolveAttributedFrame,
+} from '../src/kinds/async/latency.js';
+import type {
+  AsyncChainNode,
+  AsyncOperationRecord,
+  AsyncStackFrame,
+} from '../src/kinds/async/types.js';
+
+function rec(p: Partial<AsyncOperationRecord> & { asyncId: number }): AsyncOperationRecord {
+  return {
+    triggerAsyncId: 0,
+    kind: 'promise',
+    rawType: 'PROMISE',
+    initAtMs: 0,
+    runMs: 0,
+    runCount: 0,
+    orphan: false,
+    initStack: [],
+    runWindows: [],
+    ...p,
+  };
+}
+
+function frame(file: string, fn = 'fn', line = 1): AsyncStackFrame {
+  return { function: fn, file, line, column: 1 };
+}
+
+const userFile = '/app/src/handler.js';
+const depFile = '/app/node_modules/pg/lib/client.js';
+
+describe('buildWaitWindows', () => {
+  it('returns the whole lifetime when the resource never ran', () => {
+    expect(buildWaitWindows(rec({ asyncId: 1, initAtMs: 0 }), 100)).toEqual([
+      { startMs: 0, endMs: 100 },
+    ]);
+  });
+
+  it('emits the gaps around run windows', () => {
+    const r = rec({ asyncId: 1, initAtMs: 0, runWindows: [{ startMs: 20, endMs: 30 }] });
+    expect(buildWaitWindows(r, 100)).toEqual([
+      { startMs: 0, endMs: 20 },
+      { startMs: 30, endMs: 100 },
+    ]);
+  });
+
+  it('returns nothing when run windows cover the whole lifetime', () => {
+    const r = rec({ asyncId: 1, initAtMs: 0, runWindows: [{ startMs: 0, endMs: 100 }] });
+    expect(buildWaitWindows(r, 100)).toEqual([]);
+  });
+
+  it('clamps run windows that exceed the lifetime', () => {
+    const r = rec({ asyncId: 1, initAtMs: 10, runWindows: [{ startMs: 5, endMs: 200 }] });
+    expect(buildWaitWindows(r, 100)).toEqual([]);
+  });
+});
+
+describe('deriveLatency', () => {
+  it('computes waitMs as duration minus runMs', () => {
+    expect(deriveLatency(rec({ asyncId: 1, initAtMs: 0, runMs: 30 }), 100)).toEqual({ waitMs: 70 });
+  });
+
+  it('derives scheduleDelayMs from firstRunAtMs', () => {
+    const r = rec({ asyncId: 1, initAtMs: 10, firstRunAtMs: 35, runMs: 5 });
+    expect(deriveLatency(r, 100)).toEqual({ waitMs: 85, scheduleDelayMs: 25 });
+  });
+
+  it('clamps waitMs at zero', () => {
+    expect(deriveLatency(rec({ asyncId: 1, initAtMs: 0, runMs: 200 }), 100).waitMs).toBe(0);
+  });
+});
+
+describe('classifyLatencyCause', () => {
+  const base = {
+    waitWindows: [{ startMs: 0, endMs: 100 }],
+    stallWindows: [],
+    gcWindows: [],
+    descendantWindows: [],
+    kind: 'promise' as const,
+    runMs: 0,
+    durationMs: 100,
+    captureDurationMs: 100_000,
+    signals: { eventLoop: true, gc: true },
+  };
+
+  it('classifies a long-lived idle resource as background', () => {
+    const r = classifyLatencyCause({
+      ...base,
+      durationMs: 95_000,
+      captureDurationMs: 100_000,
+      waitWindows: [{ startMs: 0, endMs: 95_000 }],
+    });
+    expect(r.cause).toBe('background');
+  });
+
+  it('records no-eventloop-signal when the loop could not be observed', () => {
+    const blind = classifyLatencyCause({ ...base, signals: { eventLoop: false, gc: true } });
+    expect(blind.cause).toBe('unknown');
+    expect(blind.evidence.basis).toBe('no-eventloop-signal');
+    const seen = classifyLatencyCause({ ...base, signals: { eventLoop: true, gc: true } });
+    expect(seen.evidence.basis).toBe('none');
+  });
+
+  it('classifies cpu-bound when runMs dominates the duration', () => {
+    const r = classifyLatencyCause({ ...base, runMs: 80 });
+    expect(r.cause).toBe('cpu-bound');
+  });
+
+  it('classifies a long-lived multiplexed handle as background', () => {
+    // A keep-alive connection / interval: activated many times, alive ~the whole
+    // capture, low CPU. Its aggregate wait is idle gaps between activations, not a
+    // blocked callback — so even a full stall overlap must not make it event-loop-blocked.
+    const r = classifyLatencyCause({
+      ...base,
+      durationMs: 9000,
+      captureDurationMs: 10_000,
+      runMs: 200,
+      runCount: 50,
+      waitWindows: [{ startMs: 0, endMs: 9000 }],
+      stallWindows: [{ startMs: 0, endMs: 9000 }],
+    });
+    expect(r.cause).toBe('background');
+    expect(r.evidence.basis).toBe('long-lived-multiplexed');
+  });
+
+  it('does not treat a single-run long I/O as a multiplexed handle (preserves io-wait)', () => {
+    // One slow I/O that spans most of the capture but ran once stays io-wait; the
+    // runCount>1 discriminator protects genuine single slow operations.
+    const r = classifyLatencyCause({
+      ...base,
+      kind: 'fs',
+      durationMs: 8500,
+      captureDurationMs: 10_000,
+      runMs: 0,
+      runCount: 1,
+      waitWindows: [{ startMs: 0, endMs: 8500 }],
+    });
+    expect(r.cause).toBe('io-wait');
+  });
+
+  it('classifies event-loop-blocked when the wait overlaps a stall', () => {
+    const r = classifyLatencyCause({ ...base, stallWindows: [{ startMs: 0, endMs: 70 }] });
+    expect(r.cause).toBe('event-loop-blocked');
+    expect(r.evidence.basis).toBe('event-loop-stall');
+  });
+
+  it('classifies gc-pause when the wait overlaps GC windows', () => {
+    const r = classifyLatencyCause({ ...base, gcWindows: [{ startMs: 10, endMs: 80 }] });
+    expect(r.cause).toBe('gc-pause');
+  });
+
+  it('classifies downstream-async when descendants ran during the wait', () => {
+    const r = classifyLatencyCause({ ...base, descendantWindows: [{ startMs: 0, endMs: 90 }] });
+    expect(r.cause).toBe('downstream-async');
+  });
+
+  it('prefers event-loop-blocked over a competing GC overlap', () => {
+    const r = classifyLatencyCause({
+      ...base,
+      stallWindows: [{ startMs: 0, endMs: 80 }],
+      gcWindows: [{ startMs: 0, endMs: 60 }],
+    });
+    expect(r.cause).toBe('event-loop-blocked');
+  });
+
+  it('classifies io-wait for I/O kinds with no signal overlap', () => {
+    expect(classifyLatencyCause({ ...base, kind: 'fs' }).cause).toBe('io-wait');
+  });
+
+  it('falls back to unknown for a promise with no signal overlap', () => {
+    expect(classifyLatencyCause(base).cause).toBe('unknown');
+  });
+
+  it('rejects event-loop-blocked when the stall ended well before the callback ran', () => {
+    // A genuinely slow 500ms wait whose window happens to span a 260ms stall
+    // (52% overlap) that finished at 310ms — long before the op ran at 500ms.
+    // The block did not cause the latency, so this must not be event-loop-blocked.
+    const r = classifyLatencyCause({
+      ...base,
+      durationMs: 500,
+      waitWindows: [{ startMs: 0, endMs: 500 }],
+      stallWindows: [{ startMs: 50, endMs: 310 }],
+      firstRunAtMs: 500,
+    });
+    expect(r.cause).not.toBe('event-loop-blocked');
+  });
+
+  it('keeps event-loop-blocked when the loop was still stalled as the callback ran', () => {
+    const r = classifyLatencyCause({
+      ...base,
+      durationMs: 360,
+      waitWindows: [{ startMs: 0, endMs: 360 }],
+      stallWindows: [{ startMs: 0, endMs: 350 }],
+      firstRunAtMs: 360,
+    });
+    expect(r.cause).toBe('event-loop-blocked');
+  });
+
+  it('prefers event-loop-blocked over a larger GC overlap when the loop was blocked at run time', () => {
+    const r = classifyLatencyCause({
+      ...base,
+      stallWindows: [{ startMs: 40, endMs: 100 }], // 60%
+      gcWindows: [{ startMs: 0, endMs: 100 }], // 100% — must not win
+      firstRunAtMs: 100,
+    });
+    expect(r.cause).toBe('event-loop-blocked');
+    expect(r.evidence.basis).toBe('event-loop-stall');
+  });
+});
+
+describe('buildByKindLatency', () => {
+  it('buckets duration percentiles and mean wait per kind', () => {
+    const records = [
+      rec({ asyncId: 1, kind: 'fs', initAtMs: 0, durationMs: 100, runMs: 10 }),
+      rec({ asyncId: 2, kind: 'fs', initAtMs: 0, durationMs: 300, runMs: 10 }),
+      rec({ asyncId: 3, kind: 'http', initAtMs: 0, durationMs: 50, runMs: 0 }),
+    ];
+    const out = buildByKindLatency(records, 1000);
+    expect(out.fs?.count).toBe(2);
+    expect(out.fs?.maxMs).toBe(300);
+    expect(out.fs?.meanWaitMs).toBe(190); // ((100-10) + (300-10)) / 2
+    expect(out.http?.count).toBe(1);
+  });
+});
+
+describe('resolveAttributedFrame', () => {
+  it('uses the op own stack when it has a user frame', () => {
+    const r = rec({ asyncId: 1, initStack: [frame(userFile)] });
+    const map = new Map([[1, r]]);
+    expect(resolveAttributedFrame(r, map)).toMatchObject({ origin: 'self' });
+  });
+
+  it('inherits the nearest user frame from the trigger ancestry', () => {
+    const parent = rec({ asyncId: 1, initStack: [frame(userFile, 'parent')] });
+    const child = rec({ asyncId: 2, triggerAsyncId: 1, initStack: [frame(depFile)] });
+    const map = new Map([
+      [1, parent],
+      [2, child],
+    ]);
+    const out = resolveAttributedFrame(child, map);
+    expect(out.origin).toBe('inherited-trigger');
+    expect(out.frame?.function).toBe('parent');
+  });
+
+  it('falls back to the CPU execution frame', () => {
+    const r = rec({
+      asyncId: 1,
+      initStack: [frame(depFile)],
+      executionStack: [frame(userFile, 'hot')],
+    });
+    expect(resolveAttributedFrame(r, new Map([[1, r]]))).toMatchObject({ origin: 'cpu-window' });
+  });
+
+  it('returns no origin when nothing user-editable exists', () => {
+    const r = rec({ asyncId: 1, initStack: [frame(depFile)] });
+    expect(resolveAttributedFrame(r, new Map([[1, r]])).origin).toBeUndefined();
+  });
+
+  it('does not loop on a cyclic trigger chain', () => {
+    const a = rec({ asyncId: 1, triggerAsyncId: 2, initStack: [frame(depFile)] });
+    const b = rec({ asyncId: 2, triggerAsyncId: 1, initStack: [frame(depFile)] });
+    const map = new Map([
+      [1, a],
+      [2, b],
+    ]);
+    expect(resolveAttributedFrame(a, map).origin).toBeUndefined();
+  });
+});
+
+describe('collectDescendantRunWindows', () => {
+  it('gathers run windows across the trigger subtree, guarding cycles', () => {
+    const nodes = new Map<number, AsyncChainNode>([
+      [1, node(1, [2])],
+      [2, node(2, [3])],
+      [3, node(3, [1])], // cycle back to root
+    ]);
+    const records = new Map<number, AsyncOperationRecord>([
+      [1, rec({ asyncId: 1 })],
+      [2, rec({ asyncId: 2, runWindows: [{ startMs: 10, endMs: 20 }] })],
+      [3, rec({ asyncId: 3, runWindows: [{ startMs: 30, endMs: 40 }] })],
+    ]);
+    const windows = collectDescendantRunWindows(1, nodes, records);
+    expect(windows).toEqual([
+      { startMs: 10, endMs: 20 },
+      { startMs: 30, endMs: 40 },
+    ]);
+  });
+});
+
+describe('isUserEditableFile', () => {
+  it('accepts user paths and rejects deps/builtins', () => {
+    expect(isUserEditableFile(userFile)).toBe(true);
+    expect(isUserEditableFile(depFile)).toBe(false);
+    expect(isUserEditableFile('node:internal/timers')).toBe(false);
+    expect(isUserEditableFile('')).toBe(false);
+  });
+});
+
+describe('buildEventLoopStallWindows', () => {
+  it('emits [atMs - lagMs, atMs] for samples crossing the threshold', () => {
+    // durationMs == last sample atMs so no trailing-lag window is added.
+    const windows = buildEventLoopStallWindows([{ atMs: 500, lagMs: 400 }], 500, 20);
+    expect(windows).toEqual([{ startMs: 100, endMs: 500, maxLagMs: 400 }]);
+  });
+
+  it('ignores samples below the stall threshold', () => {
+    expect(buildEventLoopStallWindows([{ atMs: 500, lagMs: 50 }], 510, 20)).toEqual([]);
+  });
+
+  it('adds a trailing stall window when the loop was still blocked at capture end', () => {
+    const windows = buildEventLoopStallWindows([{ atMs: 100, lagMs: 10 }], 1000, 20);
+    expect(windows).toEqual([{ startMs: 120, endMs: 1000, maxLagMs: 880 }]);
+  });
+});
+
+function node(asyncId: number, childIds: number[]): AsyncChainNode {
+  return {
+    asyncId,
+    kind: 'promise',
+    rawType: 'PROMISE',
+    durationMs: 0,
+    runMs: 0,
+    initAtMs: 0,
+    depth: 0,
+    childIds,
+    orphan: false,
+  };
+}
diff --git a/packages/detectors/src/config.ts b/packages/detectors/src/config.ts
index 0be2ec0..61ff82d 100644
--- a/packages/detectors/src/config.ts
+++ b/packages/detectors/src/config.ts
@@ -114,6 +114,18 @@ export interface DetectorThresholds {
   readonly deepAsyncChain: DeepAsyncChainThresholds;
   readonly microtaskFlood: MicrotaskFloodThresholds;
   readonly hotAsyncContext: HotAsyncContextThresholds;
+  readonly eventLoopBlockedAsync: EventLoopBlockedAsyncThresholds;
+}
+
+export interface EventLoopBlockedAsyncThresholds {
+  /** Async op wait time (ms) that triggers a finding. */
+  readonly minWaitMs: number;
+  /** Above this wait the finding is `critical`. */
+  readonly criticalWaitMs: number;
+  /** Only blame a stall when the overlapping event-loop lag is at least this (ms). */
+  readonly minStallLagMs: number;
+  /** Cap on emitted findings. */
+  readonly maxFindings: number;
 }
 
 export interface HotAsyncContextThresholds {
@@ -372,4 +384,13 @@ export const DETECTOR_THRESHOLDS: DetectorThresholds = {
     minAttributedCoveragePct: 15,
     maxFindings: 3,
   },
+  // Async ops whose latency is explained by a blocked event loop (not the
+  // awaited I/O). 100ms wait blames the loop; 500ms is unambiguously bad. Only
+  // attribute when a stall ≥200ms (the event-loop stall threshold) overlaps.
+  eventLoopBlockedAsync: {
+    minWaitMs: 100,
+    criticalWaitMs: 500,
+    minStallLagMs: 200,
+    maxFindings: 5,
+  },
 };
diff --git a/packages/detectors/src/detectors/async-evidence.ts b/packages/detectors/src/detectors/async-evidence.ts
index 0ff6a40..f9c46fa 100644
--- a/packages/detectors/src/detectors/async-evidence.ts
+++ b/packages/detectors/src/detectors/async-evidence.ts
@@ -60,12 +60,26 @@ export function asyncEvidenceExtra(
     cdpAsyncStackCoverageRatio: report.quality.cdpAsyncStackCoverageRatio,
     instrumentationMode: report.quality.instrumentationMode,
     attachPartialCapture: report.quality.attachPartialCapture,
+    attributedStackRatio: report.quality.attributedStackRatio,
     cpuAttributionCoveragePct: report.quality.cpuAttributionCoveragePct,
     cpuAmbiguousSamples: report.quality.cpuAmbiguousSamples,
+    ambiguousRatio: report.quality.ambiguousRatio,
     clockSyncUncertaintyMs: report.quality.clockSyncUncertaintyMs,
   };
 }
 
+/** Returns the weaker of two confidence levels (an undefined input is ignored). */
+export function minConfidence(
+  a: BaseFinding['confidence'],
+  b: BaseFinding['confidence'],
+): BaseFinding['confidence'] {
+  if (a === undefined) return b;
+  if (b === undefined) return a;
+  const rank = (c: 'low' | 'medium' | 'high'): number =>
+    c === 'high' ? 2 : c === 'medium' ? 1 : 0;
+  return rank(a) <= rank(b) ? a : b;
+}
+
 export function resolveAsyncUserCaller(
   entity: { userCaller?: UserCallerAttribution } | undefined,
   fallbackFrame: AsyncStackFrameReport | undefined,
diff --git a/packages/detectors/src/detectors/async-index.ts b/packages/detectors/src/detectors/async-index.ts
index eeeec59..51e23f3 100644
--- a/packages/detectors/src/detectors/async-index.ts
+++ b/packages/detectors/src/detectors/async-index.ts
@@ -5,15 +5,16 @@ import {
   type ProfileSectionMap,
 } from '@lanterna-profiler/core';
 import { deepAsyncChainDetector } from './deep-async-chain.js';
+import { eventLoopBlockedAsyncDetector } from './event-loop-blocked-async.js';
 import { hotAsyncContextDetector } from './hot-async-context.js';
 import { longAwaitDetector } from './long-await.js';
 import { microtaskFloodDetector } from './microtask-flood.js';
 import { orphanAsyncResourceDetector } from './orphan-async-resource.js';
 
 /**
- * Async-scoped detectors. `hot-async-context` is multi-kind (`cpu` + `async`)
- * and auto-skipped when CPU isn't captured — same shape as `alloc-in-hot-path`
- * in the memory pack.
+ * Async-scoped detectors. `hot-async-context` and `event-loop-blocked-async`
+ * are multi-kind (`cpu` + `async`) and auto-skipped when CPU isn't captured —
+ * same shape as `alloc-in-hot-path` in the memory pack.
  */
 export const ASYNC_DETECTORS: ReadonlyArray<KindScopedDetector<keyof ProfileSectionMap>> = [
   longAwaitDetector,
@@ -21,6 +22,7 @@ export const ASYNC_DETECTORS: ReadonlyArray<KindScopedDetector<keyof ProfileSect
   deepAsyncChainDetector,
   microtaskFloodDetector,
   hotAsyncContextDetector,
+  eventLoopBlockedAsyncDetector,
 ];
 
 export function createBuiltInAsyncFindingAnalyzers(): FindingAnalyzer[] {
diff --git a/packages/detectors/src/detectors/event-loop-blocked-async.ts b/packages/detectors/src/detectors/event-loop-blocked-async.ts
new file mode 100644
index 0000000..8a33a79
--- /dev/null
+++ b/packages/detectors/src/detectors/event-loop-blocked-async.ts
@@ -0,0 +1,171 @@
+import type {
+  AsyncStackFrameReport,
+  AsyncTopOperation,
+  BaseFinding,
+  CorrelatedHotspot,
+  Finding,
+  KindScopedDetector,
+} from '@lanterna-profiler/core';
+import { DETECTOR_THRESHOLDS } from '../config.js';
+import {
+  anchorForFrame,
+  asyncConfidence,
+  asyncEvidenceExtra,
+  minConfidence,
+  resolveAsyncUserCaller,
+} from './async-evidence.js';
+
+/**
+ * Cross-kind detector: surfaces async operations whose latency the analysis
+ * classified as `event-loop-blocked`, and points at the synchronous CPU frame
+ * that blocked the loop — not the (innocent) async call site. This is the
+ * answer to "the await was slow but the I/O wasn't": the callback was ready,
+ * the loop was busy. Auto-skips when CPU isn't captured.
+ */
+export const eventLoopBlockedAsyncDetector: KindScopedDetector<'cpu' | 'async'> = {
+  id: 'event-loop-blocked-async',
+  kindIds: ['cpu', 'async'],
+  detect({ async, cpu }): Finding[] {
+    const thresholds = DETECTOR_THRESHOLDS.eventLoopBlockedAsync;
+    const eventLoop = cpu.report.eventLoop;
+    if (!eventLoop.available || eventLoop.stallIntervals.length === 0) return [];
+
+    const globalBlocking = [...(eventLoop.correlatedHotspots ?? [])].sort(
+      (a, b) => b.overlapPct - a.overlapPct,
+    )[0];
+    // This detector's whole value is naming the synchronous frame that blocked
+    // the loop. With no correlated CPU hotspot we have no culprit to point at,
+    // and the generic event-loop-stall finding already reports "the loop was
+    // blocked" — emitting a critical finding anchored at a placeholder
+    // `(event-loop)` frame would be noise, so stand down.
+    if (!globalBlocking) return [];
+    const maxStallLagMs = eventLoop.stallIntervals.reduce((m, s) => Math.max(m, s.maxLagMs), 0);
+    if (maxStallLagMs < thresholds.minStallLagMs) return [];
+
+    const dropped = async.report.summary.recordsDropped > 0;
+    const findings: Finding[] = [];
+    for (const op of async.report.topOperations) {
+      if (findings.length >= thresholds.maxFindings) break;
+      if (op.latencyCause !== 'event-loop-blocked') continue;
+      const waitMs = op.waitMs ?? 0;
+      if (waitMs < thresholds.minWaitMs) continue;
+
+      // Attribute to the *specific* stall that blocked this op (the one active
+      // when it became runnable), so several distinct blockers each point at
+      // their own frame instead of the globally-dominant one. Fall back to the
+      // global hotspot when the op's run time matches no stall.
+      const opStall = blockingStallForOp(op, eventLoop.stallIntervals);
+      const blocking = opStall?.topFrame ?? globalBlocking;
+      const stallMaxLagMs = opStall?.maxLagMs ?? maxStallLagMs;
+
+      const severity: BaseFinding['severity'] =
+        waitMs >= thresholds.criticalWaitMs ? 'critical' : 'warning';
+      const asyncFrame = op.primaryFrame ?? op.initFrame ?? op.creationFrame;
+      // `blocking` is always a real user frame (the per-stall culprit or the
+      // global fallback), so the finding anchors on an actionable location.
+      const blockingFrame = toFrame(blocking);
+      const anchor = anchorForFrame(async.report, blockingFrame);
+      const userCaller = resolveAsyncUserCaller(undefined, blockingFrame, {
+        profilePct: blocking.overlapPct,
+        supportPct: 100,
+        confidence: blocking.confidence,
+        basis: 'async-cpu-window',
+      });
+      const confidence: BaseFinding['confidence'] = dropped
+        ? 'low'
+        : minConfidence(
+            minConfidence(asyncConfidence(async.report, 'high'), op.causeConfidence),
+            blocking.confidence,
+          );
+
+      findings.push({
+        id: `event-loop-blocked-async:${op.asyncId}`,
+        profileKind: 'async',
+        severity,
+        category: 'event-loop-blocked-async',
+        title: `Async ${op.kind} stalled ${waitMs.toFixed(0)}ms by a blocked event loop in \`${blockingFrame.function}\``,
+        confidence,
+        proofLevel: 'correlated-window',
+        evidence: {
+          file: blockingFrame.file,
+          line: blockingFrame.line,
+          function: blockingFrame.function,
+          selfPct: blocking.samplePct,
+          ...(blockingFrame.source ? { source: blockingFrame.source } : {}),
+          extra: {
+            asyncId: op.asyncId,
+            kind: op.kind,
+            rawType: op.rawType,
+            waitMs,
+            scheduleDelayMs: op.scheduleDelayMs ?? null,
+            durationMs: op.durationMs,
+            runMs: op.runMs,
+            asyncCallSite: asyncFrame ?? null,
+            blockingFrame,
+            stallMaxLagMs,
+            causeEvidence: op.causeEvidence ?? null,
+            attributedFrameOrigin: op.attributedFrameOrigin ?? null,
+            ...(userCaller ? { userCaller } : {}),
+            ...asyncEvidenceExtra(async.report, anchor),
+          },
+        },
+        measurements: {
+          observed: { waitMs, durationMs: op.durationMs, stallMaxLagMs },
+          thresholds: {
+            minWaitMs: thresholds.minWaitMs,
+            criticalWaitMs: thresholds.criticalWaitMs,
+            minStallLagMs: thresholds.minStallLagMs,
+          },
+        },
+        why: buildWhy(op, waitMs, blockingFrame, stallMaxLagMs, dropped),
+        suggestion: `The async ${op.kind} was ready but the event loop was blocked. Open \`${blockingFrame.file}:${blockingFrame.line}\` (\`${blockingFrame.function}\`) — the synchronous work there is the real cause. Offload it to a worker thread, make it async, or chunk it. Patching the await site will not help.`,
+        references: ['https://nodejs.org/en/docs/guides/dont-block-the-event-loop'],
+      });
+    }
+    return findings;
+  },
+};
+
+/** Margin (ms) between a stall ending and the callback running that still counts as "this stall blocked it" — mirrors the latency classifier's readiness margin. */
+const STALL_MATCH_MARGIN_MS = 50;
+
+type StallInterval = {
+  startMs: number;
+  endMs: number;
+  maxLagMs: number;
+  topFrame?: CorrelatedHotspot;
+};
+
+/** The stall that delayed this op: the one still active when it became runnable. */
+function blockingStallForOp(
+  op: AsyncTopOperation,
+  stallIntervals: readonly StallInterval[],
+): StallInterval | undefined {
+  const ranAtMs = op.firstRunAtMs;
+  if (ranAtMs === undefined) return undefined;
+  return stallIntervals.find(
+    (s) => s.startMs <= ranAtMs && ranAtMs <= s.endMs + STALL_MATCH_MARGIN_MS,
+  );
+}
+
+function toFrame(hotspot: CorrelatedHotspot): AsyncStackFrameReport {
+  return {
+    function: hotspot.function,
+    file: hotspot.file,
+    line: hotspot.line,
+    column: 0,
+    ...(hotspot.source ? { source: hotspot.source } : {}),
+  };
+}
+
+function buildWhy(
+  op: AsyncTopOperation,
+  waitMs: number,
+  blockingFrame: AsyncStackFrameReport,
+  stallMaxLagMs: number,
+  dropped: boolean,
+): string {
+  const culprit = `synchronous work in \`${blockingFrame.function}\` at \`${blockingFrame.file}:${blockingFrame.line}\``;
+  const drop = dropped ? ' (recordsDropped > 0; lower bound.)' : '';
+  return `Async ${op.kind} #${op.asyncId} spent ~${waitMs.toFixed(0)}ms waiting, and its wait overlaps an event-loop stall (lag up to ${stallMaxLagMs.toFixed(0)}ms). The callback was ready but could not run because the loop was blocked by ${culprit}. The latency is a blocked event loop, not slow I/O.${drop}`;
+}
diff --git a/packages/detectors/src/detectors/long-await.ts b/packages/detectors/src/detectors/long-await.ts
index 989c59f..c802cc3 100644
--- a/packages/detectors/src/detectors/long-await.ts
+++ b/packages/detectors/src/detectors/long-await.ts
@@ -31,6 +31,9 @@ export const longAwaitDetector: KindScopedDetector<'async'> = {
     const findings: Finding[] = [];
     for (const op of rankLongAwaitOperations(report.topOperations)) {
       if (findings.length >= thresholds.maxFindings) break;
+      // Persistent/multiplexed handles (keep-alive sockets, parsers, intervals)
+      // carry a capture-length aggregate waitMs that is not a single long await.
+      if (op.latencyCause === 'background') continue;
       if (isBackgroundWindowOperation(op, captureDurationMs)) continue;
       if (isShadowedByResumedOperation(op, report.topOperations)) continue;
       if (op.durationMs < thresholds.minDurationMs) break; // sorted desc
@@ -44,6 +47,10 @@ export const longAwaitDetector: KindScopedDetector<'async'> = {
         confidence: op.overallConfidence ?? 'high',
         basis: 'async-stack',
       });
+      // Finding confidence reflects how sure we are this is a real long await
+      // (a direct duration measurement + capture quality). The latency *cause*
+      // is a separate axis surfaced in evidence.extra.causeConfidence — an
+      // `unknown` cause must not drag down a confidently-measured duration.
       const baseConfidence: BaseFinding['confidence'] = op.orphan ? 'medium' : 'high';
       const confidence: BaseFinding['confidence'] = dropped
         ? 'low'
@@ -88,6 +95,13 @@ export const longAwaitDetector: KindScopedDetector<'async'> = {
             cdpAsyncContextConfidence: op.cdpAsyncContextConfidence ?? null,
             cpuAttributedSamples: op.cpuAttributedSamples ?? null,
             cpuAmbiguousSamples: op.cpuAmbiguousSamples ?? null,
+            waitMs: op.waitMs ?? null,
+            scheduleDelayMs: op.scheduleDelayMs ?? null,
+            firstRunAtMs: op.firstRunAtMs ?? null,
+            latencyCause: op.latencyCause ?? null,
+            causeConfidence: op.causeConfidence ?? null,
+            causeEvidence: op.causeEvidence ?? null,
+            attributedFrameOrigin: op.attributedFrameOrigin ?? null,
             ...(userCaller ? { userCaller } : {}),
             ...asyncEvidenceExtra(report, anchor),
           },
@@ -153,6 +167,13 @@ function asyncFrameKey(frame: NonNullable<AsyncTopOperation['initFrame']>): stri
 function preferredLongAwaitFrame(
   op: AsyncTopOperation,
 ): AsyncTopOperation['initFrame'] | undefined {
+  if (
+    op.latencyCause === 'cpu-bound' &&
+    op.executionFrame &&
+    isUserEditableFrame(op.executionFrame)
+  ) {
+    return op.executionFrame;
+  }
   return (
     firstEditableFrame(op.initStack) ??
     (op.kind === 'promise'
@@ -170,6 +191,17 @@ function firstEditableFrame(
 function buildSuggestion(op: AsyncTopOperation, frame: AsyncTopOperation['initFrame']): string {
   const timeoutGuidance =
     'Network and database calls should always carry a deadline (`AbortController`, axios `timeout`, `pg` statement_timeout, etc.).';
+  if (op.latencyCause === 'event-loop-blocked') {
+    return 'The callback was ready but the event loop was blocked. Find and offload the synchronous work (CPU-heavy loops, sync I/O, large JSON.parse/stringify) — see the `event-loop-blocked-async` finding for the blocking frame. Fixing this call site will not help.';
+  }
+  if (op.latencyCause === 'cpu-bound') {
+    return frame
+      ? `\`${frame.file}:${frame.line}\` (\`${frame.function}\`) spends its time running, not waiting (~${(op.runMs ?? 0).toFixed(0)}ms on CPU). Move the heavy computation to a worker thread or chunk it so the event loop can breathe.`
+      : 'This operation spends its time running on CPU, not waiting. Move the heavy computation to a worker thread or chunk it.';
+  }
+  if (op.latencyCause === 'downstream-async') {
+    return `This operation was waiting on its own descendant async work (triggerAsyncId=${op.triggerAsyncId}). Trace the inner operations it awaits and optimize those, not this call site.`;
+  }
   if (!frame) {
     return `Trace the trigger chain (triggerAsyncId=${op.triggerAsyncId}) to find the call site. Add explicit timeouts on network/database calls and verify every promise has a path to resolve or reject.`;
   }
@@ -187,11 +219,32 @@ function buildWhy(
   const head = frame
     ? `\`${frame.function}\` at \`${frame.file}:${frame.line}\` started an async ${op.kind} resource that lived ${op.durationMs.toFixed(0)}ms`
     : `An async ${op.kind} resource (${op.rawType}) was alive for ${op.durationMs.toFixed(0)}ms`;
+  const split =
+    op.waitMs !== undefined
+      ? ` (~${op.waitMs.toFixed(0)}ms waiting, ${(op.runMs ?? 0).toFixed(0)}ms on CPU)`
+      : '';
   const tail = op.orphan ? ' and never resolved before the capture ended' : '';
   const drop = dropped
     ? ' Records were dropped during capture (`recordsDropped > 0`), so this is a lower bound.'
     : '';
-  return `${head}${tail}.${drop} Long-lived async operations are usually slow I/O, network calls without a timeout, or promises waiting on something that already finished.`;
+  return `${head}${split}${tail}.${drop} ${causeNarrative(op)}`;
+}
+
+function causeNarrative(op: AsyncTopOperation): string {
+  switch (op.latencyCause) {
+    case 'event-loop-blocked':
+      return 'Its wait overlaps an event-loop stall — the latency is a blocked loop, not slow I/O. Find the synchronous work hogging the loop.';
+    case 'gc-pause':
+      return 'Its wait overlaps garbage-collection pauses — reduce allocation pressure on the hot path.';
+    case 'downstream-async':
+      return 'It was waiting on its own descendant async work — optimize the inner operation it awaits.';
+    case 'cpu-bound':
+      return 'Most of its lifetime was spent running on CPU, not waiting — move the heavy work off the async path.';
+    case 'io-wait':
+      return 'It was genuinely waiting on external I/O — add a deadline/timeout and verify the remote side.';
+    default:
+      return 'Long-lived async operations are usually slow I/O, network calls without a timeout, or promises waiting on something that already finished.';
+  }
 }
 
 function isUserEditableFrame(frame: AsyncTopOperation['initFrame']): boolean {
diff --git a/packages/detectors/test/async-detectors.test.ts b/packages/detectors/test/async-detectors.test.ts
index 38ed0b9..11bae02 100644
--- a/packages/detectors/test/async-detectors.test.ts
+++ b/packages/detectors/test/async-detectors.test.ts
@@ -10,6 +10,7 @@ import {
 } from '@lanterna-profiler/core';
 import { describe, expect, it } from 'vitest';
 import { deepAsyncChainDetector } from '../src/detectors/deep-async-chain.js';
+import { eventLoopBlockedAsyncDetector } from '../src/detectors/event-loop-blocked-async.js';
 import { hotAsyncContextDetector } from '../src/detectors/hot-async-context.js';
 import { longAwaitDetector } from '../src/detectors/long-await.js';
 import { microtaskFloodDetector } from '../src/detectors/microtask-flood.js';
@@ -738,3 +739,256 @@ describe('async detector edge cases', () => {
     expect(result.findings).toEqual([]);
   });
 });
+
+describe('long-await latency cause', () => {
+  it('classifies a slow op as event-loop-blocked when its wait overlaps a stall', () => {
+    const records: AsyncOperationRecord[] = [];
+    for (let i = 0; i < 5; i++) {
+      records.push(
+        withFrame(makeRecord(10 + i, 1, 'promise', 10), {
+          function: 'warm',
+          file: '/app/src/warm.js',
+          line: 1,
+        }),
+      );
+    }
+    const slow = withFrame(makeRecord(999, 1, 'promise', 600), {
+      function: 'handler',
+      file: '/app/src/handler.js',
+      line: 5,
+    });
+    records.push(slow);
+    const bundle = makeBundle({ records });
+    bundle.captureIntegrity.eventLoopTimed = true;
+    bundle.runtimeSignals = {
+      gcEvents: [],
+      eventLoopSamples: [{ atMs: 500, lagMs: 400 }],
+      eventLoopAvailable: true,
+    };
+    const pipeline = createAnalysisPipeline({
+      kinds: [createAsyncProfileKind()],
+      findingAnalyzers: [createFindingAnalyzerFromKindScopedDetector(longAwaitDetector)],
+    });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    const finding = result.findings.find((f) => f.id === 'long-await:999');
+    expect(finding?.evidence.extra).toMatchObject({
+      latencyCause: 'event-loop-blocked',
+      waitMs: 600,
+    });
+    expect(finding?.suggestion).toContain('event loop');
+  });
+});
+
+describe('event-loop-blocked-async detector', () => {
+  function bundleWithStallAndCpu(): CaptureBundle {
+    const slow = withFrame(makeRecord(999, 0, 'promise', 600), {
+      function: 'handler',
+      file: '/app/src/handler.js',
+      line: 5,
+    });
+    const bundle = makeBundle({ records: [slow] });
+    bundle.captureIntegrity.eventLoopTimed = true;
+    bundle.runtimeSignals = {
+      gcEvents: [],
+      eventLoopSamples: [{ atMs: 400, lagMs: 400 }],
+      eventLoopAvailable: true,
+    };
+    bundle.kinds.cpu = {
+      cpuProfile: {
+        nodes: [
+          {
+            id: 1,
+            callFrame: {
+              functionName: '(root)',
+              scriptId: '0',
+              url: '',
+              lineNumber: -1,
+              columnNumber: -1,
+            },
+            hitCount: 0,
+            children: [2],
+          },
+          {
+            id: 2,
+            callFrame: {
+              functionName: 'blockingFn',
+              scriptId: '1',
+              url: 'file:///app/src/block.js',
+              lineNumber: 10,
+              columnNumber: 0,
+            },
+            hitCount: 400,
+            children: [],
+          },
+        ],
+        startTime: 0,
+        endTime: 400_000,
+        samples: Array.from({ length: 400 }, () => 2),
+        timeDeltas: Array.from({ length: 400 }, () => 1000),
+      },
+      deopts: [],
+      samplesTimed: true,
+    };
+    return bundle;
+  }
+
+  it('emits a correlated-window finding tying the slow op to the blocked loop', () => {
+    const pipeline = createAnalysisPipeline({
+      kinds: [
+        createCpuProfileKind({ readStderrSoFar: () => '', sampleIntervalMicros: 1000 }),
+        createAsyncProfileKind(),
+      ],
+      findingAnalyzers: [
+        createFindingAnalyzerFromKindScopedDetector(eventLoopBlockedAsyncDetector),
+      ],
+    });
+    const result = pipeline.run(bundleWithStallAndCpu(), {
+      command: ['node', 'app.js'],
+      mode: 'spawn',
+    });
+    const finding = result.findings.find((f) => f.id === 'event-loop-blocked-async:999');
+    expect(finding).toBeDefined();
+    expect(finding?.proofLevel).toBe('correlated-window');
+    expect(finding?.category).toBe('event-loop-blocked-async');
+    expect(finding?.evidence.extra).toMatchObject({ waitMs: 600 });
+  });
+
+  it('skips silently when the CPU kind is absent', () => {
+    const bundle = bundleWithStallAndCpu();
+    bundle.kinds.cpu = undefined;
+    const pipeline = createAnalysisPipeline({
+      kinds: [createAsyncProfileKind()],
+      findingAnalyzers: [
+        createFindingAnalyzerFromKindScopedDetector(eventLoopBlockedAsyncDetector),
+      ],
+    });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    expect(result.findings).toEqual([]);
+  });
+
+  it('stays silent when no CPU hotspot identifies the blocking frame', () => {
+    const bundle = bundleWithStallAndCpu();
+    // All CPU time is in (root): the correlation finds no user culprit frame, so
+    // this detector has nothing actionable to point at and must not emit a
+    // placeholder `(event-loop)` finding (the generic stall finding covers it).
+    bundle.kinds.cpu = {
+      cpuProfile: {
+        nodes: [
+          {
+            id: 1,
+            callFrame: {
+              functionName: '(root)',
+              scriptId: '0',
+              url: '',
+              lineNumber: -1,
+              columnNumber: -1,
+            },
+            hitCount: 400,
+            children: [],
+          },
+        ],
+        startTime: 0,
+        endTime: 400_000,
+        samples: Array.from({ length: 400 }, () => 1),
+        timeDeltas: Array.from({ length: 400 }, () => 1000),
+      },
+      deopts: [],
+      samplesTimed: true,
+    };
+    const pipeline = createAnalysisPipeline({
+      kinds: [
+        createCpuProfileKind({ readStderrSoFar: () => '', sampleIntervalMicros: 1000 }),
+        createAsyncProfileKind(),
+      ],
+      findingAnalyzers: [
+        createFindingAnalyzerFromKindScopedDetector(eventLoopBlockedAsyncDetector),
+      ],
+    });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    expect(result.findings.some((f) => f.category === 'event-loop-blocked-async')).toBe(false);
+  });
+
+  it('attributes each blocked op to its own stall frame, not one global frame', () => {
+    // Two distinct blockers: blockA stalls the loop during [0,300], blockB during
+    // [500,800]. opA becomes runnable at 300 (blocked by A), opB at 800 (by B).
+    const opA = { ...makeRecord(901, 0, 'timer', 300, 0), firstRunAtMs: 300 };
+    const opB = { ...makeRecord(902, 0, 'timer', 300, 500), firstRunAtMs: 800 };
+    const bundle = makeBundle({ records: [opA, opB], durationMs: 1000 });
+    bundle.captureIntegrity.eventLoopTimed = true;
+    bundle.runtimeSignals = {
+      gcEvents: [],
+      eventLoopSamples: [
+        { atMs: 300, lagMs: 300 },
+        { atMs: 800, lagMs: 300 },
+      ],
+      eventLoopAvailable: true,
+    };
+    bundle.kinds.cpu = {
+      cpuProfile: {
+        nodes: [
+          {
+            id: 1,
+            callFrame: {
+              functionName: '(root)',
+              scriptId: '0',
+              url: '',
+              lineNumber: -1,
+              columnNumber: -1,
+            },
+            hitCount: 0,
+            children: [2, 3],
+          },
+          {
+            id: 2,
+            callFrame: {
+              functionName: 'blockA',
+              scriptId: '1',
+              url: 'file:///app/blockA.js',
+              lineNumber: 1,
+              columnNumber: 0,
+            },
+            hitCount: 300,
+            children: [],
+          },
+          {
+            id: 3,
+            callFrame: {
+              functionName: 'blockB',
+              scriptId: '2',
+              url: 'file:///app/blockB.js',
+              lineNumber: 1,
+              columnNumber: 0,
+            },
+            hitCount: 300,
+            children: [],
+          },
+        ],
+        startTime: 0,
+        endTime: 1_000_000,
+        samples: [
+          ...Array(300).fill(2), // blockA during [0,300]
+          ...Array(200).fill(1), // gap
+          ...Array(300).fill(3), // blockB during [500,800]
+          ...Array(200).fill(1), // tail
+        ],
+        timeDeltas: Array.from({ length: 1000 }, () => 1000),
+      },
+      deopts: [],
+      samplesTimed: true,
+    };
+    const pipeline = createAnalysisPipeline({
+      kinds: [
+        createCpuProfileKind({ readStderrSoFar: () => '', sampleIntervalMicros: 1000 }),
+        createAsyncProfileKind(),
+      ],
+      findingAnalyzers: [
+        createFindingAnalyzerFromKindScopedDetector(eventLoopBlockedAsyncDetector),
+      ],
+    });
+    const result = pipeline.run(bundle, { command: ['node', 'app.js'], mode: 'spawn' });
+    const a = result.findings.find((f) => f.id === 'event-loop-blocked-async:901');
+    const b = result.findings.find((f) => f.id === 'event-loop-blocked-async:902');
+    expect(a?.evidence.function).toBe('blockA');
+    expect(b?.evidence.function).toBe('blockB');
+  });
+});
diff --git a/packages/detectors/test/findings.test.ts b/packages/detectors/test/findings.test.ts
index 9afc03f..e8f3342 100644
--- a/packages/detectors/test/findings.test.ts
+++ b/packages/detectors/test/findings.test.ts
@@ -755,11 +755,17 @@ describe('findings – event-loop-stall', () => {
     findFindingOrFail(report, (f) => f.id === 'event-loop-stall', 'event-loop-stall finding');
   });
 
-  it('derives real stall intervals from timed heartbeats', () => {
-    assert.deepEqual(getCpuProfile(report).eventLoop.stallIntervals, [
-      { startMs: 20, endMs: 320, maxLagMs: 300 },
-      { startMs: 340, endMs: 1000, maxLagMs: 660 },
-    ]);
+  it('derives real stall intervals (with per-stall culprit frame) from timed heartbeats', () => {
+    const intervals = getCpuProfile(report).eventLoop.stallIntervals;
+    assert.deepEqual(
+      intervals.map(({ startMs, endMs, maxLagMs }) => ({ startMs, endMs, maxLagMs })),
+      [
+        { startMs: 20, endMs: 320, maxLagMs: 300 },
+        { startMs: 340, endMs: 1000, maxLagMs: 660 },
+      ],
+    );
+    // Each stall carries the user frame that dominated CPU during it (the culprit).
+    assert.match(String(intervals[0]?.topFrame?.function), /hashPassword/);
   });
 
   it('includes correlated hotspot candidates in event-loop evidence', () => {
diff --git a/skills/lanterna-profiler/SKILL.md b/skills/lanterna-profiler/SKILL.md
index 750d862..2cb6fe2 100644
--- a/skills/lanterna-profiler/SKILL.md
+++ b/skills/lanterna-profiler/SKILL.md
@@ -18,7 +18,7 @@ Lanterna produces agent-facing Node.js profiling reports. Your job is not to sum
 ## Diagnostic Workflow
 
 0. **No report yet? Capture first.** If the user gave only a target (command, PID, inspector URL), drive a capture before diagnosing — see [Lanterna Capture & Rerun Commands](#lanterna-capture--rerun-commands).
-   - **Pick kinds from the symptom, do not ask blindly.** Start with `cpu`. Add `memory` for leaks, OOM, sustained RSS/heap growth, off-heap Buffer pressure, or allocation-heavy hot paths. Add `async` for long awaits, deep chains, orphan resources, or concurrency shape. Add `--heap-snapshot-analysis` only for retainer/leak retention. Confirm with the user only when the symptom is genuinely ambiguous across CPU/memory/async.
+   - **Pick kinds from the symptom, do not ask blindly.** Start with `cpu`. Add `memory` for leaks, OOM, sustained RSS/heap growth, off-heap Buffer pressure, or allocation-heavy hot paths. Add `async` for long awaits, deep chains, orphan resources, or concurrency shape; pair it with `cpu` (`--kind cpu,async`) to attribute an await's latency to a cause (blocked event loop vs slow I/O vs downstream CPU). Add `--heap-snapshot-analysis` only for retainer/leak retention. Confirm with the user only when the symptom is genuinely ambiguous across CPU/memory/async.
    - Ask only for inputs you cannot infer: the target (`run -- <cmd>` / `attach --pid` / `attach --inspect-url`) is always needed; a duration for long-running targets (finite scripts/jobs can run to exit, but a duration still helps for a fixed comparison window).
    - **Workload gate.** Before capturing a long-running target, ask whether representative load is already active, whether the user will launch it, or whether the agent should launch/propose it. For HTTP servers with no workload and no readiness URL/port, pause before capture and offer concrete choices: user-driven traffic, `curl`/`autocannon`/`artillery`, or an existing project load script; also ask for `--wait-for-url` or the port/health route to build it. For `run`, include `--workload` only after the user accepts an agent-launched command. For `attach`, explain that `attach` has no `--workload`, so traffic must run externally during capture. For batch jobs, queue consumers, workers, and cron tasks, ask whether representative jobs/backlog are already active; if not, propose enqueueing jobs, replaying fixtures/payloads, running a producer/scheduler, or using an existing job script.
    - **Attaching without a known PID? List, don't guess.** When the user wants to attach to a running process but gave no `--pid` or `--inspect-url`, enumerate candidates with `$LANTERNA ps --format json` — a JSON array of live direct `node`/`nodejs` runtimes shaped as `{ pid, runtime, attachMode, command, cwd, ageMs, cpu, memory }`. Present the candidates back readably (pid · command · cwd · attach mode · CPU) and ask which one to attach to; do not pick silently, even when one stands out. Do not assume the list separates applications from tooling: commands launched by `node` can appear. Prefer `cdp-ready` targets (an inspector is already open) over `pid-attach` ones (reached best-effort via `SIGUSR1`). If the array is empty there is nothing to attach to — fall back to `run -- <cmd>` or ask for an `--inspect-url`. Then capture with `attach --pid <chosen>`.
diff --git a/skills/lanterna-profiler/references/async-profiling.md b/skills/lanterna-profiler/references/async-profiling.md
index 23de241..77ac7fd 100644
--- a/skills/lanterna-profiler/references/async-profiling.md
+++ b/skills/lanterna-profiler/references/async-profiling.md
@@ -39,13 +39,13 @@ Attach reports can still be useful for new resources created during the capture
 
 These are targeted JSON lookup paths. For analysis, read the agent report first and use its frontmatter, `## Findings` table, `## Finding N` blocks, the `decision` column, `Kind Review`, and `Files To Read First` sections as the contract.
 
-- `profiles.async.summary`: availability, counts, top kinds, `collectedVia`, dropped record count, and optional `topAsyncHotFile`.
+- `profiles.async.summary`: availability, counts, top kinds, `collectedVia`, dropped record count, optional `topAsyncHotFile`, and `byKindLatency` (per-family `p50/p95/p99/max` of lifetime + `meanWaitMs`, completed ops only).
 - `profiles.async.quality`: confidence, reasons, recommendations, dropped records, `attachPartialCapture`, CDP stack coverage.
-- `profiles.async.topOperations[]`: ranked resource lifecycle summaries with duration, run counts, frames, and optional await/CPU attribution.
+- `profiles.async.topOperations[]`: ranked **completed** resource lifecycle summaries — `durationMs`/`runMs`/`waitMs`/`scheduleDelayMs`, frames, optional await/CPU attribution, and a classified `latencyCause` (+ `causeConfidence`/`causeEvidence`/`attributedFrameOrigin`). Orphans are excluded (see `orphans[]`).
 - `profiles.async.hotFiles[]`: ranked user files responsible for async work.
 - `profiles.async.chains[]`: parent/trigger chains built from async resource relationships.
 - `profiles.async.concurrencyTimeline[]`: active/inflight samples over time.
-- `profiles.async.orphans[]`: resources still unresolved/undestroyed at read time.
+- `profiles.async.orphans[]`: resources still unresolved/undestroyed at capture end — excluded from `topOperations`/`byKindLatency` because their capture-clamped duration is fictional.
 - `profiles.async.cdpAsyncContexts[]`: supplemental CDP async stacks.
 - `profiles.async.cpuAttribution.topChains[]`: CPU attributed to async run windows when CPU was captured.
 - `meta.kinds.async.*`: capture knobs such as instrumentation mode and stack depth.
@@ -73,13 +73,28 @@ Interpretation rules:
 - CPU attribution through run windows is approximate; prefer wording like "overlapped" unless CPU and async evidence both point to the same source.
 - CDP async stacks are supplemental. Low coverage does not invalidate lifecycle records, but it weakens source attribution.
 
+## Latency Decomposition And Cause
+
+With `--kind cpu,async`, each `topOperations[]` entry splits its lifetime into `runMs` (on CPU), `waitMs` (alive but not on CPU — the real latency), and `scheduleDelayMs` (init → first run), and carries a classified `latencyCause` with `causeConfidence` + `causeEvidence`. `latencyCause` is a windowed correlation (wait vs event-loop stalls, GC pauses, descendant CPU), not proof. Read each value with its real reliability:
+
+- `event-loop-blocked` — high-value and reliable. The callback was ready but the loop was still stalled when it became runnable. Pair with the `event-loop-blocked-async` finding, which names the synchronous CPU frame to fix — **not** the await site. Trust it over the (innocent) async call site.
+- `cpu-bound` — reliable. A single `await` fragments into several promise resources; only the work-carrying one is `cpu-bound` (`runMs ≈ durationMs`), while its sibling/parent promises commonly read `unknown`. Anchor on the resource that carries `runMs`/`waitMs`, not the count of `unknown` siblings.
+- `io-wait` — a *residual* for I/O-kind resources with no other explanation, not positive proof of remote slowness. Corroborate with the dependency/remote before concluding.
+- `gc-pause` — **rare by design**: GC pauses are matched against their real (sub-millisecond) durations, so they seldom dominate ≥50% of a wait. Its absence is not evidence GC is cheap — confirm GC pressure with `--kind memory`.
+- `downstream-async` — only when a *trigger-descendant runs on CPU* during the wait. Work you `await` that is itself waiting (a timer/socket resolved on a sibling resource) is **not** caught and shows as `unknown`; do not read its absence as "nothing downstream".
+- `background` — a long-lived handle that is not a latency bug: idle (never ran) or a persistent/multiplexed handle (keep-alive socket, HTTP parser, pool, interval) that activated many times across most of the capture. Its `waitMs` aggregates idle gaps between activations, so these are deliberately excluded from `event-loop-blocked-async` and `long-await` — a keep-alive connection spanning the capture is not a single multi-second blocked callback.
+- `unknown` — `causeEvidence.basis: "no-eventloop-signal"` means the heartbeat was unavailable (capture in run mode to classify); `"none"` means a real wait that matched no signal. Treat as "not enough signal", not "no problem".
+
+`summary.byKindLatency` gives per-family `p50/p95/p99/max` of lifetime + `meanWaitMs` over completed ops only. The `event-loop-blocked-async` finding anchors on the user frame that dominated CPU during the *specific* stall that delayed each op (matched by `firstRunAtMs`; per-stall culprits are in `profiles.cpu.eventLoop.stallIntervals[].topFrame`), so distinct blockers each point at their own frame; it falls back to the globally-dominant hotspot only when an op matches no stall, and stands down when no hotspot correlates.
+
 ## Findings
 
 Async findings usually include:
 
 - `deep-async-chain`: async parent chains exceed depth thresholds.
-- `long-await`: async operations or await frames are materially long.
+- `long-await`: async operations or await frames are materially long; carries the latency decomposition and cause-specific guidance.
 - `orphan-async-resource`: resources remain unresolved/undestroyed at read time.
+- `event-loop-blocked-async`: an async operation's wait overlaps an event-loop stall — latency is a blocked loop, not slow I/O — anchored on the synchronous CPU frame to fix (needs `--kind cpu,async`).
 - `hot-async-context` / async CPU attribution findings when combined with CPU data.
 - `microtask-flood` when microtask or TickObject volume dominates and microtasks were included.