diff --git a/CHANGELOG.md b/CHANGELOG.md index 4b87ed5..a3722e6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,28 @@ # Changelog +## [0.11.3] - 2026-05-27 + +### Fixed + +- **Cursor scanner: walk both legacy `.txt` and current Composer 2+ `.jsonl` layouts** ([#45](https://github.com/subinium/agf/pull/45), by @rooty0 / Stan) — current Cursor stores transcripts at `~/.cursor/projects/*/agent-transcripts//.jsonl` (depth 4) rather than the legacy `~/.cursor/projects/*/agent-transcripts/.txt` (depth 3). The scanner walked depth 3 with a `.txt`-only filter, so on current Cursor it returned **zero sessions**. Verified against live data: `~/.cursor/projects` had 2 JSONL transcripts at depth 4 and 0 TXT, and `agf list --agent cursor-agent` returned `No sessions found.` before this release. Closes [#35](https://github.com/subinium/agf/issues/35). +- **Cursor scanner: read chat metadata from the right table** ([#45](https://github.com/subinium/agf/pull/45), by @rooty0) — the previous code queried `SELECT value FROM cursorDiskKV WHERE key = 'composerData'`, which is the **IDE's** `state.vscdb` schema, not the CLI's `store.db`. Cursor CLI's `store.db` actually exposes a `meta(key TEXT PRIMARY KEY, value TEXT)` table with a single `key = '0'` row whose value is a hex-encoded JSON containing `agentId`, `name`, `createdAt`, `mode`, and `lastUsedModel`. Verified via `sqlite3` against a real store.db on disk. +- **Cursor scanner: skip JSONL transcripts whose `store.db` is missing** ([#45](https://github.com/subinium/agf/pull/45), by @rooty0) — `cursor-agent --resume` only surfaces sessions that have BOTH a transcript and a `~/.cursor/chats///store.db` entry; reporting orphaned transcripts that the CLI itself refuses to resume just confuses the listing. Legacy `.txt` sessions are unaffected (they predate the `chats/` directory). +- **Cursor scanner: fall back to the first user prompt when `store.db` has no usable metadata** ([#45](https://github.com/subinium/agf/pull/45), by @rooty0) — the JSONL is parsed for the first `role: user` text part, with `` system injections skipped and `` wrappers stripped. +- **`extract_first_prompt` no longer panics on inverted `` tags** — `str::find` returns the FIRST occurrence of each substring independently, so a text part where `` byte-precedes `` (e.g. a pasted log or AI-generated code sample) gave `start > end` and `text[s+12..e]` panicked with `begin > end`. Confirmed via a standalone rustc reproducer. The closing tag is now searched **after** the opening one. Regression test: `extract_first_prompt_does_not_panic_on_inverted_tags`. +- **`extract_first_prompt` no longer aborts on the first malformed or non-UTF-8 line** — both the per-line IO read and `serde_json::from_str` used `.ok()?`, which propagates `None` out of the whole function on the first error instead of skipping the bad line. A single corrupted/truncated/non-UTF-8 line at the top of the JSONL silently disabled the blank-summary fallback for the rest of the file. Replaced with `let Ok(...) else { continue; };` (matching `scanner/pi.rs`). Regression tests: `extract_first_prompt_skips_malformed_json_lines` and `extract_first_prompt_skips_invalid_utf8_lines`. +- **`extract_first_prompt` now bounded by a 512 KiB byte budget** — pi.rs added this safeguard in v0.11.2 after large Claude logs stalled the TUI; Cursor transcripts can carry multi-MB tool-result blobs, and the `CACHE_VERSION` bump in this release forces a cold rescan for every upgrader, so the same precaution applies. +- **Cursor delete: legacy `.txt` transcripts now actually get removed** — `delete_cursor_agent_session` called `remove_dirs_matching_name(&projects_dir, &session.session_id)`, but that helper filters on `path.is_dir()` AND `file_name == name`. Legacy sessions live at `agent-transcripts/.txt` (a file named `.txt`), so it never matched. Delete returned `Ok(())`, the orphan file persisted on disk, and the next scan resurrected it. A new sibling helper `remove_files_matching_name` removes the file form alongside the directory form. Regression test: `delete_cursor_agent_removes_legacy_txt_transcript`. +- **Cursor scanner: enforce stem == parent UUID invariant on the `.jsonl` arm** — the previous check only required the grandparent to be named `agent-transcripts`. A stray `agent-transcripts//.jsonl` would produce `session_id = uuidB`, which mismatches both the store.db lookup and what `cursor-agent --resume` expects. Real Cursor always writes them equal, but the invariant is now explicit. Regression test: `scan_from_rejects_jsonl_with_stem_mismatched_to_parent`. +- **`decode_dash_path` test coverage for hyphenated project segments** — added `decode_dash_path_resolves_hyphenated_segments` which places `agent`, `agent-tui`, and `agent-tui-finder` as sibling directories and asserts the backtracking decoder resolves to the longest existing match. This is the load-bearing case for this very repo's path. + +### Changed + +- **`CACHE_VERSION` bumped to 6** — the new orphan-skip rule fires only on fresh scans; cached `0.11.x` cursor entries persist with their old summaries until each transcript's mtime changes. Bumping the version forces a one-time rescan on upgrade so the "35 orphans → 0" effect actually lands for upgraders. + +### Docs + +- **README: Cursor CLI doc link + transcript paths updated** ([#45](https://github.com/subinium/agf/pull/45), by @rooty0) — `docs.cursor.com/agent` no longer resolves; switched to `cursor.com/docs/cli/overview`. Storage column now lists both the current JSONL layout and the legacy `.txt` form. + ## [0.11.2] - 2026-05-23 ### Fixed diff --git a/Cargo.lock b/Cargo.lock index 22cd7a2..41feb79 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -4,7 +4,7 @@ version = 4 [[package]] name = "agf" -version = "0.11.2" +version = "0.11.3" dependencies = [ "anyhow", "chrono", diff --git a/Cargo.toml b/Cargo.toml index 0c9d7c2..bec218d 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "agf" -version = "0.11.2" +version = "0.11.3" edition = "2021" description = "Find and resume local AI coding-agent sessions across Claude Code, Codex, Gemini, Cursor CLI, OpenCode, Kiro, pi, and Hermes" license = "MIT" diff --git a/src/cache.rs b/src/cache.rs index b382d6b..761f238 100644 --- a/src/cache.rs +++ b/src/cache.rs @@ -7,6 +7,15 @@ use serde::{Deserialize, Serialize}; use crate::model::{Agent, Session}; use crate::plugin; +// Bumped to 6 in v0.11.3: +// - Cursor scanner now (a) walks both depth-3 .txt and depth-4 .jsonl layouts, +// (b) reads chat metadata from the `meta` table of store.db, and (c) drops +// .jsonl transcripts that have no matching store.db (orphans `cursor-agent` +// itself refuses to resume). Cache entries written by 0.11.x would surface +// the old orphan-laden list until each transcript's mtime happened to +// change. Bumping the version forces a one-time rescan on upgrade so the +// PR description's "35 orphans -> 0" claim actually holds for upgraders. +// // Bumped to 5 after v0.11.1: // - Pi scanner now keeps all user-message summaries instead of only the first // one, matching the History preview behavior of other agents. @@ -24,7 +33,7 @@ use crate::plugin; // entries written by 0.10.x would surface as stale "cli session (...)" // summaries until the source DB mtime happens to change. // Bumping the version forces a one-time rescan on first 0.11.0 launch. -const CACHE_VERSION: u32 = 5; +const CACHE_VERSION: u32 = 6; #[derive(Serialize, Deserialize)] struct CacheFile { diff --git a/src/delete.rs b/src/delete.rs index e92ed6e..e012dbc 100644 --- a/src/delete.rs +++ b/src/delete.rs @@ -107,6 +107,21 @@ fn remove_dirs_matching_name(base: &Path, name: &str) -> Result<(), io::Error> { Ok(()) } +/// Walk a directory tree and remove any regular file whose name (including +/// extension) matches the target. +fn remove_files_matching_name(base: &Path, name: &str) -> Result<(), io::Error> { + if !base.is_dir() { + return Ok(()); + } + for entry in WalkDir::new(base).into_iter().filter_map(|e| e.ok()) { + let path = entry.path(); + if path.is_file() && path.file_name().and_then(|n| n.to_str()) == Some(name) { + fs::remove_file(path)?; + } + } + Ok(()) +} + // --------------------------------------------------------------------------- // Codex // --------------------------------------------------------------------------- @@ -320,22 +335,33 @@ fn delete_kiro_session(session: &Session) -> Result<(), io::Error> { // Cursor Agent // --------------------------------------------------------------------------- -/// Cursor Agent sessions are stored in two locations: -/// 1. `~/.cursor/chats///store.db` (SQLite) -/// 2. `~/.cursor/projects/*/agent-transcripts//` (transcript directory) +/// Cursor Agent sessions are stored across two layouts: +/// - Current (Composer 2+) JSONL: directory at +/// `~/.cursor/projects/*/agent-transcripts//` containing +/// `.jsonl`, plus chat metadata under +/// `~/.cursor/chats///store.db`. +/// - Legacy: file at `~/.cursor/projects/*/agent-transcripts/.txt` +/// with no `chats/` counterpart. +/// +/// Both shapes are still surfaced by the scanner (see `scan_from`), so delete +/// must remove the directory AND the file form — otherwise legacy sessions +/// silently no-op (`remove_dirs_matching_name` filters on `is_dir()`) and the +/// next scan resurrects the orphan. fn delete_cursor_agent_session(session: &Session) -> Result<(), io::Error> { let cursor_dir = config::cursor_dir().map_err(io::Error::other)?; - // 1. Remove chat directory: ~/.cursor/chats/*// + // 1. Chat metadata: ~/.cursor/chats/*// let chats_dir = cursor_dir.join("chats"); if chats_dir.exists() { remove_dirs_matching_name(&chats_dir, &session.session_id)?; } - // 2. Remove transcript directory: ~/.cursor/projects/*/agent-transcripts// + // 2. Transcript: directory form (JSONL) and file form (legacy .txt). let projects_dir = cursor_dir.join("projects"); if projects_dir.exists() { remove_dirs_matching_name(&projects_dir, &session.session_id)?; + let legacy_txt = format!("{}.txt", session.session_id); + remove_files_matching_name(&projects_dir, &legacy_txt)?; } Ok(()) @@ -571,4 +597,36 @@ mod tests { assert!(!target_dir.exists(), "target session dir should be deleted"); assert!(sibling_dir.exists(), "sibling session dir must survive"); } + + /// Regression: legacy Cursor sessions live as plain files at + /// `projects//agent-transcripts/.txt`, not as directories. + /// `remove_dirs_matching_name` filters on `is_dir()`, so the dir-only + /// pass added in #45 left these files behind and the next scan + /// resurrected the orphan. Delete must remove both shapes. + #[test] + fn delete_cursor_agent_removes_legacy_txt_transcript() { + let base = make_codex_dir("agf-test-cursor-legacy-txt"); + let transcripts = base.join("projects/encproj/agent-transcripts"); + fs::create_dir_all(&transcripts).unwrap(); + + let target_uuid = "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeee1"; + let sibling_uuid = "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeee2"; + + let target_file = transcripts.join(format!("{target_uuid}.txt")); + let sibling_file = transcripts.join(format!("{sibling_uuid}.txt")); + fs::write(&target_file, b"legacy transcript").unwrap(); + fs::write(&sibling_file, b"legacy transcript").unwrap(); + + let projects_dir = base.join("projects"); + remove_files_matching_name(&projects_dir, &format!("{target_uuid}.txt")).unwrap(); + + assert!( + !target_file.exists(), + "legacy .txt transcript should be deleted", + ); + assert!( + sibling_file.exists(), + "unrelated sibling legacy .txt must survive", + ); + } } diff --git a/src/scanner/cursor_agent.rs b/src/scanner/cursor_agent.rs index 3ae16ea..c803ae5 100644 --- a/src/scanner/cursor_agent.rs +++ b/src/scanner/cursor_agent.rs @@ -8,6 +8,21 @@ use crate::model::{Agent, Session}; use super::truncate; +/// Max chars stored per session summary. +const SUMMARY_MAX_CHARS: usize = 100; + +/// Max bytes read from a single JSONL transcript when extracting the first +/// prompt. Cursor transcripts can include multi-MB tool-result blobs, and the +/// CACHE_VERSION bump in this release forces a cold rescan for every upgrader, +/// so an unbounded read repeats the v0.10.1 heavy-log stall pattern that +/// `read_head_tail` was added to prevent for Claude. +const MAX_PARSE_BYTES: usize = 512 * 1024; + +/// Max lines scanned when extracting the first prompt. The first user message +/// almost always lands within the first few lines; the cap bounds work on +/// pathological transcripts that pad with non-`role=user` rows. +const MAX_PARSE_LINES: usize = 50; + pub fn scan() -> Result, AgfError> { scan_from(&crate::config::cursor_dir()?) } @@ -40,39 +55,53 @@ fn scan_from(cursor_dir: &Path) -> Result, AgfError> { let ext = path.extension().and_then(|e| e.to_str()); // Resolve (agent_transcripts_dir, session_id) for each format. - let (agent_transcripts_dir, session_id) = match ext { - Some("txt") => { - // Legacy: parent must be "agent-transcripts" - let parent = match path.parent().filter(|p| { - p.file_name().and_then(|n| n.to_str()) == Some("agent-transcripts") - }) { - Some(p) => p, - None => continue, - }; - let id = match path.file_stem().and_then(|n| n.to_str()) { - Some(id) => id.to_string(), - None => continue, - }; - (parent, id) - } - Some("jsonl") => { - // Current: grandparent must be "agent-transcripts" - let grandparent = match path - .parent() - .and_then(|p| p.parent()) - .filter(|p| p.file_name().and_then(|n| n.to_str()) == Some("agent-transcripts")) - { - Some(p) => p, - None => continue, - }; - let id = match path.file_stem().and_then(|n| n.to_str()) { - Some(id) => id.to_string(), - None => continue, - }; - (grandparent, id) - } - _ => continue, - }; + let (agent_transcripts_dir, session_id) = + match ext { + Some("txt") => { + // Legacy: parent must be "agent-transcripts" + let parent = match path.parent().filter(|p| { + p.file_name().and_then(|n| n.to_str()) == Some("agent-transcripts") + }) { + Some(p) => p, + None => continue, + }; + let id = match path.file_stem().and_then(|n| n.to_str()) { + Some(id) => id.to_string(), + None => continue, + }; + (parent, id) + } + Some("jsonl") => { + // Current layout: agent-transcripts//.jsonl + // The parent dir name must equal the file stem (both are the + // same session UUID). Without this invariant a stray jsonl + // would produce a session_id that mismatches both the + // store.db lookup key and what `cursor-agent --resume` expects. + let parent = match path.parent() { + Some(p) => p, + None => continue, + }; + let parent_name = match parent.file_name().and_then(|n| n.to_str()) { + Some(n) => n, + None => continue, + }; + let id = match path.file_stem().and_then(|n| n.to_str()) { + Some(id) => id, + None => continue, + }; + if parent_name != id { + continue; + } + let grandparent = match parent.parent().filter(|p| { + p.file_name().and_then(|n| n.to_str()) == Some("agent-transcripts") + }) { + Some(p) => p, + None => continue, + }; + (grandparent, id.to_string()) + } + _ => continue, + }; // Parent of agent-transcripts is the dash-encoded project path let encoded_dir = match agent_transcripts_dir @@ -197,7 +226,7 @@ fn read_store_db(store_path: &Path) -> Option { let name = parsed .get("name") .and_then(|v| v.as_str()) - .map(|s| truncate(s, 100)); + .map(|s| truncate(s, SUMMARY_MAX_CHARS)); // createdAt is in milliseconds let created_at = parsed @@ -231,20 +260,39 @@ fn extract_first_prompt(jsonl_path: &Path) -> Option { let file = File::open(jsonl_path).ok()?; let reader = BufReader::new(file); - for line in reader.lines().take(50) { - let line = line.ok()?; + let mut bytes_read = 0usize; + for (lines_seen, line_result) in reader.lines().enumerate() { + if lines_seen >= MAX_PARSE_LINES || bytes_read >= MAX_PARSE_BYTES { + break; + } + + // A single bad line (invalid UTF-8, transient IO error, malformed + // JSON, partial flush from a crashed session) must skip just that + // line — never disable extraction for the rest of the file. The + // previous `.ok()?` pattern silently defeated this fallback for any + // transcript whose first 50 lines had even one unparseable row. + let Ok(line) = line_result else { + continue; + }; + bytes_read += line.len() + 1; // +1 approximates the stripped newline let line = line.trim(); if line.is_empty() { continue; } - let value: serde_json::Value = serde_json::from_str(line).ok()?; + let Ok(value) = serde_json::from_str::(line) else { + continue; + }; if value.get("role").and_then(|v| v.as_str()) != Some("user") { continue; } - let parts = value + let parts = match value .get("message") .and_then(|m| m.get("content")) - .and_then(|c| c.as_array())?; + .and_then(|c| c.as_array()) + { + Some(p) => p, + None => continue, + }; for part in parts { if part.get("type").and_then(|t| t.as_str()) != Some("text") { @@ -257,16 +305,25 @@ fn extract_first_prompt(jsonl_path: &Path) -> Option { if text.trim_start().starts_with("") { continue; } - // Strip wrapper if present - let prompt = if let (Some(s), Some(e)) = - (text.find(""), text.find("")) - { - text[s + "".len()..e].trim() - } else { - text.trim() + // Strip the `...` wrapper if present. + // `str::find` returns the FIRST occurrence of each substring + // independently, so we must search for the closing tag AFTER + // the opening one — otherwise a text like + // `"foo..."` (which can + // occur in a pasted log or AI-generated code sample) gives + // s > e and `text[s+12..e]` panics with `begin > end`. + let prompt = match text.find("") { + Some(s) => { + let after = s + "".len(); + match text[after..].find("") { + Some(rel) => text[after..after + rel].trim(), + None => text.trim(), + } + } + None => text.trim(), }; if !prompt.is_empty() { - return Some(truncate(prompt, 100)); + return Some(truncate(prompt, SUMMARY_MAX_CHARS)); } } } @@ -320,6 +377,14 @@ mod tests { /// /// Labels must be alphanumeric only (no dashes) to avoid ambiguity in /// the backtracking decoder. + /// + /// Gated to Unix: the slug encoding (`canonical.to_str()` → + /// `trim_start_matches('/')` → `replace('/', "-")`) assumes POSIX paths. + /// On Windows, `canonicalize()` returns `\\?\C:\…` with backslashes that + /// the dash decoder can't round-trip — every fixture-based test would + /// fail. The scanner code itself is platform-agnostic; only this test + /// fixture is Unix-shaped. + #[cfg(unix)] fn make_fixture(label: &str) -> (PathBuf, PathBuf, String) { let pid = std::process::id(); let cursor_dir = std::env::temp_dir().join(format!("agfcursor{pid}{label}")); @@ -342,6 +407,7 @@ mod tests { /// Write an empty `.jsonl` at the correct depth: /// `/projects//agent-transcripts//.jsonl` + #[cfg(unix)] fn place_session(cursor_dir: &Path, slug: &str, uuid: &str) { let session_dir = cursor_dir .join("projects") @@ -355,12 +421,14 @@ mod tests { /// Create a stub store.db at `/chats///store.db`. /// The file content is irrelevant; only its existence is checked. + #[cfg(unix)] fn place_store_db(cursor_dir: &Path, workspace: &str, uuid: &str) { let dir = cursor_dir.join("chats").join(workspace).join(uuid); fs::create_dir_all(&dir).unwrap(); fs::write(dir.join("store.db"), b"").unwrap(); } + #[cfg(unix)] #[test] fn scan_from_finds_jsonl_session_with_store_db() { let (cursor_dir, real_proj, encoded) = make_fixture("finds"); @@ -379,6 +447,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn scan_from_skips_orphan_jsonl_without_store_db() { let (cursor_dir, real_proj, encoded) = make_fixture("orphan"); @@ -394,6 +463,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn scan_from_finds_legacy_txt_session() { let (cursor_dir, real_proj, encoded) = make_fixture("txtigno"); @@ -417,6 +487,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn scan_from_ignores_txt_nested_in_uuid_subdir() { let (cursor_dir, real_proj, encoded) = make_fixture("txtwrong"); @@ -437,6 +508,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn scan_from_ignores_jsonl_directly_in_agent_transcripts() { let (cursor_dir, real_proj, encoded) = make_fixture("depth"); @@ -455,6 +527,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn scan_from_skips_var_folders_encoded_dirs() { let (cursor_dir, real_proj, _) = make_fixture("varfold"); @@ -470,6 +543,7 @@ mod tests { let _ = fs::remove_dir_all(&real_proj); } + #[cfg(unix)] #[test] fn decode_dash_path_resolves_existing_directory() { let pid = std::process::id(); @@ -535,6 +609,7 @@ mod tests { let _ = fs::remove_file(&path); } + #[cfg(unix)] #[test] fn scan_from_falls_back_to_prompt_when_store_db_lacks_meta() { let (cursor_dir, real_proj, encoded) = make_fixture("nodb"); @@ -564,4 +639,142 @@ mod tests { let _ = fs::remove_dir_all(&cursor_dir); let _ = fs::remove_dir_all(&real_proj); } + + /// Write a standalone `.jsonl` to a unique temp path for tests that drive + /// `extract_first_prompt` directly (no `make_fixture` / `scan_from`, so + /// they stay portable — the existing fixture encodes paths Unix-style). + fn temp_jsonl(label: &str, body: &[u8]) -> PathBuf { + let pid = std::process::id(); + let path = std::env::temp_dir().join(format!("agf-cursor-{label}-{pid}.jsonl")); + let _ = fs::remove_file(&path); + fs::write(&path, body).unwrap(); + path + } + + /// Regression: a text part containing `` BEFORE `` + /// (e.g. a pasted log or AI-generated code sample) used to compute + /// `start > end` and panic with "begin > end" when slicing. The fix + /// searches for the closing tag AFTER the opening one. + #[test] + fn extract_first_prompt_does_not_panic_on_inverted_tags() { + // The text has the CLOSING tag first, then a normal opening+closing pair. + let path = temp_jsonl( + "invtag", + br#"{"role":"user","message":{"content":[{"type":"text","text":"noisereal prompt"}]}}"#, + ); + let result = extract_first_prompt(&path); + let _ = fs::remove_file(&path); + assert_eq!(result.as_deref(), Some("real prompt")); + } + + /// Regression: previously, the FIRST malformed JSON line aborted the whole + /// per-line loop via `.ok()?`, so a single garbage line at the top of a + /// JSONL silently disabled the blank-summary fallback for the entire file. + /// Now each bad line is skipped individually. + #[test] + fn extract_first_prompt_skips_malformed_json_lines() { + // Line 1: garbage; Line 2: empty; Line 3: valid user prompt. + let body = b"not-json-at-all\n\n\ + {\"role\":\"user\",\"message\":{\"content\":[{\"type\":\"text\",\"text\":\"recovered\"}]}}\n"; + let path = temp_jsonl("malformed", body); + let result = extract_first_prompt(&path); + let _ = fs::remove_file(&path); + assert_eq!(result.as_deref(), Some("recovered")); + } + + /// Regression: a single line containing invalid UTF-8 bytes used to abort + /// the loop via `let line = line.ok()?;`. It should now be skipped like + /// any other bad line. + #[test] + fn extract_first_prompt_skips_invalid_utf8_lines() { + let mut bytes: Vec = Vec::new(); + // Line 1: a truncated multibyte sequence — `BufReader::lines` returns + // Err(InvalidData) for this row. + bytes.extend_from_slice(&[0xC3, 0x28]); // invalid 2-byte UTF-8 start + bytes.push(b'\n'); + // Line 2: well-formed JSON with a user prompt. + bytes.extend_from_slice( + br#"{"role":"user","message":{"content":[{"type":"text","text":"after garbage"}]}}"#, + ); + let path = temp_jsonl("badutf8", &bytes); + let result = extract_first_prompt(&path); + let _ = fs::remove_file(&path); + assert_eq!(result.as_deref(), Some("after garbage")); + } + + /// Invariant: a jsonl whose file stem mismatches its parent directory name + /// is silently dropped. Real Cursor always writes them equal; the guard + /// prevents a stray jsonl from producing a session_id that mismatches both + /// the store.db lookup key and what `cursor-agent --resume` expects. + /// + /// Gated to Unix because `make_fixture`'s slug encoding (joining canonical + /// path components with `-`) assumes Unix-style paths; Windows canonical + /// paths carry `\\?\C:\…` and backslashes that the dash decoder can't + /// round-trip. + #[cfg(unix)] + #[test] + fn scan_from_rejects_jsonl_with_stem_mismatched_to_parent() { + let (cursor_dir, real_proj, encoded) = make_fixture("stemmismatch"); + let parent_uuid = "aaaaaaaa0000000000000000000000c1"; + let stem_uuid = "bbbbbbbb0000000000000000000000c1"; + + let session_dir = cursor_dir + .join("projects") + .join(&encoded) + .join("agent-transcripts") + .join(parent_uuid); + fs::create_dir_all(&session_dir).unwrap(); + // stem != parent_uuid → must be skipped. + fs::write(session_dir.join(format!("{stem_uuid}.jsonl")), b"{}\n").unwrap(); + // Even with a store.db at the stem id, the scanner shouldn't surface it. + place_store_db(&cursor_dir, "ws", stem_uuid); + + let sessions = scan_from(&cursor_dir).unwrap(); + assert!(sessions.is_empty()); + + let _ = fs::remove_dir_all(&cursor_dir); + let _ = fs::remove_dir_all(&real_proj); + } + + /// Real-world load-bearing case: project paths regularly contain hyphens + /// (this very repo is `agent-tui-finder`). The backtracking decoder must + /// pick the segment that actually exists on disk when multiple + /// hyphen-split prefixes are valid directory candidates. + /// + /// Gated to Unix for the same reason as `make_fixture` — the encoded slug + /// is built from a canonical path with `'/'`-separated components. + #[cfg(unix)] + #[test] + fn decode_dash_path_resolves_hyphenated_segments() { + let pid = std::process::id(); + let base = std::env::temp_dir() + .join(format!("agfdecode{pid}hyphen")) + .canonicalize() + .unwrap_or_else(|_| { + let p = std::env::temp_dir().join(format!("agfdecode{pid}hyphen")); + let _ = fs::remove_dir_all(&p); + fs::create_dir_all(&p).unwrap(); + p.canonicalize().unwrap() + }); + let _ = fs::remove_dir_all(&base); + fs::create_dir_all(&base).unwrap(); + // Decoy siblings the greedy-longest-first decoder must reject. + fs::create_dir_all(base.join("agent")).unwrap(); + fs::create_dir_all(base.join("agent-tui")).unwrap(); + // The real target. + let target = base.join("agent-tui-finder"); + fs::create_dir_all(&target).unwrap(); + + let encoded = base + .to_str() + .unwrap() + .trim_start_matches('/') + .replace('/', "-") + + "-agent-tui-finder"; + + let resolved = decode_dash_path(&encoded).unwrap(); + assert_eq!(resolved, target); + + let _ = fs::remove_dir_all(&base); + } }