Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
description: Analyze saved trajectories and recall audit events offline to record whether recalled guidelines influenced completed sessions.
---
Use the `evolve-lite-provenance` skill on the current conversation. Follow the skill's instructions exactly.
7 changes: 5 additions & 2 deletions platform-integrations/bob/evolve-lite/lib/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,17 @@
import pathlib


def append(project_root=".", **fields):
def append(project_root=".", evolve_dir=None, **fields):
"""Append a JSON audit entry to .evolve/audit.log.

Args:
project_root: Root directory that contains .evolve/.
evolve_dir: Explicit evolve data directory. When set, writes directly
to ``<evolve_dir>/audit.log`` instead of deriving it from
``project_root``.
**fields: Arbitrary key-value fields to include in the log entry.
"""
path = pathlib.Path(project_root) / ".evolve" / "audit.log"
path = pathlib.Path(evolve_dir) / "audit.log" if evolve_dir is not None else pathlib.Path(project_root) / ".evolve" / "audit.log"
path.parent.mkdir(parents=True, exist_ok=True)
entry = {**fields, "ts": datetime.datetime.now(datetime.UTC).isoformat().replace("+00:00", "Z")}
with path.open("a", encoding="utf-8") as f:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,15 @@ Unless that artifact happens to be:

## Workflow

### Step 0: Save and Load the Conversation

First, use the evolve-lite:save-trajectory skill to save the current conversation to `.evolve/trajectories/`. Capture the exact path from its output as `saved_trajectory_path`. You will attach this exact path to each entity's `trajectory` field in Step 6.

After saving, read `saved_trajectory_path` with the Read tool and analyze that saved trajectory rather than relying only on live context. If the trajectory cannot be saved or read, output zero entities and exit. Do not invent a trajectory path.

### Step 1: Analyze the Conversation

Identify from your current conversation:
Identify from the saved trajectory loaded in Step 0:

- **Task/Request**: What was the user asking for?
- **Steps Taken**: What reasoning, actions, and observations occurred?
Expand Down Expand Up @@ -76,6 +82,11 @@ Prefer one of these artifact forms:
- a small script, saved to a stable path in the workspace or plugin, such as `scripts/`, `tools/`, or another obvious helper location.
- a documented local workflow if code is not appropriate

When turning an ad hoc command or script into a reusable artifact, remove
incidental one-off inputs such as literal file names, IDs, answer values, or
temporary paths. Keep the reusable procedure that was actually exercised in the
session, and do not add capabilities that were not validated by the work.

If you create an artifact, record:
- its path
- what it does
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
name: evolve-lite:provenance
description: Analyze saved trajectories and recall audit events offline to record whether recalled guidelines influenced completed sessions.
---

# Provenance Analyzer

## Overview

This skill runs after one or more sessions have completed. It reads saved trajectories from `.evolve/trajectories/`, matches them to `recall` events in `.evolve/audit.log`, and records post-hoc `influence` events for recalled guidelines.

Use this skill when you want to compute usage provenance without coupling the work to the live learn step.

## Workflow

### Step 1: Load Recall Events

Read `.evolve/audit.log` as JSONL. Find entries where `event == "recall"` and `entities` is a non-empty list.

Skip any recall event that already has `influence` entries for the same `session_id` and entity ids. Do not write duplicate influence records.

### Step 2: Locate Saved Trajectories

List `.evolve/trajectories/` and match each recall event to a trajectory by `session_id`.

Matching strategy (in order):
1. `claude-transcript_<session-id>.jsonl` - the stop-hook transcript dump; the session id is in the filename.
2. `trajectory_<timestamp>_<session-id>.json` - written by the evolve-lite:save-trajectory skill when a session id is available. Match on the `<session-id>` slice of the filename.
3. `trajectory_<timestamp>.json` - open the file and match its top-level `session_id` field against the recall event. Only fall back to this step when the filename alone does not identify the session.

If none of the above yields a confident match for a recall event, skip it. Do not guess.

### Step 3: Read Recalled Entities

For each recalled entity id, open `.evolve/entities/<id>.md`. The id is a path relative to `.evolve/entities/` without the `.md` suffix, such as `guideline/foo` or `subscribed/alice/guideline/foo`.

Read the entity content and trigger. Skip ids whose files are missing.

### Step 4: Assess Influence

Compare each recalled entity with the matched trajectory. Pick exactly one verdict:

- `followed` - the agent's actual actions are consistent with the guideline.
- `contradicted` - the guideline applied, but the agent did the opposite or repeated the avoidable dead end.
- `not_applicable` - the guideline was recalled but did not apply to this session.

Keep `evidence` to one short sentence citing a concrete action, tool call, or absence in the trajectory.

### Step 5: Write Influence Events

Pipe one JSON payload per assessed session to the helper:

```bash
echo '{
"session_id": "<session-id>",
"assessments": [
{"entity": "guideline/<slug>", "verdict": "followed", "evidence": "Agent used the saved parser before trying shell fallbacks."}
]
}' | python3 .bob/skills/evolve-lite-provenance/scripts/log_influence.py
```

The `entity` value must match exactly what appeared in the recall event, including any `subscribed/<source>/` prefix.

It is valid to emit an empty `assessments` list when recall events exist but no recalled guideline can be assessed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/usr/bin/env python3
"""Append post-hoc influence assessments to .evolve/audit.log.

Reads JSON from stdin of the form:
{
"session_id": "<transcript stem>",
"assessments": [
{"entity": "<qualified id>", "verdict": "followed|contradicted|not_applicable",
"evidence": "<short justification>"},
...
]
}
"""

import json
import sys
from pathlib import Path

# Walk up from the script location to find the installed plugin lib directory.
# claude/claw-code/codex/bob all ship a sibling lib/ next to skills/; bob's
# installer copies it to .bob/evolve-lib/, hence both names are checked.
_script = Path(__file__).resolve()
_lib = None
for _ancestor in _script.parents:
for _candidate in (_ancestor / "lib", _ancestor / "evolve-lib"):
if (_candidate / "entity_io.py").is_file():
_lib = _candidate
break
if _lib is not None:
break
if _lib is None:
raise ImportError(f"Cannot find plugin lib directory above {_script}")
sys.path.insert(0, str(_lib))
from entity_io import get_evolve_dir, log as _log # noqa: E402
import audit # noqa: E402


_ALLOWED_VERDICTS = {"followed", "contradicted", "not_applicable"}


def log(message):
_log("influence", message)


def existing_influence_keys(evolve_dir):
audit_log = Path(evolve_dir) / "audit.log"
if not audit_log.is_file():
return set()

keys = set()
for line in audit_log.read_text(encoding="utf-8").splitlines():
if not line.strip():
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
continue
if event.get("event") == "influence" and event.get("session_id") and event.get("entity"):
keys.add((event["session_id"], event["entity"]))
return keys


def main():
try:
payload = json.load(sys.stdin)
except json.JSONDecodeError as exc:
log(f"Invalid JSON input: {exc}")
print(f"Error: invalid JSON input - {exc}", file=sys.stderr)
sys.exit(1)

if not isinstance(payload, dict):
log(f"Bad payload type: {type(payload).__name__}")
print("Error: payload must be a JSON object.", file=sys.stderr)
sys.exit(1)

session_id = payload.get("session_id")
assessments = payload.get("assessments", [])
if not isinstance(session_id, str) or not session_id or not isinstance(assessments, list):
log(f"Bad payload shape: session_id={session_id!r} assessments_type={type(assessments).__name__}")
print("Error: payload must include a string `session_id` and a list `assessments`.", file=sys.stderr)
sys.exit(1)

evolve_dir = get_evolve_dir().resolve()
existing_keys = existing_influence_keys(evolve_dir)

written = 0
for assessment in assessments:
if not isinstance(assessment, dict):
log(f"Skipping non-dict assessment item: {assessment!r}")
continue
entity = assessment.get("entity")
verdict = assessment.get("verdict")
evidence = assessment.get("evidence", "")
if not isinstance(entity, str) or not entity:
log(f"Skipping assessment with non-string entity: {assessment!r}")
continue
if verdict not in _ALLOWED_VERDICTS:
log(f"Skipping invalid assessment verdict: {assessment}")
continue
if not isinstance(evidence, str):
evidence = str(evidence)
key = (session_id, entity)
if key in existing_keys:
log(f"Skipping duplicate influence assessment: session_id={session_id} entity={entity}")
continue
audit.append(
evolve_dir=str(evolve_dir),
event="influence",
session_id=session_id,
entity=entity,
verdict=verdict,
evidence=evidence,
)
existing_keys.add(key)
written += 1

log(f"Wrote {written} influence record(s) for session {session_id}")
print(f"Recorded {written} influence assessment(s).")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
if _lib is None:
raise ImportError(f"Cannot find plugin lib directory above {_script}")
sys.path.insert(0, str(_lib))
from entity_io import find_entities_dir, markdown_to_entity, log as _log # noqa: E402
from entity_io import find_entities_dir, get_evolve_dir, markdown_to_entity, log as _log # noqa: E402
import audit # noqa: E402


def log(message):
Expand Down Expand Up @@ -81,6 +82,7 @@ def load_entities_with_source(entities_dir):
continue

entity.pop("_source", None)
entity["_id"] = str(md.relative_to(entities_dir).with_suffix(""))
parts = md.relative_to(entities_dir).parts
if parts and parts[0] == "subscribed" and len(parts) > 1:
entity["_source"] = parts[1]
Expand Down Expand Up @@ -139,6 +141,33 @@ def main():
print(output)
log(f"Output {len(output)} chars to stdout")

# Audit which entity ids were served to this session. Logging is
# intentionally best-effort so recall never fails because provenance
# recording could not append to audit.log.
try:
if isinstance(input_data, dict):
transcript_path = input_data.get("transcript_path", "")
else:
transcript_path = ""
session_id = None
if transcript_path:
stem = Path(transcript_path).stem
if stem.startswith("claude-transcript_"):
session_id = stem.removeprefix("claude-transcript_")
if not session_id and isinstance(input_data, dict) and isinstance(input_data.get("session_id"), str):
session_id = input_data["session_id"]
entity_ids = sorted({entity["_id"] for entity in entities if entity.get("_id")})
if session_id and entity_ids:
audit.append(
evolve_dir=str(get_evolve_dir().resolve()),
event="recall",
session_id=session_id,
entities=entity_ids,
)
log(f"Audit: recall session_id={session_id} entities={len(entity_ids)}")
except Exception as exc:
log(f"Audit append failed (non-fatal): {exc}")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,14 @@ Wrap the messages array in a trajectory envelope:
{
"model": "<model-id-from-session>",
"timestamp": "2025-01-15T10:30:00Z",
"session_id": "<session-id-from-session>",
"messages": [...]
}
```

- **model**: Use the exact model ID from the current session's environment context (e.g., the value after "You are powered by the model named …"). Do not hardcode a default — always read it from the session.
- **timestamp**: Current ISO 8601 timestamp
- **session_id**: The current session identifier. Read it from whatever the harness exposes — the `session_id` passed into the skill, the session id surfaced in the session context, or a runtime-provided environment variable. Include it verbatim so offline provenance can match this trajectory to `recall` audit events for the same session. Omit the field only if no session id is truly available in this environment.

### Step 5: Save via Helper Script

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import getpass
import json
import os
import re
import sys
import tempfile
from pathlib import Path
Expand Down Expand Up @@ -65,15 +66,29 @@ def get_trajectories_dir():
return base.resolve()


def open_trajectory_file(trajectories_dir):
_SAFE_SESSION_ID = re.compile(r"[^A-Za-z0-9._-]")


def _sanitize_session_id(session_id):
"""Return a filesystem-safe slice of ``session_id`` (empty if unusable)."""
if not isinstance(session_id, str):
return ""
cleaned = _SAFE_SESSION_ID.sub("-", session_id.strip())
return cleaned[:64]


def open_trajectory_file(trajectories_dir, session_id=None):
"""Atomically claim a timestamped trajectory file.

Returns a ``(Path, fd)`` tuple. Uses ``O_CREAT | O_EXCL`` so two saves
racing within the same second pick distinct filenames instead of one
overwriting the other.
overwriting the other. When ``session_id`` is provided, it is embedded
in the filename so offline provenance can match this trajectory to
``recall`` audit events for the same session without content inspection.
"""
now = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
base_name = f"trajectory_{now}"
sid = _sanitize_session_id(session_id)
base_name = f"trajectory_{now}_{sid}" if sid else f"trajectory_{now}"

for suffix in range(0, 1000):
name = f"{base_name}.json" if suffix == 0 else f"{base_name}_{suffix}.json"
Expand Down Expand Up @@ -121,9 +136,11 @@ def main():

log(f"Trajectory has {len(messages)} messages")

# Atomically claim a unique output path (handles same-second races)
# Atomically claim a unique output path (handles same-second races).
# Embed session_id in the filename when present so offline provenance
# can match recall events to trajectories deterministically.
trajectories_dir = get_trajectories_dir()
output_path, fd = open_trajectory_file(trajectories_dir)
output_path, fd = open_trajectory_file(trajectories_dir, trajectory.get("session_id"))

# Write formatted JSON via the already-opened owner-only fd
try:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,12 +132,21 @@ def main():
remote=args.remote,
)
except Exception as exc:
# Audit logging is best-effort: a failed append shouldn't roll back
# an otherwise successful subscribe (the repo is cloned, the config
# has the entry). Warn loudly so the user can fix the audit log
# path without losing the subscription. Originally rolled back on
# main's PR #245 (#244 e2e fix).
print(f"Warning: failed to append audit entry for subscribe: {exc}", file=sys.stderr)
repos.pop()
set_repos(cfg, repos)
try:
save_config(cfg, project_root)
except Exception as save_exc:
print(
f"Warning: rollback save_config failed under {project_root!r}: {save_exc}. "
f"The clone was removed but evolve.config.yaml may still list '{args.name}' - "
f"please inspect the file and remove the entry manually if present.",
file=sys.stderr,
)
if dest.exists():
shutil.rmtree(dest, ignore_errors=True)
print(f"Error: failed to record subscription in audit log: {exc}", file=sys.stderr)
sys.exit(1)

print(f"Subscribed to '{args.name}' (scope={args.scope}) from {args.remote}")

Expand Down
Loading
Loading