valuecodes · valuecodes · Jan 27, 2026 · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
 node_modules
 .env
-tmp
+tmp
+.venv
+__pycache__
diff --git a/AGENTS.md b/AGENTS.md
@@ -9,9 +9,10 @@
 1. Start at `src/cli/<cli>/main.ts` and the matching `src/cli/<cli>/README.md`.
 2. Follow the pipeline classes under `src/cli/<cli>/clients/*` and schemas under `src/cli/<cli>/types/*`.
 3. Reuse shared helpers: `src/utils/parse-args.ts`, `src/utils/question-handler.ts`, `src/clients/logger.ts`.
-4. Keep changes minimal; add/update **Vitest** tests (`*.test.ts`) when behavior changes.
-5. Run: `pnpm typecheck`, `pnpm lint`, `pnpm test` (and `pnpm format:check` if formatting changed).
-6. All runtime artifacts go under `tmp/` (never commit them).
+4. Keep `main.ts` focused on the basic agent flow; move non-trivial logic into `clients/` or `utils/`.
+5. Keep changes minimal; add/update **Vitest** tests (`*.test.ts`) when behavior changes.
+6. Run: `pnpm typecheck`, `pnpm lint`, `pnpm test` (and `pnpm format:check` if formatting changed).
+7. All runtime artifacts go under `tmp/` (never commit them).
 
 **Scratch space:** Use `tmp/` for generated HTML/markdown/JSON/reports.
 
@@ -31,6 +32,13 @@
 - Install deps: `pnpm install`
 - Set `OPENAI_API_KEY` via env or `.env` (humans do this; agents must not read secrets)
 - If a task requires Playwright, follow the repo README for system deps
+- If a task requires Python (e.g., `etf-backtest`), set up the venv:
+  ```bash
+  # On Debian/Ubuntu, install venv support first: sudo apt install python3-venv
+  python3 -m venv .venv
+  source .venv/bin/activate
+  pip install numpy pandas torch
+  ```
 
 **Common scripts (see `package.json` for all):**
 
@@ -86,6 +94,9 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
 - **`listFiles`** (`src/tools/list-files/list-files-tool.ts`)
   - Lists files/dirs under `tmp/`.
   - Params: `{ path?: string }` (defaults to `tmp/` root)
+- **`runPython`** (`src/tools/run-python/run-python-tool.ts`)
+  - Runs a Python script from a configured scripts directory.
+  - Params: `{ scriptName: string, input: string }` (input is JSON string; pass `""` for no input)
 
 ### Safe web fetch tool
 
@@ -99,9 +110,16 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
 ## 5) Coding conventions (how changes should look)
 
 - Initialize `Logger` in CLI entry points and pass it into clients/pipelines via constructor options.
+- Use `Logger` instead of `console.log`/`console.error` for output.
+- Use `AgentRunner` (`src/clients/agent-runner.ts`) as the default wrapper when running agents.
 - Prefer shared helpers in `src/utils` (`parse-args`, `question-handler`) over custom logic.
+- `main.ts` should stay focused on the **basic agent flow**: argument parsing → agent setup → run loop → final output. Move helper logic into `clients/` or `utils/`
 - Prefer TypeScript path aliases over deep relative imports: `~tools/*`, `~clients/*`, `~utils/*`.
 - Use Zod schemas for CLI args and tool IO.
+- Keep object field names in `camelCase` (e.g., `trainSamples`), not `snake_case`.
+- Keep Zod schemas in a dedicated `schemas.ts` file for each CLI (avoid inline schemas in `main.ts`).
+- Keep constants in a dedicated `constants.ts` file for each CLI.
+- Move hardcoded numeric values into `constants.ts` (treat numbers as configuration).
 - For HTTP fetching in code, prefer `Fetch` (sanitized) or `PlaywrightScraper` for JS-heavy pages.
 - When adding tools that touch files, use `src/tools/utils/fs.ts` for path validation.
 - Comments should capture invariants or subtle behavior, not restate code.
@@ -127,3 +145,7 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
 - [ ] Any generated artifacts are in `tmp/` only
 
 ---
+
+# ExecPlans
+
+When writing complex features or significant refactors, use an ExecPlan (as described in `agent/PLANS.md`) from design to implementation.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # cli-agent-sandbox
 
-A minimal TypeScript CLI sandbox for testing agent workflows and safe web scraping. This is a single-package repo built with [`@openai/agents`](https://github.com/openai/openai-agents-js), and it includes a guestbook demo, a Finnish name explorer CLI, a publication scraping pipeline with a Playwright-based scraper for JS-rendered pages, and agent tools scoped to `tmp` with strong safety checks.
+A minimal TypeScript CLI sandbox for testing agent workflows and safe web scraping. This is a single-package repo built with [`@openai/agents`](https://github.com/openai/openai-agents-js), and it includes a guestbook demo, a Finnish name explorer CLI, a publication scraping pipeline with a Playwright-based scraper for JS-rendered pages, an ETF backtest CLI, and agent tools scoped to `tmp` with strong safety checks.
 
 ## Quick Start
 
@@ -11,20 +11,33 @@ A minimal TypeScript CLI sandbox for testing agent workflows and safe web scrapi
 5. Run the demo: `pnpm run:guestbook`
 6. (Optional) Explore Finnish name stats: `pnpm run:name-explorer -- --mode ai|stats`
 7. (Optional) Run publication scraping: `pnpm run:scrape-publications -- --url="https://example.com"`
+8. (Optional) Run ETF backtest: `pnpm run:etf-backtest -- --isin=IE00B5BMR087` (requires Python setup below)
+
+### Python Setup (for ETF backtest)
+
+```bash
+# On Debian/Ubuntu, install venv support first:
+sudo apt install python3-venv
+
+python3 -m venv .venv
+source .venv/bin/activate
+pip install numpy pandas torch
+```
 
 ## Commands
 
-| Command                        | Description                                       |
-| ------------------------------ | ------------------------------------------------- |
-| `pnpm run:guestbook`           | Run the interactive guestbook CLI demo            |
-| `pnpm run:name-explorer`       | Explore Finnish name statistics (AI Q&A or stats) |
-| `pnpm run:scrape-publications` | Scrape publication links and build a review page  |
-| `pnpm typecheck`               | Run TypeScript type checking                      |
-| `pnpm lint`                    | Run ESLint for code quality                       |
-| `pnpm lint:fix`                | Run ESLint and auto-fix issues                    |
-| `pnpm format`                  | Format code with Prettier                         |
-| `pnpm format:check`            | Check code formatting                             |
-| `pnpm test`                    | Run Vitest test suite                             |
+| Command                        | Description                                            |
+| ------------------------------ | ------------------------------------------------------ |
+| `pnpm run:guestbook`           | Run the interactive guestbook CLI demo                 |
+| `pnpm run:name-explorer`       | Explore Finnish name statistics (AI Q&A or stats)      |
+| `pnpm run:scrape-publications` | Scrape publication links and build a review page       |
+| `pnpm run:etf-backtest`        | Run ETF backtest + feature optimizer (requires Python) |
+| `pnpm typecheck`               | Run TypeScript type checking                           |
+| `pnpm lint`                    | Run ESLint for code quality                            |
+| `pnpm lint:fix`                | Run ESLint and auto-fix issues                         |
+| `pnpm format`                  | Format code with Prettier                              |
+| `pnpm format:check`            | Check code formatting                                  |
+| `pnpm test`                    | Run Vitest test suite                                  |
 
 ## Publication scraping
 
@@ -46,7 +59,7 @@ The publication pipeline uses `PlaywrightScraper` to render JavaScript-heavy pag
 
 The `run:name-explorer` script explores Finnish name statistics. It supports an AI Q&A mode (default) backed by SQL tools, plus a `stats` mode that generates an HTML report.
 
-![Name Explorer demo](src/cli/name-explorer/demo-1.png)
+<img src="src/cli/name-explorer/demo-1.png" alt="Name Explorer demo" width="820" />
 
 Usage:
 
@@ -56,22 +69,55 @@ pnpm run:name-explorer -- [--mode ai|stats] [--refetch]
 
 Outputs are written under `tmp/name-explorer/`, including `statistics.html` in stats mode.
 
+## ETF backtest
+
+The `run:etf-backtest` CLI fetches ETF history from justetf.com (via Playwright), caches it under
+`tmp/etf-backtest/<ISIN>/data.json`, and runs the Python experiment loop via the `runPython` tool.
+
+<img src="src/cli/etf-backtest/demo-1.png" alt="ETF Backtest demo" width="820" />
+
+Usage:
+
+```
+pnpm run:etf-backtest -- --isin=IE00B5BMR087 [--maxIterations=5] [--seed=42] [--refresh] [--verbose]
+```
+
+Notes:
+
+- `--refresh` forces a refetch; otherwise cached data is reused.
+- Python scripts live in `src/cli/etf-backtest/scripts/`.
+
 ## Tools
 
-File tools are sandboxed to the `tmp/` directory with path validation to prevent traversal and symlink attacks. The `fetchUrl` tool adds SSRF protections and HTML sanitization.
+File tools are sandboxed to the `tmp/` directory with path validation to prevent traversal and symlink attacks. The `fetchUrl` tool adds SSRF protections and HTML sanitization, and `runPython` executes whitelisted Python scripts from a configured directory.
+
+| Tool        | Location                                  | Description                                                                    |
+| ----------- | ----------------------------------------- | ------------------------------------------------------------------------------ |
+| `fetchUrl`  | `src/tools/fetch-url/fetch-url-tool.ts`   | Fetches URLs safely and returns sanitized Markdown/text                        |
+| `readFile`  | `src/tools/read-file/read-file-tool.ts`   | Reads file content from `tmp` directory                                        |
+| `writeFile` | `src/tools/write-file/write-file-tool.ts` | Writes content to files in `tmp` directory                                     |
+| `listFiles` | `src/tools/list-files/list-files-tool.ts` | Lists files and directories under `tmp`                                        |
+| `runPython` | `src/tools/run-python/run-python-tool.ts` | Runs Python scripts from a configured scripts directory (JSON stdin supported) |
+
+`runPython` details:
 
-| Tool        | Location                                  | Description                                             |
-| ----------- | ----------------------------------------- | ------------------------------------------------------- |
-| `fetchUrl`  | `src/tools/fetch-url/fetch-url-tool.ts`   | Fetches URLs safely and returns sanitized Markdown/text |
-| `readFile`  | `src/tools/read-file/read-file-tool.ts`   | Reads file content from `tmp` directory                 |
-| `writeFile` | `src/tools/write-file/write-file-tool.ts` | Writes content to files in `tmp` directory              |
-| `listFiles` | `src/tools/list-files/list-files-tool.ts` | Lists files and directories under `tmp`                 |
+- `scriptName` must be a `.py` file name in the configured scripts directory (no subpaths).
+- `input` is a JSON string passed to stdin (use `""` for no input).
 
 ## Project Structure
 
 ```
 src/
 ├── cli/
+│   ├── etf-backtest/
+│   │   ├── main.ts            # ETF backtest CLI entry point
+│   │   ├── README.md          # ETF backtest docs
+│   │   ├── constants.ts       # CLI constants
+│   │   ├── schemas.ts         # CLI args + agent output schemas
+│   │   ├── clients/           # Data fetcher + Playwright capture
+│   │   ├── utils/             # Scoring + formatting helpers
+│   │   ├── types/             # ETF data types
+│   │   └── scripts/           # Python backtest + prediction scripts
 │   ├── guestbook/
 │   │   ├── main.ts            # Guestbook CLI entry point
 │   │   └── README.md          # Guestbook CLI docs
@@ -90,15 +136,16 @@ src/
 ├── clients/
 │   ├── fetch.ts               # Shared HTTP fetch + sanitization
 │   ├── logger.ts              # Shared console logger
+│   ├── agent-runner.ts        # Default agent runner wrapper
 │   └── playwright-scraper.ts  # Playwright-based web scraper
 ├── utils/
 │   ├── parse-args.ts          # Shared CLI arg parsing helper
 │   └── question-handler.ts    # Shared CLI prompt + validation helper
 ├── tools/
-│   ├── index.ts               # Tool exports
 │   ├── fetch-url/             # Safe fetch tool
 │   ├── list-files/            # List files tool
 │   ├── read-file/             # Read file tool
+│   ├── run-python/            # Run Python scripts tool
 │   ├── write-file/            # Write file tool
 │   └── utils/
 │       ├── fs.ts              # Path safety utilities
@@ -111,6 +158,7 @@ tmp/                           # Runtime scratch space (tool I/O)
 ## CLI conventions
 
 - When using `Logger`, initialize it in the CLI entry point and pass it into clients/pipelines via constructor options.
+- Use `AgentRunner` (`src/clients/agent-runner.ts`) as the default wrapper when running agents.
 - Prefer shared helpers in `src/utils` (`parse-args`, `question-handler`) over custom argument parsing or prompt logic.
 - Use the TypeScript path aliases for shared modules: `~tools/*`, `~clients/*`, `~utils/*`.
   Example: `import { readFileTool } from "~tools/read-file/read-file-tool";`

diff --git a/agent/PLANS.md b/agent/PLANS.md
@@ -0,0 +1,136 @@
+# ExecPlans for cli-agent-sandbox
+
+This repo is a minimal TypeScript CLI sandbox. ExecPlans exist to make larger changes safe, reproducible, and testable by a novice who only has the repo and the plan. Keep plans tailored to this repository, not a generic template.
+
+Use an ExecPlan only for complex features or significant refactors. For small, localized changes, skip the plan and just implement.
+
+## Non-negotiables
+
+- Self-contained: the plan must include all context needed to execute it without external docs or prior plans.
+- Observable outcomes: describe what a human can run and see to prove the change works.
+- Living document: update the plan as work proceeds; never let it drift from reality.
+- Repo-safe: never read `.env`, never write outside the repo or `tmp/`, never commit or push.
+- Minimal, test-covered changes: update or add Vitest tests when behavior changes.
+
+## Repository context to embed in every plan
+
+Include a short orientation paragraph naming the key paths and how they relate:
+
+- Entry points live in `src/cli/<cli>/main.ts` with a matching `src/cli/<cli>/README.md`.
+- Pipelines and clients live in `src/cli/<cli>/clients/*`; schemas in `src/cli/<cli>/types/*`.
+- Shared helpers: `src/utils/parse-args.ts`, `src/utils/question-handler.ts`, `src/clients/logger.ts`.
+- Tool sandboxing is under `src/tools/*` and path validation in `src/tools/utils/fs.ts`.
+- Runtime artifacts belong under `tmp/` only.
+
+If the plan adds a new CLI, state that it must be scaffolded via:
+
+    pnpm scaffold:cli -- --name=my-cli --description="What it does"
+
+Then add `"run:my-cli": "tsx src/cli/my-cli/main.ts"` to `package.json`.
+
+## Repo conventions to capture in plans (when relevant)
+
+- Initialize `Logger` in CLI entry points and pass it into clients/pipelines via constructor options.
+- Use Zod schemas for CLI args and tool IO; name the schema files in the plan.
+- Prefer TypeScript path aliases like `~tools/*`, `~clients/*`, `~utils/*` over deep relative imports.
+- Avoid `index.ts` barrel exports; use explicit module paths.
+- For HTTP fetching, prefer sanitized `Fetch` or `PlaywrightScraper` as appropriate.
+- Any file-touching tool must use path validation from `src/tools/utils/fs.ts`.
+
+## Required sections in every ExecPlan
+
+Use these headings, in this order, and keep them up to date:
+
+1. **Purpose / Big Picture** — what the user gains and how they can see it working.
+2. **Progress** — checklist with timestamps (UTC), split partial work into “done” vs “remaining”.
+3. **Surprises & Discoveries** — unexpected behaviors or constraints with short evidence.
+4. **Decision Log** — decision, rationale, date/author.
+5. **Outcomes & Retrospective** — what was achieved, gaps, lessons learned.
+6. **Context and Orientation** — repo-specific orientation and key files.
+7. **Conventions and Contracts** — logging, schemas, imports, and tool safety expectations.
+8. **Plan of Work** — prose describing edits, with precise file paths and locations.
+9. **Concrete Steps** — exact commands to run (cwd included) and expected short outputs.
+10. **Validation and Acceptance** — behavioral acceptance and tests; name new tests.
+11. **Idempotence and Recovery** — how to rerun safely; rollback guidance if needed.
+12. **Artifacts and Notes** — concise transcripts, diffs, or snippets as indented blocks.
+13. **Interfaces and Dependencies** — required modules, types, function signatures, and why.
+
+## Formatting rules
+
+- The ExecPlan is a normal Markdown document (no outer code fence).
+- Prefer prose over lists; the only mandatory checklist is in **Progress**.
+- Define any non-obvious term the first time you use it.
+- Use repo-relative paths and exact function/module names.
+- Do not point to external docs; embed the needed context in the plan itself.
+
+## Validation defaults for this repo
+
+State which of these apply, and include expected outcomes:
+
+- `pnpm typecheck`
+- `pnpm lint` (or `pnpm lint:fix` if auto-fixing is intended)
+- `pnpm test`
+- `pnpm format:check` (if formatting changes)
+
+If the change affects a CLI, include a concrete CLI invocation and expected output.
+
+## ExecPlan skeleton (copy and fill)
+
+    # <Short, action-oriented title>
+
+    This ExecPlan is a living document. Update **Progress**, **Surprises & Discoveries**, **Decision Log**, and **Outcomes & Retrospective** as work proceeds.
+
+    ## Purpose / Big Picture
+
+    Describe the user-visible behavior and how to observe it.
+
+    ## Progress
+
+    - [ ] (2026-01-25 00:00Z) Example incomplete step.
+
+    ## Surprises & Discoveries
+
+    - Observation: …
+      Evidence: …
+
+    ## Decision Log
+
+    - Decision: …
+      Rationale: …
+      Date/Author: …
+
+    ## Outcomes & Retrospective
+
+    Summarize results, gaps, and lessons learned.
+
+    ## Context and Orientation
+
+    Explain the relevant parts of `src/cli/...`, shared helpers, and tools.
+
+    ## Conventions and Contracts
+
+    Call out logging, Zod schemas, imports, and any tool safety expectations.
+
+    ## Plan of Work
+
+    Prose description of edits with precise file paths and locations.
+
+    ## Concrete Steps
+
+    State commands with cwd and short expected outputs.
+
+    ## Validation and Acceptance
+
+    Behavioral acceptance plus test commands and expectations.
+
+    ## Idempotence and Recovery
+
+    How to rerun safely and roll back if needed.
+
+    ## Artifacts and Notes
+
+    Short transcripts, diffs, or snippets as indented blocks.
+
+    ## Interfaces and Dependencies
+
+    Required types/modules/functions and why they exist.
diff --git a/eslint.config.ts b/eslint.config.ts
@@ -89,6 +89,22 @@ export default defineConfig(
           ],
         },
       ],
+      // Avoid template literals in logger calls for better structured logging
+      "no-restricted-syntax": [
+        "error",
+        {
+          selector:
+            "CallExpression[callee.type='MemberExpression'][callee.object.name='logger'][callee.property.name=/^(debug|info|warn|error|tool|question|answer)$/] > TemplateLiteral",
+          message:
+            "Avoid template literals in logger calls. Use a plain string and pass data as extra args (e.g. logger.info('Saved file', { path })).",
+        },
+        {
+          selector:
+            "CallExpression[callee.type='MemberExpression'][callee.object.type='MemberExpression'][callee.object.property.name='logger'][callee.property.name=/^(debug|info|warn|error|tool|question|answer)$/] > TemplateLiteral",
+          message:
+            "Avoid template literals in logger calls. Use a plain string and pass data as extra args (e.g. logger.info('Saved file', { path })).",
+        },
+      ],
     },
   },
   {