Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
b2dc6d8
feat: add etf backtest cli
valuecodes Jan 26, 2026
6b57331
feat: implement agent runner with logging and event handling
valuecodes Jan 26, 2026
18d66db
refactor: simplify agent runner configuration and update model name
valuecodes Jan 26, 2026
04886e3
refactor: remove unused constants and clean up code
valuecodes Jan 26, 2026
a9b6ee7
refactor: update agent runner to accept prompt in options object
valuecodes Jan 26, 2026
6c459a6
feat: enhance agent runner with stateless execution and logging
valuecodes Jan 26, 2026
51485a9
refactor: streamline tool creation with logger integration
valuecodes Jan 26, 2026
40bcff9
refactor: enhance logging structure and clarity across multiple files
valuecodes Jan 26, 2026
d307a4e
feat: implement ETF data fetching with caching and logging
valuecodes Jan 26, 2026
cbcf4e2
chore: remove legacy backtest and prediction scripts
valuecodes Jan 26, 2026
6d48ede
refactor: remove ticker references from CLI and related scripts
valuecodes Jan 26, 2026
d19ff50
docs: update README and agent documentation for ETF backtest usage
valuecodes Jan 26, 2026
9eb364e
feat: implement learnings manager for ETF backtest optimization
valuecodes Jan 27, 2026
e3ae01b
refactor: remove REASONING_PREVIEW_LIMIT constant and update usage
valuecodes Jan 27, 2026
1d9adec
docs: update AGENTS and ETF backtest README for clarity and new features
valuecodes Jan 27, 2026
a7362b9
refactor: move ETF data schemas to schemas.ts and remove old types
valuecodes Jan 27, 2026
d5b92a4
test: update tool logging in AgentRunner tests for clarity
valuecodes Jan 27, 2026
36165ee
feat: add prompt builders for runPython usage and recovery messages
valuecodes Jan 27, 2026
3656fd2
refactor: remove redundant Python script tests from run-python-tool
valuecodes Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
node_modules
.env
tmp
tmp
.venv
__pycache__
28 changes: 25 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@
1. Start at `src/cli/<cli>/main.ts` and the matching `src/cli/<cli>/README.md`.
2. Follow the pipeline classes under `src/cli/<cli>/clients/*` and schemas under `src/cli/<cli>/types/*`.
3. Reuse shared helpers: `src/utils/parse-args.ts`, `src/utils/question-handler.ts`, `src/clients/logger.ts`.
4. Keep changes minimal; add/update **Vitest** tests (`*.test.ts`) when behavior changes.
5. Run: `pnpm typecheck`, `pnpm lint`, `pnpm test` (and `pnpm format:check` if formatting changed).
6. All runtime artifacts go under `tmp/` (never commit them).
4. Keep `main.ts` focused on the basic agent flow; move non-trivial logic into `clients/` or `utils/`.
5. Keep changes minimal; add/update **Vitest** tests (`*.test.ts`) when behavior changes.
6. Run: `pnpm typecheck`, `pnpm lint`, `pnpm test` (and `pnpm format:check` if formatting changed).
7. All runtime artifacts go under `tmp/` (never commit them).

**Scratch space:** Use `tmp/` for generated HTML/markdown/JSON/reports.

Expand All @@ -31,6 +32,13 @@
- Install deps: `pnpm install`
- Set `OPENAI_API_KEY` via env or `.env` (humans do this; agents must not read secrets)
- If a task requires Playwright, follow the repo README for system deps
- If a task requires Python (e.g., `etf-backtest`), set up the venv:
```bash
# On Debian/Ubuntu, install venv support first: sudo apt install python3-venv
python3 -m venv .venv
source .venv/bin/activate
pip install numpy pandas torch
```

**Common scripts (see `package.json` for all):**

Expand Down Expand Up @@ -86,6 +94,9 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
- **`listFiles`** (`src/tools/list-files/list-files-tool.ts`)
- Lists files/dirs under `tmp/`.
- Params: `{ path?: string }` (defaults to `tmp/` root)
- **`runPython`** (`src/tools/run-python/run-python-tool.ts`)
- Runs a Python script from a configured scripts directory.
- Params: `{ scriptName: string, input: string }` (input is JSON string; pass `""` for no input)

### Safe web fetch tool

Expand All @@ -99,9 +110,16 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
## 5) Coding conventions (how changes should look)

- Initialize `Logger` in CLI entry points and pass it into clients/pipelines via constructor options.
- Use `Logger` instead of `console.log`/`console.error` for output.
- Use `AgentRunner` (`src/clients/agent-runner.ts`) as the default wrapper when running agents.
- Prefer shared helpers in `src/utils` (`parse-args`, `question-handler`) over custom logic.
- `main.ts` should stay focused on the **basic agent flow**: argument parsing → agent setup → run loop → final output. Move helper logic into `clients/` or `utils/`
- Prefer TypeScript path aliases over deep relative imports: `~tools/*`, `~clients/*`, `~utils/*`.
- Use Zod schemas for CLI args and tool IO.
- Keep object field names in `camelCase` (e.g., `trainSamples`), not `snake_case`.
- Keep Zod schemas in a dedicated `schemas.ts` file for each CLI (avoid inline schemas in `main.ts`).
- Keep constants in a dedicated `constants.ts` file for each CLI.
- Move hardcoded numeric values into `constants.ts` (treat numbers as configuration).
- For HTTP fetching in code, prefer `Fetch` (sanitized) or `PlaywrightScraper` for JS-heavy pages.
- When adding tools that touch files, use `src/tools/utils/fs.ts` for path validation.
- Comments should capture invariants or subtle behavior, not restate code.
Expand All @@ -127,3 +145,7 @@ All file tools are sandboxed to `tmp/` using path validation (`src/tools/utils/f
- [ ] Any generated artifacts are in `tmp/` only

---

# ExecPlans

When writing complex features or significant refactors, use an ExecPlan (as described in `agent/PLANS.md`) from design to implementation.
90 changes: 69 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cli-agent-sandbox

A minimal TypeScript CLI sandbox for testing agent workflows and safe web scraping. This is a single-package repo built with [`@openai/agents`](https://github.com/openai/openai-agents-js), and it includes a guestbook demo, a Finnish name explorer CLI, a publication scraping pipeline with a Playwright-based scraper for JS-rendered pages, and agent tools scoped to `tmp` with strong safety checks.
A minimal TypeScript CLI sandbox for testing agent workflows and safe web scraping. This is a single-package repo built with [`@openai/agents`](https://github.com/openai/openai-agents-js), and it includes a guestbook demo, a Finnish name explorer CLI, a publication scraping pipeline with a Playwright-based scraper for JS-rendered pages, an ETF backtest CLI, and agent tools scoped to `tmp` with strong safety checks.

## Quick Start

Expand All @@ -11,20 +11,33 @@ A minimal TypeScript CLI sandbox for testing agent workflows and safe web scrapi
5. Run the demo: `pnpm run:guestbook`
6. (Optional) Explore Finnish name stats: `pnpm run:name-explorer -- --mode ai|stats`
7. (Optional) Run publication scraping: `pnpm run:scrape-publications -- --url="https://example.com"`
8. (Optional) Run ETF backtest: `pnpm run:etf-backtest -- --isin=IE00B5BMR087` (requires Python setup below)

### Python Setup (for ETF backtest)

```bash
# On Debian/Ubuntu, install venv support first:
sudo apt install python3-venv

python3 -m venv .venv
source .venv/bin/activate
pip install numpy pandas torch
```

## Commands

| Command | Description |
| ------------------------------ | ------------------------------------------------- |
| `pnpm run:guestbook` | Run the interactive guestbook CLI demo |
| `pnpm run:name-explorer` | Explore Finnish name statistics (AI Q&A or stats) |
| `pnpm run:scrape-publications` | Scrape publication links and build a review page |
| `pnpm typecheck` | Run TypeScript type checking |
| `pnpm lint` | Run ESLint for code quality |
| `pnpm lint:fix` | Run ESLint and auto-fix issues |
| `pnpm format` | Format code with Prettier |
| `pnpm format:check` | Check code formatting |
| `pnpm test` | Run Vitest test suite |
| Command | Description |
| ------------------------------ | ------------------------------------------------------ |
| `pnpm run:guestbook` | Run the interactive guestbook CLI demo |
| `pnpm run:name-explorer` | Explore Finnish name statistics (AI Q&A or stats) |
| `pnpm run:scrape-publications` | Scrape publication links and build a review page |
| `pnpm run:etf-backtest` | Run ETF backtest + feature optimizer (requires Python) |
| `pnpm typecheck` | Run TypeScript type checking |
| `pnpm lint` | Run ESLint for code quality |
| `pnpm lint:fix` | Run ESLint and auto-fix issues |
| `pnpm format` | Format code with Prettier |
| `pnpm format:check` | Check code formatting |
| `pnpm test` | Run Vitest test suite |

## Publication scraping

Expand All @@ -46,7 +59,7 @@ The publication pipeline uses `PlaywrightScraper` to render JavaScript-heavy pag

The `run:name-explorer` script explores Finnish name statistics. It supports an AI Q&A mode (default) backed by SQL tools, plus a `stats` mode that generates an HTML report.

![Name Explorer demo](src/cli/name-explorer/demo-1.png)
<img src="src/cli/name-explorer/demo-1.png" alt="Name Explorer demo" width="820" />

Usage:

Expand All @@ -56,22 +69,55 @@ pnpm run:name-explorer -- [--mode ai|stats] [--refetch]

Outputs are written under `tmp/name-explorer/`, including `statistics.html` in stats mode.

## ETF backtest

The `run:etf-backtest` CLI fetches ETF history from justetf.com (via Playwright), caches it under
`tmp/etf-backtest/<ISIN>/data.json`, and runs the Python experiment loop via the `runPython` tool.

<img src="src/cli/etf-backtest/demo-1.png" alt="ETF Backtest demo" width="820" />

Usage:

```
pnpm run:etf-backtest -- --isin=IE00B5BMR087 [--maxIterations=5] [--seed=42] [--refresh] [--verbose]
```

Notes:

- `--refresh` forces a refetch; otherwise cached data is reused.
- Python scripts live in `src/cli/etf-backtest/scripts/`.

## Tools

File tools are sandboxed to the `tmp/` directory with path validation to prevent traversal and symlink attacks. The `fetchUrl` tool adds SSRF protections and HTML sanitization.
File tools are sandboxed to the `tmp/` directory with path validation to prevent traversal and symlink attacks. The `fetchUrl` tool adds SSRF protections and HTML sanitization, and `runPython` executes whitelisted Python scripts from a configured directory.

| Tool | Location | Description |
| ----------- | ----------------------------------------- | ------------------------------------------------------------------------------ |
| `fetchUrl` | `src/tools/fetch-url/fetch-url-tool.ts` | Fetches URLs safely and returns sanitized Markdown/text |
| `readFile` | `src/tools/read-file/read-file-tool.ts` | Reads file content from `tmp` directory |
| `writeFile` | `src/tools/write-file/write-file-tool.ts` | Writes content to files in `tmp` directory |
| `listFiles` | `src/tools/list-files/list-files-tool.ts` | Lists files and directories under `tmp` |
| `runPython` | `src/tools/run-python/run-python-tool.ts` | Runs Python scripts from a configured scripts directory (JSON stdin supported) |

`runPython` details:

| Tool | Location | Description |
| ----------- | ----------------------------------------- | ------------------------------------------------------- |
| `fetchUrl` | `src/tools/fetch-url/fetch-url-tool.ts` | Fetches URLs safely and returns sanitized Markdown/text |
| `readFile` | `src/tools/read-file/read-file-tool.ts` | Reads file content from `tmp` directory |
| `writeFile` | `src/tools/write-file/write-file-tool.ts` | Writes content to files in `tmp` directory |
| `listFiles` | `src/tools/list-files/list-files-tool.ts` | Lists files and directories under `tmp` |
- `scriptName` must be a `.py` file name in the configured scripts directory (no subpaths).
- `input` is a JSON string passed to stdin (use `""` for no input).

## Project Structure

```
src/
├── cli/
│ ├── etf-backtest/
│ │ ├── main.ts # ETF backtest CLI entry point
│ │ ├── README.md # ETF backtest docs
│ │ ├── constants.ts # CLI constants
│ │ ├── schemas.ts # CLI args + agent output schemas
│ │ ├── clients/ # Data fetcher + Playwright capture
│ │ ├── utils/ # Scoring + formatting helpers
│ │ ├── types/ # ETF data types
│ │ └── scripts/ # Python backtest + prediction scripts
│ ├── guestbook/
│ │ ├── main.ts # Guestbook CLI entry point
│ │ └── README.md # Guestbook CLI docs
Expand All @@ -90,15 +136,16 @@ src/
├── clients/
│ ├── fetch.ts # Shared HTTP fetch + sanitization
│ ├── logger.ts # Shared console logger
│ ├── agent-runner.ts # Default agent runner wrapper
│ └── playwright-scraper.ts # Playwright-based web scraper
├── utils/
│ ├── parse-args.ts # Shared CLI arg parsing helper
│ └── question-handler.ts # Shared CLI prompt + validation helper
├── tools/
│ ├── index.ts # Tool exports
│ ├── fetch-url/ # Safe fetch tool
│ ├── list-files/ # List files tool
│ ├── read-file/ # Read file tool
│ ├── run-python/ # Run Python scripts tool
│ ├── write-file/ # Write file tool
│ └── utils/
│ ├── fs.ts # Path safety utilities
Expand All @@ -111,6 +158,7 @@ tmp/ # Runtime scratch space (tool I/O)
## CLI conventions

- When using `Logger`, initialize it in the CLI entry point and pass it into clients/pipelines via constructor options.
- Use `AgentRunner` (`src/clients/agent-runner.ts`) as the default wrapper when running agents.
- Prefer shared helpers in `src/utils` (`parse-args`, `question-handler`) over custom argument parsing or prompt logic.
- Use the TypeScript path aliases for shared modules: `~tools/*`, `~clients/*`, `~utils/*`.
Example: `import { readFileTool } from "~tools/read-file/read-file-tool";`
Expand Down
136 changes: 136 additions & 0 deletions agent/PLANS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# ExecPlans for cli-agent-sandbox

This repo is a minimal TypeScript CLI sandbox. ExecPlans exist to make larger changes safe, reproducible, and testable by a novice who only has the repo and the plan. Keep plans tailored to this repository, not a generic template.

Use an ExecPlan only for complex features or significant refactors. For small, localized changes, skip the plan and just implement.

## Non-negotiables

- Self-contained: the plan must include all context needed to execute it without external docs or prior plans.
- Observable outcomes: describe what a human can run and see to prove the change works.
- Living document: update the plan as work proceeds; never let it drift from reality.
- Repo-safe: never read `.env`, never write outside the repo or `tmp/`, never commit or push.
- Minimal, test-covered changes: update or add Vitest tests when behavior changes.

## Repository context to embed in every plan

Include a short orientation paragraph naming the key paths and how they relate:

- Entry points live in `src/cli/<cli>/main.ts` with a matching `src/cli/<cli>/README.md`.
- Pipelines and clients live in `src/cli/<cli>/clients/*`; schemas in `src/cli/<cli>/types/*`.
- Shared helpers: `src/utils/parse-args.ts`, `src/utils/question-handler.ts`, `src/clients/logger.ts`.
- Tool sandboxing is under `src/tools/*` and path validation in `src/tools/utils/fs.ts`.
- Runtime artifacts belong under `tmp/` only.

If the plan adds a new CLI, state that it must be scaffolded via:

pnpm scaffold:cli -- --name=my-cli --description="What it does"

Then add `"run:my-cli": "tsx src/cli/my-cli/main.ts"` to `package.json`.

## Repo conventions to capture in plans (when relevant)

- Initialize `Logger` in CLI entry points and pass it into clients/pipelines via constructor options.
- Use Zod schemas for CLI args and tool IO; name the schema files in the plan.
- Prefer TypeScript path aliases like `~tools/*`, `~clients/*`, `~utils/*` over deep relative imports.
- Avoid `index.ts` barrel exports; use explicit module paths.
- For HTTP fetching, prefer sanitized `Fetch` or `PlaywrightScraper` as appropriate.
- Any file-touching tool must use path validation from `src/tools/utils/fs.ts`.

## Required sections in every ExecPlan

Use these headings, in this order, and keep them up to date:

1. **Purpose / Big Picture** — what the user gains and how they can see it working.
2. **Progress** — checklist with timestamps (UTC), split partial work into “done” vs “remaining”.
3. **Surprises & Discoveries** — unexpected behaviors or constraints with short evidence.
4. **Decision Log** — decision, rationale, date/author.
5. **Outcomes & Retrospective** — what was achieved, gaps, lessons learned.
6. **Context and Orientation** — repo-specific orientation and key files.
7. **Conventions and Contracts** — logging, schemas, imports, and tool safety expectations.
8. **Plan of Work** — prose describing edits, with precise file paths and locations.
9. **Concrete Steps** — exact commands to run (cwd included) and expected short outputs.
10. **Validation and Acceptance** — behavioral acceptance and tests; name new tests.
11. **Idempotence and Recovery** — how to rerun safely; rollback guidance if needed.
12. **Artifacts and Notes** — concise transcripts, diffs, or snippets as indented blocks.
13. **Interfaces and Dependencies** — required modules, types, function signatures, and why.

## Formatting rules

- The ExecPlan is a normal Markdown document (no outer code fence).
- Prefer prose over lists; the only mandatory checklist is in **Progress**.
- Define any non-obvious term the first time you use it.
- Use repo-relative paths and exact function/module names.
- Do not point to external docs; embed the needed context in the plan itself.

## Validation defaults for this repo

State which of these apply, and include expected outcomes:

- `pnpm typecheck`
- `pnpm lint` (or `pnpm lint:fix` if auto-fixing is intended)
- `pnpm test`
- `pnpm format:check` (if formatting changes)

If the change affects a CLI, include a concrete CLI invocation and expected output.

## ExecPlan skeleton (copy and fill)

# <Short, action-oriented title>

This ExecPlan is a living document. Update **Progress**, **Surprises & Discoveries**, **Decision Log**, and **Outcomes & Retrospective** as work proceeds.

## Purpose / Big Picture

Describe the user-visible behavior and how to observe it.

## Progress

- [ ] (2026-01-25 00:00Z) Example incomplete step.

## Surprises & Discoveries

- Observation: …
Evidence: …

## Decision Log

- Decision: …
Rationale: …
Date/Author: …

## Outcomes & Retrospective

Summarize results, gaps, and lessons learned.

## Context and Orientation

Explain the relevant parts of `src/cli/...`, shared helpers, and tools.

## Conventions and Contracts

Call out logging, Zod schemas, imports, and any tool safety expectations.

## Plan of Work

Prose description of edits with precise file paths and locations.

## Concrete Steps

State commands with cwd and short expected outputs.

## Validation and Acceptance

Behavioral acceptance plus test commands and expectations.

## Idempotence and Recovery

How to rerun safely and roll back if needed.

## Artifacts and Notes

Short transcripts, diffs, or snippets as indented blocks.

## Interfaces and Dependencies

Required types/modules/functions and why they exist.
16 changes: 16 additions & 0 deletions eslint.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,22 @@ export default defineConfig(
],
},
],
// Avoid template literals in logger calls for better structured logging
"no-restricted-syntax": [
"error",
{
selector:
"CallExpression[callee.type='MemberExpression'][callee.object.name='logger'][callee.property.name=/^(debug|info|warn|error|tool|question|answer)$/] > TemplateLiteral",
message:
"Avoid template literals in logger calls. Use a plain string and pass data as extra args (e.g. logger.info('Saved file', { path })).",
},
{
selector:
"CallExpression[callee.type='MemberExpression'][callee.object.type='MemberExpression'][callee.object.property.name='logger'][callee.property.name=/^(debug|info|warn|error|tool|question|answer)$/] > TemplateLiteral",
message:
"Avoid template literals in logger calls. Use a plain string and pass data as extra args (e.g. logger.info('Saved file', { path })).",
},
],
},
},
{
Expand Down
Loading