Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .mcp.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"mcpServers": {
"filigree": {
"type": "stdio",
"command": "/home/john/errorworks/.venv/bin/filigree-mcp",
"command": "/home/john/.local/bin/filigree-mcp",
"args": [
"--project",
"/home/john/errorworks"
Expand Down
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.2] - 2026-03-23

### Fixed

- **Multi-worker startup crash**: All presets and default config set `workers=4`, but uvicorn
requires an import string (not a Python object) when `workers > 1`. Implemented factory pattern
with environment variable config serialization. Env var is cleaned up after uvicorn exits.
- **Metrics misclassification** (7 bugs): Both LLM and Web metrics classifiers gated
`connection_error` on `status_code is None`, but servers record some connection errors with
non-None status codes (timeout→504, incomplete_response→200). Classifiers now check `error_type`
first. Also removed `slow_response` from Web connection error set (it's a successful response
with extra delay) and added `redirect_loop_terminated` to the redirect category.
- **OpenAI API fidelity** (3 bugs): Added missing `param` field to all error responses, fixed
timeout 504 body to use standard format (`type: server_error`, `code: timeout`), and fixed echo
mode to extract text from multi-modal message content instead of dumping raw list representation.
- **CLI bugs** (6 bugs, 3 in each CLI): `show-config` YAML output no longer contains
`!!python/tuple` tags (uses `model_dump(mode="json")`), `--format` flag now validates input
and rejects unsupported formats, multi-worker env var cleaned up via `try/finally`.
- **MCP analysis logic** (4 bugs): Percentile calculation off-by-one (`int(n*p)` → `ceil(n*p)-1`),
trailing burst no longer silently dropped from `get_burst_events`, unfinished bursts excluded
from recovery time in `analyze_aimd_behavior`, `find_anomalies` now checks all 6 error columns
instead of only `rate_limited` and `capacity_error`.
- **Thread safety**: Added double-checked locking to `ContentGenerator._get_preset_bank()`,
porting the pattern already used in the LLM `ResponseGenerator`.
- **Memory scalability** (2 bugs): `get_stats()` loaded all latency values into Python memory
for percentile computation — replaced with SQL `LIMIT 1 OFFSET` queries (O(1) memory).
`export_data()` loaded entire database unbounded — added `limit`/`offset` parameters.

### Changed

- `InjectionEngine` docstring updated to accurately explain why RNG thread safety is acceptable
in the current ASGI architecture (single-threaded event loop per worker, multi-worker forks).

## [0.1.1] - 2026-03-17

### Added
Expand Down
35 changes: 32 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,27 @@ Key fixture helpers: `post_completion()`, `fetch_page()`, `update_config()`, `ge
- `SIM108` (ternary) is ignored — prefer explicit if/else
- First-party import: `errorworks`

<!-- filigree:instructions:v1.5.0:bcb039c9 -->
## Epic Creation Workflow

When creating a new epic (a major capability or theme of work), always follow this process:

1. **Create the epic** — `type: epic` with a clear description of the capability and its key sub-capabilities
2. **Draft requirements** — Create `type: requirement` issues as children of the epic (`parent_id`). Each requirement should have:
- `req_type`: functional, non_functional, constraint, or interface
- `rationale`: why this requirement exists
- `acceptance_criteria`: testable conditions
- `stakeholder`: who needs it
3. **Add acceptance criteria** — For non-trivial requirements, create `type: acceptance_criterion` children with Given/When/Then fields
4. **Label the epic** — Add `future` label for backlog epics, or appropriate labels for active work

Requirements start in `drafted` state. As epics move out of backlog:
- Requirements go through `reviewing → approved` during scope refinement
- Tasks/features created during implementation link back to their requirements via dependencies
- Requirements move to `implementing → verified` as work completes (verification requires `verification_method`: test, inspection, analysis, or demonstration)

This ensures traceability from "why does this exist" through to "how was it verified."

<!-- filigree:instructions:v1.5.1:63b4188e -->
## Filigree Issue Tracker

Use `filigree` for all task tracking in this project. Data lives in `.filigree/`.
Expand All @@ -112,10 +132,14 @@ faster and return structured data. Key tools:
- `create_issue` / `update_issue` / `close_issue` — manage issues
- `claim_issue` / `claim_next` — atomic claiming
- `add_comment` / `add_label` — metadata
- `list_labels` / `get_label_taxonomy` — discover labels and reserved namespaces
- `create_plan` / `get_plan` — milestone planning
- `get_stats` / `get_metrics` — project health
- `get_valid_transitions` — workflow navigation
- `observe` / `list_observations` / `dismiss_observation` / `promote_observation` — agent scratchpad
- `trigger_scan` / `trigger_scan_batch` / `get_scan_status` / `preview_scan` / `list_scanners` — automated code scanning
- `get_finding` / `list_findings` / `update_finding` / `batch_update_findings` — scan finding triage
- `promote_finding` / `dismiss_finding` — finding lifecycle (promote to issue or dismiss)

Observations are fire-and-forget notes that expire after 14 days. Use `list_issues --label=from-observation` to find promoted observations.

Expand All @@ -125,8 +149,8 @@ design concern. Don't stop what you're doing; just fire off the observation and
carry on. They're ideal for "I don't have time to investigate this right now, but
I want to come back to it." Include `file_path` and `line` when relevant so the
observation is anchored to code. At session end, skim `list_observations` and
either `dismiss` (not worth tracking) or `promote` (deserves an issue) anything
that's accumulated.
either `dismiss_observation` (not worth tracking) or `promote_observation`
(deserves an issue) for anything that's accumulated.

Fall back to CLI (`filigree <command>`) when MCP is unavailable.

Expand All @@ -137,6 +161,9 @@ Fall back to CLI (`filigree <command>`) when MCP is unavailable.
filigree ready # Show issues ready to work (no blockers)
filigree list --status=open # All open issues
filigree list --status=in_progress # Active work
filigree list --label=bug --label=P1 # Filter by multiple labels (AND)
filigree list --label-prefix=cluster/ # Filter by label namespace prefix
filigree list --not-label=wontfix # Exclude issues with label
filigree show <id> # Detailed issue view

# Creating & updating
Expand All @@ -155,6 +182,8 @@ filigree add-comment <id> "text" # Add comment
filigree get-comments <id> # List comments
filigree add-label <id> <label> # Add label
filigree remove-label <id> <label> # Remove label
filigree labels # List all labels by namespace
filigree taxonomy # Show reserved namespaces and vocabulary

# Workflow templates
filigree types # List registered types with state flows
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "errorworks"
version = "0.1.1"
version = "0.1.2"
description = "Composable chaos-testing services for various pipelines"
readme = "README.md"
requires-python = ">=3.12"
Expand Down
3 changes: 2 additions & 1 deletion src/errorworks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
"""Composable chaos-testing servers for LLM and web scraping pipelines."""

__version__ = "0.1.1"
__version__ = "0.1.2"

__all__ = [
"__version__",
"engine",
"llm",
"llm_mcp",
"testing",
"web",
]
15 changes: 9 additions & 6 deletions src/errorworks/engine/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,10 @@ async def handle_admin_stats(request: Request, server: ChaosServer) -> JSONRespo
return denied
try:
return JSONResponse(server.get_stats())
except sqlite3.Error as e:
except sqlite3.Error:
logger.exception("admin_stats_database_error")
return JSONResponse(
{"error": {"type": "database_error", "message": f"Failed to retrieve stats: {e}"}},
{"error": {"type": "database_error", "message": "Failed to retrieve stats due to a database error"}},
status_code=503,
)

Expand All @@ -100,9 +101,10 @@ async def handle_admin_reset(request: Request, server: ChaosServer) -> JSONRespo
return denied
try:
new_run_id = server.reset()
except sqlite3.Error as e:
except sqlite3.Error:
logger.exception("admin_reset_database_error")
return JSONResponse(
{"error": {"type": "database_error", "message": f"Failed to reset metrics: {e}"}},
{"error": {"type": "database_error", "message": "Failed to reset metrics due to a database error"}},
status_code=503,
)
return JSONResponse({"status": "reset", "new_run_id": new_run_id})
Expand All @@ -114,8 +116,9 @@ async def handle_admin_export(request: Request, server: ChaosServer) -> JSONResp
return denied
try:
return JSONResponse(server.export_metrics())
except sqlite3.Error as e:
except sqlite3.Error:
logger.exception("admin_export_database_error")
return JSONResponse(
{"error": {"type": "database_error", "message": f"Failed to export metrics: {e}"}},
{"error": {"type": "database_error", "message": "Failed to export metrics due to a database error"}},
status_code=503,
)
2 changes: 1 addition & 1 deletion src/errorworks/engine/config_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = deep_merge(result[key], value)
else:
result[key] = value
result[key] = copy.deepcopy(value)
return result


Expand Down
18 changes: 12 additions & 6 deletions src/errorworks/engine/injection_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,14 @@
class InjectionEngine:
"""Burst state machine + priority/weighted error selection.

Thread-safe for burst state management. The RNG is not thread-safe —
callers are expected to snapshot the engine reference per-request (see
config snapshot pattern) rather than sharing concurrent calls to select().
Thread-safe for burst state management. The RNG (``random.Random``) is
not inherently thread-safe, but this is acceptable because:
- ASGI servers (uvicorn) use a single-threaded event loop per worker
- Multi-worker mode forks processes, giving each its own RNG instance
- The config snapshot pattern prevents mid-request component swaps

If the engine is used from a multi-threaded context (e.g. sync endpoints
on a ThreadPoolExecutor), callers should provide per-thread engines.

The engine handles:
- Periodic burst windows (is the system currently in a burst?)
Expand Down Expand Up @@ -170,9 +175,10 @@ def _select_weighted(self, specs: list[ErrorSpec]) -> ErrorSpec | None:
if roll < threshold:
return spec

# Unreachable: roll < total_weight (guarded above) guarantees a match
# in the loop. Defensive return for static analysis / type checkers.
return None # pragma: no cover
# Defensive return for static analysis / type checkers.
# Reachable via floating-point accumulation edge cases where
# cumulative sum never reaches roll due to precision loss.
return None

def reset(self) -> None:
"""Reset the engine state (clears burst timing)."""
Expand Down
50 changes: 37 additions & 13 deletions src/errorworks/engine/metrics_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -513,35 +513,59 @@ def get_stats(self) -> dict[str, Any]:
stats["requests_by_status_code"] = {row[0]: row[1] for row in cursor.fetchall()}

if "latency_ms" in col_names:
cursor = conn.execute("SELECT AVG(latency_ms), MAX(latency_ms) FROM requests WHERE latency_ms IS NOT NULL")
cursor = conn.execute("SELECT AVG(latency_ms), MAX(latency_ms), COUNT(latency_ms) FROM requests WHERE latency_ms IS NOT NULL")
row = cursor.fetchone()

cursor = conn.execute("SELECT latency_ms FROM requests WHERE latency_ms IS NOT NULL ORDER BY latency_ms")
latencies = [r[0] for r in cursor.fetchall()]
avg_latency, max_latency, count = row[0], row[1], row[2]

p50_latency = None
p95_latency = None
p99_latency = None

if latencies:
p50_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.50) - 1, len(latencies) - 1))]
p95_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.95) - 1, len(latencies) - 1))]
p99_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.99) - 1, len(latencies) - 1))]
if count > 0:
# Compute percentiles via SQL LIMIT/OFFSET (no Python-side data transfer)
for pct, name in ((0.50, "p50"), (0.95, "p95"), (0.99, "p99")):
offset = max(0, min(math.ceil(count * pct) - 1, count - 1))
pcursor = conn.execute(
"SELECT latency_ms FROM requests WHERE latency_ms IS NOT NULL ORDER BY latency_ms LIMIT 1 OFFSET ?",
(offset,),
)
prow = pcursor.fetchone()
if name == "p50":
p50_latency = prow[0]
elif name == "p95":
p95_latency = prow[0]
else:
p99_latency = prow[0]

stats["latency_stats"] = {
"avg_ms": row[0],
"avg_ms": avg_latency,
"p50_ms": p50_latency,
"p95_ms": p95_latency,
"p99_ms": p99_latency,
"max_ms": row[1],
"max_ms": max_latency,
}

return stats

def export_data(self) -> dict[str, Any]:
"""Export raw requests and time-series data."""
def export_data(
self,
*,
limit: int | None = None,
offset: int = 0,
) -> dict[str, Any]:
"""Export raw requests and time-series data.

Args:
limit: Maximum number of request rows to return (default: all).
offset: Number of request rows to skip (default: 0).
"""
conn = self._get_connection()
requests = [dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc")]
if limit is not None:
requests = [
dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc LIMIT ? OFFSET ?", (limit, offset))
]
else:
requests = [dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc")]
timeseries = [dict(row) for row in conn.execute("SELECT * FROM timeseries ORDER BY bucket_utc")]
return {
"run_id": self._run_id,
Expand Down
6 changes: 6 additions & 0 deletions src/errorworks/engine/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,12 @@ def __post_init__(self) -> None:
if ts_dupes:
raise ValueError(f"Duplicate timeseries column names: {sorted(ts_dupes)}")

# Check for duplicate index names
idx_names = [name for name, _col in self.request_indexes]
idx_dupes = {n for n in idx_names if idx_names.count(n) > 1}
if idx_dupes:
raise ValueError(f"Duplicate index names: {sorted(idx_dupes)}")

# Validate that index columns reference actual request columns
req_name_set = set(req_names)
for index_name, col_name in self.request_indexes:
Expand Down
12 changes: 11 additions & 1 deletion src/errorworks/engine/validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,19 @@ def validate_error_decision(
ValueError: If any invariant is violated.
"""
if error_type is None:
# Success case: no other fields should be set
# Success case: no error-related fields should be set
if category is not None:
raise ValueError("Success decision must not have a category")
if status_code is not None:
raise ValueError(f"Success decision must not have a status_code, got {status_code}")
if retry_after_sec is not None:
raise ValueError("Success decision must not have retry_after_sec")
if delay_sec is not None:
raise ValueError("Success decision must not have delay_sec")
if start_delay_sec is not None:
raise ValueError("Success decision must not have start_delay_sec")
if malformed_type is not None:
raise ValueError("Success decision must not have malformed_type")
return

if category is None:
Expand Down
45 changes: 33 additions & 12 deletions src/errorworks/llm/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,17 +372,35 @@ def serve(
)
raise typer.Exit(1) from e

from errorworks.llm.server import create_app
if config.server.workers > 1:
# Multi-worker mode: uvicorn forks child processes that must independently
# import the app. Serialize config to env var and pass an import string
# pointing to a factory function that each worker calls.
import os

app = create_app(config)

uvicorn.run(
app,
host=config.server.host,
port=config.server.port,
workers=config.server.workers,
log_level="info",
)
os.environ["_ERRORWORKS_LLM_CONFIG"] = config.model_dump_json()
try:
uvicorn.run(
"errorworks.llm.server:_create_app_from_env",
factory=True,
host=config.server.host,
port=config.server.port,
workers=config.server.workers,
log_level="info",
)
finally:
os.environ.pop("_ERRORWORKS_LLM_CONFIG", None)
else:
from errorworks.llm.server import create_app

server_app = create_app(config)
uvicorn.run(
server_app,
host=config.server.host,
port=config.server.port,
workers=1,
log_level="info",
)


@app.command()
Expand Down Expand Up @@ -453,12 +471,15 @@ def show_config(
typer.secho(f"Configuration error: {e}", fg=typer.colors.RED, err=True)
raise typer.Exit(1) from e

config_dict = config.model_dump()
config_dict = config.model_dump(mode="json")

if output_format == "json":
typer.echo(json.dumps(config_dict, indent=2))
else:
elif output_format == "yaml":
typer.echo(yaml.dump(config_dict, default_flow_style=False, sort_keys=False))
else:
typer.secho(f"Error: unsupported format '{output_format}'. Use 'json' or 'yaml'.", fg=typer.colors.RED, err=True)
raise typer.Exit(1)


# MCP server CLI - separate entry point
Expand Down
Loading
Loading