johnm-dta · johnm-dta · Mar 23, 2026 · Mar 22, 2026 · Mar 22, 2026 · Mar 22, 2026
diff --git a/.mcp.json b/.mcp.json
@@ -2,7 +2,7 @@
   "mcpServers": {
     "filigree": {
       "type": "stdio",
-      "command": "/home/john/errorworks/.venv/bin/filigree-mcp",
+      "command": "/home/john/.local/bin/filigree-mcp",
       "args": [
         "--project",
         "/home/john/errorworks"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.1.2] - 2026-03-23
+
+### Fixed
+
+- **Multi-worker startup crash**: All presets and default config set `workers=4`, but uvicorn
+  requires an import string (not a Python object) when `workers > 1`. Implemented factory pattern
+  with environment variable config serialization. Env var is cleaned up after uvicorn exits.
+- **Metrics misclassification** (7 bugs): Both LLM and Web metrics classifiers gated
+  `connection_error` on `status_code is None`, but servers record some connection errors with
+  non-None status codes (timeout→504, incomplete_response→200). Classifiers now check `error_type`
+  first. Also removed `slow_response` from Web connection error set (it's a successful response
+  with extra delay) and added `redirect_loop_terminated` to the redirect category.
+- **OpenAI API fidelity** (3 bugs): Added missing `param` field to all error responses, fixed
+  timeout 504 body to use standard format (`type: server_error`, `code: timeout`), and fixed echo
+  mode to extract text from multi-modal message content instead of dumping raw list representation.
+- **CLI bugs** (6 bugs, 3 in each CLI): `show-config` YAML output no longer contains
+  `!!python/tuple` tags (uses `model_dump(mode="json")`), `--format` flag now validates input
+  and rejects unsupported formats, multi-worker env var cleaned up via `try/finally`.
+- **MCP analysis logic** (4 bugs): Percentile calculation off-by-one (`int(n*p)` → `ceil(n*p)-1`),
+  trailing burst no longer silently dropped from `get_burst_events`, unfinished bursts excluded
+  from recovery time in `analyze_aimd_behavior`, `find_anomalies` now checks all 6 error columns
+  instead of only `rate_limited` and `capacity_error`.
+- **Thread safety**: Added double-checked locking to `ContentGenerator._get_preset_bank()`,
+  porting the pattern already used in the LLM `ResponseGenerator`.
+- **Memory scalability** (2 bugs): `get_stats()` loaded all latency values into Python memory
+  for percentile computation — replaced with SQL `LIMIT 1 OFFSET` queries (O(1) memory).
+  `export_data()` loaded entire database unbounded — added `limit`/`offset` parameters.
+
+### Changed
+
+- `InjectionEngine` docstring updated to accurately explain why RNG thread safety is acceptable
+  in the current ASGI architecture (single-threaded event loop per worker, multi-worker forks).
+
 ## [0.1.1] - 2026-03-17
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -97,7 +97,27 @@ Key fixture helpers: `post_completion()`, `fetch_page()`, `update_config()`, `ge
 - `SIM108` (ternary) is ignored — prefer explicit if/else
 - First-party import: `errorworks`
 
-<!-- filigree:instructions:v1.5.0:bcb039c9 -->
+## Epic Creation Workflow
+
+When creating a new epic (a major capability or theme of work), always follow this process:
+
+1. **Create the epic** — `type: epic` with a clear description of the capability and its key sub-capabilities
+2. **Draft requirements** — Create `type: requirement` issues as children of the epic (`parent_id`). Each requirement should have:
+   - `req_type`: functional, non_functional, constraint, or interface
+   - `rationale`: why this requirement exists
+   - `acceptance_criteria`: testable conditions
+   - `stakeholder`: who needs it
+3. **Add acceptance criteria** — For non-trivial requirements, create `type: acceptance_criterion` children with Given/When/Then fields
+4. **Label the epic** — Add `future` label for backlog epics, or appropriate labels for active work
+
+Requirements start in `drafted` state. As epics move out of backlog:
+- Requirements go through `reviewing → approved` during scope refinement
+- Tasks/features created during implementation link back to their requirements via dependencies
+- Requirements move to `implementing → verified` as work completes (verification requires `verification_method`: test, inspection, analysis, or demonstration)
+
+This ensures traceability from "why does this exist" through to "how was it verified."
+
+<!-- filigree:instructions:v1.5.1:63b4188e -->
 ## Filigree Issue Tracker
 
 Use `filigree` for all task tracking in this project. Data lives in `.filigree/`.
@@ -112,10 +132,14 @@ faster and return structured data. Key tools:
 - `create_issue` / `update_issue` / `close_issue` — manage issues
 - `claim_issue` / `claim_next` — atomic claiming
 - `add_comment` / `add_label` — metadata
+- `list_labels` / `get_label_taxonomy` — discover labels and reserved namespaces
 - `create_plan` / `get_plan` — milestone planning
 - `get_stats` / `get_metrics` — project health
 - `get_valid_transitions` — workflow navigation
 - `observe` / `list_observations` / `dismiss_observation` / `promote_observation` — agent scratchpad
+- `trigger_scan` / `trigger_scan_batch` / `get_scan_status` / `preview_scan` / `list_scanners` — automated code scanning
+- `get_finding` / `list_findings` / `update_finding` / `batch_update_findings` — scan finding triage
+- `promote_finding` / `dismiss_finding` — finding lifecycle (promote to issue or dismiss)
 
 Observations are fire-and-forget notes that expire after 14 days. Use `list_issues --label=from-observation` to find promoted observations.
 
@@ -125,8 +149,8 @@ design concern. Don't stop what you're doing; just fire off the observation and
 carry on. They're ideal for "I don't have time to investigate this right now, but
 I want to come back to it." Include `file_path` and `line` when relevant so the
 observation is anchored to code. At session end, skim `list_observations` and
-either `dismiss` (not worth tracking) or `promote` (deserves an issue) anything
-that's accumulated.
+either `dismiss_observation` (not worth tracking) or `promote_observation`
+(deserves an issue) for anything that's accumulated.
 
 Fall back to CLI (`filigree <command>`) when MCP is unavailable.
 
@@ -137,6 +161,9 @@ Fall back to CLI (`filigree <command>`) when MCP is unavailable.
 filigree ready                              # Show issues ready to work (no blockers)
 filigree list --status=open                 # All open issues
 filigree list --status=in_progress          # Active work
+filigree list --label=bug --label=P1        # Filter by multiple labels (AND)
+filigree list --label-prefix=cluster/       # Filter by label namespace prefix
+filigree list --not-label=wontfix           # Exclude issues with label
 filigree show <id>                          # Detailed issue view
 
 # Creating & updating
@@ -155,6 +182,8 @@ filigree add-comment <id> "text"            # Add comment
 filigree get-comments <id>                  # List comments
 filigree add-label <id> <label>             # Add label
 filigree remove-label <id> <label>          # Remove label
+filigree labels                             # List all labels by namespace
+filigree taxonomy                           # Show reserved namespaces and vocabulary
 
 # Workflow templates
 filigree types                              # List registered types with state flows

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "errorworks"
-version = "0.1.1"
+version = "0.1.2"
 description = "Composable chaos-testing services for various pipelines"
 readme = "README.md"
 requires-python = ">=3.12"

diff --git a/src/errorworks/__init__.py b/src/errorworks/__init__.py
@@ -1,11 +1,12 @@
 """Composable chaos-testing servers for LLM and web scraping pipelines."""
 
-__version__ = "0.1.1"
+__version__ = "0.1.2"
 
 __all__ = [
     "__version__",
     "engine",
     "llm",
+    "llm_mcp",
     "testing",
     "web",
 ]
diff --git a/src/errorworks/engine/admin.py b/src/errorworks/engine/admin.py
@@ -87,9 +87,10 @@ async def handle_admin_stats(request: Request, server: ChaosServer) -> JSONRespo
         return denied
     try:
         return JSONResponse(server.get_stats())
-    except sqlite3.Error as e:
+    except sqlite3.Error:
+        logger.exception("admin_stats_database_error")
         return JSONResponse(
-            {"error": {"type": "database_error", "message": f"Failed to retrieve stats: {e}"}},
+            {"error": {"type": "database_error", "message": "Failed to retrieve stats due to a database error"}},
             status_code=503,
         )
 
@@ -100,9 +101,10 @@ async def handle_admin_reset(request: Request, server: ChaosServer) -> JSONRespo
         return denied
     try:
         new_run_id = server.reset()
-    except sqlite3.Error as e:
+    except sqlite3.Error:
+        logger.exception("admin_reset_database_error")
         return JSONResponse(
-            {"error": {"type": "database_error", "message": f"Failed to reset metrics: {e}"}},
+            {"error": {"type": "database_error", "message": "Failed to reset metrics due to a database error"}},
             status_code=503,
         )
     return JSONResponse({"status": "reset", "new_run_id": new_run_id})
@@ -114,8 +116,9 @@ async def handle_admin_export(request: Request, server: ChaosServer) -> JSONResp
         return denied
     try:
         return JSONResponse(server.export_metrics())
-    except sqlite3.Error as e:
+    except sqlite3.Error:
+        logger.exception("admin_export_database_error")
         return JSONResponse(
-            {"error": {"type": "database_error", "message": f"Failed to export metrics: {e}"}},
+            {"error": {"type": "database_error", "message": "Failed to export metrics due to a database error"}},
             status_code=503,
         )
diff --git a/src/errorworks/engine/config_loader.py b/src/errorworks/engine/config_loader.py
@@ -36,7 +36,7 @@ def deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]
         if key in result and isinstance(result[key], dict) and isinstance(value, dict):
             result[key] = deep_merge(result[key], value)
         else:
-            result[key] = value
+            result[key] = copy.deepcopy(value)
     return result
 
 

diff --git a/src/errorworks/engine/injection_engine.py b/src/errorworks/engine/injection_engine.py
@@ -30,9 +30,14 @@
 class InjectionEngine:
     """Burst state machine + priority/weighted error selection.
 
-    Thread-safe for burst state management. The RNG is not thread-safe —
-    callers are expected to snapshot the engine reference per-request (see
-    config snapshot pattern) rather than sharing concurrent calls to select().
+    Thread-safe for burst state management. The RNG (``random.Random``) is
+    not inherently thread-safe, but this is acceptable because:
+    - ASGI servers (uvicorn) use a single-threaded event loop per worker
+    - Multi-worker mode forks processes, giving each its own RNG instance
+    - The config snapshot pattern prevents mid-request component swaps
+
+    If the engine is used from a multi-threaded context (e.g. sync endpoints
+    on a ThreadPoolExecutor), callers should provide per-thread engines.
 
     The engine handles:
     - Periodic burst windows (is the system currently in a burst?)
@@ -170,9 +175,10 @@ def _select_weighted(self, specs: list[ErrorSpec]) -> ErrorSpec | None:
             if roll < threshold:
                 return spec
 
-        # Unreachable: roll < total_weight (guarded above) guarantees a match
-        # in the loop. Defensive return for static analysis / type checkers.
-        return None  # pragma: no cover
+        # Defensive return for static analysis / type checkers.
+        # Reachable via floating-point accumulation edge cases where
+        # cumulative sum never reaches roll due to precision loss.
+        return None
 
     def reset(self) -> None:
         """Reset the engine state (clears burst timing)."""

diff --git a/src/errorworks/engine/metrics_store.py b/src/errorworks/engine/metrics_store.py
@@ -513,35 +513,59 @@ def get_stats(self) -> dict[str, Any]:
             stats["requests_by_status_code"] = {row[0]: row[1] for row in cursor.fetchall()}
 
         if "latency_ms" in col_names:
-            cursor = conn.execute("SELECT AVG(latency_ms), MAX(latency_ms) FROM requests WHERE latency_ms IS NOT NULL")
+            cursor = conn.execute("SELECT AVG(latency_ms), MAX(latency_ms), COUNT(latency_ms) FROM requests WHERE latency_ms IS NOT NULL")
             row = cursor.fetchone()
-
-            cursor = conn.execute("SELECT latency_ms FROM requests WHERE latency_ms IS NOT NULL ORDER BY latency_ms")
-            latencies = [r[0] for r in cursor.fetchall()]
+            avg_latency, max_latency, count = row[0], row[1], row[2]
 
             p50_latency = None
             p95_latency = None
             p99_latency = None
 
-            if latencies:
-                p50_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.50) - 1, len(latencies) - 1))]
-                p95_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.95) - 1, len(latencies) - 1))]
-                p99_latency = latencies[max(0, min(math.ceil(len(latencies) * 0.99) - 1, len(latencies) - 1))]
+            if count > 0:
+                # Compute percentiles via SQL LIMIT/OFFSET (no Python-side data transfer)
+                for pct, name in ((0.50, "p50"), (0.95, "p95"), (0.99, "p99")):
+                    offset = max(0, min(math.ceil(count * pct) - 1, count - 1))
+                    pcursor = conn.execute(
+                        "SELECT latency_ms FROM requests WHERE latency_ms IS NOT NULL ORDER BY latency_ms LIMIT 1 OFFSET ?",
+                        (offset,),
+                    )
+                    prow = pcursor.fetchone()
+                    if name == "p50":
+                        p50_latency = prow[0]
+                    elif name == "p95":
+                        p95_latency = prow[0]
+                    else:
+                        p99_latency = prow[0]
 
             stats["latency_stats"] = {
-                "avg_ms": row[0],
+                "avg_ms": avg_latency,
                 "p50_ms": p50_latency,
                 "p95_ms": p95_latency,
                 "p99_ms": p99_latency,
-                "max_ms": row[1],
+                "max_ms": max_latency,
             }
 
         return stats
 
-    def export_data(self) -> dict[str, Any]:
-        """Export raw requests and time-series data."""
+    def export_data(
+        self,
+        *,
+        limit: int | None = None,
+        offset: int = 0,
+    ) -> dict[str, Any]:
+        """Export raw requests and time-series data.
+
+        Args:
+            limit: Maximum number of request rows to return (default: all).
+            offset: Number of request rows to skip (default: 0).
+        """
         conn = self._get_connection()
-        requests = [dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc")]
+        if limit is not None:
+            requests = [
+                dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc LIMIT ? OFFSET ?", (limit, offset))
+            ]
+        else:
+            requests = [dict(row) for row in conn.execute("SELECT * FROM requests ORDER BY timestamp_utc")]
         timeseries = [dict(row) for row in conn.execute("SELECT * FROM timeseries ORDER BY bucket_utc")]
         return {
             "run_id": self._run_id,

diff --git a/src/errorworks/engine/types.py b/src/errorworks/engine/types.py
@@ -259,6 +259,12 @@ def __post_init__(self) -> None:
         if ts_dupes:
             raise ValueError(f"Duplicate timeseries column names: {sorted(ts_dupes)}")
 
+        # Check for duplicate index names
+        idx_names = [name for name, _col in self.request_indexes]
+        idx_dupes = {n for n in idx_names if idx_names.count(n) > 1}
+        if idx_dupes:
+            raise ValueError(f"Duplicate index names: {sorted(idx_dupes)}")
+
         # Validate that index columns reference actual request columns
         req_name_set = set(req_names)
         for index_name, col_name in self.request_indexes:

diff --git a/src/errorworks/engine/validators.py b/src/errorworks/engine/validators.py
@@ -86,9 +86,19 @@ def validate_error_decision(
         ValueError: If any invariant is violated.
     """
     if error_type is None:
-        # Success case: no other fields should be set
+        # Success case: no error-related fields should be set
         if category is not None:
             raise ValueError("Success decision must not have a category")
+        if status_code is not None:
+            raise ValueError(f"Success decision must not have a status_code, got {status_code}")
+        if retry_after_sec is not None:
+            raise ValueError("Success decision must not have retry_after_sec")
+        if delay_sec is not None:
+            raise ValueError("Success decision must not have delay_sec")
+        if start_delay_sec is not None:
+            raise ValueError("Success decision must not have start_delay_sec")
+        if malformed_type is not None:
+            raise ValueError("Success decision must not have malformed_type")
         return
 
     if category is None:

diff --git a/src/errorworks/llm/cli.py b/src/errorworks/llm/cli.py
@@ -372,17 +372,35 @@ def serve(
         )
         raise typer.Exit(1) from e
 
-    from errorworks.llm.server import create_app
+    if config.server.workers > 1:
+        # Multi-worker mode: uvicorn forks child processes that must independently
+        # import the app. Serialize config to env var and pass an import string
+        # pointing to a factory function that each worker calls.
+        import os
 
-    app = create_app(config)
-
-    uvicorn.run(
-        app,
-        host=config.server.host,
-        port=config.server.port,
-        workers=config.server.workers,
-        log_level="info",
-    )
+        os.environ["_ERRORWORKS_LLM_CONFIG"] = config.model_dump_json()
+        try:
+            uvicorn.run(
+                "errorworks.llm.server:_create_app_from_env",
+                factory=True,
+                host=config.server.host,
+                port=config.server.port,
+                workers=config.server.workers,
+                log_level="info",
+            )
+        finally:
+            os.environ.pop("_ERRORWORKS_LLM_CONFIG", None)
+    else:
+        from errorworks.llm.server import create_app
+
+        server_app = create_app(config)
+        uvicorn.run(
+            server_app,
+            host=config.server.host,
+            port=config.server.port,
+            workers=1,
+            log_level="info",
+        )
 
 
 @app.command()
@@ -453,12 +471,15 @@ def show_config(
         typer.secho(f"Configuration error: {e}", fg=typer.colors.RED, err=True)
         raise typer.Exit(1) from e
 
-    config_dict = config.model_dump()
+    config_dict = config.model_dump(mode="json")
 
     if output_format == "json":
         typer.echo(json.dumps(config_dict, indent=2))
-    else:
+    elif output_format == "yaml":
         typer.echo(yaml.dump(config_dict, default_flow_style=False, sort_keys=False))
+    else:
+        typer.secho(f"Error: unsupported format '{output_format}'. Use 'json' or 'yaml'.", fg=typer.colors.RED, err=True)
+        raise typer.Exit(1)
 
 
 # MCP server CLI - separate entry point