feat: activate inference feedback loop by htafolla · Pull Request #14 · htafolla/StringRay

htafolla · 2026-03-29T16:07:33Z

Summary

The analytics pipeline (3000+ lines across 15 files) was a one-way pipe — it collected data, analyzed it, generated suggestions, but could never write anything back. This PR closes the loop.

What Changed

Phase 1: Critical write-back (the big one)

inference-tuner.ts: addKeywordMapping() was a no-op that returned false. Now reads routing-mappings.json, checks for agent conflicts, appends new keyword mappings, and writes back to disk. This single change turns the analytics pipeline from read-only observability into an active learning system.
inference-tuner.ts: Wired routingRefiner into performTuning(). Tuning cycles now consume refiner suggestions and apply them.

Phase 2: Runtime feedback

agent-delegator.ts: When top agent confidence < 0.85, consults predictiveAnalytics.predictSync() for a historically-better routing suggestion. Logs the refinement and promotes predicted agent.
predictive-analytics.ts: Added predictSync() for hot-path usage (no disk reload).

Phase 3: Dead code and stubs

predictive-analytics.ts: Replaced 11-line stub with full implementation (190 lines). Uses keyword overlap + historical success rate for predictions.
kernel/: Deleted standalone package — zero imports from src/.
package.json: Removed dead strray-analytics bin and analytics:daily scripts.
kernel-patterns.ts: learn() now updates this.assumptions/this.cascades confidence instead of writing to empty this.patterns Map. Includes confidence increment and decay.

Phase 4: Data quality

rule-registry.ts: addRule() is now idempotent — silently updates on duplicate instead of throwing. Fixes codex-1 duplicate registration error.
outcome-tracker.ts: getPromptData() now computes real complexity and extracts keywords instead of returning 0/[].

Test Results

127 test files, 2399 tests — all passing
Zero TypeScript errors

…s back Phase 1 — Critical write-back: - inference-tuner.ts: addKeywordMapping() no longer a no-op. Reads routing-mappings.json, checks for conflicts, appends new keyword mappings, writes back to disk. This single change activates the entire 3000-line analytics pipeline as an active learning system. - inference-tuner.ts: wired routingRefiner into performTuning(). Tuning cycle now consumes refiner suggestions and applies them. - inference-tuner.ts: imported routingRefiner singleton. Phase 2 — Runtime feedback: - agent-delegator.ts: when top agent confidence < 0.85, consults predictiveAnalytics.predictSync() for a historically-better routing suggestion. Logs the refinement and promotes predicted agent. - predictive-analytics.ts: added predictSync() method for hot-path usage (no disk reload, operates on in-memory data). Phase 3 — Dead code and stubs: - predictive-analytics.ts: replaced 11-line stub with 190-line implementation. predict() uses keyword overlap + historical success rate. predictOptimalAgent() returns best agent with >= 3 samples. - kernel/ directory: deleted. Standalone package with zero imports. - package.json: removed strray-analytics bin and analytics:daily scripts (pointed to non-existent files). - kernel-patterns.ts: learn() now writes to this.assumptions and this.cascades Maps instead of empty this.patterns Map. Includes confidence increment (+0.05 on match, cap 1.0) and decay (-0.02 on miss, floor 0.1). Phase 4 — Data quality: - rule-registry.ts: addRule() is now idempotent — silently updates on duplicate instead of throwing. Fixes codex-1 duplicate registration error that fired 5x per test run. - rule-registry.test.ts: updated tests for idempotent behavior. - outcome-tracker.ts: getPromptData() now computes real complexity (description length / 5, capped at 100) and extracts keywords (words > 3 chars, deduplicated, max 10) instead of returning 0/[]. All 2399 tests pass. Zero TS errors.

OpenCode plugin (strray-codex-injection.ts): - Added module-level tool call counter - After every tool.execute.after hook, increments counter - Every 100 calls, dynamically imports inferenceTuner and runs a single tuning cycle (fire-and-forget, non-blocking) Hermes plugin (__init__.py): - Added _INFERENCE_TUNE_INTERVAL = 100 counter - After every post_tool_call hook, checks threshold - Shells out to npx strray-ai inference:tuner --run-once in a background daemon thread (30s timeout) - Logs result to activity.log - Counter resets on session_start Both plugins now auto-calibrate the routing feedback loop without manual intervention. 127 test files, 2399 tests green.

The inference tuner was dry — only the MCP orchestrator recorded outcomes, so the auto-tune at call #100 always hit the 'insufficient data' guard. Normal tool calls (write, edit, search, etc.) never fed into the analytics pipeline. OpenCode plugin (strray-codex-injection.ts): - Added TOOL_AGENT_MAP: maps tool names (write, edit, bash, search, read, glob, grep, ls) to agent/skill identifiers - After every tool.execute.after, imports routingOutcomeTracker and records the outcome with tool name, args description, agent/skill mapping, confidence, and success status Hermes plugin (__init__.py): - Added _TOOL_AGENT_MAP: same mapping for Hermes tool names (write_file, patch, execute_code, terminal, search_files, etc.) - Added _record_tool_outcome(): writes directly to logs/framework/routing-outcomes.json (same format as TS tracker) - Called from _on_post_tool_call after error detection - Circular buffer: keeps last 1000 outcomes - Supports wildcard patterns (browser_*) Both plugins now feed real data into the analytics pipeline. By call #100, the tuner has ~100 outcomes to analyze. Instance-level tuning is fully functional. Upstream tuning (sending calibration data to Jelly) still requires the Jelly API — tracked separately. 127 test files, 2399 tests green.

Three changes to unblock the inference feedback loop: 1. determineAgents() now loads routing-mappings.json fresh each call, keyword-matches against the operation string, and uses the learned mapping if confidence > 0.7. Falls back to hardcoded if nothing hits. 2. Predictive analytics threshold dropped from 0.85 to 0.7 so the prediction layer actually fires instead of being suppressed by hardcoded high-confidence values. 3. Task-type classification added to both OpenCode and Hermes plugin outcome recording. Tool calls are now classified (testing, build, security, lint, git, etc.) instead of every terminal call being recorded as 'testing-lead/execution'. RoutingOutcome interface gains optional taskType field. All backwards-compatible. 2399 tests passing, 5 pipelines green.

htafolla added 5 commits March 29, 2026 11:07

docs: deep reflection — inference feedback loop activation

581bf11

htafolla merged commit df304b8 into master Mar 29, 2026
6 of 7 checks passed

htafolla deleted the fix/inference-feedback-loop-activation branch March 29, 2026 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: activate inference feedback loop#14

feat: activate inference feedback loop#14
htafolla merged 5 commits intomasterfrom
fix/inference-feedback-loop-activation

htafolla commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

htafolla commented Mar 29, 2026

Summary

What Changed

Phase 1: Critical write-back (the big one)

Phase 2: Runtime feedback

Phase 3: Dead code and stubs

Phase 4: Data quality

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant