Skip to content

feat: activate inference feedback loop#14

Merged
htafolla merged 5 commits intomasterfrom
fix/inference-feedback-loop-activation
Mar 29, 2026
Merged

feat: activate inference feedback loop#14
htafolla merged 5 commits intomasterfrom
fix/inference-feedback-loop-activation

Conversation

@htafolla
Copy link
Copy Markdown
Owner

Summary

The analytics pipeline (3000+ lines across 15 files) was a one-way pipe — it collected data, analyzed it, generated suggestions, but could never write anything back. This PR closes the loop.

What Changed

Phase 1: Critical write-back (the big one)

  • inference-tuner.ts: addKeywordMapping() was a no-op that returned false. Now reads routing-mappings.json, checks for agent conflicts, appends new keyword mappings, and writes back to disk. This single change turns the analytics pipeline from read-only observability into an active learning system.
  • inference-tuner.ts: Wired routingRefiner into performTuning(). Tuning cycles now consume refiner suggestions and apply them.

Phase 2: Runtime feedback

  • agent-delegator.ts: When top agent confidence < 0.85, consults predictiveAnalytics.predictSync() for a historically-better routing suggestion. Logs the refinement and promotes predicted agent.
  • predictive-analytics.ts: Added predictSync() for hot-path usage (no disk reload).

Phase 3: Dead code and stubs

  • predictive-analytics.ts: Replaced 11-line stub with full implementation (190 lines). Uses keyword overlap + historical success rate for predictions.
  • kernel/: Deleted standalone package — zero imports from src/.
  • package.json: Removed dead strray-analytics bin and analytics:daily scripts.
  • kernel-patterns.ts: learn() now updates this.assumptions/this.cascades confidence instead of writing to empty this.patterns Map. Includes confidence increment and decay.

Phase 4: Data quality

  • rule-registry.ts: addRule() is now idempotent — silently updates on duplicate instead of throwing. Fixes codex-1 duplicate registration error.
  • outcome-tracker.ts: getPromptData() now computes real complexity and extracts keywords instead of returning 0/[].

Test Results

  • 127 test files, 2399 tests — all passing
  • Zero TypeScript errors

…s back

Phase 1 — Critical write-back:
- inference-tuner.ts: addKeywordMapping() no longer a no-op. Reads
  routing-mappings.json, checks for conflicts, appends new keyword
  mappings, writes back to disk. This single change activates the
  entire 3000-line analytics pipeline as an active learning system.
- inference-tuner.ts: wired routingRefiner into performTuning().
  Tuning cycle now consumes refiner suggestions and applies them.
- inference-tuner.ts: imported routingRefiner singleton.

Phase 2 — Runtime feedback:
- agent-delegator.ts: when top agent confidence < 0.85, consults
  predictiveAnalytics.predictSync() for a historically-better
  routing suggestion. Logs the refinement and promotes predicted
  agent.
- predictive-analytics.ts: added predictSync() method for hot-path
  usage (no disk reload, operates on in-memory data).

Phase 3 — Dead code and stubs:
- predictive-analytics.ts: replaced 11-line stub with 190-line
  implementation. predict() uses keyword overlap + historical
  success rate. predictOptimalAgent() returns best agent with
  >= 3 samples.
- kernel/ directory: deleted. Standalone package with zero imports.
- package.json: removed strray-analytics bin and analytics:daily
  scripts (pointed to non-existent files).
- kernel-patterns.ts: learn() now writes to this.assumptions and
  this.cascades Maps instead of empty this.patterns Map. Includes
  confidence increment (+0.05 on match, cap 1.0) and decay
  (-0.02 on miss, floor 0.1).

Phase 4 — Data quality:
- rule-registry.ts: addRule() is now idempotent — silently
  updates on duplicate instead of throwing. Fixes codex-1
  duplicate registration error that fired 5x per test run.
- rule-registry.test.ts: updated tests for idempotent behavior.
- outcome-tracker.ts: getPromptData() now computes real complexity
  (description length / 5, capped at 100) and extracts keywords
  (words > 3 chars, deduplicated, max 10) instead of returning 0/[].

All 2399 tests pass. Zero TS errors.
OpenCode plugin (strray-codex-injection.ts):
- Added module-level tool call counter
- After every tool.execute.after hook, increments counter
- Every 100 calls, dynamically imports inferenceTuner and runs
  a single tuning cycle (fire-and-forget, non-blocking)

Hermes plugin (__init__.py):
- Added _INFERENCE_TUNE_INTERVAL = 100 counter
- After every post_tool_call hook, checks threshold
- Shells out to npx strray-ai inference:tuner --run-once
  in a background daemon thread (30s timeout)
- Logs result to activity.log
- Counter resets on session_start

Both plugins now auto-calibrate the routing feedback loop
without manual intervention. 127 test files, 2399 tests green.
The inference tuner was dry — only the MCP orchestrator recorded
outcomes, so the auto-tune at call #100 always hit the 'insufficient
data' guard. Normal tool calls (write, edit, search, etc.) never
fed into the analytics pipeline.

OpenCode plugin (strray-codex-injection.ts):
- Added TOOL_AGENT_MAP: maps tool names (write, edit, bash, search,
  read, glob, grep, ls) to agent/skill identifiers
- After every tool.execute.after, imports routingOutcomeTracker and
  records the outcome with tool name, args description, agent/skill
  mapping, confidence, and success status

Hermes plugin (__init__.py):
- Added _TOOL_AGENT_MAP: same mapping for Hermes tool names
  (write_file, patch, execute_code, terminal, search_files, etc.)
- Added _record_tool_outcome(): writes directly to
  logs/framework/routing-outcomes.json (same format as TS tracker)
- Called from _on_post_tool_call after error detection
- Circular buffer: keeps last 1000 outcomes
- Supports wildcard patterns (browser_*)

Both plugins now feed real data into the analytics pipeline.
By call #100, the tuner has ~100 outcomes to analyze.
Instance-level tuning is fully functional.

Upstream tuning (sending calibration data to Jelly) still requires
the Jelly API — tracked separately. 127 test files, 2399 tests green.
Three changes to unblock the inference feedback loop:

1. determineAgents() now loads routing-mappings.json fresh each call,
   keyword-matches against the operation string, and uses the learned
   mapping if confidence > 0.7. Falls back to hardcoded if nothing hits.

2. Predictive analytics threshold dropped from 0.85 to 0.7 so the
   prediction layer actually fires instead of being suppressed by
   hardcoded high-confidence values.

3. Task-type classification added to both OpenCode and Hermes plugin
   outcome recording. Tool calls are now classified (testing, build,
   security, lint, git, etc.) instead of every terminal call being
   recorded as 'testing-lead/execution'. RoutingOutcome interface
   gains optional taskType field.

All backwards-compatible. 2399 tests passing, 5 pipelines green.
@htafolla htafolla merged commit df304b8 into master Mar 29, 2026
6 of 7 checks passed
@htafolla htafolla deleted the fix/inference-feedback-loop-activation branch March 29, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant