Skip to content

Implement controlled activation for local steering inference provider #34

@mdheller

Description

@mdheller

Goal

Implement a controlled local activation path for GPT-2 Small residual-stream SAE steering, using the already-registered gpt2-small.res-jb sourceset.

Current status

PR #38 landed the fail-closed preflight / real-path entrypoint tranche.

PR #39 landed the steering artifact receipt contract and pending fixture.

Issue #34 is not complete. It must remain open until a real status: applied local smoke record exists.

Landed by PR #38:

  • requirements-steering.txt optional runtime dependency list
  • agent-machine steer preflight --sourceset gpt2-small.res-jb
  • agent-machine steer serve --sourceset gpt2-small.res-jb --host 127.0.0.1 --port 8080
  • fail-closed not_configured behavior when dependencies/artifacts/receipts/admission are absent
  • docs/steering-activation-path.md

Landed by PR #39:

  • contracts/steering-artifact-receipt.schema.json
  • pending gpt2-small.res-jb artifact receipt fixture
  • validation wiring for SteeringArtifactReceipt
  • docs/steering-artifact-receipts.md
  • artifact receipt gate in docs/steering-activation-path.md

Still not landed:

  • complete artifact receipt with exact repo, file path, resolved revision, local path, size, and SHA-256 digest for every model/tokenizer/SAE file
  • storage receipt after artifact resolution
  • policy admission / agent-registry grant records
  • GPT-2 Small model loading
  • blocks.6.hook_resid_pre SAE loading
  • real activation injection
  • real baseline vs steered completion
  • local smoke record with status: applied

Scope

This issue is GPT-2 Small only.

Gemma sourcesets are explicitly out of scope for closure because Gemma model access depends on operator acceptance of Google/Gemma terms. Gemma remains registered but blocked until terms/access and artifact readiness are verified separately.

Implement the path needed for:

agent-machine steer serve --sourceset gpt2-small.res-jb
POST /steer

The served endpoint must remain compatible with Noetica's existing local steering client shape:

NEURONPEDIA_BASE_URL=http://localhost:8080
Noetica /api/steer -> Agent Machine /steer

Required implementation work

  • Resolve the gpt2-small.res-jb sourceset from the SteeringSourceset registry.
  • Fetch or locate GPT-2 Small model artifacts from openai-community/gpt2 with digest verification.
  • Fetch or locate the residual-stream SAE artifact from jbloom/GPT2-Small-SAEs-Reformatted / SAELens release gpt2-small-res-jb / SAE id blocks.6.hook_resid_pre with digest verification.
  • Emit a storage receipt after successful artifact resolution.
  • Load GPT-2 Small.
  • Load the blocks.6.hook_resid_pre SAE.
  • Run a baseline completion.
  • Run a steered completion by injecting the requested SAE feature vector at the specified layer and strength.
  • Return a Noetica-compatible SteeringResult with status: applied, baseline, steered, diff_summary, feature_id, layer, and strength.
  • Preserve the Issue Define Neuronpedia-compatible local /steer endpoint contract #32 stub path as a no-weights fallback.
  • Emit or stage evidence sufficient for a local smoke record.

Acceptance criteria

  • agent-machine steer serve --sourceset gpt2-small.res-jb starts the real local steering endpoint when required artifacts are present and verified.
  • agent-machine steer serve-stub ... remains available and returns not_configured or noop only.
  • If artifacts, storage receipts, grants, policy admission, or runtime dependencies are missing, the real serve path fails closed with a clear diagnostic and does not silently downgrade to fake applied.
  • A local smoke record is committed showing:
    • sourceset: gpt2-small.res-jb
    • feature index: 10200
    • layer: 6-res-jb
    • strength: 5
    • prompt
    • baseline text
    • steered text
    • status: applied
    • evidence hash / receipt reference
  • Noetica can set NEURONPEDIA_BASE_URL=http://localhost:8080, call /api/steer, and receive the applied response without code changes.
  • Native validation / CI passes.

Non-goals

  • Do not implement Gemma activation in this issue.
  • Do not bypass policy/grant/storage/evidence gates.
  • Do not store credentials or tokens in the repository.
  • Do not claim production readiness.
  • Do not remove the stub endpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions