feat(notebook): add /tinker skill and api-exploration template for no… by glassBead-tc · Pull Request #154 · Kastalien-Research/thoughtbox

glassBead-tc · 2026-03-09T05:10:08Z

…tebook-first workflows

Addresses the "coding blind" problem where agents implement against APIs they've never actually called. Adds three interconnected features:

api-exploration template: Structured notebook template with phases for assumption inventory, shape validation, edge case testing, and evidence export.
/tinker skill: Guides agents through notebook-based API exploration before implementation. Records validated mental models as belief_snapshot thoughts.
Server-initiated exploration via sampling: SamplingHandler.requestExploration() uses MCP sampling/createMessage to ask the agent's LLM to explore untested APIs in a notebook. The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs.
Workflow integration: /workflow pipeline now includes a conditional Stage 3.5 (Tinker) between Planning and Implementation that gates on notebook evidence.

Inspired by @lateinteraction's observation that coding agents should tinker at small scale in notebooks before writing complex scripts.

https://claude.ai/code/session_012WEeqRaye2Exoc6cURRyex

…tebook-first workflows Addresses the "coding blind" problem where agents implement against APIs they've never actually called. Adds three interconnected features: 1. **api-exploration template**: Structured notebook template with phases for assumption inventory, shape validation, edge case testing, and evidence export. 2. **/tinker skill**: Guides agents through notebook-based API exploration before implementation. Records validated mental models as belief_snapshot thoughts. 3. **Server-initiated exploration via sampling**: SamplingHandler.requestExploration() uses MCP sampling/createMessage to ask the agent's LLM to explore untested APIs in a notebook. The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs. 4. **Workflow integration**: /workflow pipeline now includes a conditional Stage 3.5 (Tinker) between Planning and Implementation that gates on notebook evidence. Inspired by @lateinteraction's observation that coding agents should tinker at small scale in notebooks before writing complex scripts. https://claude.ai/code/session_012WEeqRaye2Exoc6cURRyex

chatgpt-codex-connector · 2026-03-09T05:10:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

greptile-apps · 2026-03-09T05:13:05Z

Greptile Summary

This PR adds a notebook-first API exploration workflow to Thoughtbox, comprising a new /tinker skill, an api-exploration notebook template, a Stage 3.5 conditional gate in the /workflow pipeline, and server-side nudge logic in GatewayHandler that detects "coding blind" implementation thoughts and prompts agents to explore first.

The skill files, template, and workflow integration are well-designed. However, the core runtime implementation in gateway-handler.ts and sampling/handler.ts has three meaningful issues that should be resolved before merge:

exploredTargets is instance-scoped, not session-scoped — in a multi-agent server, one session's exploration suppresses nudges for all other sessions, inconsistent with how sessionsPrimed and sessionStages are managed.
SamplingHandler.requestExploration() is dead code — the method is implemented but never wired up to GatewayHandler, so the "server-initiated exploration via sampling" described in the PR description does not actually execute.
Null-target nudge loop — when extractApiTarget returns null (the common case for generic thought text), there is no suppression mechanism and the nudge fires on every qualifying thought indefinitely.

Confidence Score: 2/5

Not safe to merge as-is — the nudge system has a shared-state bug and unbounded-firing edge case, and the sampling integration is unwired.
The skill docs, template, and workflow table changes are solid. The runtime code in gateway-handler.ts has a cross-session contamination bug (instance-level Set vs session-keyed Map), an unbounded nudge loop when the target heuristic fails, and a public method that should be private. Additionally, the headline feature (server-initiated sampling exploration) is entirely unwired — requestExploration exists but is never called.
src/gateway/gateway-handler.ts and src/sampling/handler.ts require the most attention.

Important Files Changed

Filename	Overview
src/gateway/gateway-handler.ts	Adds exploration nudge system to handleThought. Key issues: exploredTargets is instance-scoped (not session-scoped), causing cross-session contamination; when extractApiTarget returns null nudges fire without bound; markExplored is unnecessarily public.
src/sampling/handler.ts	Adds requestExploration method but it is never called from GatewayHandler — the server-initiated sampling-based exploration described in the PR is dead code.
.claude/skills/tinker/SKILL.md	New /tinker skill with clear five-phase protocol for API exploration. Well-structured with anti-patterns and integration guidance. No issues.
.claude/skills/workflow/SKILL.md	Adds conditional Stage 3.5 (Tinker) to workflow pipeline with clear gate conditions and skip logic for plans with no external dependencies.
src/notebook/operations.ts	Extends the create notebook operation to accept 'api-exploration' as a valid template enum value. Clean, minimal change.
src/notebook/templates.generated.ts	Adds api-exploration template content and registers it in AVAILABLE_TEMPLATES. Template structure is comprehensive and well-organized across five phases.
templates/api-exploration-template.src.md	Source template for the api-exploration notebook. Matches the generated template in templates.generated.ts. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent submits thought via thoughtbox_gateway] --> B{result.isError?}
    B -- yes --> Z[Return result as-is]
    B -- no --> C[checkExplorationEvidence]
    C --> D{thoughtType is belief_snapshot\nor assumption_update?}
    D -- yes --> H[Skip nudge]
    D -- no --> E{Thought matches\nEXPLORATION_SIGNALS?}
    E -- yes --> H
    E -- no --> F{Thought matches\nIMPLEMENTATION_SIGNALS?}
    F -- no --> H
    F -- yes --> G{extractApiTarget returns\nnon-null AND target\nalready in exploredTargets?}
    G -- yes --> H
    G -- no --> I[Append text nudge to result.content]
    H --> J{thoughtType === belief_snapshot?}
    I --> J
    J -- yes --> K[extractApiTarget from thought\nif non-null → markExplored]
    J -- no --> L[Profile priming check]
    K --> L
    L --> M[Return result]

    subgraph DEAD_CODE ["⚠️ Never Called"]
        N[SamplingHandler.requestExploration]
    end

    I -.->|PR description implies\nthis triggers sampling| N

_{Last reviewed commit: b993e12}

greptile-apps · 2026-03-09T05:13:08Z

src/gateway/gateway-handler.ts

+  /** Track APIs/libraries that have been explored in notebooks (prevents repeated nudges) */
+  private exploredTargets = new Set<string>();


exploredTargets is instance-scoped, not session-scoped

exploredTargets is a single Set<string> shared across all MCP sessions on the same GatewayHandler instance. In a multi-agent environment, session A's exploration of, say, "stripe" will permanently suppress nudges for session B — even if session B has never explored Stripe at all.

Every other piece of per-session state in this class uses a Map keyed by session ID: sessionsPrimed, sessionStages. This Set should follow the same pattern to avoid cross-session contamination:

/** Track APIs/libraries that have been explored per session (prevents repeated nudges) */ private exploredTargets = new Map<string, Set<string>>();

Then in checkExplorationEvidence and markExplored, gate on the current mcpSessionId (which is already threaded through handleThought).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/gateway/gateway-handler.ts Line: 211-212 Comment: **`exploredTargets` is instance-scoped, not session-scoped** `exploredTargets` is a single `Set<string>` shared across all MCP sessions on the same `GatewayHandler` instance. In a multi-agent environment, session A's exploration of, say, `"stripe"` will permanently suppress nudges for session B — even if session B has never explored Stripe at all. Every other piece of per-session state in this class uses a `Map` keyed by session ID: `sessionsPrimed`, `sessionStages`. This Set should follow the same pattern to avoid cross-session contamination: ``` /** Track APIs/libraries that have been explored per session (prevents repeated nudges) */ private exploredTargets = new Map<string, Set<string>>(); ``` Then in `checkExplorationEvidence` and `markExplored`, gate on the current `mcpSessionId` (which is already threaded through `handleThought`). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-09T05:13:09Z

src/sampling/handler.ts

+  async requestExploration(
+    target: string,
+    assumptions: string[],
+    context: ThoughtData[]
+  ): Promise<string> {
+    const assumptionList = assumptions.length > 0
+      ? `\n\nAssumptions detected (validate ALL of these):\n${assumptions.map((a, i) => `${i + 1}. ${a}`).join("\n")}`
+      : "";
+
+    const contextText = context.slice(-3)
+      .map((t) => `Thought ${t.thoughtNumber}: ${t.thought}`)
+      .join("\n\n");
+
+    const message = `${contextText ? contextText + "\n\n" : ""}You are about to implement code that depends on: ${target}
+
+No notebook evidence was found validating your assumptions about this dependency.${assumptionList}
+
+Create a Thoughtbox notebook using the api-exploration template and validate your mental model before proceeding. Start with: thoughtbox_gateway { operation: "notebook", args: { operation: "create", title: "${target} Exploration", language: "typescript", template: "api-exploration" } }`;
+
+    try {
+      const result = (await this.protocol.request(
+        {
+          method: "sampling/createMessage",
+          params: {
+            messages: [{
+              role: "user" as const,
+              content: {
+                type: "text" as const,
+                text: message,
+              },
+            }],
+            systemPrompt: EXPLORATION_SYSTEM_PROMPT,
+            modelPreferences: {
+              hints: [{ name: "claude-sonnet-4-5-20250929" }],
+              intelligencePriority: 0.8,
+              costPriority: 0.4,
+            },
+            includeContext: "thisServer",
+            maxTokens: 2000,
+          } as SamplingParams,
+        },
+        null as any
+      )) as SamplingResult;
+
+      return result.content.text;
+    } catch (error: any) {
+      if (error.code === -32601) {
+        // Client doesn't support sampling — degrade gracefully
+        throw error;
+      }
+      throw new Error(`Exploration sampling request failed: ${error.message}`);
+    }
+  }
+
  // NOTE: Sampling capability detection happens automatically during MCP initialization
  // handshake via client capabilities. No need for runtime probing.
 }


requestExploration is dead code — never called

SamplingHandler.requestExploration() is implemented here and the PR description says "The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs" via this method. However, GatewayHandler never imports SamplingHandler, never instantiates it, and never calls requestExploration. The gateway's nudge is entirely the static text block in checkExplorationEvidence.

This means the "server-initiated exploration via sampling" described in the PR is not actually wired up. The sampling-based flow (system prompt, model preferences, includeContext: "thisServer") never fires. Either:

The integration was deferred and the method should be removed or marked // TODO, or

There should be a call to requestExploration from GatewayHandler.handleThought when a nudge is warranted (in addition to or instead of the text nudge).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/sampling/handler.ts Line: 186-242 Comment: **`requestExploration` is dead code — never called** `SamplingHandler.requestExploration()` is implemented here and the PR description says "The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs" via this method. However, `GatewayHandler` never imports `SamplingHandler`, never instantiates it, and never calls `requestExploration`. The gateway's nudge is entirely the static text block in `checkExplorationEvidence`. This means the "server-initiated exploration via sampling" described in the PR is not actually wired up. The sampling-based flow (system prompt, model preferences, `includeContext: "thisServer"`) never fires. Either: 1. The integration was deferred and the method should be removed or marked `// TODO`, or 2. There should be a call to `requestExploration` from `GatewayHandler.handleThought` when a nudge is warranted (in addition to or instead of the text nudge). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-09T05:13:10Z

src/gateway/gateway-handler.ts

+    // Check if the target has already been explored in this session
+    // (Simple heuristic: extract likely API name from thought)
+    const target = this.extractApiTarget(thought);
+    if (target && this.exploredTargets.has(target.toLowerCase())) {
+      return null;
+    }
+


Unbounded nudging when extractApiTarget returns null

When extractApiTarget returns null (which is common — many generic thoughts about integrations won't match the backtick/quote/capitalized-noun heuristics), the guard if (target && this.exploredTargets.has(...)) short-circuits to false. This means:

Every implementation-signal thought with an unextractable target fires a nudge — potentially on every thought in a long planning session.

The nudge degrades to \/tinker [target]`` (the literal placeholder), which is unhelpful.

There is no way for the agent to suppress these nudges by recording a belief_snapshot, because handleThought also calls extractApiTarget to find what to mark, and if it returns null there too, nothing is ever marked.

Consider adding a session-level flag (e.g. "generic implementation nudge already sent this session") so at minimum the null-target case only fires once per session.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/gateway/gateway-handler.ts Line: 1218-1224 Comment: **Unbounded nudging when `extractApiTarget` returns `null`** When `extractApiTarget` returns `null` (which is common — many generic thoughts about integrations won't match the backtick/quote/capitalized-noun heuristics), the guard `if (target && this.exploredTargets.has(...))` short-circuits to `false`. This means: 1. Every implementation-signal thought with an unextractable target fires a nudge — potentially on every thought in a long planning session. 2. The nudge degrades to `\`/tinker [target]\`` (the literal placeholder), which is unhelpful. 3. There is no way for the agent to suppress these nudges by recording a `belief_snapshot`, because `handleThought` also calls `extractApiTarget` to find what to mark, and if it returns `null` there too, nothing is ever marked. Consider adding a session-level flag (e.g. "generic implementation nudge already sent this session") so at minimum the `null`-target case only fires once per session. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-09T05:13:11Z

src/gateway/gateway-handler.ts

+  markExplored(target: string): void {
+    this.exploredTargets.add(target.toLowerCase());
+  }


markExplored should be private

This method is called only from within handleThought (line 631) and has no reason to be part of the public API. Exposing it publicly allows external callers to pre-suppress nudges for targets that have never been explored, which defeats the purpose of the exploration gate. Mark it private:

Suggested change

markExplored(target: string): void {

this.exploredTargets.add(target.toLowerCase());

}

private markExplored(target: string): void {

this.exploredTargets.add(target.toLowerCase());

}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/gateway/gateway-handler.ts Line: 1253-1255 Comment: **`markExplored` should be `private`** This method is called only from within `handleThought` (line 631) and has no reason to be part of the public API. Exposing it publicly allows external callers to pre-suppress nudges for targets that have never been explored, which defeats the purpose of the exploration gate. Mark it `private`: ```suggestion private markExplored(target: string): void { this.exploredTargets.add(target.toLowerCase()); } ``` How can I resolve this? If you propose a fix, please make it concise.

augmentcode · 2026-03-09T05:23:15Z

🤖 Augment PR Summary

Summary: This PR introduces a notebook-first “tinker before you implement” workflow to reduce mistakes from coding against unvalidated APIs.

Changes:

Added a new /tinker skill that guides agents through assumption inventory, notebook-based API exploration, and exporting evidence.
Introduced an api-exploration notebook template and exposed it via the notebook create operation’s template enum.
Extended the /workflow conductor documentation with a conditional Stage 3.5 (“Tinker”) between Planning and Implementation.
Added an exploration “nudge” system in GatewayHandler that detects implementation-oriented thoughts and appends a non-blocking suggestion to run /tinker.
Extended SamplingHandler with requestExploration() to request notebook exploration via MCP sampling (sampling/createMessage).

Technical Notes: The nudge system uses regex-based signals + a session-local explored-targets set, and the sampling exploration prompt instructs creation of a notebook using the new api-exploration template.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-09T05:23:16Z

src/gateway/gateway-handler.ts

+      return null;
+    }
+
+    return `\n---\n**Exploration nudge**: This thought mentions implementing against an external API/library, but no notebook evidence was found validating the assumed behavior. Consider running \`/tinker ${target || '[target]'}\` to validate your mental model in a notebook before writing production code. This catches shape mismatches, auth surprises, and undocumented edge cases early.`;


checkExplorationEvidence()'s nudge says “no notebook evidence was found”, but the current logic only uses regex heuristics + an in-memory exploredTargets set (no check of exported .tinker/* files or persisted session thoughts). This could produce misleading nudges even when evidence exists but isn’t detected by these heuristics.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-03-09T05:23:16Z

src/sampling/handler.ts

+   * @returns The agent's exploration plan/acknowledgment
+   * @throws Error if sampling fails or is not supported
+   */
+  async requestExploration(


I can’t find any call sites for requestExploration() in the current codebase, so the “server-initiated exploration via sampling” behavior described in the PR may not actually trigger yet. If this is intended to be live in this PR, it likely needs wiring from the gateway/thought flow.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-03-09T05:23:16Z

src/sampling/handler.ts

+
+No notebook evidence was found validating your assumptions about this dependency.${assumptionList}
+
+Create a Thoughtbox notebook using the api-exploration template and validate your mental model before proceeding. Start with: thoughtbox_gateway { operation: "notebook", args: { operation: "create", title: "${target} Exploration", language: "typescript", template: "api-exploration" } }`;


target is interpolated into a JSON-like tool invocation inside the prompt (e.g., title: "${target} Exploration"); if target contains quotes/newlines, the suggested command can become invalid/confusing for the model to follow. Consider sanitizing/escaping target (or avoiding embedding it in a pseudo-JSON snippet) to keep the instruction robust.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

augmentcode bot reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(notebook): add /tinker skill and api-exploration template for no…#154

feat(notebook): add /tinker skill and api-exploration template for no…#154
glassBead-tc wants to merge 1 commit intomainfrom
claude/thoughtbox-notebooks-workflow-w7nQk

glassBead-tc commented Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

augmentcode bot commented Mar 9, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Mar 9, 2026

Uh oh!

augmentcode bot Mar 9, 2026

Uh oh!

augmentcode bot Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/** Track APIs/libraries that have been explored in notebooks (prevents repeated nudges) */
		private exploredTargets = new Set<string>();


		No notebook evidence was found validating your assumptions about this dependency.${assumptionList}

		Create a Thoughtbox notebook using the api-exploration template and validate your mental model before proceeding. Start with: thoughtbox_gateway { operation: "notebook", args: { operation: "create", title: "${target} Exploration", language: "typescript", template: "api-exploration" } }`;

Conversation

glassBead-tc commented Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Mar 9, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants