Skip to content

feat(notebook): add /tinker skill and api-exploration template for no…#154

Open
glassBead-tc wants to merge 1 commit intomainfrom
claude/thoughtbox-notebooks-workflow-w7nQk
Open

feat(notebook): add /tinker skill and api-exploration template for no…#154
glassBead-tc wants to merge 1 commit intomainfrom
claude/thoughtbox-notebooks-workflow-w7nQk

Conversation

@glassBead-tc
Copy link
Member

…tebook-first workflows

Addresses the "coding blind" problem where agents implement against APIs they've never actually called. Adds three interconnected features:

  1. api-exploration template: Structured notebook template with phases for assumption inventory, shape validation, edge case testing, and evidence export.

  2. /tinker skill: Guides agents through notebook-based API exploration before implementation. Records validated mental models as belief_snapshot thoughts.

  3. Server-initiated exploration via sampling: SamplingHandler.requestExploration() uses MCP sampling/createMessage to ask the agent's LLM to explore untested APIs in a notebook. The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs.

  4. Workflow integration: /workflow pipeline now includes a conditional Stage 3.5 (Tinker) between Planning and Implementation that gates on notebook evidence.

Inspired by @lateinteraction's observation that coding agents should tinker at small scale in notebooks before writing complex scripts.

https://claude.ai/code/session_012WEeqRaye2Exoc6cURRyex

…tebook-first workflows

Addresses the "coding blind" problem where agents implement against APIs they've
never actually called. Adds three interconnected features:

1. **api-exploration template**: Structured notebook template with phases for
   assumption inventory, shape validation, edge case testing, and evidence export.

2. **/tinker skill**: Guides agents through notebook-based API exploration before
   implementation. Records validated mental models as belief_snapshot thoughts.

3. **Server-initiated exploration via sampling**: SamplingHandler.requestExploration()
   uses MCP sampling/createMessage to ask the agent's LLM to explore untested APIs
   in a notebook. The gateway appends non-blocking nudges when it detects implementation
   thoughts about unvalidated APIs.

4. **Workflow integration**: /workflow pipeline now includes a conditional Stage 3.5
   (Tinker) between Planning and Implementation that gates on notebook evidence.

Inspired by @lateinteraction's observation that coding agents should tinker at small
scale in notebooks before writing complex scripts.

https://claude.ai/code/session_012WEeqRaye2Exoc6cURRyex
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR adds a notebook-first API exploration workflow to Thoughtbox, comprising a new /tinker skill, an api-exploration notebook template, a Stage 3.5 conditional gate in the /workflow pipeline, and server-side nudge logic in GatewayHandler that detects "coding blind" implementation thoughts and prompts agents to explore first.

The skill files, template, and workflow integration are well-designed. However, the core runtime implementation in gateway-handler.ts and sampling/handler.ts has three meaningful issues that should be resolved before merge:

  • exploredTargets is instance-scoped, not session-scoped — in a multi-agent server, one session's exploration suppresses nudges for all other sessions, inconsistent with how sessionsPrimed and sessionStages are managed.
  • SamplingHandler.requestExploration() is dead code — the method is implemented but never wired up to GatewayHandler, so the "server-initiated exploration via sampling" described in the PR description does not actually execute.
  • Null-target nudge loop — when extractApiTarget returns null (the common case for generic thought text), there is no suppression mechanism and the nudge fires on every qualifying thought indefinitely.

Confidence Score: 2/5

  • Not safe to merge as-is — the nudge system has a shared-state bug and unbounded-firing edge case, and the sampling integration is unwired.
  • The skill docs, template, and workflow table changes are solid. The runtime code in gateway-handler.ts has a cross-session contamination bug (instance-level Set vs session-keyed Map), an unbounded nudge loop when the target heuristic fails, and a public method that should be private. Additionally, the headline feature (server-initiated sampling exploration) is entirely unwired — requestExploration exists but is never called.
  • src/gateway/gateway-handler.ts and src/sampling/handler.ts require the most attention.

Important Files Changed

Filename Overview
src/gateway/gateway-handler.ts Adds exploration nudge system to handleThought. Key issues: exploredTargets is instance-scoped (not session-scoped), causing cross-session contamination; when extractApiTarget returns null nudges fire without bound; markExplored is unnecessarily public.
src/sampling/handler.ts Adds requestExploration method but it is never called from GatewayHandler — the server-initiated sampling-based exploration described in the PR is dead code.
.claude/skills/tinker/SKILL.md New /tinker skill with clear five-phase protocol for API exploration. Well-structured with anti-patterns and integration guidance. No issues.
.claude/skills/workflow/SKILL.md Adds conditional Stage 3.5 (Tinker) to workflow pipeline with clear gate conditions and skip logic for plans with no external dependencies.
src/notebook/operations.ts Extends the create notebook operation to accept 'api-exploration' as a valid template enum value. Clean, minimal change.
src/notebook/templates.generated.ts Adds api-exploration template content and registers it in AVAILABLE_TEMPLATES. Template structure is comprehensive and well-organized across five phases.
templates/api-exploration-template.src.md Source template for the api-exploration notebook. Matches the generated template in templates.generated.ts. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent submits thought via thoughtbox_gateway] --> B{result.isError?}
    B -- yes --> Z[Return result as-is]
    B -- no --> C[checkExplorationEvidence]
    C --> D{thoughtType is belief_snapshot\nor assumption_update?}
    D -- yes --> H[Skip nudge]
    D -- no --> E{Thought matches\nEXPLORATION_SIGNALS?}
    E -- yes --> H
    E -- no --> F{Thought matches\nIMPLEMENTATION_SIGNALS?}
    F -- no --> H
    F -- yes --> G{extractApiTarget returns\nnon-null AND target\nalready in exploredTargets?}
    G -- yes --> H
    G -- no --> I[Append text nudge to result.content]
    H --> J{thoughtType === belief_snapshot?}
    I --> J
    J -- yes --> K[extractApiTarget from thought\nif non-null → markExplored]
    J -- no --> L[Profile priming check]
    K --> L
    L --> M[Return result]

    subgraph DEAD_CODE ["⚠️ Never Called"]
        N[SamplingHandler.requestExploration]
    end

    I -.->|PR description implies\nthis triggers sampling| N
Loading

Last reviewed commit: b993e12

Comment on lines +211 to +212
/** Track APIs/libraries that have been explored in notebooks (prevents repeated nudges) */
private exploredTargets = new Set<string>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exploredTargets is instance-scoped, not session-scoped

exploredTargets is a single Set<string> shared across all MCP sessions on the same GatewayHandler instance. In a multi-agent environment, session A's exploration of, say, "stripe" will permanently suppress nudges for session B — even if session B has never explored Stripe at all.

Every other piece of per-session state in this class uses a Map keyed by session ID: sessionsPrimed, sessionStages. This Set should follow the same pattern to avoid cross-session contamination:

/** Track APIs/libraries that have been explored per session (prevents repeated nudges) */
private exploredTargets = new Map<string, Set<string>>();

Then in checkExplorationEvidence and markExplored, gate on the current mcpSessionId (which is already threaded through handleThought).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/gateway-handler.ts
Line: 211-212

Comment:
**`exploredTargets` is instance-scoped, not session-scoped**

`exploredTargets` is a single `Set<string>` shared across all MCP sessions on the same `GatewayHandler` instance. In a multi-agent environment, session A's exploration of, say, `"stripe"` will permanently suppress nudges for session B — even if session B has never explored Stripe at all.

Every other piece of per-session state in this class uses a `Map` keyed by session ID: `sessionsPrimed`, `sessionStages`. This Set should follow the same pattern to avoid cross-session contamination:

```
/** Track APIs/libraries that have been explored per session (prevents repeated nudges) */
private exploredTargets = new Map<string, Set<string>>();
```

Then in `checkExplorationEvidence` and `markExplored`, gate on the current `mcpSessionId` (which is already threaded through `handleThought`).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +186 to 242
async requestExploration(
target: string,
assumptions: string[],
context: ThoughtData[]
): Promise<string> {
const assumptionList = assumptions.length > 0
? `\n\nAssumptions detected (validate ALL of these):\n${assumptions.map((a, i) => `${i + 1}. ${a}`).join("\n")}`
: "";

const contextText = context.slice(-3)
.map((t) => `Thought ${t.thoughtNumber}: ${t.thought}`)
.join("\n\n");

const message = `${contextText ? contextText + "\n\n" : ""}You are about to implement code that depends on: ${target}

No notebook evidence was found validating your assumptions about this dependency.${assumptionList}

Create a Thoughtbox notebook using the api-exploration template and validate your mental model before proceeding. Start with: thoughtbox_gateway { operation: "notebook", args: { operation: "create", title: "${target} Exploration", language: "typescript", template: "api-exploration" } }`;

try {
const result = (await this.protocol.request(
{
method: "sampling/createMessage",
params: {
messages: [{
role: "user" as const,
content: {
type: "text" as const,
text: message,
},
}],
systemPrompt: EXPLORATION_SYSTEM_PROMPT,
modelPreferences: {
hints: [{ name: "claude-sonnet-4-5-20250929" }],
intelligencePriority: 0.8,
costPriority: 0.4,
},
includeContext: "thisServer",
maxTokens: 2000,
} as SamplingParams,
},
null as any
)) as SamplingResult;

return result.content.text;
} catch (error: any) {
if (error.code === -32601) {
// Client doesn't support sampling — degrade gracefully
throw error;
}
throw new Error(`Exploration sampling request failed: ${error.message}`);
}
}

// NOTE: Sampling capability detection happens automatically during MCP initialization
// handshake via client capabilities. No need for runtime probing.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestExploration is dead code — never called

SamplingHandler.requestExploration() is implemented here and the PR description says "The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs" via this method. However, GatewayHandler never imports SamplingHandler, never instantiates it, and never calls requestExploration. The gateway's nudge is entirely the static text block in checkExplorationEvidence.

This means the "server-initiated exploration via sampling" described in the PR is not actually wired up. The sampling-based flow (system prompt, model preferences, includeContext: "thisServer") never fires. Either:

  1. The integration was deferred and the method should be removed or marked // TODO, or
  2. There should be a call to requestExploration from GatewayHandler.handleThought when a nudge is warranted (in addition to or instead of the text nudge).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/sampling/handler.ts
Line: 186-242

Comment:
**`requestExploration` is dead code — never called**

`SamplingHandler.requestExploration()` is implemented here and the PR description says "The gateway appends non-blocking nudges when it detects implementation thoughts about unvalidated APIs" via this method. However, `GatewayHandler` never imports `SamplingHandler`, never instantiates it, and never calls `requestExploration`. The gateway's nudge is entirely the static text block in `checkExplorationEvidence`.

This means the "server-initiated exploration via sampling" described in the PR is not actually wired up. The sampling-based flow (system prompt, model preferences, `includeContext: "thisServer"`) never fires. Either:
1. The integration was deferred and the method should be removed or marked `// TODO`, or
2. There should be a call to `requestExploration` from `GatewayHandler.handleThought` when a nudge is warranted (in addition to or instead of the text nudge).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1218 to +1224
// Check if the target has already been explored in this session
// (Simple heuristic: extract likely API name from thought)
const target = this.extractApiTarget(thought);
if (target && this.exploredTargets.has(target.toLowerCase())) {
return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded nudging when extractApiTarget returns null

When extractApiTarget returns null (which is common — many generic thoughts about integrations won't match the backtick/quote/capitalized-noun heuristics), the guard if (target && this.exploredTargets.has(...)) short-circuits to false. This means:

  1. Every implementation-signal thought with an unextractable target fires a nudge — potentially on every thought in a long planning session.
  2. The nudge degrades to \/tinker [target]`` (the literal placeholder), which is unhelpful.
  3. There is no way for the agent to suppress these nudges by recording a belief_snapshot, because handleThought also calls extractApiTarget to find what to mark, and if it returns null there too, nothing is ever marked.

Consider adding a session-level flag (e.g. "generic implementation nudge already sent this session") so at minimum the null-target case only fires once per session.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/gateway-handler.ts
Line: 1218-1224

Comment:
**Unbounded nudging when `extractApiTarget` returns `null`**

When `extractApiTarget` returns `null` (which is common — many generic thoughts about integrations won't match the backtick/quote/capitalized-noun heuristics), the guard `if (target && this.exploredTargets.has(...))` short-circuits to `false`. This means:

1. Every implementation-signal thought with an unextractable target fires a nudge — potentially on every thought in a long planning session.
2. The nudge degrades to `\`/tinker [target]\`` (the literal placeholder), which is unhelpful.
3. There is no way for the agent to suppress these nudges by recording a `belief_snapshot`, because `handleThought` also calls `extractApiTarget` to find what to mark, and if it returns `null` there too, nothing is ever marked.

Consider adding a session-level flag (e.g. "generic implementation nudge already sent this session") so at minimum the `null`-target case only fires once per session.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1253 to +1255
markExplored(target: string): void {
this.exploredTargets.add(target.toLowerCase());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markExplored should be private

This method is called only from within handleThought (line 631) and has no reason to be part of the public API. Exposing it publicly allows external callers to pre-suppress nudges for targets that have never been explored, which defeats the purpose of the exploration gate. Mark it private:

Suggested change
markExplored(target: string): void {
this.exploredTargets.add(target.toLowerCase());
}
private markExplored(target: string): void {
this.exploredTargets.add(target.toLowerCase());
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/gateway-handler.ts
Line: 1253-1255

Comment:
**`markExplored` should be `private`**

This method is called only from within `handleThought` (line 631) and has no reason to be part of the public API. Exposing it publicly allows external callers to pre-suppress nudges for targets that have never been explored, which defeats the purpose of the exploration gate. Mark it `private`:

```suggestion
  private markExplored(target: string): void {
    this.exploredTargets.add(target.toLowerCase());
  }
```

How can I resolve this? If you propose a fix, please make it concise.

@augmentcode
Copy link

augmentcode bot commented Mar 9, 2026

🤖 Augment PR Summary

Summary: This PR introduces a notebook-first “tinker before you implement” workflow to reduce mistakes from coding against unvalidated APIs.

Changes:

  • Added a new /tinker skill that guides agents through assumption inventory, notebook-based API exploration, and exporting evidence.
  • Introduced an api-exploration notebook template and exposed it via the notebook create operation’s template enum.
  • Extended the /workflow conductor documentation with a conditional Stage 3.5 (“Tinker”) between Planning and Implementation.
  • Added an exploration “nudge” system in GatewayHandler that detects implementation-oriented thoughts and appends a non-blocking suggestion to run /tinker.
  • Extended SamplingHandler with requestExploration() to request notebook exploration via MCP sampling (sampling/createMessage).

Technical Notes: The nudge system uses regex-based signals + a session-local explored-targets set, and the sampling exploration prompt instructs creation of a notebook using the new api-exploration template.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

return null;
}

return `\n---\n**Exploration nudge**: This thought mentions implementing against an external API/library, but no notebook evidence was found validating the assumed behavior. Consider running \`/tinker ${target || '[target]'}\` to validate your mental model in a notebook before writing production code. This catches shape mismatches, auth surprises, and undocumented edge cases early.`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkExplorationEvidence()'s nudge says “no notebook evidence was found”, but the current logic only uses regex heuristics + an in-memory exploredTargets set (no check of exported .tinker/* files or persisted session thoughts). This could produce misleading nudges even when evidence exists but isn’t detected by these heuristics.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

* @returns The agent's exploration plan/acknowledgment
* @throws Error if sampling fails or is not supported
*/
async requestExploration(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t find any call sites for requestExploration() in the current codebase, so the “server-initiated exploration via sampling” behavior described in the PR may not actually trigger yet. If this is intended to be live in this PR, it likely needs wiring from the gateway/thought flow.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


No notebook evidence was found validating your assumptions about this dependency.${assumptionList}

Create a Thoughtbox notebook using the api-exploration template and validate your mental model before proceeding. Start with: thoughtbox_gateway { operation: "notebook", args: { operation: "create", title: "${target} Exploration", language: "typescript", template: "api-exploration" } }`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target is interpolated into a JSON-like tool invocation inside the prompt (e.g., title: "${target} Exploration"); if target contains quotes/newlines, the suggested command can become invalid/confusing for the model to follow. Consider sanitizing/escaping target (or avoiding embedding it in a pseudo-JSON snippet) to keep the instruction robust.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants