-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Summary
The Socratic discovery system includes full conversation history in each AI prompt. With MAX_DISCOVERY_QUESTIONS=20, this can grow to ~10,000 tokens just for history, plus prompt overhead.
Current Behavior
Each call to _generate_next_discovery_question() builds a prompt containing:
- Full conversation history (all Q&A pairs)
- Structured answers by topic
- Uncovered categories
- Socratic guidelines
Worst case scenario:
- 20 turns × ~500 tokens/turn = ~10,000 tokens for history
- Plus ~1,000 tokens for prompt template and metadata
- Total: ~11,000 input tokens per question generation
Cost impact:
- At $3/M input tokens (Claude 3.5 Sonnet): ~$0.033 per call
- Full 20-question discovery: ~$0.66 in input tokens alone
Proposed Optimizations
Option 1: Conversation Summarization
After N turns (e.g., 10), summarize earlier turns into a condensed summary:
if len(conversation_history) > 10:
summary = self._summarize_turns(conversation_history[:10])
recent_turns = conversation_history[10:]
prompt += f"## Earlier Conversation Summary\n{summary}\n\n"
prompt += "## Recent Conversation\n"
for turn in recent_turns:
...Option 2: Truncate to Recent K Turns
Keep only the most recent K turns (e.g., 5-7) plus category summaries:
MAX_HISTORY_TURNS = 7
recent_history = conversation_history[-MAX_HISTORY_TURNS:]Option 3: Token Budget with tiktoken
Count tokens and truncate when approaching a limit:
import tiktoken
def _build_discovery_question_prompt(self, ...):
enc = tiktoken.encoding_for_model("claude-3-5-sonnet-20241022")
MAX_PROMPT_TOKENS = 4000
# Add turns until budget reached
for turn in reversed(conversation_history):
turn_text = f"Q: {turn['question']}\nA: {turn['answer']}\n"
if current_tokens + len(enc.encode(turn_text)) > MAX_PROMPT_TOKENS:
break
included_turns.insert(0, turn)Option 4: Semantic Deduplication
Use embeddings to identify and remove redundant information across turns.
Recommendation
Start with Option 2 (truncate to recent K turns) as it's simplest and provides immediate benefit. Consider Option 1 (summarization) for better context preservation if quality degrades.
Acceptance Criteria
- Implement token optimization strategy
- Add token counting/logging to monitor usage
- Ensure question quality doesn't degrade with truncation
- Update documentation with token budget guidance
- Add tests for truncation/summarization logic
Priority
P3 - Nice to have optimization, not blocking for MVP
Related
- PR feat(discovery): implement AI-powered Socratic questioning system #257: Initial Socratic discovery implementation
- Issue feat(discovery): Improve category coverage detection with AI classification #258: Improve category coverage detection with AI classification