-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This module implements the AI engine abstraction layer — the core of the multi-AI system. Here's a thorough review.
Architecture
AIEngine (ABC)
├── GeminiEngine — Google genai SDK
├── OpenAIEngine — OpenAI-compatible (GPT, Grok, Local/Ollama)
└── ClaudeEngine — Anthropic SDK
All three share a common pattern:
call() → trim history → build messages → loop with auto-continue → update history → return
Key Design Pattern: Auto-Continue
Every engine implements a truncation recovery loop — if the model hits its output token limit, the engine automatically re-prompts to continue. This is a genuinely useful feature, but the implementation has issues.
Critical Issues
1. Duplicate initialize_engines() — config.py vs engines.py
There are two initialize_engines() functions: one in config.py and one in engines.py. They do the same thing differently:
| Aspect | config.py version |
engines.py version |
|---|---|---|
| Gemini SDK | google.generativeai (old) |
google.genai (new) |
| GeminiEngine args | 2 args (no client) | 3 args (with client) |
| max_tokens_key | Not passed for grok/local | Passed per-engine |
This is a conflict. Whichever runs last wins, but the config.py version would crash because GeminiEngine.__init__ requires 3 arguments (name, model_name, client), yet config.py only passes 2.
2. Gemini system prompt sent as "role": "model"
if self.system_prompt:
contents.append({"role": "model", "parts": [{"text": self.system_prompt}]})This pretends the system prompt is a model response, which is semantically wrong. The Gemini API supports a dedicated system_instruction parameter in the generation config. This hack can cause the model to behave unpredictably.
Fix:
response = self.client.models.generate_content(
model=self.model_name,
contents=contents, # without system prompt
config={
"max_output_tokens": self.max_output_tokens,
"system_instruction": self.system_prompt or None,
},
)3. GeminiEngine error handling doesn't wrap in AIError
# GeminiEngine.call()
except Exception as e:
logger.error(f"Gemini Error: {e}")
raise # ← raw exception
# OpenAIEngine.call()
except Exception as e:
logger.error(f"{self.name} API Error: {e}")
raise AIError(f"{self.name} error: {e}") # ← wrappedThis inconsistency means callers can't uniformly catch AIError for all engines.
4. Gemini auto-continue heuristic is fragile
if answer_chunk.count("```") % 2 == 1:
return True
if answer_chunk.rstrip().endswith((",", ":", "(", "[", "{")):
return TrueThese heuristics check answer_chunk (the latest chunk), not full_answer. On round 2+, this only examines the continuation fragment, not the accumulated text. An odd number of triple-backticks in a chunk doesn't necessarily mean truncation — the opening backtick could be in a previous chunk.
5. _trim_history is called twice per call()
def call(self, prompt: str) -> str:
self._trim_history() # ← before building messages
# ... API call ...
self.history.append(...)
self.history.append(...)
self._trim_history() # ← after appendingThe first trim is redundant since history is always trimmed at the end of each call. Alternatively, if scrub() or load_persona() are interleaved, the first trim is still unnecessary since those methods reset history entirely.
Moderate Issues
6. Decaying tail_chars only in Gemini
# GeminiEngine only:
eff_tail_chars = max(300, int(tail_chars * (0.8 ** (round_idx - 1))))OpenAI and Claude use a fixed tail_chars across rounds. This inconsistency should be intentional and documented, or unified.
7. No timeout configuration
None of the API calls specify timeouts. A hung API call will block indefinitely. All three SDKs support timeout parameters.
8. Claude passes empty string for system when no persona is set
system=self.system_prompt if self.system_prompt else "",The Anthropic API accepts system as optional. Passing an empty string is wasteful at best and could subtly influence behavior. Better:
kwargs = {"model": self.model_name, "max_tokens": self.max_tokens, "messages": messages}
if self.system_prompt:
kwargs["system"] = self.system_prompt
response = self.client.messages.create(**kwargs)Suggested Refactoring: Extract the Auto-Continue Loop
All three engines duplicate the same loop structure. This can be extracted:
class AIEngine(ABC):
def call(self, prompt: str) -> str:
self._trim_history()
messages = self._build_messages(prompt)
full_answer = self._auto_continue_loop(messages)
self.history.append({"role": "user", "content": prompt})
self.history.append({"role": "assistant", "content": full_answer})
self._trim_history()
return full_answer
def _auto_continue_loop(self, messages) -> str:
full_answer = ""
for round_idx in range(1, self._max_rounds + 1):
chunk, truncated = self._single_call(messages)
full_answer += chunk
if not truncated:
break
messages = self._append_continue(messages, chunk, full_answer, round_idx)
else:
full_answer += "\n\n[TRUNCATED: auto-continue limit reached]\n"
return full_answer
@abstractmethod
def _build_messages(self, prompt: str) -> list: ...
@abstractmethod
def _single_call(self, messages) -> tuple[str, bool]: ...
@abstractmethod
def _append_continue(self, messages, chunk, full, round_idx) -> list: ...This would eliminate ~60% of the duplicated code across the three engine classes.
Summary
| Severity | Issue | Location |
|---|---|---|
| 🔴 Critical | Duplicate & conflicting initialize_engines() |
config.py vs engines.py |
| 🔴 Critical | System prompt as model role in Gemini | GeminiEngine.call() |
| 🟠 High | Inconsistent error wrapping (AIError) | GeminiEngine.call() |
| 🟠 High | Fragile truncation heuristics on chunk, not full | _hit_output_limit() |
| 🟡 Medium | No API timeouts | All engines |
| 🟡 Medium | Duplicated auto-continue loop | All three call() methods |
| 🟢 Low | Double _trim_history() calls |
All engines |
| 🟢 Low | Empty string system prompt to Claude | ClaudeEngine.call() |