Skip to content

Analysis of engines.py #29

@albatrosary

Description

@albatrosary

This module implements the AI engine abstraction layer — the core of the multi-AI system. Here's a thorough review.


Architecture

AIEngine (ABC)
├── GeminiEngine    — Google genai SDK
├── OpenAIEngine    — OpenAI-compatible (GPT, Grok, Local/Ollama)
└── ClaudeEngine    — Anthropic SDK

All three share a common pattern:

call() → trim history → build messages → loop with auto-continue → update history → return

Key Design Pattern: Auto-Continue

Every engine implements a truncation recovery loop — if the model hits its output token limit, the engine automatically re-prompts to continue. This is a genuinely useful feature, but the implementation has issues.


Critical Issues

1. Duplicate initialize_engines() — config.py vs engines.py

There are two initialize_engines() functions: one in config.py and one in engines.py. They do the same thing differently:

Aspect config.py version engines.py version
Gemini SDK google.generativeai (old) google.genai (new)
GeminiEngine args 2 args (no client) 3 args (with client)
max_tokens_key Not passed for grok/local Passed per-engine

This is a conflict. Whichever runs last wins, but the config.py version would crash because GeminiEngine.__init__ requires 3 arguments (name, model_name, client), yet config.py only passes 2.

2. Gemini system prompt sent as "role": "model"

if self.system_prompt:
    contents.append({"role": "model", "parts": [{"text": self.system_prompt}]})

This pretends the system prompt is a model response, which is semantically wrong. The Gemini API supports a dedicated system_instruction parameter in the generation config. This hack can cause the model to behave unpredictably.

Fix:

response = self.client.models.generate_content(
    model=self.model_name,
    contents=contents,  # without system prompt
    config={
        "max_output_tokens": self.max_output_tokens,
        "system_instruction": self.system_prompt or None,
    },
)

3. GeminiEngine error handling doesn't wrap in AIError

# GeminiEngine.call()
except Exception as e:
    logger.error(f"Gemini Error: {e}")
    raise  # ← raw exception

# OpenAIEngine.call()
except Exception as e:
    logger.error(f"{self.name} API Error: {e}")
    raise AIError(f"{self.name} error: {e}")  # ← wrapped

This inconsistency means callers can't uniformly catch AIError for all engines.

4. Gemini auto-continue heuristic is fragile

if answer_chunk.count("```") % 2 == 1:
    return True
if answer_chunk.rstrip().endswith((",", ":", "(", "[", "{")):
    return True

These heuristics check answer_chunk (the latest chunk), not full_answer. On round 2+, this only examines the continuation fragment, not the accumulated text. An odd number of triple-backticks in a chunk doesn't necessarily mean truncation — the opening backtick could be in a previous chunk.

5. _trim_history is called twice per call()

def call(self, prompt: str) -> str:
    self._trim_history()       # ← before building messages
    # ... API call ...
    self.history.append(...)
    self.history.append(...)
    self._trim_history()       # ← after appending

The first trim is redundant since history is always trimmed at the end of each call. Alternatively, if scrub() or load_persona() are interleaved, the first trim is still unnecessary since those methods reset history entirely.


Moderate Issues

6. Decaying tail_chars only in Gemini

# GeminiEngine only:
eff_tail_chars = max(300, int(tail_chars * (0.8 ** (round_idx - 1))))

OpenAI and Claude use a fixed tail_chars across rounds. This inconsistency should be intentional and documented, or unified.

7. No timeout configuration

None of the API calls specify timeouts. A hung API call will block indefinitely. All three SDKs support timeout parameters.

8. Claude passes empty string for system when no persona is set

system=self.system_prompt if self.system_prompt else "",

The Anthropic API accepts system as optional. Passing an empty string is wasteful at best and could subtly influence behavior. Better:

kwargs = {"model": self.model_name, "max_tokens": self.max_tokens, "messages": messages}
if self.system_prompt:
    kwargs["system"] = self.system_prompt
response = self.client.messages.create(**kwargs)

Suggested Refactoring: Extract the Auto-Continue Loop

All three engines duplicate the same loop structure. This can be extracted:

class AIEngine(ABC):
    def call(self, prompt: str) -> str:
        self._trim_history()
        messages = self._build_messages(prompt)
        full_answer = self._auto_continue_loop(messages)
        self.history.append({"role": "user", "content": prompt})
        self.history.append({"role": "assistant", "content": full_answer})
        self._trim_history()
        return full_answer

    def _auto_continue_loop(self, messages) -> str:
        full_answer = ""
        for round_idx in range(1, self._max_rounds + 1):
            chunk, truncated = self._single_call(messages)
            full_answer += chunk
            if not truncated:
                break
            messages = self._append_continue(messages, chunk, full_answer, round_idx)
        else:
            full_answer += "\n\n[TRUNCATED: auto-continue limit reached]\n"
        return full_answer

    @abstractmethod
    def _build_messages(self, prompt: str) -> list: ...

    @abstractmethod
    def _single_call(self, messages) -> tuple[str, bool]: ...

    @abstractmethod
    def _append_continue(self, messages, chunk, full, round_idx) -> list: ...

This would eliminate ~60% of the duplicated code across the three engine classes.


Summary

Severity Issue Location
🔴 Critical Duplicate & conflicting initialize_engines() config.py vs engines.py
🔴 Critical System prompt as model role in Gemini GeminiEngine.call()
🟠 High Inconsistent error wrapping (AIError) GeminiEngine.call()
🟠 High Fragile truncation heuristics on chunk, not full _hit_output_limit()
🟡 Medium No API timeouts All engines
🟡 Medium Duplicated auto-continue loop All three call() methods
🟢 Low Double _trim_history() calls All engines
🟢 Low Empty string system prompt to Claude ClaudeEngine.call()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions