-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This module handles all input parsing — CLI flags, prompt assembly, sequence/pipeline definitions, and shell command parsing. It's the most verbose module with significant duplication.
Architecture
parse_cli_input() → ParsedInput (AI commands)
_parse_sh_input() → ParsedShInput (Shell commands)
build_ai_prompt() → str (Prompt assembly)
parse_sequence_steps() → nested list (Pipeline parsing)
├── smart_split_steps() (split on ->)
├── smart_split_parallel() (split on ||)
├── normalize_step() (clean whitespace/comments)
└── detect_parallel_block() (detect [...])
Critical Issues
1. smart_split_steps and smart_split_parallel are 95% identical
These two functions are character-by-character parsers that differ only in their delimiter (-> vs ||):
# smart_split_steps:
if in_quote is None and ch == "-" and i + 1 < length and text[i + 1] == ">":
# smart_split_parallel:
if in_quote is None and ch == "|" and i + 1 < length and text[i + 1] == "|":Everything else — escape handling, quote tracking, accumulation, final cleanup — is identical. This is a textbook case for extraction:
def _smart_split(text: str, delimiter: str) -> list[str]:
"""Split text on a 2-char delimiter, respecting quotes and escapes."""
assert len(delimiter) == 2
d0, d1 = delimiter[0], delimiter[1]
segments = []
current = []
in_quote = None
i = 0
length = len(text)
while i < length:
ch = text[i]
if ch == "\\" and i + 1 < length:
current.append(ch)
current.append(text[i + 1])
i += 2
continue
if ch in ('"', "'"):
if in_quote is None:
in_quote = ch
elif in_quote == ch:
in_quote = None
current.append(ch)
i += 1
continue
if in_quote is None and ch == d0 and i + 1 < length and text[i + 1] == d1:
segments.append("".join(current).strip())
current = []
i += 2
continue
current.append(ch)
i += 1
if current:
segments.append("".join(current).strip())
return [s for s in segments if s]
def smart_split_steps(text: str) -> list[str]:
return _smart_split(text, "->")
def smart_split_parallel(text: str) -> list[str]:
return _smart_split(text, "||")This eliminates ~50 lines of duplicated code.
2. VALID_COMMANDS is hardcoded, diverges from actual engine registry
VALID_COMMANDS = {
"gemini", "gpt", "claude", "grok", "local",
"sh", "scrub", "flush", "efficient",
}This set is defined inside parse_sequence_steps and duplicates the engine names from config.py/engines.py. If someone adds a new engine (e.g., "deepseek"), they must remember to update this set too. It should be derived dynamically:
from .config import engines
VALID_COMMANDS = set(engines.keys()) | {"sh", "scrub", "flush", "efficient"}However, there's a timing issue: parse_sequence_steps could be called before initialize_engines() populates the engines dict. The fix is to compute this at call time, not import time:
def parse_sequence_steps(editor_content: str) -> list[list[list[str]]] | None:
valid_commands = set(engines.keys()) | {"sh", "scrub", "flush", "efficient"}
...3. _parse_sh_input silently allows duplicate -r and -w flags
if parsed.run_file is not None:
print("[!] @sh: -r specified more than once.")
# ... but continues anyway, overwriting the value
parsed.run_file = parts[i + 1]The warning is printed but the previous value is silently overwritten. Compare with parse_cli_input which does the same for -w:
if parsed.write_file is not None:
print("[!] Warning: write flag specified multiple times. Overwriting.")
parsed.write_file = parts[i + 1]Both parsers should either reject duplicates (return None) or explicitly document that last-wins is intentional.
Significant Issues
4. parse_cli_input uses indices_to_skip pattern — fragile and hard to follow
The parser tracks which indices have been consumed via a set:
indices_to_skip = {0}
# ... later:
indices_to_skip.update({i, i + 1})
# ... finally:
a1_tokens = [parts[j] for j in range(len(parts)) if j not in indices_to_skip]This means bare tokens are collected by exclusion — whatever wasn't consumed by a flag. This works, but it's error-prone. If a new flag is added and the developer forgets to add indices to indices_to_skip, those tokens leak into a1.
A cleaner approach:
bare_tokens = []
i = 1
while i < len(parts):
token = parts[i]
if token in ("-r", "--read"):
...
i += 2
elif token in ("-e", "--edit"):
...
i += 1
else:
bare_tokens.append(token)
i += 1
parsed.a1 = " ".join(bare_tokens)This is what _parse_sh_input already does with bare_tokens. The two parsers use different patterns for the same problem.
5. build_ai_prompt raises RuntimeError but callers catch Exception
# build_ai_prompt:
raise RuntimeError(f"Error reading input file '{filename}': {e}")
# handle_ai_interaction (in handlers.py):
try:
prompt_main = build_ai_prompt(parsed, editor_content)
except Exception as e:
safe_print(f"[!] {e}")Using RuntimeError for a file-not-found condition is too broad. A custom exception or FileNotFoundError / IOError would be more precise.
6. Escaped characters are preserved literally in smart_split_*
if ch == "\\" and i + 1 < length:
current.append(ch) # keeps the backslash
current.append(text[i + 1])The backslash is not consumed — it's passed through to the output. So \-> becomes the literal string \-> in the step text, which shlex.split later processes. This may or may not be the desired behavior, but it means escape handling is split across two layers (this parser and shlex), making it hard to reason about.
7. normalize_step strips comments naively
if not stripped or stripped.startswith("#"):
continueThis strips full-line comments but not inline comments:
@gemini "prompt" # this comment is NOT stripped
The # this comment... would become part of the command tokens. Either document this limitation or handle inline comments (being careful about # inside quotes).
8. normalize_step collapse loop is O(n²)
while " " in normalized:
normalized = normalized.replace(" ", " ")For very long texts, this repeatedly scans and replaces. A single regex is cleaner:
normalized = re.sub(r" {2,}", " ", normalized)Moderate Issues
9. detect_parallel_block doesn't handle nested brackets
if stripped.startswith("[") and stripped.endswith("]"):
return True, stripped[1:-1].strip()Input like [@gemini "[test]" || @gpt "next"] would incorrectly match the inner ] in "[test]" — but only if the quotes were stripped, which they aren't here since we're checking raw text. Still, unquoted brackets inside the block would break:
[@sh "echo ]" || @gpt "next"]
After shlex.split in the editor, echo ] would be a single token, but detect_parallel_block sees the first ] and stops. However, since the check is endswith("]"), it actually only checks the last character, so this specific case works. The real failure would be:
[@sh "echo test" || @gpt "next"] extra_text
This wouldn't be detected as parallel because it doesn't end with ]. That's actually correct behavior, but there's no error message — it falls through to sequential parsing.
10. Type annotation uses list[str] in @dataclass default
read_files: list[str] = NoneThis works in Python 3.10+ but list[str] as a type hint in this position requires from __future__ import annotations in Python 3.9. The __post_init__ workaround for the mutable default is correct, but the standard pattern is:
from dataclasses import dataclass, field
@dataclass
class ParsedInput:
read_files: list[str] = field(default_factory=list)This eliminates the need for __post_init__ entirely.
Comment Density Problem
This file has the worst comment-to-code ratio. Nearly every line is commented:
steps = [] # List to hold split steps
current = [] # List for the current step being built
in_quote = None # Track if currently inside a quote
i = 0 # Index for iteration
length = len(text) # Length of the textThese comments add zero information. The variable names already say everything. Removing them would make the file ~30% shorter and significantly more readable.
Summary Table
| Severity | Issue | Location |
|---|---|---|
| 🔴 Critical | smart_split_steps/smart_split_parallel are near-identical |
Both functions |
| 🟠 High | VALID_COMMANDS hardcoded, diverges from engine registry |
parse_sequence_steps() |
| 🟠 High | Inconsistent parsing patterns (indices_to_skip vs bare_tokens) |
parse_cli_input vs _parse_sh_input |
| 🟡 Medium | Duplicate flags warned but not rejected | _parse_sh_input, parse_cli_input |
| 🟡 Medium | Escaped chars preserved literally (split responsibility) | smart_split_* |
| 🟡 Medium | Comments stripped only at line level, not inline | normalize_step() |
| 🟡 Medium | O(n²) whitespace collapse | normalize_step() |
| 🟢 Low | RuntimeError for file I/O errors |
build_ai_prompt() |
| 🟢 Low | Mutable default via __post_init__ instead of field() |
ParsedInput |
| 🟢 Low | Extreme comment density | Throughout |