Analysis of `parsers.py`

This module handles all input parsing — CLI flags, prompt assembly, sequence/pipeline definitions, and shell command parsing. It's the most verbose module with significant duplication.

---

## Architecture

```
parse_cli_input()     → ParsedInput      (AI commands)
_parse_sh_input()     → ParsedShInput    (Shell commands)
build_ai_prompt()     → str              (Prompt assembly)
parse_sequence_steps() → nested list     (Pipeline parsing)
  ├── smart_split_steps()    (split on ->)
  ├── smart_split_parallel() (split on ||)
  ├── normalize_step()       (clean whitespace/comments)
  └── detect_parallel_block() (detect [...])
```

---

## Critical Issues

### 1. **`smart_split_steps` and `smart_split_parallel` are 95% identical**

These two functions are character-by-character parsers that differ only in their delimiter (`->` vs `||`):

```python
# smart_split_steps:
if in_quote is None and ch == "-" and i + 1 < length and text[i + 1] == ">":

# smart_split_parallel:
if in_quote is None and ch == "|" and i + 1 < length and text[i + 1] == "|":
```

Everything else — escape handling, quote tracking, accumulation, final cleanup — is **identical**. This is a textbook case for extraction:

```python
def _smart_split(text: str, delimiter: str) -> list[str]:
    """Split text on a 2-char delimiter, respecting quotes and escapes."""
    assert len(delimiter) == 2
    d0, d1 = delimiter[0], delimiter[1]
    segments = []
    current = []
    in_quote = None
    i = 0
    length = len(text)

    while i < length:
        ch = text[i]

        if ch == "\\" and i + 1 < length:
            current.append(ch)
            current.append(text[i + 1])
            i += 2
            continue

        if ch in ('"', "'"):
            if in_quote is None:
                in_quote = ch
            elif in_quote == ch:
                in_quote = None
            current.append(ch)
            i += 1
            continue

        if in_quote is None and ch == d0 and i + 1 < length and text[i + 1] == d1:
            segments.append("".join(current).strip())
            current = []
            i += 2
            continue

        current.append(ch)
        i += 1

    if current:
        segments.append("".join(current).strip())

    return [s for s in segments if s]


def smart_split_steps(text: str) -> list[str]:
    return _smart_split(text, "->")

def smart_split_parallel(text: str) -> list[str]:
    return _smart_split(text, "||")
```

This eliminates ~50 lines of duplicated code.

### 2. **`VALID_COMMANDS` is hardcoded, diverges from actual engine registry**

```python
VALID_COMMANDS = {
    "gemini", "gpt", "claude", "grok", "local",
    "sh", "scrub", "flush", "efficient",
}
```

This set is defined **inside** `parse_sequence_steps` and duplicates the engine names from `config.py`/`engines.py`. If someone adds a new engine (e.g., `"deepseek"`), they must remember to update this set too. It should be derived dynamically:

```python
from .config import engines

VALID_COMMANDS = set(engines.keys()) | {"sh", "scrub", "flush", "efficient"}
```

**However**, there's a timing issue: `parse_sequence_steps` could be called before `initialize_engines()` populates the `engines` dict. The fix is to compute this at call time, not import time:

```python
def parse_sequence_steps(editor_content: str) -> list[list[list[str]]] | None:
    valid_commands = set(engines.keys()) | {"sh", "scrub", "flush", "efficient"}
    ...
```

### 3. **`_parse_sh_input` silently allows duplicate `-r` and `-w` flags**

```python
if parsed.run_file is not None:
    print("[!] @sh: -r specified more than once.")
# ... but continues anyway, overwriting the value
parsed.run_file = parts[i + 1]
```

The warning is printed but the previous value is silently overwritten. Compare with `parse_cli_input` which does the same for `-w`:

```python
if parsed.write_file is not None:
    print("[!] Warning: write flag specified multiple times. Overwriting.")
parsed.write_file = parts[i + 1]
```

Both parsers should either **reject** duplicates (return `None`) or explicitly document that last-wins is intentional.

---

## Significant Issues

### 4. **`parse_cli_input` uses `indices_to_skip` pattern — fragile and hard to follow**

The parser tracks which indices have been consumed via a set:

```python
indices_to_skip = {0}
# ... later:
indices_to_skip.update({i, i + 1})
# ... finally:
a1_tokens = [parts[j] for j in range(len(parts)) if j not in indices_to_skip]
```

This means bare tokens are collected by **exclusion** — whatever wasn't consumed by a flag. This works, but it's error-prone. If a new flag is added and the developer forgets to add indices to `indices_to_skip`, those tokens leak into `a1`.

A cleaner approach:

```python
bare_tokens = []
i = 1
while i < len(parts):
    token = parts[i]
    if token in ("-r", "--read"):
        ...
        i += 2
    elif token in ("-e", "--edit"):
        ...
        i += 1
    else:
        bare_tokens.append(token)
        i += 1

parsed.a1 = " ".join(bare_tokens)
```

This is what `_parse_sh_input` already does with `bare_tokens`. The two parsers use different patterns for the same problem.

### 5. **`build_ai_prompt` raises `RuntimeError` but callers catch `Exception`**

```python
# build_ai_prompt:
raise RuntimeError(f"Error reading input file '{filename}': {e}")

# handle_ai_interaction (in handlers.py):
try:
    prompt_main = build_ai_prompt(parsed, editor_content)
except Exception as e:
    safe_print(f"[!] {e}")
```

Using `RuntimeError` for a file-not-found condition is too broad. A custom exception or `FileNotFoundError` / `IOError` would be more precise.

### 6. **Escaped characters are preserved literally in `smart_split_*`**

```python
if ch == "\\" and i + 1 < length:
    current.append(ch)        # keeps the backslash
    current.append(text[i + 1])
```

The backslash is **not consumed** — it's passed through to the output. So `\->` becomes the literal string `\->` in the step text, which `shlex.split` later processes. This may or may not be the desired behavior, but it means escape handling is **split across two layers** (this parser and shlex), making it hard to reason about.

### 7. **`normalize_step` strips comments naively**

```python
if not stripped or stripped.startswith("#"):
    continue
```

This strips full-line comments but not inline comments:

```
@gemini "prompt"  # this comment is NOT stripped
```

The `# this comment...` would become part of the command tokens. Either document this limitation or handle inline comments (being careful about `#` inside quotes).

### 8. **`normalize_step` collapse loop is O(n²)**

```python
while "  " in normalized:
    normalized = normalized.replace("  ", " ")
```

For very long texts, this repeatedly scans and replaces. A single regex is cleaner:

```python
normalized = re.sub(r" {2,}", " ", normalized)
```

---

## Moderate Issues

### 9. **`detect_parallel_block` doesn't handle nested brackets**

```python
if stripped.startswith("[") and stripped.endswith("]"):
    return True, stripped[1:-1].strip()
```

Input like `[@gemini "[test]" || @gpt "next"]` would incorrectly match the inner `]` in `"[test]"` — but only if the quotes were stripped, which they aren't here since we're checking raw text. Still, unquoted brackets inside the block would break:

```
[@sh "echo ]" || @gpt "next"]
```

After `shlex.split` in the editor, `echo ]` would be a single token, but `detect_parallel_block` sees the first `]` and stops. However, since the check is `endswith("]")`, it actually only checks the **last** character, so this specific case works. The real failure would be:

```
[@sh "echo test" || @gpt "next"] extra_text
```

This wouldn't be detected as parallel because it doesn't end with `]`. That's actually correct behavior, but there's no error message — it falls through to sequential parsing.

### 10. **Type annotation uses `list[str]` in `@dataclass` default**

```python
read_files: list[str] = None
```

This works in Python 3.10+ but `list[str]` as a type hint in this position requires `from __future__ import annotations` in Python 3.9. The `__post_init__` workaround for the mutable default is correct, but the standard pattern is:

```python
from dataclasses import dataclass, field

@dataclass
class ParsedInput:
    read_files: list[str] = field(default_factory=list)
```

This eliminates the need for `__post_init__` entirely.

---

## Comment Density Problem

This file has the worst comment-to-code ratio. Nearly every line is commented:

```python
steps = []  # List to hold split steps
current = []  # List for the current step being built
in_quote = None  # Track if currently inside a quote
i = 0  # Index for iteration
length = len(text)  # Length of the text
```

These comments add zero information. The variable names already say everything. Removing them would make the file ~30% shorter and significantly more readable.

---

## Summary Table

| Severity | Issue | Location |
|----------|-------|----------|
| 🔴 Critical | `smart_split_steps`/`smart_split_parallel` are near-identical | Both functions |
| 🟠 High | `VALID_COMMANDS` hardcoded, diverges from engine registry | `parse_sequence_steps()` |
| 🟠 High | Inconsistent parsing patterns (`indices_to_skip` vs `bare_tokens`) | `parse_cli_input` vs `_parse_sh_input` |
| 🟡 Medium | Duplicate flags warned but not rejected | `_parse_sh_input`, `parse_cli_input` |
| 🟡 Medium | Escaped chars preserved literally (split responsibility) | `smart_split_*` |
| 🟡 Medium | Comments stripped only at line level, not inline | `normalize_step()` |
| 🟡 Medium | O(n²) whitespace collapse | `normalize_step()` |
| 🟢 Low | `RuntimeError` for file I/O errors | `build_ai_prompt()` |
| 🟢 Low | Mutable default via `__post_init__` instead of `field()` | `ParsedInput` |
| 🟢 Low | Extreme comment density | Throughout |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of `parsers.py` #32

Architecture

Critical Issues

1. `smart_split_steps` and `smart_split_parallel` are 95% identical

2. `VALID_COMMANDS` is hardcoded, diverges from actual engine registry

3. `_parse_sh_input` silently allows duplicate `-r` and `-w` flags

Significant Issues

4. `parse_cli_input` uses `indices_to_skip` pattern — fragile and hard to follow

5. `build_ai_prompt` raises `RuntimeError` but callers catch `Exception`

6. **Escaped characters are preserved literally in `smart_split_*`**

7. `normalize_step` strips comments naively

8. `normalize_step` collapse loop is O(n²)

Moderate Issues

9. `detect_parallel_block` doesn't handle nested brackets

10. Type annotation uses `list[str]` in `@dataclass` default

Comment Density Problem

Summary Table

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Severity	Issue	Location
🔴 Critical	`smart_split_steps`/`smart_split_parallel` are near-identical	Both functions
🟠 High	`VALID_COMMANDS` hardcoded, diverges from engine registry	`parse_sequence_steps()`
🟠 High	Inconsistent parsing patterns (`indices_to_skip` vs `bare_tokens`)	`parse_cli_input` vs `_parse_sh_input`
🟡 Medium	Duplicate flags warned but not rejected	`_parse_sh_input`, `parse_cli_input`
🟡 Medium	Escaped chars preserved literally (split responsibility)	`smart_split_*`
🟡 Medium	Comments stripped only at line level, not inline	`normalize_step()`
🟡 Medium	O(n²) whitespace collapse	`normalize_step()`
🟢 Low	`RuntimeError` for file I/O errors	`build_ai_prompt()`
🟢 Low	Mutable default via `__post_init__` instead of `field()`	`ParsedInput`
🟢 Low	Extreme comment density	Throughout

Analysis of parsers.py #32

Description

Architecture

Critical Issues

1. smart_split_steps and smart_split_parallel are 95% identical

2. VALID_COMMANDS is hardcoded, diverges from actual engine registry

3. _parse_sh_input silently allows duplicate -r and -w flags

Significant Issues

4. parse_cli_input uses indices_to_skip pattern — fragile and hard to follow

5. build_ai_prompt raises RuntimeError but callers catch Exception

6. Escaped characters are preserved literally in smart_split_*

7. normalize_step strips comments naively

8. normalize_step collapse loop is O(n²)

Moderate Issues

9. detect_parallel_block doesn't handle nested brackets

10. Type annotation uses list[str] in @dataclass default

Comment Density Problem

Summary Table

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Analysis of `parsers.py` #32

1. `smart_split_steps` and `smart_split_parallel` are 95% identical

2. `VALID_COMMANDS` is hardcoded, diverges from actual engine registry

3. `_parse_sh_input` silently allows duplicate `-r` and `-w` flags

4. `parse_cli_input` uses `indices_to_skip` pattern — fragile and hard to follow

5. `build_ai_prompt` raises `RuntimeError` but callers catch `Exception`

6. **Escaped characters are preserved literally in `smart_split_*`**

7. `normalize_step` strips comments naively

8. `normalize_step` collapse loop is O(n²)

9. `detect_parallel_block` doesn't handle nested brackets

10. Type annotation uses `list[str]` in `@dataclass` default