Skip to content

Analysis of utils.py #33

@albatrosary

Description

@albatrosary

This is the utility belt — security, editor integration, console helpers, string manipulation. It's generally well-structured but has a few significant issues.


Architecture

utils.py
├── Security:     secure_resolve_path()
├── Editor:       open_editor_for_prompt()
├── Console:      safe_print(), clear_thinking_line(), print_welcome_banner()
├── Config:       _get_cfg_int()
├── AI helpers:   _make_continue_prompt(), _tail_of()
└── Extraction:   extract_code_block()

Critical Issues

1. secure_resolve_path has a path traversal bypass on certain platforms

if not os.path.commonpath([abs_base, target_path]) == abs_base:
    raise PermissionError(...)

os.path.commonpath raises ValueError on Windows if paths are on different drives:

os.path.commonpath(["C:\\data", "D:\\evil"])  # ValueError

This uncaught exception would crash the caller instead of raising PermissionError. More critically, on case-insensitive filesystems (macOS HFS+, Windows NTFS), os.path.commonpath does case-sensitive comparison, so:

abs_base = "/Users/data/Work_Data"
target   = "/Users/data/work_data/../../../etc/passwd"
# After abspath: "/etc/passwd"
# commonpath correctly catches this

That specific case works because abspath resolves ... But the real issue is symlinks:

# If /Users/data/work_data/link -> /etc
filename = "link/passwd"
target_path = os.path.abspath("/Users/data/work_data/link/passwd")
# = "/Users/data/work_data/link/passwd"  (abspath doesn't resolve symlinks!)
# commonpath check PASSES
# But the actual file is /etc/passwd

Fix: Use os.path.realpath instead of os.path.abspath:

abs_base = os.path.realpath(base_dir)
target_path = os.path.realpath(os.path.join(abs_base, filename))

if not target_path.startswith(abs_base + os.sep) and target_path != abs_base:
    raise PermissionError(...)

The startswith check with os.sep appended prevents /data_evil matching base /data.

2. extract_code_block misparses nested or indented code fences

if line.startswith("```"):
    if not in_block:
        in_block = True
    else:
        if line.strip() == "```":
            in_block = False
            ...
        else:
            current_block.append(line)

The logic is:

  • Opening: any line starting with ```
  • Closing: only lines that are exactly ``` after stripping

This means:

```python
def foo():
    pass
```python   ← this does NOT close the block (strip gives "```python")

The block never closes. More importantly, this line:

code here

Would be treated as an opening fence (starts with ```), not a 4-backtick fence. The parser doesn't handle variable-length fences per the CommonMark spec.

Also, the opening fence check skips the language identifier line, which is correct, but a line inside a code block that happens to start with ``` (e.g., showing markdown syntax) would be misinterpreted:

```markdown
Here's how to write a code block:
```python   ← parser thinks this is inside, treated as content
print("hello")
```          ← parser closes here, losing the real structure

Practical fix:

import re

def extract_code_block(text: str) -> str:
    pattern = re.compile(r"^(`{3,})(\w*)\s*\n(.*?)^\1\s*$", re.MULTILINE | re.DOTALL)
    blocks = [m.group(3) for m in pattern.finditer(text)]
    return "\n\n".join(blocks) if blocks else text

This handles variable-length fences and matches opening/closing fence lengths per CommonMark.


Significant Issues

3. open_editor_for_prompt has a TOCTOU vulnerability with temp files

tmp_fd, tmp_path = tempfile.mkstemp(suffix=".md", prefix="ai_prompt_")
with os.fdopen(tmp_fd, "w", encoding="utf-8") as f:
    tmp_fd = None
    f.write(header)

result = subprocess.run(editor_cmd + [tmp_path], check=False)

with open(tmp_path, encoding="utf-8") as f:
    raw_content = f.read()

Between the editor closing and the file being read back, another process could modify the temp file. This is a minor concern for a CLI tool, but the temp file is created with mkstemp defaults which on Unix gives 0o600 permissions — that's good.

More concerning: The tmp_fd = None trick to prevent double-close is clever but fragile. If os.fdopen raises an exception after taking ownership of the fd but before the assignment, the finally block would try to close an already-transferred fd. In practice, os.fdopen is unlikely to fail at that point, but the pattern is unusual.

4. clear_thinking_line doesn't account for multi-byte characters or ANSI codes

cols = shutil.get_terminal_size(fallback=(80, 20)).columns
print(" " * (cols - 1), end="\r", flush=True)

If the "thinking" line contained ANSI escape codes (which it doesn't currently, but could in the future), this wouldn't fully clear it. The standard approach is:

print(f"\r\033[K", end="", flush=True)  # ANSI: carriage return + clear to end of line

5. _get_cfg_int swallows all exceptions silently

def _get_cfg_int(config, section, key, fallback):
    try:
        return config.getint(section, key, fallback=fallback)
    except Exception:
        return fallback

This catches everything — including KeyboardInterrupt via the broad Exception (actually Exception doesn't catch KeyboardInterrupt, but it catches AttributeError if config is None, TypeError if fallback is wrong type, etc.). A malformed config value like max_tokens = "abc" would silently fall back with no warning. At minimum, log when this happens:

except (ValueError, configparser.Error) as e:
    logger.warning(f"Config error for [{section}]{key}: {e}. Using fallback={fallback}")
    return fallback

Moderate Issues

6. print_welcome_banner hardcodes the command list

print("[*] Commands: @model, @efficient, @scrub, @sequence, @sh, exit")

This duplicates the command registry in handlers.py and parsers.py. If commands change, three places need updating.

7. _make_continue_prompt is hardcoded English with no configurability

return (
    "The output was truncated due to an output limit.\n"
    "Continue EXACTLY from where you stopped.\n"
    ...
)

For a multi-engine tool, the continuation prompt could benefit from being engine-specific or configurable via the INI file. Some models respond better to different phrasing.

8. open_editor_for_prompt accepts logger parameter but other functions use the global logger

def open_editor_for_prompt(logger=None) -> str | None:
    if logger is None:
        logger = logging.getLogger("MultiAI")

This is the only utility function that accepts a logger parameter. Every other function in the codebase imports and uses the global logger from config.py. This inconsistency suggests an incomplete refactoring.

9. Comment noise continues

if not text:
    return ""  # Return empty string if input is empty
return text[-n:] if len(text) > n else text  # Return last n characters
extracted_blocks = []  # List to hold extracted code blocks
current_block = []  # List for the current block being constructed
in_block = False  # Flag to track if currently inside a code block

Summary Table

Severity Issue Location
🔴 Critical Symlink bypass in path traversal check secure_resolve_path()
🔴 Critical Code fence parser mishandles nested/language fences extract_code_block()
🟠 High commonpath raises ValueError on Windows cross-drive secure_resolve_path()
🟡 Medium _get_cfg_int silently swallows config errors _get_cfg_int()
🟡 Medium Clearing line without ANSI escape codes clear_thinking_line()
🟡 Medium Hardcoded command list in banner print_welcome_banner()
🟡 Medium Inconsistent logger parameter pattern open_editor_for_prompt()
🟢 Low Continue prompt not configurable _make_continue_prompt()
🟢 Low TOCTOU on temp file (minimal risk) open_editor_for_prompt()
🟢 Low Excessive comments Throughout

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions