Skip to content

Refactor monolithic core.py into focused modules and add response caching#1

Open
JunjieAraoXiong wants to merge 2 commits intoT3S1AMAX:mainfrom
JunjieAraoXiong:refactor/modular-core-and-caching
Open

Refactor monolithic core.py into focused modules and add response caching#1
JunjieAraoXiong wants to merge 2 commits intoT3S1AMAX:mainfrom
JunjieAraoXiong:refactor/modular-core-and-caching

Conversation

@JunjieAraoXiong
Copy link
Copy Markdown

Summary

  • Break up the 1,776-line core.py into single-responsibility modules:
    • providers.py (449 lines) — OpenRouter, Codex Exec backends and response parsing
    • compiler.py (208 lines) — Lean compilation, error extraction, error memory
    • evaluation.py (230 lines) — semantic fidelity grading prompts and payload parsing
    • cache.py (123 lines) — SHA256-based LLM response cache to avoid redundant API calls
  • core.py reduced from 1,776 → 831 lines, focused purely on pipeline orchestration
  • New CLI flags: --no-cache and --cache-dir for cache control

Cache details

  • Keyed by SHA256(prompt + model) to deduplicate identical API requests
  • Only caches successful responses (returncode == 0)
  • Sharded storage in .autolean_cache/ with 2-char subdirectories to avoid flat directory bloat
  • Saves API costs on repeated runs and re-evaluations

Module responsibility

Module Responsibility Lines
providers.py "How do I talk to LLMs?" 449
compiler.py "How do I compile Lean and track errors?" 208
evaluation.py "How do I grade formalizations?" 230
cache.py "How do I avoid redundant API calls?" 123
core.py "How do I orchestrate the pipeline?" 831

Test plan

  • All module imports verified (python -c "from autolean.core import ...")
  • CLI imports verified (from autolean.cli import main)
  • Cache unit tested (miss → put → hit, stats, failed responses not cached, clear)
  • Full integration test with a sample problem JSON

…hing

Break up the 1,776-line core.py into single-responsibility modules:
- providers.py (449 lines): OpenRouter, Codex Exec backends and response parsing
- compiler.py (208 lines): Lean compilation, error extraction, error memory
- evaluation.py (230 lines): semantic fidelity grading prompts and payload parsing
- cache.py (123 lines): SHA256-based LLM response cache to avoid redundant API calls

core.py is now 831 lines focused purely on pipeline orchestration.

Cache features:
- Keyed by (prompt_hash, model) to deduplicate identical requests
- Only caches successful responses (returncode == 0)
- Sharded storage in .autolean_cache/ with 2-char subdirectories
- New CLI flags: --no-cache, --cache-dir
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the core orchestration logic by modularizing the codebase into specialized modules for caching, compilation, evaluation, and provider interactions. It introduces a disk-backed LLM response cache to reduce API costs and adds CLI support for cache configuration. Review feedback identifies a logic error in Lean 4 proposition extraction, suggests optimizing filesystem operations for performance and atomicity, and recommends consolidating redundant constant definitions to improve maintainability.

Comment on lines +112 to +126
def extract_top_level_prop_from_theorem_header(header: str) -> Optional[str]:
depth = 0
last_colon = -1
for i, ch in enumerate(header):
if ch in "([{":
depth += 1
elif ch in ")]}":
depth = max(0, depth - 1)
elif ch == ":" and depth == 0:
if i + 1 < len(header) and header[i + 1] == "=":
continue
last_colon = i
if last_colon < 0:
return None
return header[last_colon + 1:].strip()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for extracting the proposition from the theorem header incorrectly uses the last colon at depth 0. In Lean 4, propositions often contain colons at the top level (e.g., in quantifiers like ∀ x : Nat, ...). Using the last colon will truncate the proposition. It should use the first colon at depth 0, which acts as the separator between the theorem name/arguments and the proposition.

References
  1. Ensure logic correctly handles language-specific syntax edge cases, such as colons in Lean 4 propositions.

Comment on lines +45 to +49
def _cache_path(self, key: str) -> Path:
# Use first 2 chars as subdirectory to avoid flat directory with thousands of files
subdir = self.cache_dir / key[:2]
subdir.mkdir(parents=True, exist_ok=True)
return subdir / f"{key}.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling mkdir on every cache lookup is inefficient and unnecessary for a read operation. This should be moved to the put method and only executed when a write is actually performed.

References
  1. Avoid unnecessary filesystem operations in hot paths or read-only lookups.

Comment on lines +76 to +99
def put(self, prompt: str, model: str, result: CommandResult) -> None:
"""Store a successful response in the cache."""
if not self.enabled:
return
# Only cache successful responses
if result.returncode != 0:
return

key = self._cache_key(prompt, model)
path = self._cache_path(key)

data = {
"prompt_sha256": hashlib.sha256(prompt.encode("utf-8")).hexdigest(),
"model": model,
"cached_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"argv": result.argv,
"returncode": result.returncode,
"stdout": result.stdout,
"stderr": result.stderr,
}
try:
path.write_text(json.dumps(data, ensure_ascii=True, indent=2), encoding="utf-8")
except OSError:
pass # Cache write failure is non-fatal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Writing directly to the cache file can lead to corrupted entries if the process is interrupted or if there is concurrent access. It is safer to write to a temporary file and then perform an atomic replace. Additionally, indent=2 is generally unnecessary for cache files and increases storage usage.

References
  1. Use atomic file operations (write to temp + rename) to prevent data corruption.

import json
from typing import Optional

_EVAL_GRADES = {"A", "B", "C", "D"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This constant is used in multiple modules (core.py, cli.py). It should be made public (remove the leading underscore) so it can be imported, reducing code duplication across the project.

References
  1. Promote shared constants to public visibility to avoid duplication across modules.

primary_eval_reasoning_effort: str,
) -> None:
"""Merge double-check results into eval_payload in-place."""
_EVAL_GRADES = {"A", "B", "C", "D"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This local definition of _EVAL_GRADES is redundant as it is already defined in evaluation.py. It should be imported from there to maintain a single source of truth and improve maintainability.

References
  1. Avoid redundant definitions of constants across different modules.

…licate EVAL_GRADES

- Cache writes now use temp file + os.replace() for atomicity
- Removed mkdir from cache __init__ (deferred to put)
- Renamed _CODEX_EXEC_CODING_MODEL to CODEX_EXEC_CODING_MODEL (public)
- Consolidated _EVAL_GRADES into EVAL_GRADES in evaluation.py, imported elsewhere
- Removed indent=2 from cache JSON for compact storage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant