Refactor monolithic core.py into focused modules and add response caching#1
Conversation
…hing Break up the 1,776-line core.py into single-responsibility modules: - providers.py (449 lines): OpenRouter, Codex Exec backends and response parsing - compiler.py (208 lines): Lean compilation, error extraction, error memory - evaluation.py (230 lines): semantic fidelity grading prompts and payload parsing - cache.py (123 lines): SHA256-based LLM response cache to avoid redundant API calls core.py is now 831 lines focused purely on pipeline orchestration. Cache features: - Keyed by (prompt_hash, model) to deduplicate identical requests - Only caches successful responses (returncode == 0) - Sharded storage in .autolean_cache/ with 2-char subdirectories - New CLI flags: --no-cache, --cache-dir
There was a problem hiding this comment.
Code Review
This pull request refactors the core orchestration logic by modularizing the codebase into specialized modules for caching, compilation, evaluation, and provider interactions. It introduces a disk-backed LLM response cache to reduce API costs and adds CLI support for cache configuration. Review feedback identifies a logic error in Lean 4 proposition extraction, suggests optimizing filesystem operations for performance and atomicity, and recommends consolidating redundant constant definitions to improve maintainability.
| def extract_top_level_prop_from_theorem_header(header: str) -> Optional[str]: | ||
| depth = 0 | ||
| last_colon = -1 | ||
| for i, ch in enumerate(header): | ||
| if ch in "([{": | ||
| depth += 1 | ||
| elif ch in ")]}": | ||
| depth = max(0, depth - 1) | ||
| elif ch == ":" and depth == 0: | ||
| if i + 1 < len(header) and header[i + 1] == "=": | ||
| continue | ||
| last_colon = i | ||
| if last_colon < 0: | ||
| return None | ||
| return header[last_colon + 1:].strip() |
There was a problem hiding this comment.
The logic for extracting the proposition from the theorem header incorrectly uses the last colon at depth 0. In Lean 4, propositions often contain colons at the top level (e.g., in quantifiers like ∀ x : Nat, ...). Using the last colon will truncate the proposition. It should use the first colon at depth 0, which acts as the separator between the theorem name/arguments and the proposition.
References
- Ensure logic correctly handles language-specific syntax edge cases, such as colons in Lean 4 propositions.
| def _cache_path(self, key: str) -> Path: | ||
| # Use first 2 chars as subdirectory to avoid flat directory with thousands of files | ||
| subdir = self.cache_dir / key[:2] | ||
| subdir.mkdir(parents=True, exist_ok=True) | ||
| return subdir / f"{key}.json" |
There was a problem hiding this comment.
| def put(self, prompt: str, model: str, result: CommandResult) -> None: | ||
| """Store a successful response in the cache.""" | ||
| if not self.enabled: | ||
| return | ||
| # Only cache successful responses | ||
| if result.returncode != 0: | ||
| return | ||
|
|
||
| key = self._cache_key(prompt, model) | ||
| path = self._cache_path(key) | ||
|
|
||
| data = { | ||
| "prompt_sha256": hashlib.sha256(prompt.encode("utf-8")).hexdigest(), | ||
| "model": model, | ||
| "cached_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), | ||
| "argv": result.argv, | ||
| "returncode": result.returncode, | ||
| "stdout": result.stdout, | ||
| "stderr": result.stderr, | ||
| } | ||
| try: | ||
| path.write_text(json.dumps(data, ensure_ascii=True, indent=2), encoding="utf-8") | ||
| except OSError: | ||
| pass # Cache write failure is non-fatal |
There was a problem hiding this comment.
Writing directly to the cache file can lead to corrupted entries if the process is interrupted or if there is concurrent access. It is safer to write to a temporary file and then perform an atomic replace. Additionally, indent=2 is generally unnecessary for cache files and increases storage usage.
References
- Use atomic file operations (write to temp + rename) to prevent data corruption.
src/autolean/evaluation.py
Outdated
| import json | ||
| from typing import Optional | ||
|
|
||
| _EVAL_GRADES = {"A", "B", "C", "D"} |
There was a problem hiding this comment.
src/autolean/core.py
Outdated
| primary_eval_reasoning_effort: str, | ||
| ) -> None: | ||
| """Merge double-check results into eval_payload in-place.""" | ||
| _EVAL_GRADES = {"A", "B", "C", "D"} |
There was a problem hiding this comment.
…licate EVAL_GRADES - Cache writes now use temp file + os.replace() for atomicity - Removed mkdir from cache __init__ (deferred to put) - Renamed _CODEX_EXEC_CODING_MODEL to CODEX_EXEC_CODING_MODEL (public) - Consolidated _EVAL_GRADES into EVAL_GRADES in evaluation.py, imported elsewhere - Removed indent=2 from cache JSON for compact storage
Summary
core.pyinto single-responsibility modules:providers.py(449 lines) — OpenRouter, Codex Exec backends and response parsingcompiler.py(208 lines) — Lean compilation, error extraction, error memoryevaluation.py(230 lines) — semantic fidelity grading prompts and payload parsingcache.py(123 lines) — SHA256-based LLM response cache to avoid redundant API callscore.pyreduced from 1,776 → 831 lines, focused purely on pipeline orchestration--no-cacheand--cache-dirfor cache controlCache details
SHA256(prompt + model)to deduplicate identical API requestsreturncode == 0).autolean_cache/with 2-char subdirectories to avoid flat directory bloatModule responsibility
providers.pycompiler.pyevaluation.pycache.pycore.pyTest plan
python -c "from autolean.core import ...")from autolean.cli import main)