Refactor monolithic core.py into focused modules and add response caching by JunjieAraoXiong · Pull Request #1 · T3S1AMAX/autolean

JunjieAraoXiong · 2026-04-04T04:41:40Z

Summary

Break up the 1,776-line core.py into single-responsibility modules:
- providers.py (449 lines) — OpenRouter, Codex Exec backends and response parsing
- compiler.py (208 lines) — Lean compilation, error extraction, error memory
- evaluation.py (230 lines) — semantic fidelity grading prompts and payload parsing
- cache.py (123 lines) — SHA256-based LLM response cache to avoid redundant API calls
core.py reduced from 1,776 → 831 lines, focused purely on pipeline orchestration
New CLI flags: --no-cache and --cache-dir for cache control

Cache details

Keyed by SHA256(prompt + model) to deduplicate identical API requests
Only caches successful responses (returncode == 0)
Sharded storage in .autolean_cache/ with 2-char subdirectories to avoid flat directory bloat
Saves API costs on repeated runs and re-evaluations

Module responsibility

Module	Responsibility	Lines
`providers.py`	"How do I talk to LLMs?"	449
`compiler.py`	"How do I compile Lean and track errors?"	208
`evaluation.py`	"How do I grade formalizations?"	230
`cache.py`	"How do I avoid redundant API calls?"	123
`core.py`	"How do I orchestrate the pipeline?"	831

Test plan

All module imports verified (python -c "from autolean.core import ...")
CLI imports verified (from autolean.cli import main)
Cache unit tested (miss → put → hit, stats, failed responses not cached, clear)
Full integration test with a sample problem JSON

…hing Break up the 1,776-line core.py into single-responsibility modules: - providers.py (449 lines): OpenRouter, Codex Exec backends and response parsing - compiler.py (208 lines): Lean compilation, error extraction, error memory - evaluation.py (230 lines): semantic fidelity grading prompts and payload parsing - cache.py (123 lines): SHA256-based LLM response cache to avoid redundant API calls core.py is now 831 lines focused purely on pipeline orchestration. Cache features: - Keyed by (prompt_hash, model) to deduplicate identical requests - Only caches successful responses (returncode == 0) - Sharded storage in .autolean_cache/ with 2-char subdirectories - New CLI flags: --no-cache, --cache-dir

gemini-code-assist

Code Review

This pull request refactors the core orchestration logic by modularizing the codebase into specialized modules for caching, compilation, evaluation, and provider interactions. It introduces a disk-backed LLM response cache to reduce API costs and adds CLI support for cache configuration. Review feedback identifies a logic error in Lean 4 proposition extraction, suggests optimizing filesystem operations for performance and atomicity, and recommends consolidating redundant constant definitions to improve maintainability.

gemini-code-assist · 2026-04-04T04:43:30Z

src/autolean/compiler.py

+def extract_top_level_prop_from_theorem_header(header: str) -> Optional[str]:
+    depth = 0
+    last_colon = -1
+    for i, ch in enumerate(header):
+        if ch in "([{":
+            depth += 1
+        elif ch in ")]}":
+            depth = max(0, depth - 1)
+        elif ch == ":" and depth == 0:
+            if i + 1 < len(header) and header[i + 1] == "=":
+                continue
+            last_colon = i
+    if last_colon < 0:
+        return None
+    return header[last_colon + 1:].strip()


The logic for extracting the proposition from the theorem header incorrectly uses the last colon at depth 0. In Lean 4, propositions often contain colons at the top level (e.g., in quantifiers like ∀ x : Nat, ...). Using the last colon will truncate the proposition. It should use the first colon at depth 0, which acts as the separator between the theorem name/arguments and the proposition.

References

Ensure logic correctly handles language-specific syntax edge cases, such as colons in Lean 4 propositions.

gemini-code-assist · 2026-04-04T04:43:30Z

src/autolean/cache.py

+    def _cache_path(self, key: str) -> Path:
+        # Use first 2 chars as subdirectory to avoid flat directory with thousands of files
+        subdir = self.cache_dir / key[:2]
+        subdir.mkdir(parents=True, exist_ok=True)
+        return subdir / f"{key}.json"


Calling mkdir on every cache lookup is inefficient and unnecessary for a read operation. This should be moved to the put method and only executed when a write is actually performed.

References

Avoid unnecessary filesystem operations in hot paths or read-only lookups.

gemini-code-assist · 2026-04-04T04:43:30Z

src/autolean/cache.py

+    def put(self, prompt: str, model: str, result: CommandResult) -> None:
+        """Store a successful response in the cache."""
+        if not self.enabled:
+            return
+        # Only cache successful responses
+        if result.returncode != 0:
+            return
+
+        key = self._cache_key(prompt, model)
+        path = self._cache_path(key)
+
+        data = {
+            "prompt_sha256": hashlib.sha256(prompt.encode("utf-8")).hexdigest(),
+            "model": model,
+            "cached_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+            "argv": result.argv,
+            "returncode": result.returncode,
+            "stdout": result.stdout,
+            "stderr": result.stderr,
+        }
+        try:
+            path.write_text(json.dumps(data, ensure_ascii=True, indent=2), encoding="utf-8")
+        except OSError:
+            pass  # Cache write failure is non-fatal


Writing directly to the cache file can lead to corrupted entries if the process is interrupted or if there is concurrent access. It is safer to write to a temporary file and then perform an atomic replace. Additionally, indent=2 is generally unnecessary for cache files and increases storage usage.

References

Use atomic file operations (write to temp + rename) to prevent data corruption.

gemini-code-assist · 2026-04-04T04:43:30Z

src/autolean/evaluation.py

+import json
+from typing import Optional
+
+_EVAL_GRADES = {"A", "B", "C", "D"}


This constant is used in multiple modules (core.py, cli.py). It should be made public (remove the leading underscore) so it can be imported, reducing code duplication across the project.

References

Promote shared constants to public visibility to avoid duplication across modules.

gemini-code-assist · 2026-04-04T04:43:30Z

src/autolean/core.py

+    primary_eval_reasoning_effort: str,
+) -> None:
+    """Merge double-check results into eval_payload in-place."""
+    _EVAL_GRADES = {"A", "B", "C", "D"}


This local definition of _EVAL_GRADES is redundant as it is already defined in evaluation.py. It should be imported from there to maintain a single source of truth and improve maintainability.

References

Avoid redundant definitions of constants across different modules.

…licate EVAL_GRADES - Cache writes now use temp file + os.replace() for atomicity - Removed mkdir from cache __init__ (deferred to put) - Renamed _CODEX_EXEC_CODING_MODEL to CODEX_EXEC_CODING_MODEL (public) - Consolidated _EVAL_GRADES into EVAL_GRADES in evaluation.py, imported elsewhere - Removed indent=2 from cache JSON for compact storage

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

This was referenced Apr 4, 2026

Benchmarks math-ai-org/mathcode#3

Open

Exploring AXLE integration + lemma retrieval for the proving pipeline math-ai-org/mathcode#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor monolithic core.py into focused modules and add response caching#1

Refactor monolithic core.py into focused modules and add response caching#1
JunjieAraoXiong wants to merge 2 commits intoT3S1AMAX:mainfrom
JunjieAraoXiong:refactor/modular-core-and-caching

JunjieAraoXiong commented Apr 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JunjieAraoXiong commented Apr 4, 2026

Summary

Cache details

Module responsibility

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant