parallelcodefoundry · jhdavis8 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -17,3 +17,5 @@ tmp*
 
 enqueue_jobs.sh
 gen_*.slurm
+test-output/
+test-output.json
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,91 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+ParEval-Repo is an LLM benchmarking suite for repository-scale translation of HPC (parallel) codes. It translates codebases between parallel programming models (e.g., CUDA → Kokkos, OpenMP-offload → CUDA) using LLMs, then builds and validates the translated code on HPC systems.
+
+## Setup
+
+```bash
+# Preferred (uses exact pinned versions)
+uv sync && . .venv/bin/activate
+
+# Alternative
+pip install -r requirements.txt
+```
+
+Python 3.11.13+ required.
+
+## Key Commands
+
+### Translation (LLM inference)
+```bash
+python src/translate/translate.py --help
+
+# Example: naive translation
+python src/translate/translate.py \
+  --input targets/XSBench/openmp-offload \
+  --output /path/to/output \
+  --src-model openmp-offload \
+  --dst-model cuda \
+  --method naive \
+  --config config/perlmutter-config.json
+```
+
+### Running drivers (build and test translated repos)
+```bash
+python src/drivers/run-all.py --help
+
+# Example
+python src/drivers/run-all.py \
+  --translations-root /path/to/translations \
+  --output results.json \
+  --config config/perlmutter-config.json
+```
+
+## Architecture
+
+### Two-Phase Pipeline
+
+1. **Translation phase** (`src/translate/`): Takes a source repo + `target.json` for both input and output, calls an LLM to produce translated code files.
+2. **Driver phase** (`src/drivers/`): Builds and runs the translated repos, comparing outputs against expected values.
+
+### Translation Methods (`src/translate/`)
+
+Three strategies, selectable via `--method`:
+
+- **naive** (`naive/`): Translates file-by-file with full repo context in a single LLM prompt. Uses `ChunkFileAgent` for large files.
+- **top-down-agentic** (`top_down_agentic/`): Multi-agent pipeline: `DependencyAgent` builds a file dependency tree → `ChunkAgent` splits large files → `ContextAgent` gathers relevant context → translates each file.
+- **swe-agent** (`swe_agent/`): Wraps the external SWE-agent tool for autonomous translation.
+
+All methods inherit from `Translator` ABC (`translator.py`) and use `GeneratorMixin` (`generator_mixin.py`) for unified LLM access across backends (OpenAI, Gemini, HuggingFace, vLLM, local).
+
+### Target Configuration (`target.json`)
+
+Each `targets/<app>/<model>/` directory requires a `target.json` with:
+- Build/run commands and timeouts
+- Expected output strings for validation (`debug_outputs`, `debug_type`)
+- File classifications (build files, main entry points)
+- Dependency module names (resolved via system config)
+
+The driver reads `target.json` to know how to build, run, and validate each translated repo.
+
+### System Configuration (`config/`)
+
+JSON files per HPC system (e.g., `perlmutter-config.json`) that map dependency names to module load commands and set GPU architecture (`sm`). Passed to both translation and driver scripts via `--config`.
+
+### Driver Utilities (`src/drivers/util.py`)
+
+Core utility classes used throughout drivers:
+- `CommandExecutor` — runs shell commands with timeout and dry-run support
+- `ConfigManager` — loads and resolves system config
+- `DataManager` — persists results to JSON
+- `ResultBuilder` — constructs structured build/run result objects
+
+## Adding a New Target
+
+1. Create `targets/<app>/<model>/repo/` with source code
+2. Create `targets/<app>/<model>/target.json` following the schema in existing targets
+3. Provide a `target.json` for both source and destination models when translating
diff --git a/config/nano_v3_reasoning_parser.py b/config/nano_v3_reasoning_parser.py
@@ -0,0 +1,19 @@
+from vllm.reasoning.abs_reasoning_parsers import ReasoningParserManager
+from vllm.reasoning.deepseek_r1_reasoning_parser import DeepSeekR1ReasoningParser
+
+
+@ReasoningParserManager.register_module("nano_v3")
+class NanoV3ReasoningParser(DeepSeekR1ReasoningParser):
+    def extract_reasoning(self, model_output, request):
+        reasoning_content, final_content = super().extract_reasoning(
+            model_output, request
+        )
+        if (
+            hasattr(request, "chat_template_kwargs")
+            and request.chat_template_kwargs
+            and request.chat_template_kwargs.get("enable_thinking") is False
+            and final_content is None
+        ):
+            reasoning_content, final_content = final_content, reasoning_content
+
+        return reasoning_content, final_content
diff --git a/config/perlmutter-vllm-glm.yaml b/config/perlmutter-vllm-glm.yaml
@@ -0,0 +1,8 @@
+# vLLM config file for Perlmutter when using GLM-4.7-GGUF:Q4_K_M
+
+tensor-parallel-size: 4
+max-model-len: 131072
+max-num-seqs: 2
+enable-auto-tool-choice: true
+tool-call-parser: glm45
+reasoning-parser: glm45
diff --git a/config/perlmutter-vllm-nemo.yaml b/config/perlmutter-vllm-nemo.yaml
@@ -0,0 +1,10 @@
+# vLLM config file for Perlmutter when using nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
+
+tensor-parallel-size: 4
+max-model-len: 262144
+max-num-seqs: 8
+reasoning-parser-plugin: config/nano_v3_reasoning_parser.py
+reasoning-parser: nano_v3
+enable-auto-tool-choice: true
+tool-call-parser: qwen3_coder
+trust-remote-code: true
diff --git a/config/perlmutter-vllm-oss.yaml b/config/perlmutter-vllm-oss.yaml
@@ -0,0 +1,12 @@
+# vLLM config file for Perlmutter when using openai/gpt-oss-120b
+
+tensor-parallel-size: 4
+async-scheduling: true
+no-enable-prefix-caching: true
+max-model-len: 131072
+gpu-memory-utilization: 0.95
+max-num-seqs: 4
+max-num-batched-tokens: 2048
+tool-call-parser: openai
+reasoning-parser: openai_gptoss
+enable-auto-tool-choice: true
diff --git a/config/perlmutter-vllm-qwen.yaml b/config/perlmutter-vllm-qwen.yaml
@@ -0,0 +1,7 @@
+# vLLM config file for Perlmutter when using Qwen/Qwen3-Coder-Next
+
+tensor-parallel-size: 4
+max-model-len: 262144
+max-num-seqs: 2
+enable-auto-tool-choice: true
+tool-call-parser: qwen3_coder
diff --git a/pyproject.toml b/pyproject.toml
@@ -11,4 +11,5 @@ dependencies = [
     "langchain-text-splitters>=0.3.11",
     "openai>=1.107.3",
     "pandas>=2.3.2",
+    "tiktoken>=0.9.0",
 ]