Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,5 @@ tmp*

enqueue_jobs.sh
gen_*.slurm
test-output/
test-output.json
91 changes: 91 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

ParEval-Repo is an LLM benchmarking suite for repository-scale translation of HPC (parallel) codes. It translates codebases between parallel programming models (e.g., CUDA → Kokkos, OpenMP-offload → CUDA) using LLMs, then builds and validates the translated code on HPC systems.

## Setup

```bash
# Preferred (uses exact pinned versions)
uv sync && . .venv/bin/activate

# Alternative
pip install -r requirements.txt
```

Python 3.11.13+ required.

## Key Commands

### Translation (LLM inference)
```bash
python src/translate/translate.py --help

# Example: naive translation
python src/translate/translate.py \
--input targets/XSBench/openmp-offload \
--output /path/to/output \
--src-model openmp-offload \
--dst-model cuda \
--method naive \
--config config/perlmutter-config.json
```

### Running drivers (build and test translated repos)
```bash
python src/drivers/run-all.py --help

# Example
python src/drivers/run-all.py \
--translations-root /path/to/translations \
--output results.json \
--config config/perlmutter-config.json
```

## Architecture

### Two-Phase Pipeline

1. **Translation phase** (`src/translate/`): Takes a source repo + `target.json` for both input and output, calls an LLM to produce translated code files.
2. **Driver phase** (`src/drivers/`): Builds and runs the translated repos, comparing outputs against expected values.

### Translation Methods (`src/translate/`)

Three strategies, selectable via `--method`:

- **naive** (`naive/`): Translates file-by-file with full repo context in a single LLM prompt. Uses `ChunkFileAgent` for large files.
- **top-down-agentic** (`top_down_agentic/`): Multi-agent pipeline: `DependencyAgent` builds a file dependency tree → `ChunkAgent` splits large files → `ContextAgent` gathers relevant context → translates each file.
- **swe-agent** (`swe_agent/`): Wraps the external SWE-agent tool for autonomous translation.

All methods inherit from `Translator` ABC (`translator.py`) and use `GeneratorMixin` (`generator_mixin.py`) for unified LLM access across backends (OpenAI, Gemini, HuggingFace, vLLM, local).

### Target Configuration (`target.json`)

Each `targets/<app>/<model>/` directory requires a `target.json` with:
- Build/run commands and timeouts
- Expected output strings for validation (`debug_outputs`, `debug_type`)
- File classifications (build files, main entry points)
- Dependency module names (resolved via system config)

The driver reads `target.json` to know how to build, run, and validate each translated repo.

### System Configuration (`config/`)

JSON files per HPC system (e.g., `perlmutter-config.json`) that map dependency names to module load commands and set GPU architecture (`sm`). Passed to both translation and driver scripts via `--config`.

### Driver Utilities (`src/drivers/util.py`)

Core utility classes used throughout drivers:
- `CommandExecutor` — runs shell commands with timeout and dry-run support
- `ConfigManager` — loads and resolves system config
- `DataManager` — persists results to JSON
- `ResultBuilder` — constructs structured build/run result objects

## Adding a New Target

1. Create `targets/<app>/<model>/repo/` with source code
2. Create `targets/<app>/<model>/target.json` following the schema in existing targets
3. Provide a `target.json` for both source and destination models when translating
19 changes: 19 additions & 0 deletions config/nano_v3_reasoning_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from vllm.reasoning.abs_reasoning_parsers import ReasoningParserManager
from vllm.reasoning.deepseek_r1_reasoning_parser import DeepSeekR1ReasoningParser


@ReasoningParserManager.register_module("nano_v3")
class NanoV3ReasoningParser(DeepSeekR1ReasoningParser):
def extract_reasoning(self, model_output, request):
reasoning_content, final_content = super().extract_reasoning(
model_output, request
)
if (
hasattr(request, "chat_template_kwargs")
and request.chat_template_kwargs
and request.chat_template_kwargs.get("enable_thinking") is False
and final_content is None
):
reasoning_content, final_content = final_content, reasoning_content

return reasoning_content, final_content
8 changes: 8 additions & 0 deletions config/perlmutter-vllm-glm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# vLLM config file for Perlmutter when using GLM-4.7-GGUF:Q4_K_M

tensor-parallel-size: 4
max-model-len: 131072
max-num-seqs: 2
enable-auto-tool-choice: true
tool-call-parser: glm45
reasoning-parser: glm45
10 changes: 10 additions & 0 deletions config/perlmutter-vllm-nemo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# vLLM config file for Perlmutter when using nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

tensor-parallel-size: 4
max-model-len: 262144
max-num-seqs: 8
reasoning-parser-plugin: config/nano_v3_reasoning_parser.py
reasoning-parser: nano_v3
enable-auto-tool-choice: true
tool-call-parser: qwen3_coder
trust-remote-code: true
12 changes: 12 additions & 0 deletions config/perlmutter-vllm-oss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# vLLM config file for Perlmutter when using openai/gpt-oss-120b

tensor-parallel-size: 4
async-scheduling: true
no-enable-prefix-caching: true
max-model-len: 131072
gpu-memory-utilization: 0.95
max-num-seqs: 4
max-num-batched-tokens: 2048
tool-call-parser: openai
reasoning-parser: openai_gptoss
enable-auto-tool-choice: true
7 changes: 7 additions & 0 deletions config/perlmutter-vllm-qwen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# vLLM config file for Perlmutter when using Qwen/Qwen3-Coder-Next

tensor-parallel-size: 4
max-model-len: 262144
max-num-seqs: 2
enable-auto-tool-choice: true
tool-call-parser: qwen3_coder
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ dependencies = [
"langchain-text-splitters>=0.3.11",
"openai>=1.107.3",
"pandas>=2.3.2",
"tiktoken>=0.9.0",
]
Loading