Skip to content

APPS env added#68

Open
SIMONLQY wants to merge 3 commits intoopenreasoner:mainfrom
SIMONLQY:main
Open

APPS env added#68
SIMONLQY wants to merge 3 commits intoopenreasoner:mainfrom
SIMONLQY:main

Conversation

@SIMONLQY
Copy link

@SIMONLQY SIMONLQY commented Dec 2, 2024

  • Add APPS in envs
  • Interfaces alignment with MATH env
  • reason part add initial adaptation for APPS

@YanSong97
Copy link
Collaborator

Thanks for your contribution!

Can you also include a testing dataset to run a demo? See MATH.

@JibanKumar-cloud
Copy link

Hi! I would like to take on this issue and add code generation reasoning support to OpenR.

Proposed approach:

  1. Code execution environment (envs/CODE/) — A safe subprocess-based executor that runs generated code against test cases (with timeout + sandboxing), producing binary pass/fail rewards. This replaces the PRM for the code domain since ground truth is executable.

  2. Dataset loaders — HumanEval (164 problems) and MBPP (974 problems), with prompt formatting for step-by-step reasoning.

  3. Integration with existing search — A CodeVerifier that produces [0, 1] rewards matching OpenR's PRM interface, so Best-of-N, Beam Search, and MCTS work out of the box. Plus a CodeSearchConfig adapter for MCTS.

  4. Evaluation scriptsscripts/eval/code_greedy.sh, code_best_of_n.sh, code_mcts.sh, following the same pattern as the math evaluation scripts.

  5. Benchmarks — I'll include pass@1 results on HumanEval using Qwen2.5-Coder-1.5B-Instruct across greedy, Best-of-N, and MCTS.

I have GPU access (T4 16GB) and can run the full evaluation pipeline. Estimated timeline: ~3 weeks.

Before I start coding, does this approach align with what you had in mind? Any feedback on the architecture or scope is welcome. Happy to adjust.

Thanks for building OpenR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants