[codex] Add Harbor covered benchmark mapping#728
Conversation
cc30da7 to
057a3aa
Compare
2e8c35c to
538a073
Compare
|
@OpenHands /codereview |
|
I'm on it! neubig can track my progress at all-hands.dev |
|
@OpenHands /codereview Validation context for the re-review:
Note: the SDK |
|
I'm on it! neubig can track my progress at all-hands.dev |
|
After #727 was merged and the stacked base branch was deleted, this PR could not be reopened or retargeted by GitHub ( I created replacement PR #731 on top of This comment was created by an AI agent (OpenHands) on behalf of the user. |
Summary
Stacked on #727.
Closes #720.
Validation
python -m py_compile benchmarks/utils/harbor_compat.py benchmarks/terminalbench/config.py benchmarks/skillsbench/config.pyuv run pytest tests/test_harbor_compat.py tests/test_terminalbench.py tests/test_skillsbench_run_infer.pyuv run ruff check benchmarks/utils/harbor.py benchmarks/utils/harbor_compat.py benchmarks/terminalbench/run_infer.py benchmarks/skillsbench/run_infer.py tests/test_harbor_compat.py