Open
Conversation
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Member
|
I see that we're just using what upstream vllm is using. |
atalman
approved these changes
Oct 7, 2025
Contributor
Author
Yeah, this is just to follow the same format the vLLM is using with more models added |
yeqcharlotte
reviewed
Oct 8, 2025
| model_name: str, tasks: List[str], tp_size: int, config: Dict[str, Any] | ||
| ) -> Dict[str, Any]: | ||
| trust_remote_code = config.get("trust_remote_code", False) | ||
| max_model_len = config.get("max_model_len", 8192) |
There was a problem hiding this comment.
this likely will impact the result. ideally it's set in auto.
Contributor
Author
There was a problem hiding this comment.
It looks like vLLM lm-eval does like that value and ends up with this error:
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 10-10 01:22:45 [__init__.py:215] Automatically detected platform cuda.
Traceback (most recent call last):
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 186, in <module>
main()
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 182, in main
run_lm_eval(args.configs_dir, models, tasks)
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 164, in run_lm_eval
results = run(model_name, selected_tasks, tp_size, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/vllm-workspace/pytorch-integration-testing/vllm-eval-harness/run_vllm_eval_harness.py", line 105, in run
return lm_eval.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/utils.py", line 456, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/evaluator.py", line 245, in simple_evaluate
lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/api/model.py", line 155, in create_from_arg_string
return cls(**args, **args2)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/lm_eval/models/vllm_causallms.py", line 170, in __init__
"max_model_len": int(self._max_length) if self._max_length else None,
^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'auto'
Let me take a closer look.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The first iteration of the script is at
vllm-eval-harness/run_vllm_eval_harness.py. The rest are lm-eval configurations modified from the format used by vLLM CI https://github.com/vllm-project/vllm/tree/main/.buildkite/lm-eval-harness/configs. There are a couple of tweaks in the format:B200, so that we can run the same task on multiple devices on CI if needed--tensor-parallel-sizeto dictate how many devices are need to evaluate the modelORG/MODELformat to make it easier to find the right configTesting
This script can be run locally on B200 with
Next steps
cc @zhewenl @yeqcharlotte