-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
bugSomething isn't workingSomething isn't workingcontent_check_passedissue content check passedissue content check passed
Description
操作系统及版本
ubuntu20.04
安装工具的python环境
在anaconda/miniconda创建的python虚拟环境
python版本
3.10
AISBench工具版本
最新的master分支
AISBench执行命令
ais_bench --models vllm_api_general_chat --datasets longbenchv2_gen --debug --num-warmups 0 --num-prompts 98
模型配置文件或自定义配置文件内容
api配置如下
root@localhost:~/benchmark# cat /root/benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-general-chat",
path="/mnt/share/weights/Qwen3.5-397B-A17B-w8a8-org",
model="qwen3.5",
stream=False,
request_rate=0,
use_timestamp=False,
retry=2,
api_key="",
host_ip="141.61.81.51",
host_port=8010,
url="",
max_out_len=32768,
batch_size=16,
trust_remote_code=False,
generation_kwargs=dict(
temperature=0,
ignore_eos=False,
),
pred_postprocessor=dict(type=extract_non_reasoning_content),
)
]
预期行为
No response
实际行为
测评打屏结果为
root@localhost:~/benchmark# ais_bench --models vllm_api_general_chat --datasets longbenchv2_gen --debug --num-warmups 0 --num-prompts 98
[2026-03-27 13:03:08,618] [ais_bench] [INFO] Loading vllm_api_general_chat: /root/benchmark/ais_bench/benchmark/configs/./models/vllm_api/vllm_api_general_chat.py
[2026-03-27 13:03:08,623] [ais_bench] [INFO] Loading longbenchv2_gen: /root/benchmark/ais_bench/benchmark/configs/./datasets/longbenchv2/longbenchv2_gen.py
[2026-03-27 13:03:08,625] [ais_bench] [INFO] Loading example: /root/benchmark/ais_bench/benchmark/configs/./summarizers/example.py
[2026-03-27 13:03:08,650] [ais_bench] [INFO] Current exp folder: outputs/default/20260327_130300
[2026-03-27 13:03:08,651] [ais_bench] [INFO] Keeping the first 98 prompts for dataset [LongBenchv2]
[2026-03-27 13:03:08,705] [ais_bench] [INFO] Starting inference tasks...
[2026-03-27 13:03:08,708] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-27 13:03:08,738] [ais_bench] [INFO] Launch TasksMonitor, PID: 138125, Refresh interval: 0.5, Run in background: True
[2026-03-27 13:03:17,741] [ais_bench] [INFO] Debug mode, print progress directly
[2026-03-27 13:03:17,742] [ais_bench] [INFO] Task [vllm-api-general-chat/LongBenchv2]
[2026-03-27 13:05:04,279] [ais_bench] [INFO] Zero Retriever initialized, returning empty shot case for all queries
[2026-03-27 13:05:04,822] [ais_bench] [INFO] Apply ice template finished
[2026-03-27 13:05:04,826] [ais_bench] [INFO] Warmup size is 0, skip...
[2026-03-27 13:05:04,879] [ais_bench] [INFO] Dataset needed memory size: 64.17680168 MB
[2026-03-27 13:05:04,879] [ais_bench] [INFO] Memory usage check passed: 13.02% < 80% (Available: 1751.38 GB)
/usr/local/python3.11.10/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Bus error (core dumped)
[2026-03-27 13:05:10,060] [ais_bench] [INFO] Inference tasks completed.
[2026-03-27 13:05:10,066] [ais_bench] [INFO] Starting evaluation tasks...
[2026-03-27 13:05:10,069] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-27 13:05:10,088] [ais_bench] [INFO] Launch TasksMonitor, PID: 139489, Refresh interval: 0.5, Run in background: True
[2026-03-27 13:05:19,747] [ais_bench] [INFO] Debug mode, print progress directly
/usr/local/python3.11.10/lib/python3.11/site-packages/urllib3/connectionpool.py:1097: InsecureRequestWarning: Unverified HTTPS request is being made to host '141.0.180.100'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
[2026-03-27 13:07:04,262] [ais_bench] [WARNING] Task vllm-api-general-chat/LongBenchv2: No predictions found.
[2026-03-27 13:07:04,263] [ais_bench] [INFO] Evaluation task time elapsed: 104.52s
[2026-03-27 13:07:05,619] [ais_bench] [INFO] Evaluation tasks completed.
[2026-03-27 13:07:05,622] [ais_bench] [INFO] Summarizing evaluation results...
dataset version metric mode vllm-api-general-chat
----------- --------- -------- ------ -----------------------
LongBenchv2 - - - -
[2026-03-27 13:07:05,626] [ais_bench] [INFO] write summary to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.txt
[2026-03-27 13:07:05,626] [ais_bench] [INFO] write csv to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.csv
The markdown format results is as below:
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| LongBenchv2 | - | - | - | - |
[2026-03-27 13:07:05,626] [ais_bench] [INFO] write markdown summary to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.md
--num-prompts 96 可以正常跑,--num-prompts 98就会出现上述问题
无进度条,无报错日志,初步怀疑为共享内存不够,因为loongbenchv2数据集上下文较长,超过了共享内存大小,待定位排查
前置检查
- 我已读懂主页文档的快速入门,无法解决问题
- 我已检索过FAQ,无重复问题
- 我已搜索过现有Issue,无重复问题
- 我已更新到最新版本,问题仍存在
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcontent_check_passedissue content check passedissue content check passed