Skip to content

[Bug] longbenchv2精度测评无结果输出 #220

@wenba0

Description

@wenba0

操作系统及版本

ubuntu20.04

安装工具的python环境

在anaconda/miniconda创建的python虚拟环境

python版本

3.10

AISBench工具版本

最新的master分支

AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets longbenchv2_gen --debug --num-warmups 0 --num-prompts 98

模型配置文件或自定义配置文件内容

api配置如下

root@localhost:~/benchmark# cat /root/benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-general-chat",
        path="/mnt/share/weights/Qwen3.5-397B-A17B-w8a8-org",
        model="qwen3.5",
        stream=False,
        request_rate=0,
        use_timestamp=False,
        retry=2,
        api_key="",
        host_ip="141.61.81.51",
        host_port=8010,
        url="",
        max_out_len=32768,
        batch_size=16,
        trust_remote_code=False,
        generation_kwargs=dict(
            temperature=0,
            ignore_eos=False,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )
]

预期行为

No response

实际行为

测评打屏结果为

root@localhost:~/benchmark# ais_bench --models vllm_api_general_chat --datasets longbenchv2_gen --debug --num-warmups 0 --num-prompts 98
[2026-03-27 13:03:08,618] [ais_bench] [INFO] Loading vllm_api_general_chat: /root/benchmark/ais_bench/benchmark/configs/./models/vllm_api/vllm_api_general_chat.py
[2026-03-27 13:03:08,623] [ais_bench] [INFO] Loading longbenchv2_gen: /root/benchmark/ais_bench/benchmark/configs/./datasets/longbenchv2/longbenchv2_gen.py
[2026-03-27 13:03:08,625] [ais_bench] [INFO] Loading example: /root/benchmark/ais_bench/benchmark/configs/./summarizers/example.py
[2026-03-27 13:03:08,650] [ais_bench] [INFO] Current exp folder: outputs/default/20260327_130300
[2026-03-27 13:03:08,651] [ais_bench] [INFO] Keeping the first 98 prompts for dataset [LongBenchv2]
[2026-03-27 13:03:08,705] [ais_bench] [INFO] Starting inference tasks...
[2026-03-27 13:03:08,708] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-27 13:03:08,738] [ais_bench] [INFO] Launch TasksMonitor, PID: 138125, Refresh interval: 0.5, Run in background: True
[2026-03-27 13:03:17,741] [ais_bench] [INFO] Debug mode, print progress directly
[2026-03-27 13:03:17,742] [ais_bench] [INFO] Task [vllm-api-general-chat/LongBenchv2]
[2026-03-27 13:05:04,279] [ais_bench] [INFO] Zero Retriever initialized, returning empty shot case for all queries
[2026-03-27 13:05:04,822] [ais_bench] [INFO] Apply ice template finished
[2026-03-27 13:05:04,826] [ais_bench] [INFO] Warmup size is 0, skip...
[2026-03-27 13:05:04,879] [ais_bench] [INFO] Dataset needed memory size: 64.17680168 MB
[2026-03-27 13:05:04,879] [ais_bench] [INFO] Memory usage check passed: 13.02% < 80% (Available: 1751.38 GB)
/usr/local/python3.11.10/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Bus error (core dumped)
[2026-03-27 13:05:10,060] [ais_bench] [INFO] Inference tasks completed.
[2026-03-27 13:05:10,066] [ais_bench] [INFO] Starting evaluation tasks...
[2026-03-27 13:05:10,069] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-03-27 13:05:10,088] [ais_bench] [INFO] Launch TasksMonitor, PID: 139489, Refresh interval: 0.5, Run in background: True
[2026-03-27 13:05:19,747] [ais_bench] [INFO] Debug mode, print progress directly
/usr/local/python3.11.10/lib/python3.11/site-packages/urllib3/connectionpool.py:1097: InsecureRequestWarning: Unverified HTTPS request is being made to host '141.0.180.100'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
[2026-03-27 13:07:04,262] [ais_bench] [WARNING] Task vllm-api-general-chat/LongBenchv2: No predictions found.
[2026-03-27 13:07:04,263] [ais_bench] [INFO] Evaluation task time elapsed: 104.52s
[2026-03-27 13:07:05,619] [ais_bench] [INFO] Evaluation tasks completed.
[2026-03-27 13:07:05,622] [ais_bench] [INFO] Summarizing evaluation results...
dataset      version    metric    mode    vllm-api-general-chat
-----------  ---------  --------  ------  -----------------------
LongBenchv2  -          -         -       -
[2026-03-27 13:07:05,626] [ais_bench] [INFO] write summary to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.txt
[2026-03-27 13:07:05,626] [ais_bench] [INFO] write csv to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.csv


The markdown format results is as below:

| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| LongBenchv2 | - | - | - | - |

[2026-03-27 13:07:05,626] [ais_bench] [INFO] write markdown summary to /root/benchmark/outputs/default/20260327_130300/summary/summary_20260327_130300.md

--num-prompts 96 可以正常跑,--num-prompts 98就会出现上述问题

无进度条,无报错日志,初步怀疑为共享内存不够,因为loongbenchv2数据集上下文较长,超过了共享内存大小,待定位排查

前置检查

  • 我已读懂主页文档的快速入门,无法解决问题
  • 我已检索过FAQ,无重复问题
  • 我已搜索过现有Issue,无重复问题
  • 我已更新到最新版本,问题仍存在

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontent_check_passedissue content check passed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions