https://github.com/WeOpenML/PandaLM/blob/05ad95b54f5dcd9db6c5eec46a6baca8a20e444c/pandalm/utils/evaluation_pipeline.py#L128C18-L128C18