Skip to content

implement the med xpert qa text scenario in the medhelm#19

Open
chakravarthik27 wants to merge 8 commits into
mainfrom
PAL-1263-implement-the-med-xpert-qa-text-scenario-in-the-medhelm
Open

implement the med xpert qa text scenario in the medhelm#19
chakravarthik27 wants to merge 8 commits into
mainfrom
PAL-1263-implement-the-med-xpert-qa-text-scenario-in-the-medhelm

Conversation

@chakravarthik27
Copy link
Copy Markdown

This pull request introduces the MedXpertQA Text benchmark to the codebase, enabling the evaluation of medical question answering capabilities in large language models. It includes the implementation of the scenario, integration into run specifications, and updates to the configuration and dependencies to support the new benchmark. The most important changes are summarized below:

New Scenario Implementation:

  • Added the MedXpertQATextScenario class in medxpert_qa_text_scenario.py, which loads and processes the MedXpertQA Text dataset from HuggingFace, structures instances for evaluation, and provides scenario metadata.

Integration with Benchmarking Framework:

  • Registered a new run specification function get_medxpert_qa_text_spec() in medhelm_run_specs.py to define how the scenario should be run, including adapter and metric specs.

  • Updated schema_medhelm.yaml to add medxpert_qa_text to the list of run groups and provided its display name, description, metric groups, environment, and taxonomy information for the benchmark schema.
    Dependency and Build Updates:

  • Relaxed and aligned version constraints for several dependencies in pyproject.toml, such as datasets, numba, and together, and added tiktoken as a new dependency to support the new scenario.

  • Pinned the setuptools version below 82 for openai-whisper extra build dependencies to avoid build issues.

Copy link
Copy Markdown

@blidiselalin blidiselalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Re-pin numba and together in requirements.txt (don't leave them fully unpinned).
  2. Clarify whether tiktoken should be core vs. optional.
  3. Fix the citation and description mismatch in schema_medhelm.yaml

@chakravarthik27 chakravarthik27 force-pushed the PAL-1263-implement-the-med-xpert-qa-text-scenario-in-the-medhelm branch from e22d464 to 5477d69 Compare May 21, 2026 11:07
Copy link
Copy Markdown

@blidiselalin blidiselalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants