Skip to content

Adding Bundle Schemas#100

Open
ericspod wants to merge 1 commit into
masterfrom
add_bundle_schema
Open

Adding Bundle Schemas#100
ericspod wants to merge 1 commit into
masterfrom
add_bundle_schema

Conversation

@ericspod
Copy link
Copy Markdown
Member

@ericspod ericspod commented May 15, 2026

Fixes Project-MONAI/MONAI#4048

This copies the existing schemas stored here into the website itself so these can be accessed through a shorter and permanent URL. The expectation is that the "schema" URL in metadata.json files can use a shorter URL, eg. "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json" becomes "schema": "https://project-monai.github.io/meta_schema_20240725.json".

Summary by CodeRabbit

  • New Features
    • Added comprehensive metadata validation schemas for MONAI model packages, supporting multiple model types and framework versions.
    • Enabled standardized metadata structure validation to ensure model package compatibility and consistency across different model architectures.

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@ericspod ericspod requested a review from zephyrie May 15, 2026 15:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR introduces nine JSON Schema files defining validation contracts for MONAI model bundle metadata across different model types and evolution periods. Schemas range from foundational 2022 formats supporting general model metadata to specialized 2024 variants for generative models, LDM architectures, and platform integrations (HuggingFace, HoverNet).

Changes

MONAI Bundle Metadata Schema Catalog

Layer / File(s) Summary
Foundational bundle metadata schemas (2022)
meta_schema_20220324.json, meta_schema_20220729.json
Establishes core bundle metadata structure with version/framework fields, descriptive properties, and network_data_format defining inputs and outputs. Extended version adds support for latent tensor types and marks inputs/outputs as required.
Generator and LDM model metadata schemas (2022–2023)
meta_schema_generator_20220718.json, meta_schema_generator_ldm_20230507.json
Defines specialized schemas for generator-based and LDM models with required network_data_format.inputs.latent and outputs.pred; LDM extends with dual format blocks (autoencoder_data_format and generator_data_format) for encoder/decoder and generation network I/O.
Modern reusable bundle schema (2024)
meta_schema_20240725.json
Refactors bundle schema using JSON Schema $defs to define reusable tensor, argument, result, and network_io types; constrains network_data_format via patternProperties for flexible _data_format keys.
Advanced LDM model schema (2024)
meta_schema_generator_ldm_20240318.json
Extends LDM schema with domain-specific fields (body_region, anatomy_list) for autoencoder inputs, enabling granular metadata capture for specialized medical imaging generative models.
Integration and framework-specific schemas (2022–2025)
meta_schema_hf_20250321.json, meta_schema_hovernet_20221124.json
Provides HuggingFace and HoverNet integration schemas with platform-specific required metadata, framework versions, task descriptors, and network data format specifications.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A warren of schemas, neatly arranged,
From bundles to models, all measured and changed,
With tensors and latents in patterns so neat,
The framework's foundation now perfectly sweet!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Adding Bundle Schemas' clearly and accurately reflects the core change: adding multiple bundle schema JSON files to the repository.
Linked Issues check ✅ Passed The PR successfully addresses #4048 by hosting bundle schema files (meta_schema files) at project-monai.github.io, enabling shorter permanent URLs as requested instead of long GitHub release URLs.
Out of Scope Changes check ✅ Passed All changes are in-scope: the PR only adds 7 new bundle schema JSON files (meta_schema_*.json) with no modifications to unrelated code or functionality outside the stated objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add_bundle_schema

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@meta_schema_20220729.json`:
- Line 253: The "latent" property line uses a tab for indentation which is
inconsistent with the file's spaces; locate the "latent" key (the JSON property
"latent" shown in the diff) and replace the leading tab characters with the same
number of spaces used elsewhere in the file so indentation style matches the
rest of meta_schema_20220729.json.
- Line 168: Line 168 contains a tab character before the "latent" property which
is inconsistent with the file's spaces-based indentation; replace the tab with
the appropriate number of spaces to match the surrounding JSON nesting (use the
same 4-space multiples used elsewhere) so the "latent" key aligns with its
sibling properties and preserves consistent formatting.

In `@meta_schema_20240725.json`:
- Line 135: Fix the typo in the JSON description value for the metadata field
whose "description" currently reads "Version of NumPy the bundlewas generated
with." — update the string to "Version of NumPy the bundle was generated with."
so there is a space between "bundle" and "was" (edit the "description" property
in the same JSON object).

In `@meta_schema_hf_20250321.json`:
- Around line 3-17: The schema is missing a top-level "type" and "properties"
definition; update the JSON object that currently contains "required" to include
"type": "object" and a "properties" object that defines each required field
(e.g., "schema", "version", "changelog", "monai_version", "pytorch_version",
"numpy_version", "required_packages_version", "name", "task", "description",
"authors", "copyright", "huggingface_url") with appropriate simple types
(string/array/object as applicable) to match the other meta_schema files and
enforce field shapes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5b59219-c720-4102-bece-4ea18906d79e

📥 Commits

Reviewing files that changed from the base of the PR and between 6345da2 and 0c25c2d.

📒 Files selected for processing (8)
  • meta_schema_20220324.json
  • meta_schema_20220729.json
  • meta_schema_20240725.json
  • meta_schema_generator_20220718.json
  • meta_schema_generator_ldm_20230507.json
  • meta_schema_generator_ldm_20240318.json
  • meta_schema_hf_20250321.json
  • meta_schema_hovernet_20221124.json

Comment thread meta_schema_20220729.json
"value_range"
]
},
"latent": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix inconsistent indentation (tabs vs spaces).

Line 168 uses tabs for indentation while the rest of the file uses spaces. This inconsistency can cause issues with code formatters, diff tools, and may lead to subtle problems in text processing pipelines.

🔧 Suggested fix

Replace the tab characters with spaces to match the indentation style used throughout the rest of the file (appears to be using multiples of 4 spaces based on nesting level).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meta_schema_20220729.json` at line 168, Line 168 contains a tab character
before the "latent" property which is inconsistent with the file's spaces-based
indentation; replace the tab with the appropriate number of spaces to match the
surrounding JSON nesting (use the same 4-space multiples used elsewhere) so the
"latent" key aligns with its sibling properties and preserves consistent
formatting.

Comment thread meta_schema_20220729.json
"value_range"
]
},
"latent": {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix inconsistent indentation (tabs vs spaces).

Line 253 uses tabs for indentation while the rest of the file uses spaces, matching the issue at line 168.

🔧 Suggested fix

Replace the tab characters with spaces to match the indentation style used throughout the rest of the file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meta_schema_20220729.json` at line 253, The "latent" property line uses a tab
for indentation which is inconsistent with the file's spaces; locate the
"latent" key (the JSON property "latent" shown in the diff) and replace the
leading tab characters with the same number of spaces used elsewhere in the file
so indentation style matches the rest of meta_schema_20220729.json.

Comment thread meta_schema_20240725.json
"type": "string"
},
"numpy_version": {
"description": "Version of NumPy the bundlewas generated with.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in description.

Missing space: "the bundlewas generated" should be "the bundle was generated".

✏️ Proposed fix
-            "description": "Version of NumPy the bundlewas generated with.",
+            "description": "Version of NumPy the bundle was generated with.",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"description": "Version of NumPy the bundlewas generated with.",
"description": "Version of NumPy the bundle was generated with.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meta_schema_20240725.json` at line 135, Fix the typo in the JSON description
value for the metadata field whose "description" currently reads "Version of
NumPy the bundlewas generated with." — update the string to "Version of NumPy
the bundle was generated with." so there is a space between "bundle" and "was"
(edit the "description" property in the same JSON object).

Comment on lines +3 to +17
"required": [
"schema",
"version",
"changelog",
"monai_version",
"pytorch_version",
"numpy_version",
"required_packages_version",
"name",
"task",
"description",
"authors",
"copyright",
"huggingface_url"
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

find . -name "meta_schema_hf_20250321.json" -type f

Repository: Project-MONAI/project-monai.github.io

Length of output: 112


🏁 Script executed:

cat -n ./meta_schema_hf_20250321.json

Repository: Project-MONAI/project-monai.github.io

Length of output: 591


🏁 Script executed:

rg "meta_schema_hf_20250321" --type json --type js --type py --type yaml --type md -A 3 -B 3

Repository: Project-MONAI/project-monai.github.io

Length of output: 63


🏁 Script executed:

rg "meta_schema|huggingface_url" -A 2 -B 2 --type json --type md --type py

Repository: Project-MONAI/project-monai.github.io

Length of output: 680


🏁 Script executed:

cat -n scripts/process_models.py | head -100

Repository: Project-MONAI/project-monai.github.io

Length of output: 4818


🏁 Script executed:

cat -n scripts/process_models.py | sed -n '100,150p'

Repository: Project-MONAI/project-monai.github.io

Length of output: 1660


🏁 Script executed:

cat -n README.md | head -50

Repository: Project-MONAI/project-monai.github.io

Length of output: 1751


🏁 Script executed:

rg "json-schema|jsonschema|validate.*schema" -i --type py --type js --type json -A 3 -B 3

Repository: Project-MONAI/project-monai.github.io

Length of output: 2440


🏁 Script executed:

cat -n meta_schema_hovernet_20221124.json | head -30

Repository: Project-MONAI/project-monai.github.io

Length of output: 1391


🏁 Script executed:

git ls-files | grep "meta_schema.*\.json" | wc -l

Repository: Project-MONAI/project-monai.github.io

Length of output: 83


Add missing type and properties to match schema consistency across the codebase.

This file is the only meta_schema in the repository without type: object and properties definitions. All seven other schemas define these, constraining field shapes and enforcing metadata contracts. Add both to align with existing patterns and strengthen validation.

Proposed minimal hardening
 {
     "$schema": "https://json-schema.org/draft/2019-09/schema",
+    "type": "object",
+    "properties": {
+        "schema": { "type": "string" },
+        "version": { "type": "string" },
+        "changelog": { "type": "object" },
+        "monai_version": { "type": "string" },
+        "pytorch_version": { "type": "string" },
+        "numpy_version": { "type": "string" },
+        "required_packages_version": { "type": "object" },
+        "name": { "type": "string" },
+        "task": { "type": "string" },
+        "description": { "type": "string" },
+        "authors": { "type": "string" },
+        "copyright": { "type": "string" },
+        "huggingface_url": { "type": "string", "format": "uri" }
+    },
     "required": [
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"required": [
"schema",
"version",
"changelog",
"monai_version",
"pytorch_version",
"numpy_version",
"required_packages_version",
"name",
"task",
"description",
"authors",
"copyright",
"huggingface_url"
]
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"schema": { "type": "string" },
"version": { "type": "string" },
"changelog": { "type": "object" },
"monai_version": { "type": "string" },
"pytorch_version": { "type": "string" },
"numpy_version": { "type": "string" },
"required_packages_version": { "type": "object" },
"name": { "type": "string" },
"task": { "type": "string" },
"description": { "type": "string" },
"authors": { "type": "string" },
"copyright": { "type": "string" },
"huggingface_url": { "type": "string", "format": "uri" }
},
"required": [
"schema",
"version",
"changelog",
"monai_version",
"pytorch_version",
"numpy_version",
"required_packages_version",
"name",
"task",
"description",
"authors",
"copyright",
"huggingface_url"
]
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meta_schema_hf_20250321.json` around lines 3 - 17, The schema is missing a
top-level "type" and "properties" definition; update the JSON object that
currently contains "required" to include "type": "object" and a "properties"
object that defines each required field (e.g., "schema", "version", "changelog",
"monai_version", "pytorch_version", "numpy_version",
"required_packages_version", "name", "task", "description", "authors",
"copyright", "huggingface_url") with appropriate simple types
(string/array/object as applicable) to match the other meta_schema files and
enforce field shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

easier access to the bundle schema

1 participant