Validate XML from xml-model processing instructions with a GitHub Action and
CLI built for Relax NG and Schematron-heavy workflows such as TEI, DocBook,
JATS, and other complex XML repositories.
It is designed to feel native in GitHub:
- inline annotations on failing files
- a readable step summary for every run
- structured outputs for later workflow steps
- an optional JSON report for artifacts or downstream automation
This repository provides:
- a Java CLI
- a self-contained GitHub Action ready for GitHub Marketplace
Release artifacts are published on GitHub Releases as:
xml-model-validator.jarxml-model-validator.jar.sha256
The validator:
- parses
xml-modelprocessing instructions with a proper XML parser - validates Relax NG schemas with Jing
- validates Schematron with Saxon and SchXslt2
- supports Relax NG and Schematron
xml-modeldeclarations in the same file - supports Relax NG XML syntax (
.rng) and RELAX NG Compact Syntax (.rnc) - supports standalone Schematron files and embedded Schematron inside Relax NG schemas, including
.rncvia conversion to.rng - supports Schematron
phaseselection from thexml-modelprocessing instruction - recognizes common schema hints from
schematypens,type, and schema file extensions - follows remote schema URLs and caches downloaded schemas in the workspace
- supports repository configuration from a single TOML file for schema aliases
and rule-based
xml-modelhandling - supports optional rule-based
xml-modelconfiguration by directory and/or file extension, including fallback and full inline replacement modes - resolves XML paths, caches, and optional aliases against the consuming repository when run as a GitHub Action
- can emit machine-readable JSON reports for automation
- emits GitHub annotations for validation failures and warnings
- writes GitHub step summaries for Action runs, including skipped changed-file checks
It is a good fit for repositories that keep validation rules in xml-model
processing instructions and want pull-request feedback directly in GitHub,
including scholarly editing, technical publishing, journal/article XML, and
other custom XML workflows.
What the Action gives you in GitHub:
- inline workflow annotations for validation errors and warnings
- a Markdown step summary with counts, run context, and issue details
- structured outputs for later workflow steps
- an optional saved JSON report for artifacts or downstream automation
Typical workflow:
- check out the repository
- run the Action against the whole repository, one directory, explicit files, or changed files only
- inspect annotations and the step summary in GitHub
- optionally use the structured outputs or saved JSON report in later steps
- uses: actions/checkout@v6
- uses: adunning/xml-model-validator@v2The default run validates all matching files in the repository and reports the result through annotations and the job summary.
Save one of these workflows as .github/workflows/validate-xml.yml. If your default branch is not
main, replace that value with your repository's default branch name.
This is a simple configation suitable for repositories with a small number of XML files (in the hundreds rather than thousands) or those that want to validate everything on every run.
name: Validate XML
on:
push:
paths:
- "**/*.xml"
pull_request:
paths:
- "**/*.xml"
workflow_dispatch:
permissions:
contents: read
jobs:
validate-xml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Validate XML
uses: adunning/xml-model-validator@v2In a larger and frequently changing repository, a workflow that validates only the modified XML files reduce the validation time for pull requests and pushes. In the below example, a scheduled run validates the full repository once a week to catch any drift.
If your default branch is not main, replace that value with your repository's default branch name.
name: Validate XML
on:
push:
branches:
- main
paths:
- "**/*.xml"
pull_request:
paths:
- "**/*.xml"
schedule:
- cron: "0 3 * * 1"
workflow_dispatch:
permissions:
contents: read
jobs:
validate-xml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Validate changed XML
if: github.event_name != 'schedule' && github.event_name != 'workflow_dispatch'
uses: adunning/xml-model-validator@v2
with:
changed_files_only: true
- name: Validate full XML set
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
uses: adunning/xml-model-validator@v2If validation also depends on schema or config files, expand the paths filter
or remove it so schema-only changes also trigger validation.
Version tag semantics:
@v2is a floating major tag that tracks the latest2.x.yrelease.@v2.1.0is an immutable exact release tag.- This repository publishes releases from
vX.Y.Ztags and then updates the matching major tag (vX) automatically.
Validate a directory recursively:
- uses: adunning/xml-model-validator@v2
with:
directory: collectionsValidate XML files stored with a non-.xml extension:
- uses: adunning/xml-model-validator@v2
with:
directory: styles
file_extensions: cslUse changed-file mode in pull requests and upload a JSON report:
- uses: actions/checkout@v6
- id: xml-validate
uses: adunning/xml-model-validator@v2
with:
changed_files_only: true
json_report_path: reports/xml-validation.json
- uses: actions/upload-artifact@v4
if: always() && steps.xml-validate.outputs.json_report_path != ''
with:
name: xml-validation-report
path: ${{ steps.xml-validate.outputs.json_report_path }}For repositories that mostly care about pull-request validation, changed_files_only: true
plus a saved JSON report is usually the best starting point. For repositories
that want a full repository check on every run, the default Action invocation is
usually enough.
If you want both fast review feedback and a periodic full-repository safety
check, use the recommended workflow above and switch between
changed_files_only: true and a default full run based on github.event_name.
Most users will probably want to use only the GitHub Action inputs. Use a repository config file when you need more than one rule, want stable repository-wide configuration, or want different schema sets for different directories.
Apply a fallback remote Relax NG schema to files in one directory with only Action inputs:
- uses: adunning/xml-model-validator@v2
with:
xml_model_rule_mode: fallback
xml_model_rule_directory: styles
xml_model_rule_extension: csl
xml_model_declarations: |
href="https://example.org/schema/styles.rng" schematypens="http://relaxng.org/ns/structure/1.0"
href="https://example.org/schema/styles.sch" schematypens="http://purl.oclc.org/dsdl/schematron"Because file_extensions is omitted here, the Action discovers both .xml
files and .csl files in styles. Set file_extensions: csl as well if you
want to restrict discovery to .csl only.
Replace inline declarations for one directory with remote Relax NG and Schematron rules with only Action inputs:
- uses: adunning/xml-model-validator@v2
with:
xml_model_rule_mode: replace
xml_model_rule_directory: tei
xml_model_rule_extension: xml
xml_model_declarations: |
href="https://example.org/schema/tei.rng" schematypens="http://relaxng.org/ns/structure/1.0"
href="https://example.org/schema/tei.sch" schematypens="http://purl.oclc.org/dsdl/schematron"Use a repository config file when you need multiple rules or want to pin a remote schema URL to a local file:
[schema_aliases]
"https://example.org/schema/styles.rng" = "schemas/styles.rng"
[[xml_model_rules]]
directory = "styles"
extension = "csl"
mode = "fallback"
[[xml_model_rules.declarations]]
href = "https://example.org/schema/styles.rng"
schematypens = "http://relaxng.org/ns/structure/1.0"
[[xml_model_rules]]
directory = "tei"
extension = "xml"
mode = "replace"
[[xml_model_rules.declarations]]
href = "https://example.org/schema/tei.rng"
schematypens = "http://relaxng.org/ns/structure/1.0"
[[xml_model_rules.declarations]]
href = "https://example.org/schema/tei.sch"
schematypens = "http://purl.oclc.org/dsdl/schematron"Validate only the XML files changed by the current push or pull request:
- uses: adunning/xml-model-validator@v2
with:
changed_files_only: trueChoose how changed files are discovered:
- uses: adunning/xml-model-validator@v2
with:
changed_files_only: true
changed_source: auto # auto | api | gitValidate explicit files and stop on the first failure:
- uses: adunning/xml-model-validator@v2
with:
files: |
docs/a.xml
docs/b.xml
fail_fast: truefiles: newline-delimited list of files to validate explicitlyfiles_from: newline-delimited file list pathdirectory: directory to scan recursivelyfile_extensions: comma- or whitespace-separated file extensions to discover when scanning directories or changed files; a leading period is optional and the default is.xmlchanged_files_only: validate only files with matching extensions changed by the current push or pull requestchanged_source: source forchanged_files_onlyfile discovery (auto,api,git)jobs: number of workers,0means automaticconfig: optional TOML validator config file containing schema aliases andxml-modelrulesxml_model_rule_mode: optional inlinexml-modelrule mode (fallbackorreplace)xml_model_rule_directory: optional directory scope for the inlinexml-modelrule; when no selection input is set, this also becomes the default directory to validatexml_model_rule_extension: optional file extension scope for the inlinexml-modelrule; a leading period is optional, and whenfile_extensionsis omitted the Action still discovers.xmlfiles by default and adds this value to the discovery setxml_model_declarations: optional newline-delimited declarations for one inlinexml-modelrule; remote schema URLs are supported and are expected to be the most common casefail_fast: stop after the first failing filejson_report_path: optional path, relative to the repository root or absolute, where the Action should save a JSON validation report
If you do not provide files, files_from, directory, or changed_files_only,
the action validates all matching files in the repository by default.
If xml_model_rule_directory is set and no selection input is provided, the
Action validates that directory by default.
The Action accepts at most one selection input at a time. If you provide more
than one of files, files_from, directory, or changed_files_only, the run
fails with an input error.
When changed_files_only: true:
changed_source: auto(default) tries the GitHub API for pull request and push events, then falls back to git diff if API discovery is unavailable.changed_source: apirequires GitHub event context and API access.changed_source: gituses local git diff logic.- If no changed files with matching extensions are found, the action reports that validation was skipped, emits a GitHub notice and step summary, and exits successfully.
During GitHub Action runs, the default output format emits both workflow
annotations and a Markdown step summary. The summary includes counts, duration,
run context, and issue details when validation runs, and an explicit skip
message when changed_files_only finds nothing to validate.
The Action also exposes structured outputs you can use in later workflow steps:
skippedfiles_checkedfailed_fileswarning_countjson_report_path
skipped:truewhenchanged_files_onlyfound no matching files and the Action exited successfully without validatingfiles_checked: number of files actually validatedfailed_files: number of files that failed validationwarning_count: number of warning-level issues reportedjson_report_path: absolute path to the saved JSON report whenjson_report_pathis set; otherwise empty
Example:
- id: xml-validate
uses: adunning/xml-model-validator@v2
with:
changed_files_only: true
json_report_path: reports/xml-validation.json
- name: Show validation result
if: always()
run: |
echo "Skipped: ${{ steps.xml-validate.outputs.skipped }}"
echo "Checked: ${{ steps.xml-validate.outputs.files_checked }}"
echo "Failed: ${{ steps.xml-validate.outputs.failed_files }}"
echo "Warnings: ${{ steps.xml-validate.outputs.warning_count }}"
echo "JSON report: ${{ steps.xml-validate.outputs.json_report_path }}"Upload the saved JSON report as a workflow artifact:
- id: xml-validate
uses: adunning/xml-model-validator@v2
with:
directory: collections
json_report_path: reports/xml-validation.json
- uses: actions/upload-artifact@v4
if: always() && steps.xml-validate.outputs.json_report_path != ''
with:
name: xml-validation-report
path: ${{ steps.xml-validate.outputs.json_report_path }}For repositories that need local schema aliases or rule-based xml-model
behaviour, provide .xml-validator/config.toml. Use --config or the
config Action input to override that location.
Example:
[schema_aliases]
"https://example.com/schema.rng" = "schemas/local.rng"
[[xml_model_rules]]
directory = "styles"
extension = "csl"
mode = "fallback"
[[xml_model_rules.declarations]]
href = "https://example.org/schema/styles.rng"
schematypens = "http://relaxng.org/ns/structure/1.0"
[[xml_model_rules]]
directory = "tei"
extension = "xml"
mode = "replace"
[[xml_model_rules.declarations]]
href = "schemas/tei.rng"
schematypens = "http://relaxng.org/ns/structure/1.0"
[[xml_model_rules.declarations]]
href = "schemas/tei.sch"
schematypens = "http://purl.oclc.org/dsdl/schematron"schema_aliases maps remote schema URLs to local files relative to the config
file.
xml_model_rules lets you define different schema sets for different
directories or extensions. Rule directory values are resolved relative to the
repository root. Rule extension values may be written with or without a
leading period. Declaration href values may be remote http:// or https://
URLs, repository-root-relative local paths, or absolute local paths. Local
declaration href values in config and Action-supplied rules are resolved from
the repository root.
Each rule can match by directory and/or extension, and can either:
fallback: apply only when the file has no inlinexml-modeldeclarationsreplace: ignore inlinexml-modeldeclarations and use the configured declarations instead
When multiple rules match a file, the most specific directory rule wins; an
extension-qualified rule beats the same directory without an extension. If two
rules match with the same specificity and precedence, validation fails with an
ambiguity error instead of choosing one implicitly. Inline xml-model
processing instructions inside the XML file still follow standard XML relative
resolution against the XML file itself.
The GitHub Action’s inline override inputs define only one rule per run. Use the TOML config file when you need multiple directory-specific or extension-specific rules in the same workflow.
When an inline rule provides xml_model_rule_extension and file_extensions
is omitted, the effective discovery set is .xml plus the inline rule
extension. Set file_extensions explicitly when you want to narrow discovery
to a smaller set such as only .csl.
Configured remote schema URLs use the same download and cache behaviour as
remote schema URLs referenced from inline xml-model processing instructions.
Each declaration supports the same fields the validator reads from inline
xml-model instructions:
href(required)schematypens(optional)type(optional)phase(optional)
The config schema is strict: unsupported keys are rejected so configuration mistakes fail clearly instead of being ignored silently.
The action is intentionally focused on the schema types most commonly used with
xml-model in editorial workflows:
- Relax NG XML syntax
- RELAX NG Compact Syntax
- standalone Schematron
- embedded Schematron in Relax NG XML syntax schemas
- embedded Schematron in RELAX NG Compact Syntax schemas via conversion to XML syntax
Embedded Schematron extraction follows Relax NG include and externalRef
links for local and remote schemas. Relative references in remote .rnc
schemas are resolved against the original remote URL before conversion.
It does not currently attempt to validate every schema language that xml-model
can theoretically reference.
The GitHub Action sets up Java internally and runs the validator with
java -jar.
When the action is running from a published release ref such as @v2 or
@v2.1.0, the composite action first resolves the exact release tag from Maven
project metadata. On a cold runner it then queries the GitHub Releases API for
the matching release assets, downloads the published jar, and verifies it
against the published checksum before use. For branch refs and other unreleased
revisions, or if the release lookup or asset download fails, the action falls
back to building the shaded jar from source.
The built jar is cached under ~/.cache/xml-model-validator/jar, and its cache
key is derived from the action's build inputs so a cached binary is only reused
when the jar-producing contents match.
The action also caches Maven's local repository and wrapper directories under
~/.m2, keyed from Maven dependency inputs so dependency downloads are reused
until those inputs change.
Remote schema downloads and prepared Schematron artifacts are cached under
~/.cache/xml-model-validator/schema-downloads and
~/.cache/xml-model-validator/schematron. The action restores the latest
runtime cache and saves a fresh one at the end of each run so those artifacts
can accumulate safely over time.
The changed_files_only mode expects the repository to be available in the runner,
which normally means using actions/checkout@v6 earlier in the job.
Build the runnable jar:
./mvnw -q -DskipTests packageThe CLI requires exactly one input source per invocation:
--directory PATHto scan a directory recursively--files-from PATHto read a newline-delimited file list from a file--files-from -to read a newline-delimited file list from standard inputFILES...to validate explicit file paths
The CLI rejects invocations that omit an input source or combine more than one input source.
--files-from expects one path per line. Blank lines are ignored. When you use
--files-from -, paths are read from standard input, which makes the CLI work
well in pipelines such as find ... | xml-model-validator --files-from -.
File discovery rules:
--directoryand--files-fromapply--file-extensions- if
--file-extensionsis omitted, discovery defaults to.xml - if an inline rule sets
--rule-extensionand--file-extensionsis omitted, discovery uses both.xmland that rule extension - explicit
FILES...arguments are validated as given and are not filtered by--file-extensions
Output formats:
--format textwrites human-readable diagnostics--format githubwrites GitHub workflow annotations and summaries--format jsonwrites a machine-readable report to standard output- if
--formatis omitted, the CLI defaults totextlocally andgithubinside GitHub Actions
Inspection mode:
--planprints the resolved input source, config path, extensions, rules, and file set without running validation--plansucceeds even when the resolved file set is empty, so it can be used to debug discovery
Exit status:
0means validation succeeded1means one or more files failed validation2means command-line usage was invalid
Output behaviour:
- by default, successful runs are quiet
- validation warnings and errors are written to standard error
- use
--verboseto print progress information and successful summaries
Run:
java -jar target/xml-model-validator.jar --directory path/to/xml -j 0
java -jar target/xml-model-validator.jar --plan --directory path/to/xml
java -jar target/xml-model-validator.jar --verbose --directory path/to/xml -j 0
find path/to/xml -name '*.xml' -print | java -jar target/xml-model-validator.jar --files-from - -j 0
java -jar target/xml-model-validator.jar --files-from path/to/files.txt -j 0
java -jar target/xml-model-validator.jar path/to/a.xml path/to/b.xml -j 0
java -jar target/xml-model-validator.jar --directory path/to/styles --file-extensions csl -j 0
java -jar target/xml-model-validator.jar --directory path/to/styles --file-extensions csl --config .xml-validator/config.toml -j 0Show CLI usage:
java -jar target/xml-model-validator.jar --helpShow CLI version:
java -jar target/xml-model-validator.jar --versionWrite a JSON report:
java -jar target/xml-model-validator.jar --format json path/to/a.xml path/to/b.xmlPreview the full validation plan:
java -jar target/xml-model-validator.jar --plan --format json --directory path/to/xmlVerify a published release artifact:
VERSION=v2.1.0 # replace with the release tag you want to verify
curl -LO "https://github.com/adunning/xml-model-validator/releases/download/${VERSION}/xml-model-validator.jar"
curl -LO "https://github.com/adunning/xml-model-validator/releases/download/${VERSION}/xml-model-validator.jar.sha256"
shasum -a 256 -c xml-model-validator.jar.sha256