Skip to content

Implement a Schema-Specific XML Validator with Schematron Support #18

@Bergmann89

Description

@Bergmann89

In addition to the existing code generator, we propose implementing a generator that produces a strict XML validator tailored to a given schema. This validator would leverage a streaming approach using quick_xml event parsing and provide an additional layer of validation using XML schematron rules.

Rationale

Currently, the generated deserialization code is designed to be robust and tolerant of errors. However, a stricter validation mechanism is required to ensure that XML documents conform precisely to their schemas before deserialization or after serialization. A dedicated validator with schematron support would:

  • Enforce strict schema validation before processing XML data.
  • Validate additional business logic constraints via schematron rules.
  • Improve data integrity and compliance with XML schema constraints.
  • Offer an independent validation step that can be used standalone.

Proposed Approach

  1. Streaming Validation Using quick_xml

    • Utilize the event-based parsing mechanism of quick_xml.
    • Validate elements, attributes, and structure in real-time as XML is parsed.
    • Minimize memory footprint by avoiding full document loading.
  2. Integration with XML schematron

    • Support rule-based validation beyond structural checks.
    • Use XPath-based validation for conditional constraints and cross-element relationships.
    • Allow users to define schematron files alongside schemas for enhanced validation.
  3. Integration with Code Generation

    • The validator should be schema-specific and generated alongside the existing code generator.
    • Ensure strict validation rules are enforced, while deserialization remains error-tolerant.
    • Provide validation hooks before deserialization and/or after serialization.
  4. Usage Scenarios

    • Pre-deserialization Validation: Ensures the XML document strictly follows the schema and business rules before it is deserialized.
    • Post-serialization Validation: Confirms that the generated XML conforms to the expected structure and constraints after serialization.
    • Standalone Validation: Can be used independently for schema and schematron rule compliance testing.

Expected Outcome

  • A lightweight and efficient XML validator for schema compliance and rule-based validation.
  • Improved error handling by distinguishing between strict validation and lenient deserialization.
  • Enhanced reliability of XML-based data processing with support for complex validation rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature request

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions