Skip to content

Implement CDS XML generation #23

@PascalEgn

Description

@PascalEgn

Description

The pipeline needs to generate final MARCXML files used for the CDS upload. We should generate one XML per Boite file and one combined XML. Currently, output is always written to disk in a fixed location, we should refactor this to support writing to a specific output dir if provided or if not, write to some temporary output.

Work involved

  • Rewrite create_import_xml_files to accept an optional output_path parameter
  • If output_path is provided: write per-Boite XML files and combined XML to this path
  • If output_path is None: use a temporary dir and return its paths to use it in later steps of the script
  • Ensure that valid MARCXML files are generated
  • Add a passable Output dir parameter to the script. This parameter may already exists for the Boite download at the beggining or the log file output of the file matching step. In case it does just use the same path with a differen subfolder like 'XMLs' or similar.

Acceptance criteria

  • MARCXML files are generated for each Boite Excel file
  • A combined MARCXML is generated that merges all Boite file records
  • When --output-path is provided, files get saved to that path
  • When --output-path is no provided, files are generated in a temporary directory
  • Return value or output structure makes it clear where/how to access the generated XML(s)

Screenshots(Optional)

Metadata

Metadata

Assignees

Labels

File Import ProjectThis task is related to the file import project of digitization

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions