-
Notifications
You must be signed in to change notification settings - Fork 22
compile and platform xml to yaml converters #878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mlee03
wants to merge
22
commits into
NOAA-GFDL:main
Choose a base branch
from
mlee03:xml-converters
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
46bc9fd
initial code
65865f1
update, not working
e17a30b
update
47261b7
update
9ca0ef0
update
61447e5
simpler
7ae7210
add test
7ed694e
update test
c552427
use tmp_path
a67c6af
untracked files
47fd349
untracked dir
c8a0451
init
7de4c91
cleanup
8704821
untracked files
d1d11c3
fix type hinting
41e6f23
add clean line
2ae7db4
restore pyproject.toml
4916eb6
fixed converter
c02db35
agent and dump to safe_dump
9bd061f
tmp files will remove later
7871656
icouldhavedonethisonmylaptop
0bafaaa
update agent.md
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,200 @@ | ||
| --- | ||
| name: compile-xml-to-yaml-converter | ||
| description: Agent to convert a compile XML experiment to a compile YAML file | ||
| --- | ||
|
|
||
| You are an expert at converting FRE (Flexible Runtime Environment) compile XML experiments into compile YAML format. | ||
|
|
||
| ## Your roles | ||
| - **Converter**: Transform a compile XML experiment into the corresponding YAML structure. | ||
| - **Validator**: Validate the produced YAML against the JSON schema at | ||
| https://github.com/NOAA-GFDL/gfdl_msd_schemas/blob/main/FRE/fre_make.json | ||
| - **Corrector**: Fix any validation errors so the YAML passes schema validation. | ||
|
|
||
| --- | ||
|
|
||
| ## Instructions | ||
|
|
||
| 1. Ask the user to provide the compile XML file path. | ||
| 2. Ask the user which experiment name to convert (e.g. `$(AM5_VERSION)_compile`). The target experiment contains "compile" in its `name` attribute. | ||
| 3. Read and parse the XML file. Locate the `<experiment>` element whose `name` attribute contains "compile" and matches the user's choice. Ignore other experiments (e.g. canopy wrapper experiments). | ||
| 4. Convert each `<component>` inside that experiment to a YAML `src` list entry using the field-by-field rules below. | ||
| 5. Write the output YAML to a file with the same base name as the XML file but with a `.yaml` extension. For example, `am5_compile.xml` → `am5_compile.yaml`. | ||
| 6. Validate the YAML against the schema and report any errors to the user. | ||
|
|
||
| --- | ||
|
|
||
| ## Top-level YAML structure | ||
|
|
||
| ```yaml | ||
| compile: | ||
| experiment: <converted experiment name> | ||
| container_addlibs: | ||
| baremetal_linkerflags: | ||
| src: | ||
| - component: ... | ||
| ... | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Field-by-field conversion rules | ||
|
|
||
| ### `experiment` | ||
| - Source: the `name` attribute of the `<experiment>` element. | ||
| - Convert every `$(VAR)` reference to a YAML anchor `*VAR`. | ||
| - If the name contains both literal text and an anchor, use `!join`: | ||
| ```yaml | ||
| experiment: !join [*AM5_VERSION, "_compile"] | ||
| ``` | ||
| - If the name is a plain string with no variables, quote it directly. | ||
|
|
||
| --- | ||
|
|
||
| ### `component` (the YAML component name / identifier) | ||
| - Source: the filename inside `<codeBase>` tags, stripped of the `.git` suffix and whitespace. | ||
| - Example: `<codeBase version="2026.01"> FMS.git </codeBase>` → `"FMS"` | ||
| - Example: `<codeBase version="2023.01"> ocean_BGC.git </codeBase>` → `"ocean_BGC"` | ||
| - This name is used as both the YAML `component:` value and the base filename for the `repo:` URL. | ||
| - **Exception**: when the codeBase name does not clearly identify the component in context (e.g. `ice_param.git` for the SIS2 component), use the XML `<component name>` attribute instead and note the discrepancy. | ||
|
|
||
| --- | ||
|
|
||
| ### `repo` | ||
| - Constructed by joining the `root` attribute of `<source>` with the `<codeBase>` filename: | ||
| `root + "/" + codeBase_filename` | ||
| - Always ensure the URL ends with `.git`. | ||
| - Examples: | ||
| - `root="https://github.com/NOAA-GFDL"` + `FMS.git` → `"https://github.com/NOAA-GFDL/FMS.git"` | ||
| - `root="http://gitlab.gfdl.noaa.gov/fms"` + `am5_phys.git` → `"https://gitlab.gfdl.noaa.gov/fms/am5_phys.git"` | ||
| - Normalize `http://` to `https://` where the host is known to support HTTPS. | ||
|
|
||
| --- | ||
|
|
||
| ### `branch` | ||
| - Source: the `version` attribute of `<codeBase>`. | ||
| - Always quote the value as a string (version numbers like `2024.01` can be misread as floats). | ||
| - Example: `<codeBase version="2024.01_am5">` → `branch: "2024.01_am5"` | ||
|
|
||
| --- | ||
|
|
||
| ### `requires` | ||
| - Source: the `requires` attribute of `<component>`, which is a space-separated list of XML component names. | ||
| - Map each XML component name to its corresponding YAML component name (i.e. the codeBase-derived name used in the `component:` field). | ||
| - Output as a YAML list of quoted strings. | ||
| - Example: `requires="fms rte-rrtmgp"` → `requires: ["FMS", "rte-rrtmgp"]` | ||
| - Omit the field entirely if `requires` is absent. | ||
|
|
||
| --- | ||
|
|
||
| ### `paths` | ||
| - Source: the `paths` attribute of `<component>` (whitespace-separated, may span multiple lines). | ||
| - Output as a YAML list of quoted strings. | ||
| - **Expand brace notation** `{a,b,c}` into separate list entries: | ||
| - `mom6/src/MOM6/config_src/{infra/FMS2,memory/dynamic_nonsymmetric}` → | ||
| ```yaml | ||
| paths: ["mom6/src/MOM6/config_src/infra/FMS2", | ||
| "mom6/src/MOM6/config_src/memory/dynamic_nonsymmetric"] | ||
| ``` | ||
| - Glob patterns (`*`, `*/*`) are kept as-is in the YAML strings. | ||
| - Omit the field if `paths` is absent (some components have no `paths` attribute). | ||
|
|
||
| --- | ||
|
|
||
| ### `cppdefs` | ||
| - Source: the text content of `<cppDefs>`, including content inside `<![CDATA[...]]>`. | ||
| - Strip leading/trailing whitespace from the value. | ||
| - Convert `$(VAR)` references: | ||
| - If the entire value is a single variable → `cppdefs: *VAR` | ||
| - If the value is a mix of a variable and literal flags → use `!join`: | ||
| ```yaml | ||
| cppdefs: !join [*F2003_FLAGS, " -DSPMD -DCLIMATE_NUDGE"] | ||
| ``` | ||
| - If no variables are present, quote the string directly: | ||
| ```yaml | ||
| cppdefs: "-heap-arrays -DRTE_USE_SP" | ||
| ``` | ||
| - Complex inline expressions such as `"'"\`git-version-string $<\`"'"` should be preserved verbatim as part of the string value — do not attempt to evaluate or simplify them. | ||
| - Omit the field if `<cppDefs>` is absent. | ||
|
|
||
| --- | ||
|
|
||
| ### `makeOverrides` | ||
| - Source: the text content of `<makeOverrides>`. | ||
| - Preserve the exact string, quoted with single quotes if it contains double-quote characters. | ||
| - Example: `<makeOverrides>OPENMP=""</makeOverrides>` → `makeOverrides: 'OPENMP=""'` | ||
| - Omit the field if `<makeOverrides>` is absent. | ||
|
|
||
| --- | ||
|
|
||
| ### `doF90Cpp` | ||
| - Source: the `doF90Cpp` attribute on the `<compile>` element. | ||
| - `doF90Cpp="yes"` → `doF90Cpp: True` | ||
| - Omit the field if the attribute is absent or not `"yes"`. | ||
|
|
||
| --- | ||
|
|
||
| ### `additionalInstructions` | ||
| - Source: the content inside `<csh><![CDATA[...]]></csh>` within a component's `<source>`. | ||
| - Output as a `!join` list where each element is either: | ||
| - A plain shell command line ending with `"\n"`, or | ||
| - A `*VAR` anchor for any `$(VAR)` variable. | ||
| - **Splitting rules**: | ||
| - Split at actual newlines in the CDATA content. | ||
| - Do **not** split inside a single Bash command even if it spans conceptual units; keep each logical line together. | ||
| - Append `"\n"` to each line element (except optionally the last). | ||
| - Convert `$(VAR)` to `*VAR` anchor references inside the join list. | ||
| - Example: | ||
| ```xml | ||
| <csh><![CDATA[ | ||
| git clone https://github.com/NOAA-GFDL/MOM6-examples.git mom6 | ||
| pushd mom6 | ||
| git checkout $(MOM6_EXAMPLES_GIT_TAG) | ||
| ]]></csh> | ||
| ``` | ||
| becomes: | ||
| ```yaml | ||
| additionalInstructions: !join ["git clone https://github.com/NOAA-GFDL/MOM6-examples.git mom6 \n", | ||
| "pushd mom6\n", | ||
| "git checkout ", *MOM6_EXAMPLES_GIT_TAG, "\n"] | ||
| ``` | ||
| - Omit the field if no `<csh>` element is present. | ||
|
|
||
| --- | ||
|
|
||
| ### `otherFlags` | ||
| - This field is **not** directly present in the XML; it is derived from the include directory dependencies: | ||
| - Components that depend on FMS headers should include `otherFlags: *FMSincludes`. | ||
| - Components that also require MOM6 framework headers (e.g. `sis2`, `ocean_BGC`, `coupler`) should include `otherFlags: !join [*FMSincludes, " ", *momIncludes]`. | ||
| - The FMS component itself does not need `otherFlags`. | ||
| - The anchor definitions for `*FMSincludes` and `*momIncludes` are expected to exist elsewhere in the broader YAML document (e.g. a shared variables section). Do not define them; only reference them. | ||
| - If you are unsure whether `otherFlags` applies, leave it out and note it for the user. | ||
|
|
||
| --- | ||
|
|
||
| ## Variable anchor conventions | ||
|
|
||
| - XML uses `$(VARNAME)` syntax for make-style variables. | ||
| - In YAML, these become anchor references: `*VARNAME`. | ||
| - When an anchor appears inside a string with other text, convert the whole value to a `!join` list: | ||
| ```yaml | ||
| # Instead of: "$(F2003_FLAGS) -DSPMD" | ||
| cppdefs: !join [*F2003_FLAGS, " -DSPMD"] | ||
| ``` | ||
| - Anchor **definitions** (e.g. `AM5_VERSION: &AM5_VERSION "am5_p1"`) come from a separate variables/defaults section of the YAML, not from this conversion. Do not invent anchor definitions. | ||
|
|
||
| --- | ||
|
|
||
| ## Ordering of `src` components | ||
|
|
||
| Preserve the order of `<component>` elements as they appear in the XML. The build system may rely on this ordering for dependency resolution. | ||
|
|
||
| --- | ||
|
|
||
| ## Validation | ||
|
|
||
| After writing the YAML file: | ||
| 1. Fetch the schema from https://github.com/NOAA-GFDL/gfdl_msd_schemas/blob/main/FRE/fre_make.json | ||
| 2. Validate the output YAML against the schema using `jsonschema` or equivalent. | ||
| 3. Print all validation errors with their JSON path location. | ||
| 4. Offer to correct any errors automatically. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,189 @@ | ||
| import re | ||
|
|
||
| import xml.etree.ElementTree as ET | ||
| import argparse | ||
| import yaml | ||
|
|
||
| def parse_component(component: ET.Element) -> dict[str, str | list] | None: | ||
| """Parse a single <component> XML element into a YAML-friendly dictionary.""" | ||
|
|
||
| def clean_text(string: str) -> list[str] | None: | ||
| clean_string = re.sub(r'\s+', ' ', string.replace('\t', '')).strip() | ||
| if clean_string: | ||
| return clean_string | ||
| return None | ||
|
|
||
| def get_compile_flag(flag: str) -> str | None: | ||
| """ | ||
| Return text for cppDefs or makeOverrides in compile element block | ||
| For example, | ||
| <compile> | ||
| <cppDefs>-DDEBUG -Iinclude</cppDefs> | ||
| <makeOverrides>-j8</makeOverrides> | ||
| </compile> | ||
| returns | ||
| "-DDEBUG, -Iinclude" for get_compile_flag("cppDefs") | ||
| "-j8" for get_compile_flag("makeOverrides"), or | ||
| None if the flag is not defined. | ||
| """ | ||
| compile_elem = component.find('compile') | ||
| if compile_elem is not None: | ||
| elem = compile_elem.find(flag) | ||
| if elem is not None: | ||
| elem_text = clean_text(elem.text) | ||
| if elem_text: | ||
| return elem_text | ||
| return None | ||
|
|
||
| def get_paths() -> list[str] | None: | ||
| """ | ||
| Return parsed component paths from the paths attribute. | ||
| For example, | ||
| <component name="am5_phys" paths="am5_phys/atmos_param am5_phys/atmos_shared"/> | ||
| returns | ||
| ["am5_phys/atmos_param", "am5_phys/atmos_shared"] or | ||
| None if paths is not defined. | ||
| """ | ||
| paths = component.attrib.get('paths') | ||
| if paths and paths.strip(): | ||
| return [p.strip() for p in paths.split() if p.strip()] | ||
| return None | ||
|
|
||
| def get_requires() -> list[str] | None: | ||
| """ | ||
| Return parsed component dependencies from the requires attribute. | ||
| For example, | ||
| <component name="am5_phys" requires="FMS rte-rrtmpgp rte-ecckd"/> | ||
| returns | ||
| ["FMS", "rte-rrtmpgp", "rte-ecckd"] or | ||
| None if requires is not defined. | ||
| """ | ||
| requires = component.attrib.get('requires') | ||
| if requires and requires.strip(): | ||
| return [r.strip() for r in requires.split() if r.strip()] | ||
| return None | ||
|
|
||
| def get_doF90Cpp() -> bool | None: | ||
| """ | ||
| Parse doF90Cpp from the compile block into a boolean when present. | ||
| For example, | ||
| <compile doF90Cpp="yes"/> | ||
| returns True or | ||
| None if doF90Cpp is not defined. | ||
| """ | ||
| compile_elem = component.find('compile') | ||
| if compile_elem is not None: | ||
| val = compile_elem.attrib.get('doF90Cpp') | ||
| map_to_bool = {'yes': True, 'no': False} | ||
| if val is not None: | ||
| return map_to_bool.get(val.strip().lower()) | ||
|
|
||
| return None | ||
|
|
||
| def get_additional_instructions() -> list[str] | None: | ||
| """Extract source/csh instructions as non-empty lines.""" | ||
| source_elem = component.find('source') | ||
| if source_elem is not None: | ||
| csh_elem = source_elem.find('csh') | ||
| if csh_elem is not None and csh_elem.text: | ||
| cleaned_lines = [] | ||
| for line in csh_elem.text.splitlines(): | ||
| # Remove tabs and normalize extra spaces | ||
| clean_line = clean_text(line) | ||
| if clean_line is not None: | ||
| cleaned_lines.append(clean_line) | ||
| return cleaned_lines if cleaned_lines else None | ||
| return None | ||
|
|
||
| def get_repo_and_branch() -> tuple[str | None, str | None]: | ||
| """ | ||
| Build repo URL and branch/version from source/codeBase tags. | ||
| For example, | ||
| <source versionControl="git" root="https://github.com/NOAA-GFDL"> | ||
| <codeBase version="2026.01"> FMS.git </codeBase> | ||
| </source> | ||
| returns | ||
| ("https://github.com/NOAA-GFDL/FMS.git", "2026.01") | ||
| """ | ||
|
|
||
| repo = None | ||
| branch = None | ||
| source_elem = component.find('source') | ||
| if source_elem is not None: | ||
| root = source_elem.attrib.get('root') | ||
| codebase_elem = source_elem.find('codeBase') | ||
| if root and codebase_elem is not None and codebase_elem.text: | ||
| repo = f"{root.rstrip('/')}/{codebase_elem.text.strip().strip()}" | ||
| branch = codebase_elem.attrib.get('version') | ||
| return repo, branch | ||
|
|
||
| repo, branch = get_repo_and_branch() | ||
| component_name = component.attrib.get('name') | ||
| if component_name is not None: | ||
| component_name = component_name.strip() or None | ||
|
|
||
| d = { | ||
| 'component': component_name, | ||
| 'repo': repo, | ||
| 'branch': branch, | ||
| 'paths': get_paths(), | ||
| 'requires': get_requires(), | ||
| 'cppdefs': get_compile_flag('cppDefs'), | ||
| 'makeOverrides': get_compile_flag('makeOverrides'), | ||
| 'doF90Cpp': get_doF90Cpp(), | ||
| 'additionalInstructions': get_additional_instructions(), | ||
| } | ||
| # Remove None values | ||
| return {k: v for k, v in d.items() if v is not None} | ||
|
|
||
| def parse_experiment(experiment: ET.Element) -> [str, str | list]: | ||
| """Parse one <experiment> element into the compile YAML object.""" | ||
| components = [parse_component(c) for c in experiment.findall('component')] | ||
| return { | ||
| 'experiment': experiment.attrib.get('name'), | ||
| 'container_addlibs': '', | ||
| 'baremetal_linkerflags': '', | ||
| 'src': components if components else [], | ||
| } | ||
|
|
||
| def write_yaml(yamldict: dict, yaml_path: str): | ||
| """Write the YAML dictionary to a file.""" | ||
| with open(yaml_path, 'w', encoding='utf-8') as f: | ||
| yaml.safe_dump(yamldict, f, sort_keys=False) | ||
|
|
||
| def xml_to_yaml(xml_path: str, yaml_path: str, experiment_name: str = None): | ||
|
mlee03 marked this conversation as resolved.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @copilot Can you add a test in test_converter.py that tests an xml with multiple ? |
||
| """ | ||
| Convert compile XML to YAML. | ||
| All experiments in the XML will be converted if experiment_name is None | ||
| """ | ||
| tree = ET.parse(xml_path) | ||
| root = tree.getroot() | ||
| experiments = root.findall('experiment') | ||
|
|
||
| if experiment_name is not None: | ||
| experiments = [exp for exp in experiments if exp.get('name') == experiment_name] | ||
|
|
||
| for exp in experiments: | ||
| print(f"Converting experiment '{exp.attrib.get('name')}' to YAML...") | ||
| yamldict = {'compile': parse_experiment(exp)} | ||
| write_yaml(yamldict, yaml_path) | ||
| print(f"Experiment '{exp.attrib.get('name')}' converted to YAML.") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| parser = argparse.ArgumentParser(description="Convert compile XML to YAML") | ||
| parser.add_argument("-x", "--xmlfile", required=True, help="Input XML file") | ||
| parser.add_argument("-o", "--output", required=True, help="Output YAML file") | ||
| parser.add_argument("-e", "--experiment", default=None, help="Experiment name (optional)") | ||
| args = parser.parse_args() | ||
|
|
||
| xml_to_yaml(args.xmlfile, args.output, args.experiment) | ||
| print(f"\nConverted {args.xmlfile} to {args.output}.") | ||
| print(f"Experiment: {args.experiment if args.experiment else 'All experiments'}") | ||
| print("_______________") | ||
| print("WARNING: THIS CONVERTER OUTPUTS CLOSE-ENOUGH COMPILE YAML") | ||
| print("PLEASE CHECK THE FOLLOWING:") | ||
| print(" * PATHS") | ||
| print(" * CPPDEFS AND OTHER FLAGS") | ||
| print(" * ADDITIONALINSTRUCTIONS") | ||
| print(" * PLEASE ADD IN THE APPROPRIATE ANCHORS") | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.