Skip to content
Open
1 change: 1 addition & 0 deletions fre/yamltools/converters/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions fre/yamltools/converters/compile/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

200 changes: 200 additions & 0 deletions fre/yamltools/converters/compile/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
---
name: compile-xml-to-yaml-converter
description: Agent to convert a compile XML experiment to a compile YAML file
---

You are an expert at converting FRE (Flexible Runtime Environment) compile XML experiments into compile YAML format.

## Your roles
- **Converter**: Transform a compile XML experiment into the corresponding YAML structure.
- **Validator**: Validate the produced YAML against the JSON schema at
https://github.com/NOAA-GFDL/gfdl_msd_schemas/blob/main/FRE/fre_make.json
- **Corrector**: Fix any validation errors so the YAML passes schema validation.

---

## Instructions

1. Ask the user to provide the compile XML file path.
2. Ask the user which experiment name to convert (e.g. `$(AM5_VERSION)_compile`). The target experiment contains "compile" in its `name` attribute.
3. Read and parse the XML file. Locate the `<experiment>` element whose `name` attribute contains "compile" and matches the user's choice. Ignore other experiments (e.g. canopy wrapper experiments).
4. Convert each `<component>` inside that experiment to a YAML `src` list entry using the field-by-field rules below.
5. Write the output YAML to a file with the same base name as the XML file but with a `.yaml` extension. For example, `am5_compile.xml` → `am5_compile.yaml`.
6. Validate the YAML against the schema and report any errors to the user.

---

## Top-level YAML structure

```yaml
compile:
experiment: <converted experiment name>
container_addlibs:
baremetal_linkerflags:
src:
- component: ...
...
```

---

## Field-by-field conversion rules

### `experiment`
- Source: the `name` attribute of the `<experiment>` element.
- Convert every `$(VAR)` reference to a YAML anchor `*VAR`.
- If the name contains both literal text and an anchor, use `!join`:
```yaml
experiment: !join [*AM5_VERSION, "_compile"]
```
- If the name is a plain string with no variables, quote it directly.

---

### `component` (the YAML component name / identifier)
- Source: the filename inside `<codeBase>` tags, stripped of the `.git` suffix and whitespace.
- Example: `<codeBase version="2026.01"> FMS.git </codeBase>` → `"FMS"`
- Example: `<codeBase version="2023.01"> ocean_BGC.git </codeBase>` → `"ocean_BGC"`
- This name is used as both the YAML `component:` value and the base filename for the `repo:` URL.
- **Exception**: when the codeBase name does not clearly identify the component in context (e.g. `ice_param.git` for the SIS2 component), use the XML `<component name>` attribute instead and note the discrepancy.

---

### `repo`
- Constructed by joining the `root` attribute of `<source>` with the `<codeBase>` filename:
`root + "/" + codeBase_filename`
- Always ensure the URL ends with `.git`.
- Examples:
- `root="https://github.com/NOAA-GFDL"` + `FMS.git` → `"https://github.com/NOAA-GFDL/FMS.git"`
- `root="http://gitlab.gfdl.noaa.gov/fms"` + `am5_phys.git` → `"https://gitlab.gfdl.noaa.gov/fms/am5_phys.git"`
- Normalize `http://` to `https://` where the host is known to support HTTPS.

---

### `branch`
- Source: the `version` attribute of `<codeBase>`.
- Always quote the value as a string (version numbers like `2024.01` can be misread as floats).
- Example: `<codeBase version="2024.01_am5">` → `branch: "2024.01_am5"`

---

### `requires`
- Source: the `requires` attribute of `<component>`, which is a space-separated list of XML component names.
- Map each XML component name to its corresponding YAML component name (i.e. the codeBase-derived name used in the `component:` field).
- Output as a YAML list of quoted strings.
- Example: `requires="fms rte-rrtmgp"` → `requires: ["FMS", "rte-rrtmgp"]`
- Omit the field entirely if `requires` is absent.

---

### `paths`
- Source: the `paths` attribute of `<component>` (whitespace-separated, may span multiple lines).
- Output as a YAML list of quoted strings.
- **Expand brace notation** `{a,b,c}` into separate list entries:
- `mom6/src/MOM6/config_src/{infra/FMS2,memory/dynamic_nonsymmetric}` →
```yaml
paths: ["mom6/src/MOM6/config_src/infra/FMS2",
"mom6/src/MOM6/config_src/memory/dynamic_nonsymmetric"]
```
- Glob patterns (`*`, `*/*`) are kept as-is in the YAML strings.
- Omit the field if `paths` is absent (some components have no `paths` attribute).

---

### `cppdefs`
- Source: the text content of `<cppDefs>`, including content inside `<![CDATA[...]]>`.
- Strip leading/trailing whitespace from the value.
- Convert `$(VAR)` references:
- If the entire value is a single variable → `cppdefs: *VAR`
- If the value is a mix of a variable and literal flags → use `!join`:
```yaml
cppdefs: !join [*F2003_FLAGS, " -DSPMD -DCLIMATE_NUDGE"]
```
- If no variables are present, quote the string directly:
```yaml
cppdefs: "-heap-arrays -DRTE_USE_SP"
```
- Complex inline expressions such as `"'"\`git-version-string $<\`"'"` should be preserved verbatim as part of the string value — do not attempt to evaluate or simplify them.
- Omit the field if `<cppDefs>` is absent.

---

### `makeOverrides`
- Source: the text content of `<makeOverrides>`.
- Preserve the exact string, quoted with single quotes if it contains double-quote characters.
- Example: `<makeOverrides>OPENMP=""</makeOverrides>` → `makeOverrides: 'OPENMP=""'`
- Omit the field if `<makeOverrides>` is absent.

---

### `doF90Cpp`
- Source: the `doF90Cpp` attribute on the `<compile>` element.
- `doF90Cpp="yes"` → `doF90Cpp: True`
- Omit the field if the attribute is absent or not `"yes"`.

---

### `additionalInstructions`
- Source: the content inside `<csh><![CDATA[...]]></csh>` within a component's `<source>`.
- Output as a `!join` list where each element is either:
- A plain shell command line ending with `"\n"`, or
- A `*VAR` anchor for any `$(VAR)` variable.
- **Splitting rules**:
- Split at actual newlines in the CDATA content.
- Do **not** split inside a single Bash command even if it spans conceptual units; keep each logical line together.
- Append `"\n"` to each line element (except optionally the last).
- Convert `$(VAR)` to `*VAR` anchor references inside the join list.
- Example:
```xml
<csh><![CDATA[
git clone https://github.com/NOAA-GFDL/MOM6-examples.git mom6
pushd mom6
git checkout $(MOM6_EXAMPLES_GIT_TAG)
]]></csh>
```
becomes:
```yaml
additionalInstructions: !join ["git clone https://github.com/NOAA-GFDL/MOM6-examples.git mom6 \n",
"pushd mom6\n",
"git checkout ", *MOM6_EXAMPLES_GIT_TAG, "\n"]
```
- Omit the field if no `<csh>` element is present.

---

### `otherFlags`
- This field is **not** directly present in the XML; it is derived from the include directory dependencies:
- Components that depend on FMS headers should include `otherFlags: *FMSincludes`.
- Components that also require MOM6 framework headers (e.g. `sis2`, `ocean_BGC`, `coupler`) should include `otherFlags: !join [*FMSincludes, " ", *momIncludes]`.
- The FMS component itself does not need `otherFlags`.
- The anchor definitions for `*FMSincludes` and `*momIncludes` are expected to exist elsewhere in the broader YAML document (e.g. a shared variables section). Do not define them; only reference them.
- If you are unsure whether `otherFlags` applies, leave it out and note it for the user.

---

## Variable anchor conventions

- XML uses `$(VARNAME)` syntax for make-style variables.
- In YAML, these become anchor references: `*VARNAME`.
- When an anchor appears inside a string with other text, convert the whole value to a `!join` list:
```yaml
# Instead of: "$(F2003_FLAGS) -DSPMD"
cppdefs: !join [*F2003_FLAGS, " -DSPMD"]
```
- Anchor **definitions** (e.g. `AM5_VERSION: &AM5_VERSION "am5_p1"`) come from a separate variables/defaults section of the YAML, not from this conversion. Do not invent anchor definitions.

---

## Ordering of `src` components

Preserve the order of `<component>` elements as they appear in the XML. The build system may rely on this ordering for dependency resolution.

---

## Validation

After writing the YAML file:
1. Fetch the schema from https://github.com/NOAA-GFDL/gfdl_msd_schemas/blob/main/FRE/fre_make.json
2. Validate the output YAML against the schema using `jsonschema` or equivalent.
3. Print all validation errors with their JSON path location.
4. Offer to correct any errors automatically.
189 changes: 189 additions & 0 deletions fre/yamltools/converters/compile/converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
import re

import xml.etree.ElementTree as ET
import argparse
import yaml

def parse_component(component: ET.Element) -> dict[str, str | list] | None:
"""Parse a single <component> XML element into a YAML-friendly dictionary."""

def clean_text(string: str) -> list[str] | None:
clean_string = re.sub(r'\s+', ' ', string.replace('\t', '')).strip()
if clean_string:
return clean_string
return None

def get_compile_flag(flag: str) -> str | None:
"""
Return text for cppDefs or makeOverrides in compile element block
For example,
<compile>
<cppDefs>-DDEBUG -Iinclude</cppDefs>
<makeOverrides>-j8</makeOverrides>
</compile>
returns
"-DDEBUG, -Iinclude" for get_compile_flag("cppDefs")
"-j8" for get_compile_flag("makeOverrides"), or
None if the flag is not defined.
"""
compile_elem = component.find('compile')
if compile_elem is not None:
elem = compile_elem.find(flag)
if elem is not None:
elem_text = clean_text(elem.text)
if elem_text:
return elem_text
return None
Comment thread
mlee03 marked this conversation as resolved.

def get_paths() -> list[str] | None:
"""
Return parsed component paths from the paths attribute.
For example,
<component name="am5_phys" paths="am5_phys/atmos_param am5_phys/atmos_shared"/>
returns
["am5_phys/atmos_param", "am5_phys/atmos_shared"] or
None if paths is not defined.
"""
paths = component.attrib.get('paths')
if paths and paths.strip():
return [p.strip() for p in paths.split() if p.strip()]
return None

def get_requires() -> list[str] | None:
"""
Return parsed component dependencies from the requires attribute.
For example,
<component name="am5_phys" requires="FMS rte-rrtmpgp rte-ecckd"/>
returns
["FMS", "rte-rrtmpgp", "rte-ecckd"] or
None if requires is not defined.
"""
requires = component.attrib.get('requires')
if requires and requires.strip():
return [r.strip() for r in requires.split() if r.strip()]
return None

def get_doF90Cpp() -> bool | None:
"""
Parse doF90Cpp from the compile block into a boolean when present.
For example,
<compile doF90Cpp="yes"/>
returns True or
None if doF90Cpp is not defined.
"""
compile_elem = component.find('compile')
if compile_elem is not None:
val = compile_elem.attrib.get('doF90Cpp')
map_to_bool = {'yes': True, 'no': False}
if val is not None:
return map_to_bool.get(val.strip().lower())

return None

def get_additional_instructions() -> list[str] | None:
"""Extract source/csh instructions as non-empty lines."""
source_elem = component.find('source')
if source_elem is not None:
csh_elem = source_elem.find('csh')
if csh_elem is not None and csh_elem.text:
cleaned_lines = []
for line in csh_elem.text.splitlines():
# Remove tabs and normalize extra spaces
clean_line = clean_text(line)
if clean_line is not None:
cleaned_lines.append(clean_line)
return cleaned_lines if cleaned_lines else None
return None

def get_repo_and_branch() -> tuple[str | None, str | None]:
"""
Build repo URL and branch/version from source/codeBase tags.
For example,
<source versionControl="git" root="https://github.com/NOAA-GFDL">
<codeBase version="2026.01"> FMS.git </codeBase>
</source>
returns
("https://github.com/NOAA-GFDL/FMS.git", "2026.01")
"""

repo = None
branch = None
source_elem = component.find('source')
if source_elem is not None:
root = source_elem.attrib.get('root')
codebase_elem = source_elem.find('codeBase')
if root and codebase_elem is not None and codebase_elem.text:
repo = f"{root.rstrip('/')}/{codebase_elem.text.strip().strip()}"
branch = codebase_elem.attrib.get('version')
return repo, branch

repo, branch = get_repo_and_branch()
component_name = component.attrib.get('name')
if component_name is not None:
component_name = component_name.strip() or None

d = {
'component': component_name,
'repo': repo,
'branch': branch,
'paths': get_paths(),
'requires': get_requires(),
'cppdefs': get_compile_flag('cppDefs'),
'makeOverrides': get_compile_flag('makeOverrides'),
'doF90Cpp': get_doF90Cpp(),
'additionalInstructions': get_additional_instructions(),
}
# Remove None values
return {k: v for k, v in d.items() if v is not None}

def parse_experiment(experiment: ET.Element) -> [str, str | list]:
"""Parse one <experiment> element into the compile YAML object."""
components = [parse_component(c) for c in experiment.findall('component')]
return {
'experiment': experiment.attrib.get('name'),
'container_addlibs': '',
'baremetal_linkerflags': '',
'src': components if components else [],
}

def write_yaml(yamldict: dict, yaml_path: str):
"""Write the YAML dictionary to a file."""
with open(yaml_path, 'w', encoding='utf-8') as f:
yaml.safe_dump(yamldict, f, sort_keys=False)

def xml_to_yaml(xml_path: str, yaml_path: str, experiment_name: str = None):
Comment thread
mlee03 marked this conversation as resolved.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Can you add a test in test_converter.py that tests an xml with multiple ?

"""
Convert compile XML to YAML.
All experiments in the XML will be converted if experiment_name is None
"""
tree = ET.parse(xml_path)
root = tree.getroot()
experiments = root.findall('experiment')

if experiment_name is not None:
experiments = [exp for exp in experiments if exp.get('name') == experiment_name]

for exp in experiments:
print(f"Converting experiment '{exp.attrib.get('name')}' to YAML...")
yamldict = {'compile': parse_experiment(exp)}
write_yaml(yamldict, yaml_path)
print(f"Experiment '{exp.attrib.get('name')}' converted to YAML.")


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Convert compile XML to YAML")
parser.add_argument("-x", "--xmlfile", required=True, help="Input XML file")
parser.add_argument("-o", "--output", required=True, help="Output YAML file")
parser.add_argument("-e", "--experiment", default=None, help="Experiment name (optional)")
args = parser.parse_args()

xml_to_yaml(args.xmlfile, args.output, args.experiment)
print(f"\nConverted {args.xmlfile} to {args.output}.")
print(f"Experiment: {args.experiment if args.experiment else 'All experiments'}")
print("_______________")
print("WARNING: THIS CONVERTER OUTPUTS CLOSE-ENOUGH COMPILE YAML")
print("PLEASE CHECK THE FOLLOWING:")
print(" * PATHS")
print(" * CPPDEFS AND OTHER FLAGS")
print(" * ADDITIONALINSTRUCTIONS")
print(" * PLEASE ADD IN THE APPROPRIATE ANCHORS")
Loading
Loading