diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..e09f58c2 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,121 @@ +# HCL2 Parser — CLAUDE.md + +## Pipeline + +``` +Forward: HCL2 Text → Lark Parse Tree → LarkElement Tree → Python Dict/JSON +Reverse: Python Dict/JSON → LarkElement Tree → Lark Tree → HCL2 Text +``` + +## Module Map + +| Module | Role | +|---|---| +| `hcl2/hcl2.lark` | Lark grammar definition | +| `hcl2/api.py` | Public API (`load/loads/dump/dumps` + intermediate stages) | +| `hcl2/parser.py` | Lark parser factory with caching | +| `hcl2/transformer.py` | Lark parse tree → LarkElement tree | +| `hcl2/deserializer.py` | Python dict → LarkElement tree | +| `hcl2/formatter.py` | Whitespace alignment and spacing on LarkElement trees | +| `hcl2/reconstructor.py` | LarkElement tree → HCL2 text via Lark | +| `hcl2/builder.py` | Programmatic HCL document construction | +| `hcl2/utils.py` | `SerializationOptions`, `SerializationContext`, string helpers | +| `hcl2/const.py` | Constants: `IS_BLOCK`, `COMMENTS_KEY`, `INLINE_COMMENTS_KEY` | +| `cli/helpers.py` | File/directory/stdin conversion helpers | +| `cli/hcl_to_json.py` | `hcl2tojson` entry point | +| `cli/json_to_hcl.py` | `jsontohcl2` entry point | + +`hcl2/__main__.py` is a thin wrapper that imports `cli.hcl_to_json:main`. + +### Rules (one class per grammar rule) + +| File | Domain | +|---|---| +| `rules/abstract.py` | `LarkElement`, `LarkRule`, `LarkToken` base classes | +| `rules/tokens.py` | `StringToken` (cached factory), `StaticStringToken`, punctuation constants | +| `rules/base.py` | `StartRule`, `BodyRule`, `BlockRule`, `AttributeRule` | +| `rules/containers.py` | `TupleRule`, `ObjectRule`, `ObjectElemRule`, `ObjectElemKeyRule` | +| `rules/expressions.py` | `ExprTermRule`, `BinaryOpRule`, `UnaryOpRule`, `ConditionalRule` | +| `rules/literal_rules.py` | `IntLitRule`, `FloatLitRule`, `IdentifierRule`, `KeywordRule` | +| `rules/strings.py` | `StringRule`, `InterpolationRule`, `HeredocTemplateRule` | +| `rules/functions.py` | `FunctionCallRule`, `ArgumentsRule` | +| `rules/indexing.py` | `GetAttrRule`, `SqbIndexRule`, splat rules | +| `rules/for_expressions.py` | `ForTupleExprRule`, `ForObjectExprRule`, `ForIntroRule`, `ForCondRule` | +| `rules/whitespace.py` | `NewLineOrCommentRule`, `InlineCommentMixIn` | + +## Public API (`api.py`) + +Follows the `json` module convention. All option parameters are keyword-only. + +- `load/loads` — HCL2 text → Python dict +- `dump/dumps` — Python dict → HCL2 text +- Intermediate stages: `parse/parses`, `parse_to_tree/parses_to_tree`, `transform`, `serialize`, `from_dict`, `from_json`, `reconstruct` + +### Option Dataclasses + +**`SerializationOptions`** (LarkElement → dict): +`with_comments`, `with_meta`, `wrap_objects`, `wrap_tuples`, `explicit_blocks`, `preserve_heredocs`, `force_operation_parentheses`, `preserve_scientific_notation` + +**`DeserializerOptions`** (dict → LarkElement): +`heredocs_to_strings`, `strings_to_heredocs`, `object_elements_colon`, `object_elements_trailing_comma` + +**`FormatterOptions`** (whitespace/alignment): +`indent_length`, `open_empty_blocks`, `open_empty_objects`, `open_empty_tuples`, `vertically_align_attributes`, `vertically_align_object_elements` + +## CLI + +Console scripts defined in `pyproject.toml`. Each uses argparse flags that map directly to the option dataclass fields above. + +``` +hcl2tojson --json-indent 2 --with-meta file.tf +jsontohcl2 --indent 4 --no-align file.json +``` + +Add new options as `parser.add_argument()` calls in the relevant entry point module. + +## Hard Rules + +These are project-specific constraints that must not be violated: + +1. **Always use the LarkElement IR.** Never transform directly from Lark parse tree to Python dict or vice versa. +1. **Block vs object distinction.** Use `__is_block__` markers (`const.IS_BLOCK`) to preserve semantic intent during round-trips. The deserializer must distinguish blocks from regular objects. +1. **Bidirectional completeness.** Every serialization path must have a corresponding deserialization path. Test round-trip integrity: Parse → Serialize → Deserialize → Serialize produces identical results. +1. **One grammar rule = one `LarkRule` class.** Each class implements `lark_name()`, typed property accessors, `serialize()`, and declares `_children_layout: Tuple[...]` (annotation only, no assignment) to document child structure. +1. **Token caching.** Use the `StringToken` factory in `rules/tokens.py` — never create token instances directly. +1. **Interpolation context.** `${...}` generation depends on nesting depth — always pass and respect `SerializationContext`. +1. **Update both directions.** When adding language features, update transformer.py, deserializer.py, formatter.py and reconstructor.py. + +## Adding a New Language Construct + +1. Add grammar rules to `hcl2.lark` +1. Create rule class(es) in the appropriate `rules/` file +1. Add transformer method(s) in `transformer.py` +1. Implement `serialize()` in the rule class +1. Update `deserializer.py`, `formatter.py` and `reconstructor.py` for round-trip support + +## Testing + +Framework: `unittest.TestCase` (not pytest). + +``` +python -m unittest discover -s test -p "test_*.py" -v +``` + +**Unit tests** (`test/unit/`): instantiate rule objects directly (no parsing). + +- `test/unit/rules/` — one file per rules module +- `test/unit/cli/` — one file per CLI module +- `test/unit/test_api.py`, `test_builder.py`, `test_deserializer.py`, `test_formatter.py`, `test_reconstructor.py`, `test_utils.py` + +Use concrete stubs when testing ABCs (e.g., `StubExpression(ExpressionRule)`). + +**Integration tests** (`test/integration/`): full-pipeline tests with golden files. + +- `test_round_trip.py` — iterates over all suites in `hcl2_original/`, tests HCL→JSON, JSON→JSON, JSON→HCL, and full round-trip +- `test_specialized.py` — feature-specific tests with golden files in `specialized/` + +Always run round-trip full test suite after any modification. + +## Keeping Docs Current + +Update this file when architecture, modules, API surface, or testing conventions change. Also update `README.md` and `docs/usage.md` when changes affect the public API, CLI flags, or option fields.