Conversation
Control fields (001-009) were stored in IndexMap<String, String>, causing repeated tags like 007 to overwrite earlier values. Changed storage to IndexMap<String, Vec<String>> to match the Vec-based pattern already used for data fields. Updated all layers: Rust core, serializers (ISO 2709, JSON, MARCXML, CSV), PyO3 bindings, and Python wrapper. Added test suite with pymarc parity checks for repeated 006/007 fields including round-trip serialization verification. Bead: bd-7dre Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merging this PR will degrade performance by 10.46%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | test_pipeline_parallel_2x_10k_threaded |
44.8 ms | 50 ms | -10.46% |
Comparing bd-7dre-repeated-control-fields (6f120ea) with main (4de08b6)2
Footnotes
-
16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
main(6f120ea) during the generation of this report, so 4de08b6 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
CI environments don't have pymarc. Restructured tests so the 6 mrrc-only tests (parsing, round-trip, add_field) always run, while the 4 pymarc baseline/parity tests are skipped with pytest.mark.skipif. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Redundant with direct access to the public control_fields map. No callers outside its own test existed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Walkthrough: How this PR fixes repeated control fieldsPosting this as supplemental context for reviewers — a layer-by-layer explanation of the bug and fix. The bugMARC records can have repeated control fields (especially 006, 007). mrrc stored control fields in pymarc stores all fields in a flat list, so repetition is preserved naturally. mrrc's Layer 1: Rust core (
|
Summary
Fixes #77 — repeated control fields (e.g., multiple 007s) were silently dropped because
control_fieldsusedIndexMap<String, String>, causing later values to overwrite earlier ones.IndexMap<String, Vec<String>>across all three record types (Record,AuthorityRecord,HoldingsRecord)get_control_field_values()(Rust) andcontrol_field_values()(PyO3) for multi-value accessget_fields()to return all repeated control fieldsThe pymarc-compatible API surface (
get_fields(),record['007'],add_field(),get_fields()with no args) all correctly handle repeated control fields now. pymarc has noget_control_fieldmethod — that's an mrrc-internal API.22 files changed across Rust core, PyO3 bindings, and Python wrapper.
cc @acdha — this addresses the repeated 007 bug you reported. The test suite includes records with the same
cr|nn ||||||aa/fb|a bnnnnpattern from your report, plus additional cases with repeated 006 and triple 007 fields. Theto_unicode/permissiveflags enhancement is tracked separately in #78.Test plan
test_repeated_control_fields.pycovering parsing, round-trip, leader length, and pymarc parity.cargo/check.shpasses (rustfmt, clippy, ruff, doc tests, security audit)Bead: bd-7dre
🤖 Generated with Claude Code