Skip to content

Add to_unicode, permissive, and recovery_mode flags to MARCReader#80

Merged
dchud merged 1 commit intomainfrom
bd-y331-marcreader-flags
Apr 12, 2026
Merged

Add to_unicode, permissive, and recovery_mode flags to MARCReader#80
dchud merged 1 commit intomainfrom
bd-y331-marcreader-flags

Conversation

@dchud
Copy link
Copy Markdown
Owner

@dchud dchud commented Apr 12, 2026

Summary

Fixes #78 — adds pymarc-compatible to_unicode and permissive kwargs to MARCReader, and exposes mrrc's existing RecoveryMode as a recovery_mode kwarg.

  • to_unicode: accepted for pymarc compat; mrrc always converts MARC-8 → UTF-8, so False emits a warning but has no effect
  • permissive=True: yields None for records that fail to parse, matching pymarc's behavior exactly
  • recovery_mode: exposes mrrc's Rust-native RecoveryMode ("strict", "lenient", "permissive") for salvaging partial data from damaged records
  • Combining permissive=True with non-strict recovery_mode raises ValueError (conflicting strategies)

No Rust core changes — permissive is handled in the Python wrapper's __next__, recovery_mode is passed through PyO3 to the existing MarcReader::with_recovery_mode().

Test plan

  • 15 new Python tests in test_marcreader_flags.py covering all kwargs, edge cases, and conflict validation
  • All 775 Rust tests pass
  • All 611 Python tests pass
  • Full .cargo/check.sh passes

Bead: bd-y331

🤖 Generated with Claude Code

…oses #78)

pymarc-compatible kwargs for MARCReader:
- to_unicode: accepted for compat, warns if False (mrrc always converts)
- permissive: yields None for bad records instead of raising (pymarc behavior)
- recovery_mode: exposes mrrc's RecoveryMode for salvaging partial data

Updated migration guide, reading tutorial, and quickstart with error
handling documentation.

Bead: bd-y331

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dchud dchud self-assigned this Apr 12, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 12, 2026

Merging this PR will improve performance by 28.68%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 8 improved benchmarks
✅ 52 untouched benchmarks
⏩ 16 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_process_4_files_sequential 85.7 ms 75.7 ms +13.26%
WallTime test_pipeline_parallel_2x_10k_threaded 50.7 ms 41.8 ms +21.23%
WallTime test_pipeline_parallel_4x_10k_threaded 104.4 ms 88.1 ms +18.54%
WallTime test_process_4_files_parallel_4_threads 108.7 ms 93.7 ms +16%
WallTime test_pipeline_sequential_extraction_4x_10k 105.5 ms 94 ms +12.17%
WallTime test_pipeline_sequential_4x_10k 84.5 ms 76.6 ms +10.28%
WallTime test_file_parallel_4x_10k_with_extraction 1,096.1 ms 851.8 ms +28.68%
WallTime test_pipeline_sequential_1x_10k 20.4 ms 18.4 ms +11.12%

Comparing bd-y331-marcreader-flags (9ec3c31) with main (dd4c671)

Open in CodSpeed

Footnotes

  1. 16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@dchud dchud merged commit ae3a684 into main Apr 12, 2026
47 checks passed
dchud added a commit that referenced this pull request Apr 15, 2026
Add MARCReader kwargs (to_unicode, permissive, recovery_mode) and
RecoveryMode documentation to python-api.md reference. Update CHANGELOG
unreleased section with PRs #79, #80, #82 and credit @acdha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pymarc-compatible to_unicode and permissive flags to MARCReader

1 participant