Skip to content

Codebase simplification: fix delete_subfield, reduce duplication#82

Merged
dchud merged 1 commit intomainfrom
bd-1dr9-codebase-simplification
Apr 14, 2026
Merged

Codebase simplification: fix delete_subfield, reduce duplication#82
dchud merged 1 commit intomainfrom
bd-1dr9-codebase-simplification

Conversation

@dchud
Copy link
Copy Markdown
Owner

@dchud dchud commented Apr 12, 2026

Summary

Addresses the high and medium priority items from #81. Net reduction of 253 lines while fixing a bug and improving consistency.

High priority

  • Fix delete_subfield() bug: Was returning the value without actually removing the subfield. Added Field::delete_subfield() in Rust core, exposed via PyO3, updated Python wrapper to delegate. Now matches pymarc's behavior exactly.
  • Fix recovery_mode validation inconsistency: AuthorityMARCReader and HoldingsMARCReader now reject invalid recovery_mode values with ValueError, matching MARCReader behavior from Add to_unicode, permissive, and recovery_mode flags to MARCReader #80.
  • Extract _wrap_field() / use _wrap_record() helpers: Replaced 16+ inconsistent field wrapping patterns and duplicated record wrapping in __init__.py with two shared helpers.
  • Extract shared reader_helpers.rs: Moved source detection (path/bytes/file), recovery mode parsing, and Python file I/O into a shared module. Reduced authority + holdings readers from 766 → 541 lines (~225 lines eliminated).

Medium priority

  • Extract control_field_char_at() utility: Replaced 5 duplicated length-check + chars().nth() patterns across authority and holdings record 008 field parsing.

Test plan

  • All 775 Rust tests pass (lib + integration + doc)
  • All 611 Python tests pass
  • Full .cargo/check.sh passes (rustfmt, clippy, ruff, tests, doc tests)
  • delete_subfield() manually verified to match pymarc behavior

Bead: bd-1dr9
Closes #81

🤖 Generated with Claude Code

High priority:
- Fix delete_subfield() bug: was returning value without actually deleting.
  Added Rust Field::delete_subfield() method, exposed via PyO3, updated
  Python wrapper to delegate. Now matches pymarc behavior.
- Fix recovery_mode validation: authority and holdings readers now reject
  invalid values with ValueError instead of silently defaulting to strict.
- Extract _wrap_field() and use _wrap_record() helpers in Python wrapper,
  replacing 16+ inconsistent field/record wrapping patterns.
- Extract shared reader_helpers.rs for PyO3 reader backends, reducing
  ~225 lines of copy-paste between authority and holdings readers.

Medium priority:
- Extract control_field_char_at() utility for 008 field parsing, replacing
  5 duplicated length-check-then-chars-nth patterns across authority and
  holdings record types.

Bead: bd-1dr9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 12, 2026

Merging this PR will degrade performance by 20.11%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 10 improved benchmarks
❌ 7 (👁 7) regressed benchmarks
✅ 43 untouched benchmarks
⏩ 16 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
👁 WallTime test_pipeline_parallel_4x_10k_threaded 92.3 ms 114.5 ms -19.35%
👁 WallTime test_pipeline_parallel_extraction_4x_10k_threaded 125.7 ms 140.2 ms -10.36%
WallTime test_threaded_reading_1k 28.3 ms 25.3 ms +11.68%
WallTime test_threaded_reading_4x_1k 94.2 ms 83.8 ms +12.46%
👁 WallTime test_pipeline_sequential_extraction_4x_10k 94.8 ms 110.1 ms -13.9%
👁 WallTime test_process_4_files_parallel_4_threads 93.4 ms 116.9 ms -20.11%
WallTime test_threaded_reading_4x_10k 991.8 ms 854.3 ms +16.09%
👁 WallTime test_process_8_files_parallel_4_threads 197.2 ms 245.9 ms -19.81%
WallTime test_parallel_read_with_extract_4x_1k 100.7 ms 90 ms +11.86%
WallTime test_threading_speedup_4x_10k 992.8 ms 848.5 ms +17%
WallTime test_file_parallel_4x_10k_with_extraction 1,085.2 ms 853 ms +27.23%
👁 WallTime test_pipeline_sequential_1x_10k 18.7 ms 21.3 ms -12.04%
👁 WallTime test_pipeline_sequential_4x_10k 78.2 ms 87.5 ms -10.64%
WallTime test_file_parallel_4x_10k 1,037.4 ms 825.8 ms +25.62%
WallTime test_parallel_read_4x_10k 983.8 ms 864.2 ms +13.84%
WallTime test_parallel_read_with_extract_4x_10k 1,072.6 ms 908.2 ms +18.1%
WallTime test_threaded_with_title_extraction_4x_10k 1,008 ms 864.6 ms +16.59%

Comparing bd-1dr9-codebase-simplification (e808d71) with main (ae3a684)

Open in CodSpeed

Footnotes

  1. 16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@dchud dchud merged commit 6668c4e into main Apr 14, 2026
48 checks passed
dchud added a commit that referenced this pull request Apr 15, 2026
Add MARCReader kwargs (to_unicode, permissive, recovery_mode) and
RecoveryMode documentation to python-api.md reference. Update CHANGELOG
unreleased section with PRs #79, #80, #82 and credit @acdha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codebase simplification audit: redundant code, duplication, and complexity

1 participant