Skip to content

fix: CHARGE entry round-trip with PDB-padded atom names (v1.3.10)#8

Open
matteoferla wants to merge 1 commit intomainfrom
fix/charge-roundtrip-whitespace
Open

fix: CHARGE entry round-trip with PDB-padded atom names (v1.3.10)#8
matteoferla wants to merge 1 commit intomainfrom
fix/charge-roundtrip-whitespace

Conversation

@matteoferla
Copy link
Copy Markdown
Owner

Summary

CHARGEEntry.__str__ renders the f-string with self.atom verbatim, so PDB-padded atom names (e.g. " OA ") produce CHARGE OA FORMAL -1 — multi-space gaps. The matching CHARGEEntry.from_str regex requires exactly one literal space between tokens, so the library cannot read back its own output for any molecule with a non-zero formal charge.

Surfaced while minting a Rosetta params file for a caprylate anion ligand: Params.dump → Params.load raised ValueError: CHARGE entry "OA FORMAL -1" is not formatted correctly.

Fix

Two-sided so existing .params files in the wild (single-space or multi-space, multi-digit charges) keep parsing while new dumps emit cleanly:

  • __str__ now strips self.atom on emit, so the dumped line is single-space separated regardless of input padding (e.g. CHARGE OA FORMAL -1 rather than CHARGE OA FORMAL -1).
  • from_str regex widened from r"(?P<atom>\S+) FORMAL (?P<charge>\-?\+?\d)" to r"(?P<atom>\S+)\s+FORMAL\s+(?P<charge>[+-]?\d+)" — tolerates any whitespace between tokens and accepts multi-digit signed charges (-12, +10, etc.). The cleaner [+-]? also rejects nonsense like -+1 that the old \-?\+? accepted.

Version bump 1.3.9 → 1.3.10 (patch — bug fix, no API change).

Tests

Added to TestCHARGEEntry and TestRegexPatterns:

  • test_charge_entry_from_str_multi_space"OA FORMAL -1" parses cleanly.
  • test_charge_entry_from_str_multi_digit"FE FORMAL -12" parses cleanly.
  • test_charge_entry_str_strips_paddingCHARGEEntry(atom=" OA ", charge="-1") emits "CHARGE OA FORMAL -1".
  • test_charge_entry_roundtrip_paddedCHARGEEntry(atom=" OA ", charge="-1") → str → from_str → CHARGEEntry(atom="OA", charge="-1").
  • test_charge_pattern_matches extended with the multi-space / multi-digit case.

pytest tests/test_internals.py — 57 passed (50 unrelated + 7 in TestCHARGEEntry).
ruff check rdkit_to_params/entries.py — clean.
ruff format — applied to touched files.

Test plan

  • pytest tests/test_internals.py passes
  • ruff check rdkit_to_params/entries.py clean
  • ruff format --check clean on touched files

CHARGEEntry.__str__ rendered the f-string with self.atom verbatim, so
PDB-padded atom names (e.g. " OA ") produced "CHARGE  OA   FORMAL -1"
with multi-space gaps. CHARGEEntry.from_str's regex required exactly
one space between tokens and only matched a single charge digit, so
Params.dump → Params.load round-trips fell over for any molecule with
a non-zero formal charge — surfaced when minting a Rosetta params for
a caprylate anion ligand.

Two-sided fix:

- __str__ strips self.atom on emit so the dumped line is single-space
  separated regardless of input padding.
- from_str regex tolerates any whitespace between tokens
  (\\s+ instead of literal " ") and accepts multi-digit signed
  charges ([+-]?\\d+ instead of \\-?\\+?\\d).

Adds regression tests for: multi-space body, multi-digit charge,
__str__ stripping padded input, and a dump→load round-trip on a
padded-atom CHARGE entry. Existing positive/negative parse tests
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant