fix: CHARGE entry round-trip with PDB-padded atom names (v1.3.10)#8
Open
matteoferla wants to merge 1 commit intomainfrom
Open
fix: CHARGE entry round-trip with PDB-padded atom names (v1.3.10)#8matteoferla wants to merge 1 commit intomainfrom
matteoferla wants to merge 1 commit intomainfrom
Conversation
CHARGEEntry.__str__ rendered the f-string with self.atom verbatim, so PDB-padded atom names (e.g. " OA ") produced "CHARGE OA FORMAL -1" with multi-space gaps. CHARGEEntry.from_str's regex required exactly one space between tokens and only matched a single charge digit, so Params.dump → Params.load round-trips fell over for any molecule with a non-zero formal charge — surfaced when minting a Rosetta params for a caprylate anion ligand. Two-sided fix: - __str__ strips self.atom on emit so the dumped line is single-space separated regardless of input padding. - from_str regex tolerates any whitespace between tokens (\\s+ instead of literal " ") and accepts multi-digit signed charges ([+-]?\\d+ instead of \\-?\\+?\\d). Adds regression tests for: multi-space body, multi-digit charge, __str__ stripping padded input, and a dump→load round-trip on a padded-atom CHARGE entry. Existing positive/negative parse tests unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CHARGEEntry.__str__renders the f-string withself.atomverbatim, so PDB-padded atom names (e.g." OA ") produceCHARGE OA FORMAL -1— multi-space gaps. The matchingCHARGEEntry.from_strregex requires exactly one literal space between tokens, so the library cannot read back its own output for any molecule with a non-zero formal charge.Surfaced while minting a Rosetta params file for a caprylate anion ligand:
Params.dump → Params.loadraisedValueError: CHARGE entry "OA FORMAL -1" is not formatted correctly.Fix
Two-sided so existing
.paramsfiles in the wild (single-space or multi-space, multi-digit charges) keep parsing while new dumps emit cleanly:__str__now stripsself.atomon emit, so the dumped line is single-space separated regardless of input padding (e.g.CHARGE OA FORMAL -1rather thanCHARGE OA FORMAL -1).from_strregex widened fromr"(?P<atom>\S+) FORMAL (?P<charge>\-?\+?\d)"tor"(?P<atom>\S+)\s+FORMAL\s+(?P<charge>[+-]?\d+)"— tolerates any whitespace between tokens and accepts multi-digit signed charges (-12,+10, etc.). The cleaner[+-]?also rejects nonsense like-+1that the old\-?\+?accepted.Version bump
1.3.9 → 1.3.10(patch — bug fix, no API change).Tests
Added to
TestCHARGEEntryandTestRegexPatterns:test_charge_entry_from_str_multi_space—"OA FORMAL -1"parses cleanly.test_charge_entry_from_str_multi_digit—"FE FORMAL -12"parses cleanly.test_charge_entry_str_strips_padding—CHARGEEntry(atom=" OA ", charge="-1")emits"CHARGE OA FORMAL -1".test_charge_entry_roundtrip_padded—CHARGEEntry(atom=" OA ", charge="-1") → str → from_str → CHARGEEntry(atom="OA", charge="-1").test_charge_pattern_matchesextended with the multi-space / multi-digit case.pytest tests/test_internals.py— 57 passed (50 unrelated + 7 inTestCHARGEEntry).ruff check rdkit_to_params/entries.py— clean.ruff format— applied to touched files.Test plan
pytest tests/test_internals.pypassesruff check rdkit_to_params/entries.pycleanruff format --checkclean on touched files