In case the Dataframe contains invalid IRI/URL data, currently this gets serialized without it being detected, nor at mapping/nor at serialization-time.
Leaving library users having to implement functions like:
def sanitize_iris(df: pl.DataFrame, *cols: str) -> pl.DataFrame:
"""Sanitize IRI columns using native Polars expressions (columnar).
1. Strips leading/trailing whitespace.
2. Percent-encodes spaces (``%20``).
3. Nulls out values that don't start with ``http``, contain non-ASCII
characters, or contain characters that are illegal in an IRI
(``" < > \\ ^ ` { | }`` and control chars) — broken source URLs that
RDFox/rdflib reject.
"""
Having a more ergonomic/strict mode of serialization that actively catches these cases (like rdflib does at read-time) would be useful.
In case the Dataframe contains invalid IRI/URL data, currently this gets serialized without it being detected, nor at mapping/nor at serialization-time.
Leaving library users having to implement functions like:
Having a more ergonomic/strict mode of serialization that actively catches these cases (like
rdflibdoes at read-time) would be useful.