feat: propagate GEOMETRY CRS through PostGIS in both directions#418
feat: propagate GEOMETRY CRS through PostGIS in both directions#418jatorre wants to merge 2 commits intoduckdb:mainfrom
Conversation
e3d8ea5 to
774ea01
Compare
|
Hey @Maxxen — this PR adds EWKB writing to postgres_scanner so that A couple of things I'd appreciate your input on:
Also — maybe you're already looking at completing this work somewhere else, or have a different approach in mind. Let me know if for topics like this you'd prefer an issue first rather than a PR. I thought this was small enough that a PR would be more useful, but happy to adjust to whatever workflow you prefer. |
|
Unfortunately this isnt safe to do, the postgres srid is not guaranteed to be the same as the numeric part of the auth code. Ig also doesnt work for non-integer auth codes, like OGC:CRS84. We need to do a lookup in the postgres databases SPATIAL_REF_SYS table and find the best match (which I think we need PROJ for, which only spatial bundles) to get the SRID. This is not really easy to do in the type conversion code and we dont want to add proj as a dependency here right now. |
|
I have some thought on how to eventually solve this in the future, that would involve breaking PROJ out from spatial into its own separate extension, however we dont want to do that yet until we have a stable extension c-api, which we are working on. |
|
Thanks for the context @Maxxen — that makes total sense about the SRID ≠ EPSG code edge case and That said, I think there's a pragmatic middle ground here: only handle the One of the key values of DuckDB in the geospatial world is being the intermediary between data warehouses — BigQuery, Snowflake, Databricks, Redshift — and PostGIS. Those cloud warehouses don't have a Covering that 99% case with a few lines of code would already be extremely valuable — right now every For the remaining cases ( Would you be open to that scoped approach? The contract would be:
Happy to adjust the code or close this if you'd rather wait for the PROJ-based solution — just wanted to make the case that the simple version already covers almost all real-world usage. |
|
Alright, maybe we can do it for EPSG only, I just worry that the SRIDs in PostGIS spatial_ref_sys table don't actually match with the EPSG codes, even for EPSG defined projections, not sure how easy that would be to verify... Either way we still lose the CRS on import - but I guess we can deal with that separately. Let me get back to this later :) |
|
Good news — I checked a default PostGIS SELECT COUNT(*) as total,
SUM(CASE WHEN srid = auth_srid::INT THEN 1 ELSE 0 END) as matching,
SUM(CASE WHEN srid != auth_srid::INT THEN 1 ELSE 0 END) as mismatched
FROM spatial_ref_sys WHERE auth_name = 'EPSG';
-- total: 6184, matching: 6184, mismatched: 0So for the default table, EPSG code = PostGIS SRID is a safe assumption. A user could theoretically insert custom rows that break this, but that's an edge case they'd need to handle themselves anyway. Happy to wait until you have time to look at it properly — no rush. |
|
Follow-up: the read path has the symmetric issue. Two pieces needed for a full fix:
Happy to follow up with a separate PR for the read side. Keeping this PR focused on the write path since the two changes touch different code and have different edge cases (untyped |
DuckDB 1.5 introduced CRS metadata on the GEOMETRY type (e.g.,
GEOMETRY('EPSG:4326')). This change makes that CRS survive a full
round trip through PostgreSQL/PostGIS via postgres_scanner.
Write side (DuckDB → PostGIS):
- WriteGeometry detects CRS on the LogicalType and writes EWKB
instead of plain WKB so PostGIS receives the SRID:
WKB: [byte_order:1] [type:4] [payload...]
EWKB: [byte_order:1] [type|0x20000000:4] [srid:4] [payload...]
Falls back to plain WKB when no CRS is set, preserving the
previous behavior. (Uses GeoType::HasCRS / GeoType::GetCRS — the
earlier draft mistakenly referenced LogicalType::GetCRS, which
does not exist on DuckDB 1.5.)
Read side (PostGIS → DuckDB):
- TypeToLogicalType decodes the SRID from PostGIS's column-level
typmod for `geometry(TYPE, SRID)` columns and returns
LogicalType::GEOMETRY("EPSG:N") instead of plain GEOMETRY().
The typmod layout is:
bits 0 : has-M
bits 1 : has-Z
bits 2-7 : geometry type code (POINT=1, LINESTRING=2, ...)
bits 8-29 : SRID
Untyped `geometry` columns carry typmod -1 and fall back to plain
GEOMETRY (no CRS), matching the previous behavior.
Before: DuckDB GEOMETRY('EPSG:4326') → PostGIS geometry(SRID=0)
PostGIS geometry(POINT,4326) → DuckDB GEOMETRY (no CRS)
After: DuckDB GEOMETRY('EPSG:4326') → PostGIS geometry(SRID=4326)
PostGIS geometry(POINT,4326) → DuckDB GEOMETRY('EPSG:4326')
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
774ea01 to
99cd2f8
Compare
The previous commit added the EWKB write path and typmod read decode,
but on CREATE TABLE AS the CRS was being stripped before reaching
either side, so the resulting PostGIS column landed as plain
`geometry` (typmod -1) and the read-back lost the SRID.
Two strip points in this file:
1. ToPostgresType (called by AddCastToPostgresTypes during CTAS
planning) returned LogicalType::GEOMETRY() unconditionally,
dropping CRS before the projection that feeds the new physical
plan. Now returns the input unchanged for GEOMETRY so CRS
survives the cast/projection.
2. TypeToString returned plain "GEOMETRY" for any GEOMETRY type and
would have been short-circuited anyway by the HasAlias() early
return (GEOMETRY('EPSG:N') carries an alias of "GEOMETRY"). The
new branch runs *before* HasAlias, detects CRS via
GeoType::HasCRS, and emits `geometry(Geometry, N)` so PostGIS
records the SRID in pg_attribute.atttypmod and the existing
read-side typmod decode picks it up.
Verified end-to-end through the streamability harness in
duckdb-warehouse-transfer: postgres -> oracle and postgres -> snowflake
now round-trip GEOMETRY('EPSG:3857') with byte-perfect WKB and CRS
preserved at the destination, with no plan-level materialization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed The previous commit ( There are two strip points in
Verified end-to-end with a small streamability harness that chains
Both round trips were failing before this commit (CRS dropped to This is the only change beyond the previous commit; the EWKB write and typmod read decode are unchanged. |
Summary
Make DuckDB 1.5's GEOMETRY CRS metadata (
GEOMETRY('EPSG:N')) survive a full round trip through PostgreSQL/PostGIS viapostgres_scanner.Write side (DuckDB → PostGIS)
PostgresBinaryWriter::WriteGeometrydetects CRS on theLogicalTypeand writes EWKB instead of plain WKB so PostGIS receives the SRID:Falls back to plain WKB when no CRS is set, preserving previous behaviour. Uses
GeoType::HasCRS/GeoType::GetCRS(the earlier draft mistakenly referencedLogicalType::GetCRS, which does not exist on the public DuckDB 1.5 API).Read side (PostGIS → DuckDB)
PostgresUtils::TypeToLogicalTypedecodes the SRID from PostGIS's column-level typmod forgeometry(TYPE, SRID)columns and returnsLogicalType::GEOMETRY("EPSG:N")instead of plainGEOMETRY(). The typmod layout (PostGIS source):Untyped
geometrycolumns carry typmod -1 and fall back to plainGEOMETRY(no CRS), matching the previous behaviour.Before / after
GEOMETRY('EPSG:4326')geometry(SRID=0)geometry(SRID=4326)geometry(POINT, 4326)GEOMETRY(no CRS)GEOMETRY('EPSG:4326')Test plan
make release).clang-formatclean (was a CI failure on the previous draft).npm run test:adbc-srid -- postgresql --srid=3857passes —remote_srid=3857,duckdb_crs=EPSG:3857,duckdb_type=GEOMETRY('EPSG:3857').