feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow)#1431
Open
skido-cw wants to merge 2 commits into
Open
Conversation
Add first-class modelling for the three new TurboIntegrator input
datasource types introduced in TM1 v12 build 12.6.1 (Apache Arrow
IPC/Feather, Parquet, and Arrow Flight), verified end-to-end against a
live PA 12.6.1 server.
Process:
- four new datasource_flight_* scalars (constructor, from_dict, properties)
- ARROW/PARQUET branch reuses dataSourceNameForServer/Client and optional
jsonRootPointer / jsonVariableMapping (no ascii* fields)
- FLIGHT branch emits the server's verified wire fields: flightLocation /
flightDescriptorType / flightDescriptor / flightAuth
- columnar/Flight branches match Type case-insensitively, because the
server canonicalizes Type to title-case on read-back ("Arrow"/"Parquet"/
"Flight"); this keeps from_dict(server_response) round-trips from
collapsing to an empty DataSource block
ProcessService.get/get_all select the four flight fields.
datasource_type stays a free string (no Type whitelist) and no version
gating is added, since ProcessDataSource is an OData open type validated by
the engine at execute time.
Tests: offline unit tests for the body shapes, from_dict round-trips, and
the title-case Type round-trip (17 total), plus an optional live round-trip
test gated to TM1 >= 12.6.1. Offline suite and the live 12.6.1 round-trip
both pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 12.6.1+) Add a pure-Python columnar write path to cells.write_dataframe, mirroring the calling convention of tm1py-rust: pass use_parquet=True (with optional parquet_compression) or use_arrow=True to load a DataFrame through a TM1 12.6.1 ARROW / PARQUET TI datasource instead of the CSV/ASCII blob. - Encodes the aligned DataFrame to an Arrow-IPC stream or a Parquet file via pyarrow (fields v1..vN + vValue; a numeric value column -> float64 -> a Numeric TI variable, no client-side CSV serialization/escaping). - Uploads via FileService and runs an unbound ARROW/PARQUET-datasource process (_build_columnar_blob_to_cube_process mirrors _build_blob_to_cube_process), then deletes the blob. Gated to server >= 12.6.1. - pyarrow is an optional extra (pip install TM1py[arrow]); a guarded import raises a clear error if use_arrow/use_parquet is used without it. Builds on the ARROW/PARQUET Process datasource modelling. Offline unit tests cover the encoder (Arrow/Parquet bytes, field names/types) and the generated TI process body. Validated end-to-end against a live PA 12.6.1 server. Precision note: contrary to the tm1py-rust comments, on the live 12.6.1 server the columnar numeric path is NOT bit-exact for values at the edge of double precision (read-back rounds to ~16 significant digits) whereas the ASCII/CSV path round-trips exactly. Docstrings state precision is ~float64, not guaranteed bit-exact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related changes for the new TM1 v12, build 12.6.1 columnar data sources (Apache Arrow IPC, Parquet, Arrow Flight):
Process— author/read processes whoseDataSource.TypeisARROW/PARQUET/FLIGHT(previously these emitted an empty"DataSource": {}).cells.write_dataframe(..., use_parquet=True | use_arrow=True)— a pure-Python columnar blob-write path: encode the DataFrame to a Parquet file / Arrow-IPC stream and load it through anARROW/PARQUETTI datasource instead of the CSV/ASCII blob.Both commits are verified end-to-end against a live PA 12.6.1 server.
1 — Process datasource modelling
Processgains fourdatasource_flight_*scalars (constructor,from_dict, properties) and two_construct_body_as_dictbranches:ARROW/PARQUET: file name or http(s) URL viadataSourceNameForServer/Client, plus optionaljsonRootPointer/jsonVariableMappingfor nested columns; noascii*keys.FLIGHT:flightLocation/flightDescriptorType/flightDescriptor/flightAuth.Typecase-insensitively because the server canonicalizes it to title-case on read-back (Arrow/Parquet/Flight) — otherwisefrom_dict(get(...))would round-trip to an emptyDataSource.ProcessService.get/get_allselect the four flight fields.datasource_typestays a free string (noTypewhitelist) and no version gating is added —ProcessDataSourceis an OData open type validated by the engine at execute time.2 — Columnar
write_dataframeuse_arrow/use_parquet/parquet_compressiononcells.write_dataframe._encode_dataframe_columnar(pyarrow): DataFrame → Arrow-IPC stream or Parquet file; fieldsv1..vN(coordinates, String) +vValue(numeric column →float64→ a Numeric TI variable, no client-side CSV serialization/escaping)._build_columnar_blob_to_cube_processmirrors_build_blob_to_cube_processbut targets the columnar datasource;_write_dataframe_through_columnar_blobuploads viaFileService, runs the unbound TI, and deletes the blob. Gated to server ≥ 12.6.1.pyarrowis an optional extra (pip install TM1py[arrow]); a guarded import raises a clear error ifuse_arrow/use_parquetis used without it.Precision note (measured live)
On the live 12.6.1 server the columnar numeric path is not bit-exact for values at the edge of double precision — read-back rounds to ~16 significant digits, whereas the ASCII/CSV path round-trips exactly. The Parquet file itself preserves the exact
float64(covered by a unit test), so the rounding is server-side in TM1's columnar ingestion. Docstrings state precision is ~float64, not guaranteed bit-exact. The benefits here are the columnar datasource path, no client-side CSV building, and Parquet upload compression — not added precision.Testing
Processbody shapes +from_dictround-trips incl. title-caseType(Tests/Process_test.py); the columnar encoder and generated TI process body (Tests/CellService_columnar_test.py). 28 tests, all pass.ProcessServiceround-trip test (gated>= 12.6.1); and thewrite_dataframe(use_parquet/use_arrow)path validated via a self-cleaning probe.black --check .andruff check .pass.Backward compatibility
No changes to existing datasource branches (
ASCII/None/ODBC/TM1CubeView/TM1DimensionSubset/JSON) or to the defaultwrite_dataframepath; the new behavior is opt-in.🤖 Generated with Claude Code