feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow) by skido-cw · Pull Request #1431 · cubewise-code/tm1py

skido-cw · 2026-06-29T16:17:53Z

Summary

Two related changes for the new TM1 v12, build 12.6.1 columnar data sources (Apache Arrow IPC, Parquet, Arrow Flight):

Model the new TI datasource types on Process — author/read processes whose DataSource.Type is ARROW / PARQUET / FLIGHT (previously these emitted an empty "DataSource": {}).
cells.write_dataframe(..., use_parquet=True | use_arrow=True) — a pure-Python columnar blob-write path: encode the DataFrame to a Parquet file / Arrow-IPC stream and load it through an ARROW/PARQUET TI datasource instead of the CSV/ASCII blob.

Both commits are verified end-to-end against a live PA 12.6.1 server.

1 — Process datasource modelling

Process gains four datasource_flight_* scalars (constructor, from_dict, properties) and two _construct_body_as_dict branches:
- ARROW/PARQUET: file name or http(s) URL via dataSourceNameForServer/Client, plus optional jsonRootPointer/jsonVariableMapping for nested columns; no ascii* keys.
- FLIGHT: flightLocation / flightDescriptorType / flightDescriptor / flightAuth.
Branches match Type case-insensitively because the server canonicalizes it to title-case on read-back (Arrow/Parquet/Flight) — otherwise from_dict(get(...)) would round-trip to an empty DataSource.
ProcessService.get/get_all select the four flight fields.
datasource_type stays a free string (no Type whitelist) and no version gating is added — ProcessDataSource is an OData open type validated by the engine at execute time.

Field names and the title-case canonicalization were verified against the live 12.6.1 server (the flight* names persist and round-trip; the dataSourceFlight* names are silently dropped).

2 — Columnar `write_dataframe`

New keyword params use_arrow / use_parquet / parquet_compression on cells.write_dataframe.
_encode_dataframe_columnar (pyarrow): DataFrame → Arrow-IPC stream or Parquet file; fields v1..vN (coordinates, String) + vValue (numeric column → float64 → a Numeric TI variable, no client-side CSV serialization/escaping).
_build_columnar_blob_to_cube_process mirrors _build_blob_to_cube_process but targets the columnar datasource; _write_dataframe_through_columnar_blob uploads via FileService, runs the unbound TI, and deletes the blob. Gated to server ≥ 12.6.1.
pyarrow is an optional extra (pip install TM1py[arrow]); a guarded import raises a clear error if use_arrow/use_parquet is used without it.

Precision note (measured live)

On the live 12.6.1 server the columnar numeric path is not bit-exact for values at the edge of double precision — read-back rounds to ~16 significant digits, whereas the ASCII/CSV path round-trips exactly. The Parquet file itself preserves the exact float64 (covered by a unit test), so the rounding is server-side in TM1's columnar ingestion. Docstrings state precision is ~float64, not guaranteed bit-exact. The benefits here are the columnar datasource path, no client-side CSV building, and Parquet upload compression — not added precision.

Testing

Offline unit tests (no server): Process body shapes + from_dict round-trips incl. title-case Type (Tests/Process_test.py); the columnar encoder and generated TI process body (Tests/CellService_columnar_test.py). 28 tests, all pass.
Live (TM1 12.6.1): the optional ProcessService round-trip test (gated >= 12.6.1); and the write_dataframe(use_parquet/use_arrow) path validated via a self-cleaning probe.
black --check . and ruff check . pass.

Backward compatibility

No changes to existing datasource branches (ASCII/None/ODBC/TM1CubeView/TM1DimensionSubset/JSON) or to the default write_dataframe path; the new behavior is opt-in.

🤖 Generated with Claude Code

Add first-class modelling for the three new TurboIntegrator input datasource types introduced in TM1 v12 build 12.6.1 (Apache Arrow IPC/Feather, Parquet, and Arrow Flight), verified end-to-end against a live PA 12.6.1 server. Process: - four new datasource_flight_* scalars (constructor, from_dict, properties) - ARROW/PARQUET branch reuses dataSourceNameForServer/Client and optional jsonRootPointer / jsonVariableMapping (no ascii* fields) - FLIGHT branch emits the server's verified wire fields: flightLocation / flightDescriptorType / flightDescriptor / flightAuth - columnar/Flight branches match Type case-insensitively, because the server canonicalizes Type to title-case on read-back ("Arrow"/"Parquet"/ "Flight"); this keeps from_dict(server_response) round-trips from collapsing to an empty DataSource block ProcessService.get/get_all select the four flight fields. datasource_type stays a free string (no Type whitelist) and no version gating is added, since ProcessDataSource is an OData open type validated by the engine at execute time. Tests: offline unit tests for the body shapes, from_dict round-trips, and the title-case Type round-trip (17 total), plus an optional live round-trip test gated to TM1 >= 12.6.1. Offline suite and the live 12.6.1 round-trip both pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… 12.6.1+) Add a pure-Python columnar write path to cells.write_dataframe, mirroring the calling convention of tm1py-rust: pass use_parquet=True (with optional parquet_compression) or use_arrow=True to load a DataFrame through a TM1 12.6.1 ARROW / PARQUET TI datasource instead of the CSV/ASCII blob. - Encodes the aligned DataFrame to an Arrow-IPC stream or a Parquet file via pyarrow (fields v1..vN + vValue; a numeric value column -> float64 -> a Numeric TI variable, no client-side CSV serialization/escaping). - Uploads via FileService and runs an unbound ARROW/PARQUET-datasource process (_build_columnar_blob_to_cube_process mirrors _build_blob_to_cube_process), then deletes the blob. Gated to server >= 12.6.1. - pyarrow is an optional extra (pip install TM1py[arrow]); a guarded import raises a clear error if use_arrow/use_parquet is used without it. Builds on the ARROW/PARQUET Process datasource modelling. Offline unit tests cover the encoder (Arrow/Parquet bytes, field names/types) and the generated TI process body. Validated end-to-end against a live PA 12.6.1 server. Precision note: contrary to the tm1py-rust comments, on the live 12.6.1 server the columnar numeric path is NOT bit-exact for values at the edge of double precision (read-back rounds to ~16 significant digits) whereas the ASCII/CSV path round-trips exactly. Docstrings state precision is ~float64, not guaranteed bit-exact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

skido-cw and others added 2 commits June 26, 2026 16:00

skido-cw mentioned this pull request Jun 29, 2026

feat: model TM1 12.6.1 Arrow/Parquet/Flight TI datasources #1429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow)#1431

feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow)#1431
skido-cw wants to merge 2 commits into
cubewise-code:masterfrom
skido-cw:feature/write-dataframe-columnar-blob

skido-cw commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

skido-cw commented Jun 29, 2026

Summary

1 — Process datasource modelling

2 — Columnar write_dataframe

Precision note (measured live)

Testing

Backward compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2 — Columnar `write_dataframe`