Skip to content

feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow)#1431

Open
skido-cw wants to merge 2 commits into
cubewise-code:masterfrom
skido-cw:feature/write-dataframe-columnar-blob
Open

feat: TM1 12.6.1 Arrow/Parquet/Flight datasources + write_dataframe(use_parquet/use_arrow)#1431
skido-cw wants to merge 2 commits into
cubewise-code:masterfrom
skido-cw:feature/write-dataframe-columnar-blob

Conversation

@skido-cw

Copy link
Copy Markdown
Contributor

Summary

Two related changes for the new TM1 v12, build 12.6.1 columnar data sources (Apache Arrow IPC, Parquet, Arrow Flight):

  1. Model the new TI datasource types on Process — author/read processes whose DataSource.Type is ARROW / PARQUET / FLIGHT (previously these emitted an empty "DataSource": {}).
  2. cells.write_dataframe(..., use_parquet=True | use_arrow=True) — a pure-Python columnar blob-write path: encode the DataFrame to a Parquet file / Arrow-IPC stream and load it through an ARROW/PARQUET TI datasource instead of the CSV/ASCII blob.

Both commits are verified end-to-end against a live PA 12.6.1 server.

1 — Process datasource modelling

  • Process gains four datasource_flight_* scalars (constructor, from_dict, properties) and two _construct_body_as_dict branches:
    • ARROW/PARQUET: file name or http(s) URL via dataSourceNameForServer/Client, plus optional jsonRootPointer/jsonVariableMapping for nested columns; no ascii* keys.
    • FLIGHT: flightLocation / flightDescriptorType / flightDescriptor / flightAuth.
  • Branches match Type case-insensitively because the server canonicalizes it to title-case on read-back (Arrow/Parquet/Flight) — otherwise from_dict(get(...)) would round-trip to an empty DataSource.
  • ProcessService.get/get_all select the four flight fields.
  • datasource_type stays a free string (no Type whitelist) and no version gating is added — ProcessDataSource is an OData open type validated by the engine at execute time.

Field names and the title-case canonicalization were verified against the live 12.6.1 server (the flight* names persist and round-trip; the dataSourceFlight* names are silently dropped).

2 — Columnar write_dataframe

  • New keyword params use_arrow / use_parquet / parquet_compression on cells.write_dataframe.
  • _encode_dataframe_columnar (pyarrow): DataFrame → Arrow-IPC stream or Parquet file; fields v1..vN (coordinates, String) + vValue (numeric column → float64 → a Numeric TI variable, no client-side CSV serialization/escaping).
  • _build_columnar_blob_to_cube_process mirrors _build_blob_to_cube_process but targets the columnar datasource; _write_dataframe_through_columnar_blob uploads via FileService, runs the unbound TI, and deletes the blob. Gated to server ≥ 12.6.1.
  • pyarrow is an optional extra (pip install TM1py[arrow]); a guarded import raises a clear error if use_arrow/use_parquet is used without it.

Precision note (measured live)

On the live 12.6.1 server the columnar numeric path is not bit-exact for values at the edge of double precision — read-back rounds to ~16 significant digits, whereas the ASCII/CSV path round-trips exactly. The Parquet file itself preserves the exact float64 (covered by a unit test), so the rounding is server-side in TM1's columnar ingestion. Docstrings state precision is ~float64, not guaranteed bit-exact. The benefits here are the columnar datasource path, no client-side CSV building, and Parquet upload compression — not added precision.

Testing

  • Offline unit tests (no server): Process body shapes + from_dict round-trips incl. title-case Type (Tests/Process_test.py); the columnar encoder and generated TI process body (Tests/CellService_columnar_test.py). 28 tests, all pass.
  • Live (TM1 12.6.1): the optional ProcessService round-trip test (gated >= 12.6.1); and the write_dataframe(use_parquet/use_arrow) path validated via a self-cleaning probe.
  • black --check . and ruff check . pass.

Backward compatibility

No changes to existing datasource branches (ASCII/None/ODBC/TM1CubeView/TM1DimensionSubset/JSON) or to the default write_dataframe path; the new behavior is opt-in.

🤖 Generated with Claude Code

skido-cw and others added 2 commits June 26, 2026 16:00
Add first-class modelling for the three new TurboIntegrator input
datasource types introduced in TM1 v12 build 12.6.1 (Apache Arrow
IPC/Feather, Parquet, and Arrow Flight), verified end-to-end against a
live PA 12.6.1 server.

Process:
- four new datasource_flight_* scalars (constructor, from_dict, properties)
- ARROW/PARQUET branch reuses dataSourceNameForServer/Client and optional
  jsonRootPointer / jsonVariableMapping (no ascii* fields)
- FLIGHT branch emits the server's verified wire fields: flightLocation /
  flightDescriptorType / flightDescriptor / flightAuth
- columnar/Flight branches match Type case-insensitively, because the
  server canonicalizes Type to title-case on read-back ("Arrow"/"Parquet"/
  "Flight"); this keeps from_dict(server_response) round-trips from
  collapsing to an empty DataSource block

ProcessService.get/get_all select the four flight fields.

datasource_type stays a free string (no Type whitelist) and no version
gating is added, since ProcessDataSource is an OData open type validated by
the engine at execute time.

Tests: offline unit tests for the body shapes, from_dict round-trips, and
the title-case Type round-trip (17 total), plus an optional live round-trip
test gated to TM1 >= 12.6.1. Offline suite and the live 12.6.1 round-trip
both pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 12.6.1+)

Add a pure-Python columnar write path to cells.write_dataframe, mirroring the
calling convention of tm1py-rust: pass use_parquet=True (with optional
parquet_compression) or use_arrow=True to load a DataFrame through a TM1 12.6.1
ARROW / PARQUET TI datasource instead of the CSV/ASCII blob.

- Encodes the aligned DataFrame to an Arrow-IPC stream or a Parquet file via
  pyarrow (fields v1..vN + vValue; a numeric value column -> float64 -> a
  Numeric TI variable, no client-side CSV serialization/escaping).
- Uploads via FileService and runs an unbound ARROW/PARQUET-datasource process
  (_build_columnar_blob_to_cube_process mirrors _build_blob_to_cube_process),
  then deletes the blob. Gated to server >= 12.6.1.
- pyarrow is an optional extra (pip install TM1py[arrow]); a guarded import
  raises a clear error if use_arrow/use_parquet is used without it.

Builds on the ARROW/PARQUET Process datasource modelling.

Offline unit tests cover the encoder (Arrow/Parquet bytes, field names/types)
and the generated TI process body. Validated end-to-end against a live PA
12.6.1 server.

Precision note: contrary to the tm1py-rust comments, on the live 12.6.1 server
the columnar numeric path is NOT bit-exact for values at the edge of double
precision (read-back rounds to ~16 significant digits) whereas the ASCII/CSV
path round-trips exactly. Docstrings state precision is ~float64, not
guaranteed bit-exact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant