From f1b8cd33ef564aaba434f43149ca2e33c8e7f688 Mon Sep 17 00:00:00 2001 From: Rune Johansen Date: Tue, 12 May 2026 09:36:23 +0200 Subject: [PATCH 1/4] docs: add parquet chapter --- docs/PxWebApi/documentation/user-guide.md | 93 +++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/docs/PxWebApi/documentation/user-guide.md b/docs/PxWebApi/documentation/user-guide.md index 6661bf2..9f7085c 100644 --- a/docs/PxWebApi/documentation/user-guide.md +++ b/docs/PxWebApi/documentation/user-guide.md @@ -1103,6 +1103,8 @@ The API can provide the result in 7 main formats: You select the format you want the response to be in by setting the parameter `outputFormat`. +### JSON-stat v2 + ??? info "About JSON-stat v2" JSON-stat is a format specifically developed to display statistical tables, that is, datasets with many dimensions. JSON-stat represents the values in @@ -1139,6 +1141,97 @@ You select the format you want the response to be in by setting the parameter `o - - . + +### Parquet + +New in this API is the [Apache Parquet](https://parquet.apache.org/) output format. + +We create a column for each varible and separate colums for `timestamp`, `value` +and `value_symbol`. When more content variables are selected the `value` and +`value_symbol` colums will be renamed with the `ContentsCode_` prefix. + +Inspecting this request with [parqeye](https://github.com/kaushiksrini/parqeye) +shows the following views. + +Request + +```sh +https://data.qa.ssb.no/api/pxwebapi/v2/tables/04475/data?lang=en&outputFormat=parquet&valuecodes[Tid]=2025K1,2025K2,2025K3,2025K4&valuecodes[ContentsCode]=ForbrukVareliter&valuecodes[Alkohol]=03 +``` + +Visualize + +```sh + type of beverage quarter timestamp value value_symbol +──────┬───────────────────────────────────────────────────────────────────────── +1 │ "03" "2025K1" 2025-01-01 00:00:00 54185.0 NULL +2 │ "03" "2025K2" 2025-04-01 00:00:00 73012.0 NULL +3 │ "03" "2025K3" 2025-07-01 00:00:00 65806.0 NULL +4 │ "03" "2025K4" 2025-10-01 00:00:00 67327.0 NULL +``` + +Metadata + +```sh +╭────────────────────────────────File Metadata─────────────────────────────────╮ +│ Format version 1 │ +│ Created by Parquet.Net version 4.25.0 (build 687fbb462e94eddd1dc5a0aa26 +│ Rows 4 │ +│ Columns 5 │ +│ Row groups 1 │ +│ Size (raw) 411 B │ +│ Size (compressed) 394 B │ +│ Compression ratio 1.04x │ +│ Codecs (cols) SNAPPY(5) │ +│ Encodings BIT_PACKED, PLAIN, RLE │ +│ Avg row size 102 B │ +╰──────────────────────────────────────────────────────────────────────────────╯ +``` + +Schema + +```sh +╭───────Schema Tree───────╮╭─────────────────Column Statistics─────────────────╮ +│└─ root ││Repetition Physical Compressed Uncompressed │ +│ ├─ type of beverage ││OPTIONAL BYTE_ARRAY 71 B 67 B │ +│ ├─ quarter ││OPTIONAL BYTE_ARRAY 90 B 99 B │ +│ ├─ timestamp ││REQUIRED INT96 111 B 125 B │ +│ ├─ value ││REQUIRED DOUBLE 93 B 93 B │ +│ └─ value_symbol ││OPTIONAL BYTE_ARRAY 29 B 27 B │ +│ ││ │ +╰───────Leaf, Group───────╯╰───────────────────────────────────────────────────╯ +``` + +#### DuckDB example + +```sh +% duckdb +DuckDB v1.5.2 (Variegata) +Enter ".help" for usage hints. +memory D SELECT * FROM read_parquet('https://data.qa.ssb.no/api/pxwebapi/v2/tables/04475/data?lang=en&outputFormat=parquet&valuecodes[Tid]=2025K1,2025K2,2025K3,2025K4&valuecodes[ContentsCode]=ForbrukVareliter&valuecodes[Alkohol]=03'); +┌──────────────────┬─────────┬─────────────────────┬─────────┬──────────────┐ +│ type of beverage │ quarter │ timestamp │ value │ value_symbol │ +│ varchar │ varchar │ timestamp │ double │ varchar │ +├──────────────────┼─────────┼─────────────────────┼─────────┼──────────────┤ +│ 03 │ 2025K1 │ 2025-01-01 00:00:00 │ 54185.0 │ NULL │ +│ 03 │ 2025K2 │ 2025-04-01 00:00:00 │ 73012.0 │ NULL │ +│ 03 │ 2025K3 │ 2025-07-01 00:00:00 │ 65806.0 │ NULL │ +│ 03 │ 2025K4 │ 2025-10-01 00:00:00 │ 67327.0 │ NULL │ +└──────────────────┴─────────┴─────────────────────┴─────────┴──────────────┘ +``` + +#### Known issues + +!!! warning + We may have to change the format to fix some of these issues + +- [x] ~~[Multiple contents and time odering bug](https://github.com/PxTools/PxWebApi/issues/511)~~ +- [ ] Parquet serializer cannot create `timestamp` and fails when `TimeUnit=Other` +- [ ] Parquet serializer cannot create `timestamp` and fails on invalid `TimeUnit` +- [ ] Consider switching from `DataField` to `DecimalDataField` in [Parquet.Net](https://github.com/aloneguid/parquet-dotnet) +- [ ] Onyxia [Data Explorer](https://datalab.sspcloud.fr/data-explorer) + fail with parquet from PxWebApi. + ### Additionally parameters Some of the output format can take extra parameters that determines how the From a7e85781d0311beadf7b1155d7508e9aa0a0b9b7 Mon Sep 17 00:00:00 2001 From: Rune Johansen Date: Wed, 13 May 2026 13:56:35 +0200 Subject: [PATCH 2/4] docs: update Parquet section with new issues and fixes --- docs/PxWebApi/documentation/user-guide.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/docs/PxWebApi/documentation/user-guide.md b/docs/PxWebApi/documentation/user-guide.md index 9f7085c..4af5fe6 100644 --- a/docs/PxWebApi/documentation/user-guide.md +++ b/docs/PxWebApi/documentation/user-guide.md @@ -1141,7 +1141,6 @@ You select the format you want the response to be in by setting the parameter `o - - . - ### Parquet New in this API is the [Apache Parquet](https://parquet.apache.org/) output format. @@ -1225,12 +1224,10 @@ memory D SELECT * FROM read_parquet('https://data.qa.ssb.no/api/pxwebapi/v2/tabl !!! warning We may have to change the format to fix some of these issues -- [x] ~~[Multiple contents and time odering bug](https://github.com/PxTools/PxWebApi/issues/511)~~ -- [ ] Parquet serializer cannot create `timestamp` and fails when `TimeUnit=Other` -- [ ] Parquet serializer cannot create `timestamp` and fails on invalid `TimeUnit` -- [ ] Consider switching from `DataField` to `DecimalDataField` in [Parquet.Net](https://github.com/aloneguid/parquet-dotnet) -- [ ] Onyxia [Data Explorer](https://datalab.sspcloud.fr/data-explorer) - fail with parquet from PxWebApi. +- [x] [Multiple contents and time odering bug](https://github.com/PxTools/PxWebApi/issues/511) +- [ ] [Parquet seralizer throws exception on TimeScaleType](https://github.com/PxTools/PxWebApi/issues/595) +- [ ] [Consider switching from `DataField` to `DecimalDataField`](https://github.com/PxTools/PxWebApi/issues/596) +- [ ] [Parquet does not work in Onyxia Data Explorer](https://github.com/PxTools/PxWebApi/issues/597) ### Additionally parameters From 6cfeef7baa7b06dfb51f9e61fbae9e664bd44a36 Mon Sep 17 00:00:00 2001 From: Rune Johansen Date: Wed, 13 May 2026 13:58:51 +0200 Subject: [PATCH 3/4] Update README.md https://github.com/squidfunk/mkdocs-material/issues/8478 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6aea017..7d99d71 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ We use [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) for t ### Docker ``` sh -docker run --rm -it -p 8000:8000 -v ${PWD}:/docs squidfunk/mkdocs-material +docker run --rm -it -p 8000:8000 -v ${PWD}:/docs squidfunk/mkdocs-material:9.6.20 ``` Browse From 8b96123b5f4bd8d36f5e0cd0a4a1921fbe597147 Mon Sep 17 00:00:00 2001 From: Rune Johansen Date: Wed, 13 May 2026 14:13:18 +0200 Subject: [PATCH 4/4] docs: update Parquet section to indicate beta status and add known issues --- docs/PxWebApi/documentation/user-guide.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/PxWebApi/documentation/user-guide.md b/docs/PxWebApi/documentation/user-guide.md index 4af5fe6..788c5af 100644 --- a/docs/PxWebApi/documentation/user-guide.md +++ b/docs/PxWebApi/documentation/user-guide.md @@ -1099,7 +1099,7 @@ The API can provide the result in 7 main formats: - `xlsx` (Excel) - `html` - `json-px` -- `parquet` +- `parquet` (beta) You select the format you want the response to be in by setting the parameter `outputFormat`. @@ -1141,7 +1141,7 @@ You select the format you want the response to be in by setting the parameter `o - - . -### Parquet +### Parquet (beta) New in this API is the [Apache Parquet](https://parquet.apache.org/) output format. @@ -1219,7 +1219,7 @@ memory D SELECT * FROM read_parquet('https://data.qa.ssb.no/api/pxwebapi/v2/tabl └──────────────────┴─────────┴─────────────────────┴─────────┴──────────────┘ ``` -#### Known issues +#### Parquet Known issues !!! warning We may have to change the format to fix some of these issues @@ -1365,3 +1365,5 @@ Possible error codes if the query does not return a response: to include all newer periods the next time you run it. In that case, you must adjust the URL to `valueCode[Time]=*` or `from(start time)`, alternatively `top(number of newest periods)`. + +- See also [knows issues under parquet](#parquet-known-issues) output format