You have compressed log archives on disk. To query them you normally decompress everything first — even if you only need one hour out of thirty days.
This extension changes that. Query .pfc files directly from DuckDB SQL. A block index tells the extension exactly which chunks of the file to decompress — the rest stays compressed.
Requires: The
pfc_jsonlbinary installed on your machine (Step 1 below). The extension calls it for decompression.Platform: Linux x86_64, macOS Apple Silicon (ARM64), macOS Intel (x64), Windows x64 — all platforms supported as of v5.6.5.
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
SELECT
line->>'$.level' AS level,
line->>'$.message' AS message
FROM read_pfc_jsonl('/var/log/events.pfc')
WHERE line->>'$.level' = 'ERROR';PFC-JSONL is a high-performance compressed log format built for structured (JSONL) data. It achieves better compression than gzip and zstd on real log data while supporting random block access — meaning you can decompress only the time range you need.
Key properties:
- Each file is split into independently compressible blocks
- A
.pfc.bidxbinary index stores the byte offset and timestamp range of every block - The PFC binary can decompress any subset of blocks in a single call
- Free for personal and open-source use — no account, no signup required
┌──────────────────────────────────────────────────────────────┐
│ DuckDB │
│ │
│ SELECT * FROM read_pfc_jsonl('events.pfc', ts_from=...) │
│ │ │
│ ┌────────▼──────────┐ reads ┌─────────────────────┐ │
│ │ pfc extension │─────────────▶│ events.pfc.bidx │ │
│ │ (MIT, open src) │ block index │ (block timestamps) │ │
│ └────────┬──────────┘ └─────────────────────┘ │
│ │ popen() / subprocess │
└───────────┼──────────────────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ pfc_jsonl binary │ ← proprietary, closed source
│ (v3.4+, local) │ contains BWT+rANS compression
└─────────────────────┘
│
▼
decompressed JSON lines → back to DuckDB
The extension is a thin open-source wrapper — it reads the .bidx index in C++ to select which blocks are needed, then calls the PFC binary once to decompress only those blocks. The compression algorithm stays closed.
The extension calls the pfc_jsonl binary for decompression.
Download the latest release for your platform:
Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help # verify installmacOS (Apple Silicon M1/M2/M3/M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help # verify installmacOS Intel (x64):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-x64 \
-o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help # verify installWindows x64 (native — no WSL2 required):
Invoke-WebRequest `
-Uri "https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-windows-x64.exe" `
-OutFile "C:\bin\pfc_jsonl.exe"
# Add C:\bin to your PATH, then:
pfc_jsonl --help # verify installSet
PFC_JSONL_BINARY=/path/to/pfc_jsonl(Linux/macOS) or$env:PFC_JSONL_BINARY="C:\bin\pfc_jsonl.exe"(Windows PowerShell) to override the default binary path.
INSTALL pfc FROM community;
LOAD pfc;git clone --recurse-submodules https://github.com/ImpossibleForge/pfc-duckdb
cd pfc-duckdb
GEN=ninja make release
# Extension at: build/release/extension/pfc/pfc.duckdb_extensionLOAD pfc;
SELECT line FROM read_pfc_jsonl('/path/to/file.pfc');Each row contains one raw JSON string in the line column.
Use the DuckDB json extension to parse fields:
LOAD json;
SELECT
line->>'$.timestamp' AS ts,
line->>'$.level' AS level,
line->>'$.message' AS message,
line->>'$.service' AS service
FROM read_pfc_jsonl('/path/to/file.pfc');PFC files include a .pfc.bidx index with the timestamp range of each block.
Pass ts_from and/or ts_to (Unix seconds) to skip entire blocks before decompression:
-- Only decompress blocks that overlap the given time window
SELECT line
FROM read_pfc_jsonl(
'/path/to/file.pfc',
ts_from = 1735689600, -- 2026-01-01 00:00:00 UTC
ts_to = 1735775999 -- 2026-01-01 23:59:59 UTC
);Convert a timestamp string to Unix seconds with epoch():
SELECT line
FROM read_pfc_jsonl(
'/path/to/file.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')
);ts_from/ts_to skip entire blocks (coarse, fast).
Add a WHERE clause for row-level precision:
LOAD json;
SELECT line->>'$.message' AS msg
FROM read_pfc_jsonl(
'/var/log/api.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-03-15 08:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-03-15 10:00:00+00')
)
WHERE line->>'$.level' = 'ERROR';LOAD json;
-- Error rate per hour
SELECT
strftime(to_timestamp((line->>'$.ts')::BIGINT), '%Y-%m-%d %H:00') AS hour,
count(*) FILTER (WHERE line->>'$.level' = 'ERROR') AS errors,
count(*) AS total
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY hour ORDER BY hour;
-- Top 10 slowest endpoints
SELECT
line->>'$.path' AS endpoint,
avg((line->>'$.duration_ms')::DOUBLE) AS avg_ms,
count(*) AS requests
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY endpoint ORDER BY avg_ms DESC LIMIT 10;| Parameter | Type | Default | Description |
|---|---|---|---|
path |
VARCHAR | — | Path to the .pfc file. A .pfc.bidx index must exist at path + ".bidx". |
ts_from |
BIGINT | 0 | Lower bound for block selection (Unix seconds). 0 = no lower bound. |
ts_to |
BIGINT | 0 | Upper bound for block selection (Unix seconds). 0 = no upper bound. |
Returns: table with one column line VARCHAR — one row per decompressed JSON line.
Block filtering semantics:
A block is included if its timestamp range [ts_start, ts_end] overlaps [ts_from, ts_to].
Blocks with unknown timestamps are always included.
If both ts_from and ts_to are 0, all blocks are read.
| File | Required | Description |
|---|---|---|
file.pfc |
yes | Compressed PFC-JSONL file |
file.pfc.bidx |
yes | Binary block index (requires PFC-JSONL v3.4+) |
Generate both with the PFC binary:
pfc_jsonl compress input.jsonl output.pfc
# Produces: output.pfc + output.pfc.bidxNote: The Docker image on Docker Hub (
impossibleforge/pfc-jsonl) is a server-side compression tool. It is not required for using the DuckDB extension — you only need the standalonepfc_jsonlbinary from GitHub Releases.
Block-level filtering can skip the majority of a file. Example: 30-day log file, 720 hourly blocks — a 1-hour query reads 1 block instead of 720.
| Query range | Blocks read | Speedup (720-block file) |
|---|---|---|
| 30 days | 720/720 | 1× |
| 1 day | ~24/720 | ~30× |
| 1 hour | ~1/720 | ~720× |
Cannot open index file: /path/to/file.pfc.bidx
The .pfc.bidx index is missing. Compress with PFC-JSONL v3.4+:
pfc_jsonl compress input.jsonl output.pfcPFC binary not found at '/usr/local/bin/pfc_jsonl'
Binary is missing or not executable. Re-run the curl install command, or set PFC_JSONL_BINARY=/path/to/pfc_jsonl.
popen() failed — could not start PFC binary subprocess
The extension uses popen() to call the PFC binary. Make sure the binary is installed and executable (see Step 1). Native Windows x64 binary is available as of v5.6.5.
ts_from (...) must be <= ts_to (...)
You passed an inverted time range. Swap the values so ts_from comes before ts_to.
ts_from/ts_to returns 0 rows even though data exists in that time range
This was a bug in PFC-JSONL v3.4 — the block index timestamps were off by one day, causing all block-filtered queries to return 0 rows. Upgrade to v5.6.5 to fix this.
Queries on old v3.4 archives return no results
PFC-JSONL v5.6.5 fully supports reading v3.4 archives via DuckDB. If queries on v3.4 files return 0 rows, upgrade the pfc_jsonl binary to v5.6.5.
→ View all PFC tools & integrations
| Direct integration | Why |
|---|---|
| pfc-gateway | HTTP alternative — query .pfc archives without a local DuckDB install |
PFC-DuckDB is an independent open-source project and is not affiliated with, endorsed by, or associated with the DuckDB Foundation or DuckDB Labs.
The pfc DuckDB extension (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com