Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ MANIFEST
pip-log.txt
pip-delete-this-directory.txt

# JDBC driver JARs auto-downloaded by Flight SQL test fixtures
tests/.cache/

# Unit test / coverage reports
htmlcov/
.tox/
Expand Down
14 changes: 14 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,20 @@ poetry run ruff check slayer/ tests/
- **Tests use `pytest-asyncio`** with `asyncio_mode = "auto"` — test functions can be `async def` and `await` directly.
- **Sync wrappers**: `run_sync()` in `async_utils.py` bridges async→sync for CLI and MCP tools. Handles both "no event loop" and "inside Jupyter" cases.

## Flight SQL

- Port **5144** by default (one above the REST API's 5143). `slayer flight-serve [--host HOST] [--port PORT] [--storage PATH] [--token T] [--tls-cert C] [--tls-key K] [--demo]`. Wire-compatible with the upstream Apache `flight-sql-jdbc-driver` v18.3.0 — same JAR the dbt Semantic Layer connectors use. Lives in `slayer/flight/`.
- **Loopback no-token fallback** (auth.py): non-loopback binds without a `--token` (or `$SLAYER_FLIGHT_TOKEN`) are refused at startup. With `--demo` and no explicit `--host` or `--token`, the effective host defaults to `127.0.0.1` so the no-token fallback applies cleanly.
- **Stateless server**: the prepared-statement `handle` and Flight `Ticket.ticket` both carry the **original UTF-8 SQL bytes** (the ticket wraps them in `TicketStatementQuery` for ticket-shape conformance). `ActionClosePreparedStatementRequest` is a no-op — nothing to free.
- **Path A vs Path B** (the "LIMIT 0 two-round-trip" story): the JDBC driver always routes `executeQuery` through the prepared-statement triplet. The translator/handler chain runs three times per BI query — once on `CreatePreparedStatement`, once on `get_flight_info(CommandPreparedStatementQuery)`, once on `do_get`. Database round-trips stay at two (`LIMIT 0` for schema validation, then full).
- **Catalog convention**: dotted form end-to-end — `customers.regions.name`. Same form in `INFORMATION_SCHEMA.*`, in the BI-tool projection list, in `WHERE`, and in the SLayer DSL. No `__` → `.` rewrite step in the translator.
- **`Any` wrapping** (server.py / handlers.py): the Apache JDBC driver wraps every `do_action` body AND expects every `do_action` response body to be `google.protobuf.Any`-wrapped (`type_url` = the action class's full name); the pyarrow-flight Python client sends raw bytes. `_parse_action_body` accepts both; response always sends an `Any`. Don't strip the wrapper.
- **JDBC `token=X` is Phase 2** — the Apache driver pre-handshakes the bearer token. SLayer's middleware validates headers per RPC, not via handshake, so JDBC clients using `token=X` get `UNIMPLEMENTED` during handshake. The pyarrow Python client works because it sets per-call `Authorization` headers. `tests/integration/test_integration_flight.py::test_auth_positive` is `xfail(strict=True)` so a future handshake-handler implementation auto-promotes to PASSED.
- **JVM `--add-opens` for Arrow on Java 17+**: the upstream `flight-sql-jdbc-driver` reflectively pokes `java.nio.Buffer.address`, blocked by strict module access on Java 17+. The JayDeBeAPI integration tests pre-start JPype's JVM with `--add-opens=java.base/java.nio=ALL-UNNAMED` (+ `java.lang` + `java.util`) — see `tests/integration/conftest.py:_ensure_jvm_started_for_arrow`. Document this for DBeaver users.
- **Wire-capture story**: `tests/flight/fixtures/CAPTURE-FINDINGS.md` is the canonical record of what the upstream JDBC driver emits during a real session; `capture-latest.jsonl` holds the JSONL trace. Refresh by running `poetry run python tests/flight/capture_dbt_jdbc.py` (requires Java + Maven Central access for the JAR).
- **Test fixtures**: `jdbc_jar` auto-downloads + caches the JAR into `tests/.cache/`; `jaydebeapi_connect` is a connect factory; `capture_stub` boots a recording-only Flight stub. Java-free integration coverage is in `tests/integration/test_integration_flight_pyarrow_client.py`.
- **Wire schema is catalog-declared in Phase 1**: derived from `QueryResult.projection_types` (`Column.type` for dims, `ModelMeasure.type` for measures). The `LIMIT 0` engine call still runs for validation. A `ModelMeasure` with a wrong/absent `type` surfaces as `ArrowTypeError` over the wire — tighten by setting `ModelMeasure.type`. Phase 2 issue: drive the wire schema from the actual LIMIT-0 execution.

## CLI

- All commands accept `--storage` (directory for YAML, `.db` file for SQLite). Defaults to platform-appropriate path (`~/.local/share/slayer` on Linux, `~/Library/Application Support/slayer` on macOS). Override with `$SLAYER_STORAGE` env var. Legacy `--models-dir` still works.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ SLayer naturally evolves when the agent uses it. For example, if a query require

SLayer compiles queries into the correct SQL for your database, handling joins, aggregations, time-based calculations, and dialect differences. Its DSL is very expressive, [supporting](https://motley-slayer.readthedocs.io/en/latest/examples/04_time/time/) queries like _"month-on-month % increase in total revenue, compared to the previous year"_, [queries-as-models](https://motley-slayer.readthedocs.io/en/latest/examples/06_multistage_queries/multistage_queries/) and much more.

SLayer exposes [MCP](https://github.com/MotleyAI/slayer?tab=readme-ov-file#mcp-server), [REST API](https://github.com/MotleyAI/slayer?tab=readme-ov-file#rest-api), [CLI](https://github.com/MotleyAI/slayer?tab=readme-ov-file#cli) and [Python](https://github.com/MotleyAI/slayer?tab=readme-ov-file#python-client) interfaces and [supports](https://motley-slayer.readthedocs.io/en/latest/configuration/datasources/#supported-database-types) most popular databases.
SLayer exposes [MCP](https://github.com/MotleyAI/slayer?tab=readme-ov-file#mcp-server), [REST API](https://github.com/MotleyAI/slayer?tab=readme-ov-file#rest-api), [CLI](https://github.com/MotleyAI/slayer?tab=readme-ov-file#cli), [Python](https://github.com/MotleyAI/slayer?tab=readme-ov-file#python-client), and [Flight SQL](https://motley-slayer.readthedocs.io/en/latest/interfaces/flight-sql/) (JDBC, BI-tool compatible) interfaces and [supports](https://motley-slayer.readthedocs.io/en/latest/configuration/datasources/#supported-database-types) most popular databases.

### Example

Expand Down
124 changes: 124 additions & 0 deletions docs/getting-started/flight-sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Flight SQL Setup (BI Tools)

SLayer's Flight SQL endpoint speaks the same wire protocol the dbt Semantic Layer
JDBC connector uses. That means most modern BI tools can connect to SLayer with no
custom drivers — point them at SLayer's Flight SQL host:port and they treat it like
any other Flight SQL-compatible warehouse.

## Start the Server

```bash
# Quick demo — loopback, no auth, ingests the bundled Jaffle Shop dataset
slayer flight-serve --demo

# Production — non-loopback bind requires a bearer token
slayer flight-serve --host 0.0.0.0 --token "$(pass slayer-token)"
```

See [Flight SQL Interface](../interfaces/flight-sql.md) for the full flag reference,
auth model, TLS setup, and SQL subset.

## Per-Tool Connection Recipes

Each tool below is expected to work — these flows are wire-validated against the
upstream Apache `flight-sql-jdbc-driver`; the BI-tool-specific instructions match the
vendor's own dbt-SL connector documentation. Hand-test pending where noted.

### Power BI (via dbt Semantic Layer connector)

The dbt Semantic Layer connector ships as a Power BI custom connector and uses the
Apache Flight SQL JDBC driver under the hood.

* Host: `<slayer-host>`
* Port: `5144`
* `useEncryption`: `false` (or `true` if you set `--tls-cert`/`--tls-key`)
* Token: paste the value you passed to `--token`
* Database / Schema: leave blank — the SLayer catalog auto-resolves

> **Phase 1 caveat** for JDBC clients: see [the JDBC token note in the
> protocol reference](../interfaces/flight-sql.md#connection-url). For now, run the
> server with `--demo` on loopback (no token needed) until the handshake handler lands.

### Sigma

In Sigma's connection setup, choose **dbt Semantic Layer** as the connector type and
fill in:

```text
Host: <slayer-host>
Port: 5144
Service token: <slayer --token value>
```

### Looker

Use Looker's **dbt Semantic Layer** connection profile:

```text
Server: <slayer-host>:5144
Auth: bearer token
```

### Tableau

Tableau treats Flight SQL identifiers as case-sensitive by default. When picking models
and dimensions, **match SLayer's casing exactly** (lowercase model + column names in
the demo dataset). Configure the connection as:

```text
Server: <slayer-host>
Port: 5144
Authentication: dbt Semantic Layer token
```

### DBeaver Community

Use the generic JDBC driver dialog:

```text
Driver class: org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver
URL: jdbc:arrow-flight-sql://<slayer-host>:5144/?useEncryption=false&token=<token>
JAR: https://repo1.maven.org/maven2/org/apache/arrow/flight-sql-jdbc-driver/18.3.0/flight-sql-jdbc-driver-18.3.0.jar
```

Java 17+ users must add the Arrow memory-access JVM args to the DBeaver `dbeaver.ini`
(or pass via the driver's "VM Arguments"):

```text
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
```

### Hex

In Hex's Connection settings, choose **dbt Semantic Layer**:

```text
Endpoint: <slayer-host>:5144
Token: <slayer --token value>
```

## Sanity-check the Connection

The fastest way to verify a working connection is to inspect the `INFORMATION_SCHEMA.METRICS`
table from the BI tool:

```sql
SELECT * FROM INFORMATION_SCHEMA.METRICS LIMIT 20;
```

Then try a single-table SELECT against a real model — `row_count` is always available:

```sql
SELECT row_count FROM orders;
```

For a time-bucketed query:

```sql
SELECT month(ordered_at) AS m, row_count
FROM orders
WHERE ordered_at BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY m;
```
189 changes: 189 additions & 0 deletions docs/interfaces/flight-sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Flight SQL

SLayer exposes an [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html)
endpoint on port **5144** by default (one above the REST API's 5143). It is wire-compatible
with the upstream Apache `flight-sql-jdbc-driver`, which makes SLayer accessible from
JDBC-based BI tools (Power BI / Sigma / Looker / Tableau / Hex / DBeaver / dbt Semantic
Layer connectors) without any extra glue.

The endpoint is **read-only**: catalog introspection plus a constrained SQL subset that
translates to a `SlayerQuery` and executes against the engine. SQL `INSERT` / `UPDATE` /
`DELETE` / `CREATE` / `ALTER` / `DROP` are refused with a `read-only` error.

## Start the Server

```bash
# Local dev — loopback, no auth needed
slayer flight-serve --demo

# Production-ish — non-loopback bind requires a bearer token
slayer flight-serve --host 0.0.0.0 --token "$(pass slayer-token)"

# TLS-enabled
slayer flight-serve --host 0.0.0.0 --token TOK \
--tls-cert /etc/ssl/slayer.crt --tls-key /etc/ssl/slayer.key
```

Flags:

| Flag | Description |
|---|---|
| `--host HOST` | Bind address. Default `0.0.0.0`. With `--demo` and no token, defaults to `127.0.0.1` for the loopback fallback. |
| `--port PORT` | Default `5144`. |
| `--token T` | Bearer token. Falls back to `$SLAYER_FLIGHT_TOKEN`. Required for non-loopback binds. |
| `--tls-cert C` / `--tls-key K` | TLS certificate + key pair (must be supplied together). |
| `--demo` | Generate + ingest the bundled Jaffle Shop dataset before starting. |
| `--storage PATH` | Storage path (same as the REST + MCP servers). |

## Connection URL

The JDBC driver's connection URL follows the upstream Apache `flight-sql-jdbc-driver`
syntax:

```
jdbc:arrow-flight-sql://<host>:<port>/?useEncryption=<bool>[&token=<bearer>][&environmentId=<id>]
```

* `useEncryption=true` requires a TLS-enabled server (`--tls-cert` / `--tls-key`).
* `token=<bearer>` adds an `Authorization: Bearer <bearer>` header. **Phase 1 caveat:**
the Apache JDBC driver calls `handshake()` before its first real RPC to exchange the
token. SLayer's Phase 1 facade validates bearer tokens via header-based middleware on
every RPC, not via a handshake handler — so JDBC clients using `token=` will get an
`UNIMPLEMENTED` error during the handshake step. Use the pyarrow-flight Python client
(which honours per-call `Authorization` headers) until the handshake handler lands;
it is tracked as a Phase 2 follow-up.
* `environmentId=<id>` is logged at INFO on each request and otherwise ignored.

## Authentication

* No token configured → the server accepts unauthenticated requests **only** from a
loopback peer (`127.0.0.0/8` or `::1`). Non-loopback binds without a token are
refused at startup time.
* Token configured → every RPC must carry `Authorization: Bearer <token>`. Mismatched
or missing headers raise `UNAUTHENTICATED`.

## TLS

Pass `--tls-cert` and `--tls-key` together to enable TLS. The server advertises
`grpc+tls://<host>:<port>` and clients must connect with `useEncryption=true`. Supplying
only one of the pair is rejected at startup.

## Catalog Layout

SLayer exposes a single Flight catalog named **`slayer`** with one **schema per
datasource** and one **table per non-hidden `SlayerModel`** in that datasource. Each
table carries two fan-outs:

* **Metrics** — derived from each model's `columns` × eligible aggregations, plus saved
`ModelMeasure` formulas, plus custom aggregations on the model, plus a synthetic
`row_count` metric (`*:count`).
* **Dimensions** — every non-hidden column of the model, plus reachable join targets
walked up to depth 3.

Cross-model dimensions use **dotted** path syntax — `customers.regions.name` is a
multi-hop dimension on `orders` when `orders → customers → regions`. The same dotted
form is used in `INFORMATION_SCHEMA.*`, in the BI-tool projection list, in `WHERE`, and
in the SLayer DSL.

`*:count` is exposed as a column literally named `row_count`. If a user-defined column
is also named `row_count`, SLayer renames the synthetic to `_row_count` and logs a
warning at catalog-build time.

## SQL Subset

SLayer accepts a single-`FROM` `SELECT` that translates to a `SlayerQuery`:

| Feature | Notes |
|---|---|
| `SELECT <metric> [, ...]` | Each item must be a metric, dimension, or time-grain expression on the resolved table. |
| `month(<col>)`, `quarter(...)`, etc. | Time-grain wrappers on time-typed columns. Equivalent to `date_trunc('month', <col>)`. |
| `WHERE <col> BETWEEN '...' AND '...'` | On time-typed columns, lifts to `time_dimensions[*].date_range`. |
| `WHERE <col> >= '...'` / `<=` / `>` / `<` | Same lift for time bounds. |
| `WHERE ...` (everything else) | Passed verbatim into `SlayerQuery.filters`. |
| `GROUP BY` | Strict on extras, lenient on omissions. User items must be in the derived dimension set; missing ones are silently filled in from the projection. |
| `ORDER BY <col> [DESC \| ASC]` | Resolved against projected names. |
| `LIMIT N OFFSET M` | Integer literals only. |

**`SELECT *` is rejected** on Flight tables; the error includes a pointer to
`SELECT * FROM INFORMATION_SCHEMA.METRICS WHERE table_name=...` for discovery. `SELECT *`
**is** accepted on `INFORMATION_SCHEMA.*` itself.

### Probe-query whitelist

Four canned probes return canned results (used by interactive clients to test the
connection):

* `SELECT 1`
* `SELECT NULL WHERE 1=0`
* `SELECT version()` (also `SELECT @@version`)
* `SELECT current_database()`

### Bare-name table resolution

`SELECT ... FROM orders` searches every schema:

* Exactly one match → use it.
* Multiple matches → error naming each `<schema>.<table>` candidate.
* Zero matches → `Unknown table`.

Or qualify explicitly as `<schema>.<table>` or `slayer.<schema>.<table>`.

## INFORMATION_SCHEMA

The catalog exposes the following well-known introspection tables:

* `INFORMATION_SCHEMA.METRICS` — every metric in the catalog, keyed by table.
* `INFORMATION_SCHEMA.DIMENSIONS` — every dimension (including joined paths).
* `INFORMATION_SCHEMA.TABLES`, `COLUMNS`, `SCHEMATA` — JDBC-shaped equivalents of the
per-command Flight SQL RPCs.

## Prepared Statements

The Apache JDBC driver routes **every** `Statement.executeQuery` through the
prepared-statement triplet (`CreatePreparedStatement` → `GetFlightInfo` →
`do_get(<prepared-statement ticket>)`), not via `CommandStatementQuery`. SLayer's
implementation is stateless: the `prepared_statement_handle` is **the original
UTF-8 SQL bytes**, so `Close` is a no-op (nothing to free).

This means three translator runs per BI query (create-prepared + flight-info + do_get).
The database round-trip count is two: a `LIMIT 0` for schema validation on the
create-prepared step, then the full execution on `do_get`.

## DML / DDL behaviour

Any `INSERT` / `UPDATE` / `DELETE` / `MERGE` / `TRUNCATE` / `CREATE` / `ALTER` / `DROP`
raises a Flight `INVALID_ARGUMENT` whose message contains `SLayer Flight SQL endpoint
is read-only`. `BEGIN` / `COMMIT` / `ROLLBACK` / `START TRANSACTION` / `SET ...` /
`SHOW ...` / `USE ...` / `RESET ...` succeed as no-ops (empty result, no side effects).

## Error Taxonomy

Translator errors → Flight `INVALID_ARGUMENT`. Auth failures → `UNAUTHENTICATED`.
Unhandled commands → `INVALID_ARGUMENT`. Engine errors propagate as the underlying
gRPC status.

## Wire-Format Schema (Phase 1)

The wire schema for a `SELECT ... FROM <flight-table>` is derived from the
**catalog-declared** `DataType` of each projected item (`Column.type` for dimensions,
`ModelMeasure.type` for measures). A `LIMIT 0` is still executed for engine-side query
validation, but `SlayerResponse.attributes` does not yet expose per-column Arrow types
so the catalog-declared types are the wire-schema source. Phase 2 will tighten this to
a real `LIMIT 0`-derived schema.

If a `ModelMeasure` has an incorrect or absent declared `type`, the wire-schema /
data-row type mismatch surfaces as `ArrowTypeError`. Set `ModelMeasure.type` on custom
formulas that surface over Flight SQL.

## Unobserved Commands

The Apache JDBC driver did not exercise these commands during the Phase 1.0 wire
capture; SLayer implements them with well-typed empty (or canned) responses for
compatibility:

* `CommandStatementQuery` `[unobserved]` (driver uses prepared statements instead)
* `CommandGetSqlInfo` `[unobserved]` (catalog introspection goes through other RPCs)
* `CommandGetXdbcTypeInfo` `[unobserved]` — stub returns 6 entries
* `CommandPreparedStatementQuery` round-trips were partially captured against the
Phase 1.0 capture-stub; the production handlers fill in the rest
* `ActionClosePreparedStatementRequest` is a no-op (stateless handle = SQL bytes)
Loading
Loading