DEV-1390: Flight SQL facade for SLayer (dbt-SL JDBC compatible)#129
DEV-1390: Flight SQL facade for SLayer (dbt-SL JDBC compatible)#129ZmeiGorynych wants to merge 1 commit into
Conversation
…tegration tests, docs Phase 1 of the Arrow Flight SQL endpoint that is wire-compatible with the upstream Apache flight-sql-jdbc-driver (the same JAR the dbt Semantic Layer connectors use). Adds the slayer flight-serve CLI on port 5144, a full translator + handler chain backed by SLayer's existing engine, JayDeBeAPI + pyarrow-flight integration test suites (live wire-validated), and a recording capture-corpus refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdded a comprehensive Arrow Flight SQL endpoint implementation (DEV-1390 Phase 1) enabling JDBC and BI-tool connectivity to the Semantic Layer. The implementation includes the Flight SQL protocol schema, bearer-token authentication, catalog/metadata generation from semantic models, SQL-to-query translation, RPC handlers, server setup, and full end-to-end integration tests with both JDBC and gRPC clients. ChangesFlight SQL Server Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The changes are substantial and intricate: they span protocol definition, async-to-sync bridging, catalog generation with complex join semantics, SQL parsing and translation with strict GROUP BY/ORDER BY validation, request dispatch with multiple message types, authentication middleware, and dual-path integration testing (JDBC and gRPC). The density of semantic logic and multi-layer interactions across handlers, translation, and catalog generation requires careful reasoning about correctness and round-tripping behavior, though the patterns are consistent and well-structured. Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
docs/getting-started/flight-sql.md (1)
47-51: 💤 Low valueConsider adding language specifiers to fenced code blocks.
The markdown linter flags these configuration blocks as missing language specifiers. While they're plaintext configuration examples (not strict code), you could add
```textto satisfy the linter and maintain consistency with markdown best practices.📝 Optional: Add text language specifier
For example, line 47-51:
-``` +```text Host: <slayer-host> Port: 5144Apply similarly to the other configuration blocks at lines 57, 68, 78, 87, and 97.
Also applies to: 57-60, 68-72, 78-82, 87-91, 97-100
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/getting-started/flight-sql.md` around lines 47 - 51, Update the fenced configuration blocks by adding a language specifier (use text) to each opening triple-backtick so the markdown linter stops flagging them; specifically update the blocks containing the example lines starting with "Host: <slayer-host>" and the other similar blocks that include "Port: 5144", "Service token: <slayer --token value>", and the subsequent configuration examples (the blocks currently at the locations shown in the review) to begin with ```text instead of just ```, leaving the block contents unchanged.tests/integration/conftest.py (1)
21-125: 💤 Low valueConsider extracting shared fixtures to a common module.
The fixtures
jdbc_jar,_format_flight_jdbc_url,_ensure_jvm_started_for_arrow, andjaydebeapi_connectare duplicated betweentests/flight/conftest.pyand this file. While the duplication is documented as intentional (line 38-40), consider extracting these to a shared module liketests/flight/fixtures_common.pythat both conftest files can import from. This would make maintenance easier and ensure consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/conftest.py` around lines 21 - 125, The fixtures jdbc_jar, _format_flight_jdbc_url, _ensure_jvm_started_for_arrow, and jaydebeapi_connect are duplicated; extract them into a shared module (e.g., tests/flight/fixtures_common.py) and import them from both tests/flight/conftest.py and tests/integration/conftest.py; specifically move the implementations of jdbc_jar, _format_flight_jdbc_url, _ensure_jvm_started_for_arrow, and jaydebeapi_connect into the shared file, export them as fixtures (or plain helper functions where appropriate), then replace the local definitions in both conftest files with imports that expose the same fixture names so tests continue to reference jdbc_jar, flight_jdbc_url (aliasing _format_flight_jdbc_url if needed), and jaydebeapi_connect unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/flight/auth.py`:
- Around line 150-153: The no-auth branch currently returns
_BearerTokenMiddleware(...) unconditionally when self._expected is None; update
this to enforce the documented loopback-only rule by checking
_peer_is_loopback(info.peer) before accepting unauthenticated requests.
Concretely, in the method that constructs/returns the middleware (the branch
that tests self._expected is None), call _peer_is_loopback(info.peer) and only
return _BearerTokenMiddleware(environment_id=environment_id) if that check
passes; otherwise reject/raise or return a middleware that denies non-loopback
peers (consistent with existing error handling). Ensure you reference the same
info.peer, _peer_is_loopback, self._expected, and _BearerTokenMiddleware symbols
so the change matches surrounding logic.
In `@slayer/flight/info_schema.py`:
- Around line 70-72: The function match_info_schema is a public API with
multiple parameters but currently allows positional args; change its signature
to enforce keyword-only parameters by adding a leading * (e.g., def
match_info_schema(*, parsed: exp.Expression, catalog: FlightCatalog) ->
Optional[pa.Table]) and then update all call sites (notably in
slayer/flight/translator.py) to pass parsed=... and catalog=... accordingly so
callers use keyword arguments.
In `@tests/flight/capture_dbt_jdbc.py`:
- Around line 157-160: The test uses the wrong cross-model dimension syntax:
update the SQL passed to run_select inside the _try call (the lambda in the
"_try(\"cross-model dim SELECT\", lambda: run_select(...))" invocation) to use
dotted form for cross-model fields (customers.regions.name) instead of the
double-underscore form (customers__regions__name); keep the rest of the query
(revenue_sum) unchanged so the test aligns with the Flight SQL dotted
convention.
In `@tests/flight/test_translator.py`:
- Around line 71-72: Change positional calls to translate(...) to use explicit
keyword arguments: pass the SQL string as sql= and the catalog as catalog=
(e.g., replace translate("SELECT 1", _catalog()) with translate(sql="SELECT 1",
catalog=_catalog())). Update all occurrences in this test (the calls around the
shown diff and the additional calls noted at lines 77-79 and 95-96) to follow
the same pattern so every translate invocation with more than one parameter uses
named keywords.
In `@tests/integration/test_integration_flight_pyarrow_client.py`:
- Around line 44-47: The _client function currently accepts positional host and
port and an unused token; change its signature to require keyword-only
parameters (e.g., def _client(*, host: str, port: int) -> fl.FlightClient) and
remove the unused token parameter so auth is clearly handled per-RPC via
_bearer_options; update any callers to pass host=... and port=... (and similarly
make the related helper at the other occurrence keyword-only) so multi-argument
calls follow the repo rule and the function no longer implies implicit auth.
---
Nitpick comments:
In `@docs/getting-started/flight-sql.md`:
- Around line 47-51: Update the fenced configuration blocks by adding a language
specifier (use text) to each opening triple-backtick so the markdown linter
stops flagging them; specifically update the blocks containing the example lines
starting with "Host: <slayer-host>" and the other similar blocks that include
"Port: 5144", "Service token: <slayer --token value>", and the subsequent
configuration examples (the blocks currently at the locations shown in the
review) to begin with ```text instead of just ```, leaving the block contents
unchanged.
In `@tests/integration/conftest.py`:
- Around line 21-125: The fixtures jdbc_jar, _format_flight_jdbc_url,
_ensure_jvm_started_for_arrow, and jaydebeapi_connect are duplicated; extract
them into a shared module (e.g., tests/flight/fixtures_common.py) and import
them from both tests/flight/conftest.py and tests/integration/conftest.py;
specifically move the implementations of jdbc_jar, _format_flight_jdbc_url,
_ensure_jvm_started_for_arrow, and jaydebeapi_connect into the shared file,
export them as fixtures (or plain helper functions where appropriate), then
replace the local definitions in both conftest files with imports that expose
the same fixture names so tests continue to reference jdbc_jar, flight_jdbc_url
(aliasing _format_flight_jdbc_url if needed), and jaydebeapi_connect unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7f7f13fa-151b-4be3-9da9-0f0101568072
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (36)
.gitignoreCLAUDE.mdREADME.mddocs/getting-started/flight-sql.mddocs/interfaces/flight-sql.mdpyproject.tomlslayer/cli.pyslayer/flight/FlightSql.protoslayer/flight/__init__.pyslayer/flight/_capture_stub.pyslayer/flight/_flight_sql_pb2.pyslayer/flight/auth.pyslayer/flight/catalog.pyslayer/flight/cli.pyslayer/flight/handlers.pyslayer/flight/info_schema.pyslayer/flight/probe_queries.pyslayer/flight/server.pyslayer/flight/translator.pyslayer/flight/types.pyspecs/DEV-1390-RESUME.mdtests/flight/__init__.pytests/flight/capture_dbt_jdbc.pytests/flight/conftest.pytests/flight/fixtures/CAPTURE-FINDINGS.mdtests/flight/fixtures/capture-latest.jsonltests/flight/test_auth.pytests/flight/test_catalog.pytests/flight/test_handlers.pytests/flight/test_info_schema.pytests/flight/test_probe_queries.pytests/flight/test_translator.pytests/flight/test_types.pytests/integration/conftest.pytests/integration/test_integration_flight.pytests/integration/test_integration_flight_pyarrow_client.py
👮 Files not reviewed due to content moderation or server errors (8)
- tests/integration/test_integration_flight.py
- tests/flight/test_catalog.py
- slayer/flight/_flight_sql_pb2.py
- slayer/flight/_capture_stub.py
- slayer/flight/server.py
- tests/flight/test_handlers.py
- slayer/flight/translator.py
- slayer/flight/FlightSql.proto
| if self._expected is None: | ||
| # No-auth mode: loopback fallback. Server startup already rejects | ||
| # non-loopback without a token, but recheck here. | ||
| return _BearerTokenMiddleware(environment_id=environment_id) |
There was a problem hiding this comment.
Missing peer validation in no-auth mode.
The docstring (lines 114-118) states that when no token is configured, "requests from loopback peers are accepted unauthenticated; non-loopback peers are rejected" as a "belt-and-braces" defense. However, the code unconditionally returns the middleware without checking _peer_is_loopback(info.peer).
While validate_bind_address provides startup protection, the middleware should enforce the peer check to match the documented behavior and provide defense-in-depth.
🔒 Proposed fix to add peer validation
if self._expected is None:
# No-auth mode: loopback fallback. Server startup already rejects
# non-loopback without a token, but recheck here.
+ if not _peer_is_loopback(info.peer):
+ raise fl.FlightUnauthenticatedError(
+ "No token configured; only loopback peers accepted"
+ )
return _BearerTokenMiddleware(environment_id=environment_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@slayer/flight/auth.py` around lines 150 - 153, The no-auth branch currently
returns _BearerTokenMiddleware(...) unconditionally when self._expected is None;
update this to enforce the documented loopback-only rule by checking
_peer_is_loopback(info.peer) before accepting unauthenticated requests.
Concretely, in the method that constructs/returns the middleware (the branch
that tests self._expected is None), call _peer_is_loopback(info.peer) and only
return _BearerTokenMiddleware(environment_id=environment_id) if that check
passes; otherwise reject/raise or return a middleware that denies non-loopback
peers (consistent with existing error handling). Ensure you reference the same
info.peer, _peer_is_loopback, self._expected, and _BearerTokenMiddleware symbols
so the change matches surrounding logic.
| def match_info_schema( | ||
| parsed: exp.Expression, catalog: FlightCatalog, | ||
| ) -> Optional[pa.Table]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Use keyword-only parameters for public API.
Per coding guidelines, functions with more than 1 parameter should use keyword arguments. Since match_info_schema is exported in __all__ and called from other modules, it should enforce keyword-only parameters for clarity.
♻️ Proposed fix
def match_info_schema(
- parsed: exp.Expression, catalog: FlightCatalog,
+ *, parsed: exp.Expression, catalog: FlightCatalog,
) -> Optional[pa.Table]:And update call sites in slayer/flight/translator.py to use keyword arguments:
match_info_schema(parsed=..., catalog=...)As per coding guidelines: "Use keyword arguments for functions with more than 1 parameter"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@slayer/flight/info_schema.py` around lines 70 - 72, The function
match_info_schema is a public API with multiple parameters but currently allows
positional args; change its signature to enforce keyword-only parameters by
adding a leading * (e.g., def match_info_schema(*, parsed: exp.Expression,
catalog: FlightCatalog) -> Optional[pa.Table]) and then update all call sites
(notably in slayer/flight/translator.py) to pass parsed=... and catalog=...
accordingly so callers use keyword arguments.
| _try("cross-model dim SELECT", | ||
| lambda: run_select( | ||
| "SELECT customers__regions__name, revenue_sum FROM orders" | ||
| )) |
There was a problem hiding this comment.
Use dotted form for cross-model dimensions.
Line 159 uses customers__regions__name but the locked design decision (#12 in DEV-1390-RESUME.md) and coding guidelines specify that Flight SQL uses dotted form end-to-end: customers.regions.name. Update the query to align with the implementation.
📝 Proposed fix
_try("cross-model dim SELECT",
lambda: run_select(
- "SELECT customers__regions__name, revenue_sum FROM orders"
+ "SELECT customers.regions.name, revenue_sum FROM orders"
))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _try("cross-model dim SELECT", | |
| lambda: run_select( | |
| "SELECT customers__regions__name, revenue_sum FROM orders" | |
| )) | |
| _try("cross-model dim SELECT", | |
| lambda: run_select( | |
| "SELECT customers.regions.name, revenue_sum FROM orders" | |
| )) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/flight/capture_dbt_jdbc.py` around lines 157 - 160, The test uses the
wrong cross-model dimension syntax: update the SQL passed to run_select inside
the _try call (the lambda in the "_try(\"cross-model dim SELECT\", lambda:
run_select(...))" invocation) to use dotted form for cross-model fields
(customers.regions.name) instead of the double-underscore form
(customers__regions__name); keep the rest of the query (revenue_sum) unchanged
so the test aligns with the Flight SQL dotted convention.
| result = translate("SELECT 1", _catalog()) | ||
| assert isinstance(result, ProbeResult) |
There was a problem hiding this comment.
Use keyword arguments for translate(...) calls with multiple parameters.
This file consistently calls translate positionally with 2 arguments; please switch these to explicit keywords throughout for guideline compliance and readability.
As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.
Also applies to: 77-79, 95-96
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/flight/test_translator.py` around lines 71 - 72, Change positional
calls to translate(...) to use explicit keyword arguments: pass the SQL string
as sql= and the catalog as catalog= (e.g., replace translate("SELECT 1",
_catalog()) with translate(sql="SELECT 1", catalog=_catalog())). Update all
occurrences in this test (the calls around the shown diff and the additional
calls noted at lines 77-79 and 95-96) to follow the same pattern so every
translate invocation with more than one parameter uses named keywords.
| def _client(host: str, port: int, *, token: str | None = None) -> fl.FlightClient: | ||
| """Construct a pyarrow Flight client, optionally with a bearer token header.""" | ||
| return fl.FlightClient(f"grpc://{host}:{port}") | ||
|
|
There was a problem hiding this comment.
Make _client explicit and keyword-only for multi-arg calls.
token in _client is unused and makes auth behavior look implicit, while auth is actually per-RPC via _bearer_options. Also, multi-parameter calls should be keyword-based per repo rules.
♻️ Proposed fix
-def _client(host: str, port: int, *, token: str | None = None) -> fl.FlightClient:
+def _client(*, host: str, port: int) -> fl.FlightClient:
"""Construct a pyarrow Flight client, optionally with a bearer token header."""
return fl.FlightClient(f"grpc://{host}:{port}")
- client = _client(host, port)
+ client = _client(host=host, port=port)As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.
Also applies to: 64-67
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/test_integration_flight_pyarrow_client.py` around lines 44
- 47, The _client function currently accepts positional host and port and an
unused token; change its signature to require keyword-only parameters (e.g., def
_client(*, host: str, port: int) -> fl.FlightClient) and remove the unused token
parameter so auth is clearly handled per-RPC via _bearer_options; update any
callers to pass host=... and port=... (and similarly make the related helper at
the other occurrence keyword-only) so multi-argument calls follow the repo rule
and the function no longer implies implicit auth.



Summary
flight-sql-jdbc-driverv18.3.0 — the same JAR the dbt Semantic Layer connectors (Power BI, Sigma, Looker, Tableau, Hex, DBeaver) use. New CLI:slayer flight-serve.SlayerQuery), handlers, server, bearer-token auth, TLS — lives underslayer/flight/. The server is stateless: prepared-statement handles ARE the UTF-8 SQL bytes, soCloseis a no-op.tests/integration/test_integration_flight.py(17 JayDeBeAPI tests, 1 xfail for the known JDBC-handshake gap) andtest_integration_flight_pyarrow_client.py(21 pyarrow-flight tests, no JDK required). Demo-server fixture shared viatests/integration/conftest.py.docs/interfaces/flight-sql.md(protocol reference) +docs/getting-started/flight-sql.md(per-tool connect recipes) + Flight SQL section inCLAUDE.md+ README mention.Notable design decisions
Any-wrapping: the Apache JDBC driver wraps everydo_actionbody ingoogle.protobuf.Anyand requires the response to be Any-wrapped too; pyarrow-flight's Python client sends raw bytes.server._parse_action_bodyaccepts both shapes;handlers.handle_create_prepared_statementalways sends anAny-wrapped result. Hard-won lesson from the wire-capture trace.customers.regions.nameinINFORMATION_SCHEMA.*, the BI-tool projection,WHERE, and the SLayer DSL — no__→.rewrite step in the translator.Column.type/ModelMeasure.type. ALIMIT 0still runs for engine-side validation; a Phase 2 follow-up will tighten this to LIMIT-0-derived Arrow types.token=auth is Phase 2: the Apache driver pre-handshakes the bearer token; SLayer's middleware does per-RPC header validation.test_auth_positiveisxfail(strict=True)so a future handshake implementation auto-promotes to PASSED. The pyarrow Python client works today (per-callAuthorizationheaders).--add-openson Java 17+: the test fixture pre-starts JPype with--add-opens=java.base/java.nio=ALL-UNNAMED(plusjava.langandjava.util) for Arrow's reflective memory access. Documented in CLAUDE.md and in the DBeaver section of the getting-started guide.Test plan
poetry run ruff check slayer/ tests/— cleanpoetry run pytest tests/flight/— 173 unit tests passpoetry run pytest -m "not integration"— full unit suite (2395 passed)poetry run pytest tests/integration/test_integration_flight.py -m integration— 17 passed + 1 xfailed (JDBC handshake)poetry run pytest tests/integration/test_integration_flight_pyarrow_client.py -m integration— 21 passedslayer flight-serve --demo+ the prepared-statement recipe inspecs/DEV-1390-RESUME.md§5🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
slayer flight-serveCLI subcommand to run the Flight SQL server with bearer-token authentication.Documentation
Chores
pyarrowdependency and dev dependencies for Flight SQL integration testing.