Skip to content

polarityio/databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databricks SQL — Polarity Integration

Query Databricks SQL Warehouses directly from the Polarity overlay. Executes an admin-configured SQL statement with {{entity}} substitution for each searched indicator and surfaces results as structured key-value rows with Fields, Table, and JSON tab views.


Features

  • Universal entity support — IPv4, IPv6, MD5, SHA1, SHA256, email, domain, CVE, URL, MAC, and admin-defined custom types
  • Three tab views per result row — Fields (humanized key/value), Table (horizontal scroll), JSON (syntax-highlighted)
  • Summary bar tags — configurable column values surfaced in the collapsed Polarity bar, with optional field-name prefix
  • Bottleneck rate limiter — configurable maxConcurrent and minTime to stay within Databricks per-warehouse limits
  • Timeout safety — server-side wait_timeout on every statement; timed-out queries show a user-visible warning instead of a hard error
  • Polarity-assistant AI reducerreducers/details.json pipeline prunes and forwards row data to the AI assistant

Requirements

Requirement Notes
Polarity server v5.x or later
Node.js ≥ 18.0.0 (OpenSSL 3.x)
Databricks AWS, Azure, or GCP workspace
SQL Warehouse Classic or Serverless warehouse — must be running
PAT token User-level or service-principal Personal Access Token

Installation

Polarity server (recommended)

Upload the .tgz package via the Polarity admin UI under Integrations → Upload Integration.

Local development

cd ~/Documents/Polarity/databricks
npm install

Configuration Options

All options are admin-only unless noted.

Option Type Default Notes
Databricks Workspace Host text Full workspace URL, e.g. https://adb-1234567890123456.7.azuredatabricks.net. No trailing slash.
Personal Access Token (PAT) password Created in Databricks UI → User Settings → Developer → Access tokens.
SQL Warehouse ID text Found in SQL Warehouse settings → Connection Details.
SQL Query text Must contain {{entity}}. Example: SELECT * FROM logs.events WHERE ip = '{{entity}}' LIMIT 25
Query Wait Timeout number 50 Seconds (5–50). Queries exceeding this show a timeout warning.
Summary Fields text Comma-separated column names to surface as overlay summary tags, e.g. user_name,severity. Leave blank to show a Results: N count badge.
Include Field Name in Summary boolean true When enabled, tags display as field: value; when disabled, only the value is shown. (Users can view, admin sets)
Maximum Summary Tags number 4 Max tags before showing +N more. Set to 0 to show all.
Max Concurrent Queries number 5 Requires integration restart after change.
Min Time Between Queries (ms) number 200 Rate-limit throttle. Requires integration restart after change.

SQL Query Tips

Entity placeholder

Use {{entity}} (case-insensitive) as the parameter placeholder. The integration will replace it with the SQL-escaped indicator value before execution.

-- Good: single entity lookup with LIMIT
SELECT event_time, src_ip, user, action
FROM   catalog.schema.access_logs
WHERE  src_ip = '{{entity}}'
ORDER BY event_time DESC
LIMIT  50

Multiple columns in summary

Configure Summary Fields to the column names you want surfaced in the overlay bar:

user_name,action,severity

Parameterized queries vs string interpolation

The Databricks Statement Execution API used by this integration does not support parameterized bindings for all literal types in all SQL dialects. The {{entity}} value is SQL-escaped (single quotes are doubled) before interpolation. For maximum safety, wrap entity values in quotes in your SQL template:

WHERE ip_address = '{{entity}}'

Response Handling

API State Integration behaviour
SUCCEEDED Rows surfaced in overlay with Fields / Table / JSON tabs
CANCELED Timeout warning shown; no hard error
FAILED Logged at WARN level; overlay suppressed (no result)
PENDING / RUNNING Logged at WARN; overlay suppressed
HTTP 429 Logged + error returned; Bottleneck limiter protects against bursts
HTTP 401/403 Auth error surfaced in validateOptions

Packaging for Offline / Air-gapped Install

Run the included package:local script to produce a .tgz free of symlinks (avoids the Polarity extractor unsafe_symlink error):

cd ~/Documents/Polarity/databricks
npm run package:local
# Output: ../databricks-1.0.0.tgz

Verify no symlinks remain:

tar -tvzf ../databricks-1.0.0.tgz | grep "^l" && echo "❌ HAS SYMLINKS" || echo "✅ clean"

Upgrading

Change Version bump
Bug fix / CSS / data mapping 1.0.x patch
New entity type / new API endpoint 1.x.0 minor
Breaking config or API change x.0.0 major

Update the version in package.json in the same commit as the fix.


Troubleshooting

Overlay shows "No results found" but I expect data

  • Confirm the SQL query contains {{entity}} and returns rows when run manually in the Databricks SQL editor
  • Check that the warehouse is running (IDLE/RUNNING state in Databricks UI)
  • Review integration logs in Polarity for FAILED state messages

Overlay shows "Query timed out"

  • Increase the Query Wait Timeout option (up to 50 s)
  • Or optimize the query with appropriate indexes/partitions

validateOptions shows "Authentication failed (HTTP 401)"

  • The PAT may have expired; generate a new one in Databricks → User Settings → Access tokens

Overlay is blank / component fails to render

  • If upgrading from an older version, restart the Polarity platform to clear Ember's component cache

Linear Ticket

INT-1750 — New Integration: Databricks SQL — Admin-configurable SIEM-style query engine

Branch: danbi/int-1750-new-integration-databricks-sql-admin-configurable-siem-style


License

UNLICENSED — Polarity internal integration. Do not distribute.

About

Polarity integration for Databricks SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors