PredQL (Predictive Query Language) is a Python framework for writing compact, expressive predictive queries over relational data, especially for Relational Deep Learning.
It lets you write shorter, more expressive queries by abstracting temporal joins and complex aggregations.
-
๐ฏ ANTLR-based Parser
- Lexer and parser for PredQL syntax
-
๐ณ Structured parse-tree visitor
- Converts parsed queries into normalized dictionaries with source positions.
-
๐ Semantic validation
- Schema-aware query validation with error reporting.
-
๐ Two converters
- ๐
SConverterfor static prediction queries. - โฐ
TConverterfor temporal prediction queries with timestamp windows.
- ๐
-
โ๏ธ Dual output mode
execute=Falsereturns generated SQL.execute=Trueexecutes SQL and returns aTableobject.
Install PredQL via pip:
pip install predql1. Build your database as RelBench Database object or use simplified PredQL version
# path to classes
from predql.base import Database, Tablefrom predql.converter import SConverter
converter = SConverter(db)
predql_query = """
PREDICT COUNT_DISTINCT(votes.*
WHERE votes.votetypeid == 2)
FOR EACH posts.* WHERE posts.PostTypeId == 1
AND posts.OwnerUserId IS NOT NULL
AND posts.OwnerUserId != -1;
"""
# SQL only
sql_query = converter.convert(predql_query, execute=False)
# execute and get Table(fk, label)
table = converter.convert(predql_query, execute=True)import pandas as pd
from predql.converter import TConverter
timestamps = pd.Series(...) # define timestamps for which prediction must be made
converter = TConverter(db, timestamps)
# also, it is possible to update prediction timestamps later without recreating converter
converter.set_timestamps(new_timestamps)
predql_query = """
PREDICT COUNT_DISTINCT(votes.*
WHERE votes.votetypeid == 2, 0, 91, DAYS)
FOR EACH posts.* WHERE posts.PostTypeId == 1
AND posts.OwnerUserId IS NOT NULL
AND posts.OwnerUserId != -1;
"""
# SQL only
sql_query = converter.convert(predql_query, execute=False)
# execute and get Table(fk, timestamp, label)
table = converter.convert(predql_query, execute=True)PREDICT <aggregation | expression | table.column> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key>
[WHERE <static_condition | static_nested_expression>];PREDICT <aggregation | temporal_expression> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key> [WHERE <static_condition | static_nested_expression>]
[ASSUMING <temporal_condition | temporal_nested_expression>]
[WHERE <temporal_condition | temporal_nested_expression>];| Function | Meaning | Condition-Compatible |
|---|---|---|
AVG |
average | โ |
MAX |
maximum | โ |
MIN |
minimum | โ |
SUM |
sum | โ |
COUNT |
non-null count | โ |
COUNT_DISTINCT |
distinct count | โ |
FIRST |
earliest value by time | โ |
LAST |
latest value by time | โ |
LIST_DISTINCT |
list of distinct values | โ |
- Window format:
<start>, <end>, <measure_unit>. - Supported units:
YEARS,MONTHS,WEEKS,DAYS,HOURS,MINUTES,SECONDS. - Window semantics are half-open:
(start, end]. PREDICT/WHERE:startandendmust be non-negative.ASSUMING:startandendmust be non-positive.startmust be strictly less thanend.
PredQL Query String
โ
[Lexer] -> Tokens
โ
[Parser] -> Parse Tree
โ
[Visitor] -> Structured Dictionary
โ
[Validator] -> Semantic Checks
โ
[Converter] -> SQL Query
โ (optional execute=True)
[DuckDB] -> Result Table
- macOS & Linux
wget -qO- https://astral.sh/uv/install.sh | sh- Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"uv sync --all-extrasIf you modify lexer or parser grammar files (*.g4), regenerate ANTLR outputs from the repo root:
./regenerate_parser.shpytestruff check .