PredQL

PredQL (Predictive Query Language) is a Python framework for writing compact, expressive predictive queries over relational data, especially for Relational Deep Learning.

It lets you write shorter, more expressive queries by abstracting temporal joins and complex aggregations.

🧠 Features

🎯 ANTLR-based Parser
- Lexer and parser for PredQL syntax
🌳 Structured parse-tree visitor
- Converts parsed queries into normalized dictionaries with source positions.
🔍 Semantic validation
- Schema-aware query validation with error reporting.
🔀 Two converters
- 📌 SConverter for static prediction queries.
- ⏰ TConverter for temporal prediction queries with timestamp windows.
⚙️ Dual output mode
- execute=False returns generated SQL.
- execute=True executes SQL and returns a Table object.

⚙️ Installation

Install PredQL via pip:

pip install predql

🚀 Quickstart

1. Build your database as RelBench `Database` object or use simplified PredQL version

# path to classes
from predql.base import Database, Table

2. Static query with `SConverter`

from predql.converter import SConverter

converter = SConverter(db)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, label)
table = converter.convert(predql_query, execute=True)

3. Temporal query with `TConverter`

import pandas as pd
from predql.converter import TConverter

timestamps = pd.Series(...) # define timestamps for which prediction must be made
converter = TConverter(db, timestamps)

# also, it is possible to update prediction timestamps later without recreating converter
converter.set_timestamps(new_timestamps)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2, 0, 91, DAYS)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, timestamp, label)
table = converter.convert(predql_query, execute=True)

📐 Query Language

📌 Static query design

PREDICT <aggregation | expression | table.column> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key>
[WHERE <static_condition | static_nested_expression>];

⏰ Temporal query shape

PREDICT <aggregation | temporal_expression> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key> [WHERE <static_condition | static_nested_expression>]
[ASSUMING <temporal_condition | temporal_nested_expression>]
[WHERE <temporal_condition | temporal_nested_expression>];

🧮 Aggregations

Function	Meaning	Condition-Compatible
`AVG`	average	✅
`MAX`	maximum	✅
`MIN`	minimum	✅
`SUM`	sum	✅
`COUNT`	non-null count	✅
`COUNT_DISTINCT`	distinct count	✅
`FIRST`	earliest value by time	✅
`LAST`	latest value by time	✅
`LIST_DISTINCT`	list of distinct values	❌

🧭 Temporal window rules

Window format: <start>, <end>, <measure_unit>.
Supported units: YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS.
Window semantics are half-open: (start, end].
PREDICT/WHERE: start and end must be non-negative.
ASSUMING: start and end must be non-positive.
start must be strictly less than end.

🏗️ Architecture

PredQL Query String
    ↓
[Lexer] -> Tokens
    ↓
[Parser] -> Parse Tree
    ↓
[Visitor] -> Structured Dictionary
    ↓
[Validator] -> Semantic Checks
    ↓
[Converter] -> SQL Query
    ↓ (optional execute=True)
[DuckDB] -> Result Table

🔧 Development

Install uv

macOS & Linux

wget -qO- https://astral.sh/uv/install.sh | sh

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install dependencies

uv sync --all-extras

Regenerate parser files

If you modify lexer or parser grammar files (*.g4), regenerate ANTLR outputs from the repo root:

./regenerate_parser.sh

Run tests

pytest

Run linter

ruff check .

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
experiments/task_generation		experiments/task_generation
predql		predql
showcase		showcase
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
regenerate_parser.sh		regenerate_parser.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PredQL

🧠 Features

⚙️ Installation

🚀 Quickstart

1. Build your database as RelBench `Database` object or use simplified PredQL version

2. Static query with `SConverter`

3. Temporal query with `TConverter`

📐 Query Language

📌 Static query design

⏰ Temporal query shape

🧮 Aggregations

🧭 Temporal window rules

🏗️ Architecture

🔧 Development

Install uv

Install dependencies

Regenerate parser files

Run tests

Run linter

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PredQL

🧠 Features

⚙️ Installation

🚀 Quickstart

1. Build your database as RelBench Database object or use simplified PredQL version

2. Static query with SConverter

3. Temporal query with TConverter

📐 Query Language

📌 Static query design

⏰ Temporal query shape

🧮 Aggregations

🧭 Temporal window rules

🏗️ Architecture

🔧 Development

Install uv

Install dependencies

Regenerate parser files

Run tests

Run linter

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Build your database as RelBench `Database` object or use simplified PredQL version

2. Static query with `SConverter`

3. Temporal query with `TConverter`

Packages