Skip to content

kolesole/PredQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

57 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PredQL

PredQL (Predictive Query Language) is a Python framework for writing compact, expressive predictive queries over relational data, especially for Relational Deep Learning.

It lets you write shorter, more expressive queries by abstracting temporal joins and complex aggregations.

๐Ÿง  Features

  • ๐ŸŽฏ ANTLR-based Parser

    • Lexer and parser for PredQL syntax
  • ๐ŸŒณ Structured parse-tree visitor

    • Converts parsed queries into normalized dictionaries with source positions.
  • ๐Ÿ” Semantic validation

    • Schema-aware query validation with error reporting.
  • ๐Ÿ”€ Two converters

    • ๐Ÿ“Œ SConverter for static prediction queries.
    • โฐ TConverter for temporal prediction queries with timestamp windows.
  • โš™๏ธ Dual output mode

    • execute=False returns generated SQL.
    • execute=True executes SQL and returns a Table object.

โš™๏ธ Installation

Install PredQL via pip:

pip install predql

๐Ÿš€ Quickstart

1. Build your database as RelBench Database object or use simplified PredQL version

# path to classes
from predql.base import Database, Table

2. Static query with SConverter

from predql.converter import SConverter

converter = SConverter(db)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, label)
table = converter.convert(predql_query, execute=True)

3. Temporal query with TConverter

import pandas as pd
from predql.converter import TConverter

timestamps = pd.Series(...) # define timestamps for which prediction must be made
converter = TConverter(db, timestamps)

# also, it is possible to update prediction timestamps later without recreating converter
converter.set_timestamps(new_timestamps)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2, 0, 91, DAYS)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, timestamp, label)
table = converter.convert(predql_query, execute=True)

๐Ÿ“ Query Language

๐Ÿ“Œ Static query design

PREDICT <aggregation | expression | table.column> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key>
[WHERE <static_condition | static_nested_expression>];

โฐ Temporal query shape

PREDICT <aggregation | temporal_expression> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key> [WHERE <static_condition | static_nested_expression>]
[ASSUMING <temporal_condition | temporal_nested_expression>]
[WHERE <temporal_condition | temporal_nested_expression>];

๐Ÿงฎ Aggregations

Function Meaning Condition-Compatible
AVG average โœ…
MAX maximum โœ…
MIN minimum โœ…
SUM sum โœ…
COUNT non-null count โœ…
COUNT_DISTINCT distinct count โœ…
FIRST earliest value by time โœ…
LAST latest value by time โœ…
LIST_DISTINCT list of distinct values โŒ

๐Ÿงญ Temporal window rules

  • Window format: <start>, <end>, <measure_unit>.
  • Supported units: YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS.
  • Window semantics are half-open: (start, end].
  • PREDICT/WHERE: start and end must be non-negative.
  • ASSUMING: start and end must be non-positive.
  • start must be strictly less than end.

๐Ÿ—๏ธ Architecture

PredQL Query String
    โ†“
[Lexer] -> Tokens
    โ†“
[Parser] -> Parse Tree
    โ†“
[Visitor] -> Structured Dictionary
    โ†“
[Validator] -> Semantic Checks
    โ†“
[Converter] -> SQL Query
    โ†“ (optional execute=True)
[DuckDB] -> Result Table

๐Ÿ”ง Development

Install uv

  • macOS & Linux
wget -qO- https://astral.sh/uv/install.sh | sh
  • Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install dependencies

uv sync --all-extras

Regenerate parser files

If you modify lexer or parser grammar files (*.g4), regenerate ANTLR outputs from the repo root:

./regenerate_parser.sh

Run tests

pytest

Run linter

ruff check .

About

PredQL is a Python framework for task generation in Relational Deep Learning. It provides a predictive query language to simplify task generation when working with complex static/temporal relational data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors