Skip to content

Edwardvaneechoud/polars_expr_transformer

Repository files navigation

Polars Expression Transformer

PyPI version Python 3.10+ License: MIT

Transform string-based expressions into Polars DataFrame operations. Write simple, SQL-like expressions and let the library convert them to optimized Polars code.

Playground & docs

There is an interactive playground and function reference that runs the library in the browser through Pyodide. You can try expressions on sample data and see the generated Polars and FlowFrame code without installing anything.

The site lives in docs/; the function reference is generated from the docstrings with python generate_docs.py.

Quick Start

import polars as pl
from polars_expr_transformer import simple_function_to_expr

df = pl.DataFrame({
    'first_name': ['John', 'Jane', 'Bob'],
    'last_name': ['Doe', 'Smith', 'Johnson'],
    'age': [30, 25, 45],
    'salary': [50000, 60000, 75000]
})

# Concatenate columns
df.select(simple_function_to_expr('concat([first_name], " ", [last_name])').alias('full_name'))

# Conditional logic
df.select(simple_function_to_expr('if [age] > 30 then "Senior" else "Junior" endif').alias('level'))

# Math operations
df.select(simple_function_to_expr('[salary] * 1.1').alias('new_salary'))

# Combine multiple operations
df.select(simple_function_to_expr('uppercase(left([last_name], 3))').alias('code'))

Installation

pip install polars-expr-transformer

Why Use This Library?

Use Case Recommendation
Building applications with user-defined transformations Yes - Users can write expressions without Python knowledge
SQL/Tableau users transitioning to Polars Yes - Familiar syntax
Need a simple expression language for configs Yes - Easy to serialize and store
Writing performance-critical Polars code No - Use Polars directly
Need all Polars features No - This covers common operations only

Expression Syntax

Column References

Reference DataFrame columns using square brackets:

'[column_name]'           # Reference a column
'[Column With Spaces]'    # Columns with spaces work too

Values & Data Types

Besides column references, you can write literal values directly. Five literal types are supported:

Type How to write it Examples
String Single or double quotes "hello", 'world'
Integer Bare whole numbers (negatives allowed) 42, -7
Float Bare decimal numbers 3.14, -0.5
Boolean true or false (case-insensitive) true, False
Null null (case-insensitive) — the missing value null
'if [active] = true then "yes" else "no" endif'   # boolean literal
'[price] * 1.1'                                     # float literal
'coalesce([nickname], null)'                        # null literal

Operators

Operator Description Example
+ Addition [a] + [b]
- Subtraction [a] - 10
* Multiplication [price] * [quantity]
/ Division [total] / [count]
% Modulo [value] % 2
= or == Equals [status] = "active"
!= Not equals [type] != "deleted"
>, >=, <, <= Comparisons [age] >= 18
and Logical AND [a] > 0 and [b] > 0
or Logical OR [x] = 1 or [y] = 1

Conditional Expressions

# Simple if-then-else
'if [age] >= 18 then "Adult" else "Minor" endif'

# Multiple conditions with elseif
'if [score] >= 90 then "A" elseif [score] >= 80 then "B" elseif [score] >= 70 then "C" else "F" endif'

# Nested conditions
'if [type] = "A" then (if [value] > 100 then "High A" else "Low A" endif) else "Other" endif'

Comments

# Single-line comments with //
'[column] + 1 // This adds one to the column'

# Multi-line expressions with comments
'''
[price] * [quantity]  // Calculate subtotal
- [discount]          // Apply discount
'''

Available Functions

String Functions

Function Description Example
concat(a, b, ...) Concatenate strings concat([first], " ", [last])
length(text) String length length([name])
uppercase(text) Convert to uppercase uppercase([code])
lowercase(text) Convert to lowercase lowercase([email])
titlecase(text) Convert to title case titlecase([name])
left(text, n) First n characters left([phone], 3)
right(text, n) Last n characters right([id], 4)
mid(text, start, len) Substring from position mid([code], 2, 3)
substring(text, start, len) Alias for mid substring([text], 0, 10)
trim(text) Remove leading/trailing spaces trim([input])
left_trim(text) Remove leading spaces left_trim([text])
right_trim(text) Remove trailing spaces right_trim([text])
replace(text, find, replace) Replace text replace([name], ".", "")
find_position(text, search) Find substring position find_position([text], "@")
pad_left(text, len, char) Pad string on left pad_left([id], 5, "0")
pad_right(text, len, char) Pad string on right pad_right([code], 10, " ")
starts_with(text, prefix) Check prefix starts_with([url], "https")
ends_with(text, suffix) Check suffix ends_with([file], ".csv")
reverse(text) Reverse string reverse([text])
repeat(text, n) Repeat string n times repeat("*", 5)
split(text, delimiter) Split into list split([tags], ",")
count_match(text, pattern) Count occurrences count_match([text], "a")
string_similarity(a, b, method) Similarity score (0-1) string_similarity([a], [b], "levenshtein")

Math Functions

Function Description Example
abs(n) Absolute value abs([difference])
round(n, decimals) Round to decimals round([price], 2)
ceil(n) Round up ceil([value])
floor(n) Round down floor([value])
power(base, exp) Exponentiation power([x], 2)
pow(base, exp) Alias for power pow(2, [n])
sqrt(n) Square root sqrt([area])
log(n) Natural logarithm log([value])
log10(n) Base-10 logarithm log10([value])
log2(n) Base-2 logarithm log2([value])
exp(n) e^n exp([rate])
mod(a, b) Modulo mod([value], 10)
sign(n) Sign (-1, 0, 1) sign([change])
negation(n) Negate value negation([amount])
sin(n), cos(n), tan(n) Trigonometric sin([angle])
asin(n), acos(n), atan(n) Inverse trig asin([ratio])
tanh(n) Hyperbolic tangent tanh([x])
random_int(min, max) Random integer random_int(1, 100)

Date Functions

Function Description Example
now() Current datetime now()
today() Current date today()
year(date) Extract year year([created_at])
month(date) Extract month (1-12) month([date])
day(date) Extract day (1-31) day([date])
hour(datetime) Extract hour (0-23) hour([timestamp])
minute(datetime) Extract minute minute([time])
second(datetime) Extract second second([time])
week(date) ISO week number (1-53) week([date])
weekday(date) Day of week (1=Mon, 7=Sun) weekday([date])
dayofweek(date) Alias for weekday dayofweek([date])
quarter(date) Quarter (1-4) quarter([date])
dayofyear(date) Day of year (1-366) dayofyear([date])
add_days(date, n) Add days add_days([start], 30)
add_weeks(date, n) Add weeks add_weeks([date], 2)
add_months(date, n) Add months add_months([date], 6)
add_years(date, n) Add years add_years([birth], 18)
add_hours(dt, n) Add hours add_hours([time], 3)
add_minutes(dt, n) Add minutes add_minutes([time], 30)
add_seconds(dt, n) Add seconds add_seconds([time], 60)
date_diff_days(a, b) Days between dates date_diff_days([end], [start])
datetime_diff_seconds(a, b) Seconds between datetime_diff_seconds([a], [b])
format_date(date, fmt) Format as string format_date([date], "%Y-%m-%d")
start_of_month(date) First of month start_of_month([date])
end_of_month(date) Last of month end_of_month([date])
date_truncate(date, unit) Truncate to unit date_truncate([dt], "1day")

Logic & Null Handling

Function Description Example
equals(a, b) Check equality equals([status], "active")
does_not_equal(a, b) Check inequality does_not_equal([type], "deleted")
is_empty(value) Check if null is_empty([email])
is_not_empty(value) Check if not null is_not_empty([phone])
coalesce(a, b, ...) First non-null coalesce([nickname], [name], "Unknown")
ifnull(value, default) Replace null ifnull([count], 0)
nvl(value, default) Alias for ifnull nvl([value], 0)
nullif(a, b) Null if equal nullif([value], 0)
between(val, min, max) Range check (inclusive) between([age], 18, 65)
greatest(a, b, ...) Maximum value greatest([a], [b], [c])
least(a, b, ...) Minimum value least([price1], [price2])
contains(text, search) Contains substring contains([desc], "sale")
_in(value, text) Value in text _in("admin", [roles])
_not(value) Logical NOT _not([is_deleted])
is_string(value) Type check is_string([field])

Type Conversions

Function Description Example
to_string(value) Convert to string to_string([id])
to_integer(value) Convert to integer to_integer([count])
to_float(value) Convert to float to_float([price])
to_number(value) Alias for to_float to_number([value])
to_boolean(value) Convert to boolean to_boolean([flag])
to_date(text, format) Parse date to_date([date_str], "%Y-%m-%d")
to_datetime(text, format) Parse datetime to_datetime([ts], "%Y-%m-%d %H:%M:%S")
to_decimal(value, precision) Convert with precision to_decimal([amount], 2)

API Reference

simple_function_to_expr(expression: str) -> pl.Expr

Converts a string expression to a Polars expression.

from polars_expr_transformer import simple_function_to_expr

expr = simple_function_to_expr('[price] * [quantity]')
df.select(expr.alias('total'))

build_func(expression: str) -> Func

Returns the intermediate function object for inspection/debugging.

from polars_expr_transformer import build_func

func = build_func('concat([a], [b])')
print(func.get_readable_pl_function())  # See the Polars translation

get_all_expressions() -> List[str]

Returns a list of all available function names.

from polars_expr_transformer import get_all_expressions

functions = get_all_expressions()
print(functions)  # ['concat', 'length', 'uppercase', ...]

get_expression_overview() -> List[ExpressionsOverview]

Returns functions grouped by category with descriptions.

from polars_expr_transformer import get_expression_overview

for category in get_expression_overview():
    print(f"\n{category.category}:")
    for expr in category.expressions:
        print(f"  {expr.name}: {expr.description}")

Error Handling

The library validates expressions before parsing and raises ExpressionSyntaxError (a subclass of ValueError) with the exact position of the problem and a hint:

# Misspelled keyword
simple_function_to_expr('f [age] > 30 then "Senior" else "Junior" endif')
# ExpressionSyntaxError:
# Found 'then' at position 14, but there is no 'if' before it.
# f [age] > 30 then "Senior" else "Junior" endif
#              ^
# Hint: Every condition starts with 'if': if <condition> then <value> else <value> endif.
# Check that 'if' is present and spelled correctly.

# Unbalanced parentheses
simple_function_to_expr('((1)')
# ExpressionSyntaxError:
# Unbalanced parentheses: '(' at position 1 is never closed.
# ((1)
# ^
# Hint: Add a matching ')'.

# Unknown function
simple_function_to_expr('unknown_func([col])')
# ExpressionSyntaxError: Expected a single value, but found 2. This usually means
# a function name is misspelled or unknown, or an operator is missing between two values.

Catch errors with except ExpressionSyntaxError (importable from the package root) or simply except ValueError.

Built on Polars

This library is built on top of Polars, a blazingly fast DataFrame library written in Rust. All expressions are converted to native Polars operations, ensuring optimal performance.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests on GitHub.

License

MIT License - see LICENSE file for details.

Acknowledgements

Thanks to the Polars team for creating such an amazing library.

About

Code to transform simple code to polars expressions

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors