Skip to content

feat(RFC): A richer Expr IR#2572

Draft
dangotbanned wants to merge 514 commits into
mainfrom
oh-nodes
Draft

feat(RFC): A richer Expr IR#2572
dangotbanned wants to merge 514 commits into
mainfrom
oh-nodes

Conversation

@dangotbanned
Copy link
Copy Markdown
Member

@dangotbanned dangotbanned commented May 18, 2025

Will close #2571

What type of PR is this? (check all applicable)

  • ✨ Feature

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Important

See (#2571) for detail!!!!!!!

Very open to feedback

Tasks

Show 2025 May-July

@dangotbanned
Copy link
Copy Markdown
Member Author

dangotbanned commented May 19, 2025

FunctionFlags, FunctionOptions, Function, FunctionExpr

Feel like there's more I need to understand on the correct way to propagate the flags to the top-level.

@MarcoGorelli whenever you get to this - I've left loads of notes w/ references - hoping that you'd be able to demystify the rust magic 😄

All 4 classes

They're a bit trimmed down from the polars versions, where I couldn't see a need for us to mirror everything if we wouldn't use it

class FunctionFlags(enum.Flag):
ALLOW_GROUP_AWARE = 1 << 0
"""> Raise if use in group by
Not sure where this is disabled.
"""
INPUT_WILDCARD_EXPANSION = 1 << 4
"""Appears on all the horizontal aggs.
https://github.com/pola-rs/polars/blob/e8ad1059721410e65a3d5c1d84055fb22a4d6d43/crates/polars-plan/src/plans/options.rs#L49-L58
"""
RETURNS_SCALAR = 1 << 5
"""Automatically explode on unit length if it ran as final aggregation."""
ROW_SEPARABLE = 1 << 8
"""Not sure lol.
https://github.com/pola-rs/polars/pull/22573
"""
LENGTH_PRESERVING = 1 << 9
"""mutually exclusive with `RETURNS_SCALAR`"""
def is_elementwise(self) -> bool:
return self in (FunctionFlags.ROW_SEPARABLE | FunctionFlags.LENGTH_PRESERVING)
def returns_scalar(self) -> bool:
return self in FunctionFlags.RETURNS_SCALAR
def is_length_preserving(self) -> bool:
return self in FunctionFlags.LENGTH_PRESERVING
@staticmethod
def default() -> FunctionFlags:
return FunctionFlags.ALLOW_GROUP_AWARE

class FunctionOptions(Immutable):
"""ExprMetadata` but less god object.
https://github.com/pola-rs/polars/blob/3fd7ecc5f9de95f62b70ea718e7e5dbf951b6d1c/crates/polars-plan/src/plans/options.rs
"""
__slots__ = ("flags",)
flags: FunctionFlags
def is_elementwise(self) -> bool:
return self.flags.is_elementwise()
def returns_scalar(self) -> bool:
return self.flags.returns_scalar()
def is_length_preserving(self) -> bool:
return self.flags.is_length_preserving()
def with_flags(self, flags: FunctionFlags, /) -> FunctionOptions:
if (FunctionFlags.RETURNS_SCALAR | FunctionFlags.LENGTH_PRESERVING) in flags:
msg = "A function cannot both return a scalar and preserve length, they are mutually exclusive."
raise TypeError(msg)
obj = FunctionOptions.__new__(FunctionOptions)
object.__setattr__(obj, "flags", self.flags | flags)
return obj
def with_elementwise(self) -> FunctionOptions:
return self.with_flags(
FunctionFlags.ROW_SEPARABLE | FunctionFlags.LENGTH_PRESERVING
)
@staticmethod
def default() -> FunctionOptions:
obj = FunctionOptions.__new__(FunctionOptions)
object.__setattr__(obj, "flags", FunctionFlags.default())
return obj
@staticmethod
def elementwise() -> FunctionOptions:
return FunctionOptions.default().with_elementwise()
@staticmethod
def row_separable() -> FunctionOptions:
return FunctionOptions.groupwise().with_flags(FunctionFlags.ROW_SEPARABLE)
@staticmethod
def length_preserving() -> FunctionOptions:
return FunctionOptions.default().with_flags(FunctionFlags.LENGTH_PRESERVING)
@staticmethod
def groupwise() -> FunctionOptions:
return FunctionOptions.default()
@staticmethod
def aggregation() -> FunctionOptions:
return FunctionOptions.groupwise().with_flags(FunctionFlags.RETURNS_SCALAR)

class Function(ExprIR):
"""Shared by expr functions and namespace functions.
https://github.com/pola-rs/polars/blob/112cab39380d8bdb82c6b76b31aca9b58c98fd93/crates/polars-plan/src/dsl/expr.rs#L114
"""
@property
def function_options(self) -> FunctionOptions:
from narwhals._plan.options import FunctionOptions
return FunctionOptions.default()
@property
def is_scalar(self) -> bool:
return self.function_options.returns_scalar()
def to_function_expr(self, *inputs: ExprIR) -> FunctionExpr[Self]:
from narwhals._plan.expr import FunctionExpr
from narwhals._plan.options import FunctionOptions
# NOTE: Still need to figure out how these should be generated
# Feel like it should be the union of `input` & `function`
PLACEHOLDER = FunctionOptions.default() # noqa: N806
return FunctionExpr(input=inputs, function=self, options=PLACEHOLDER)

class FunctionExpr(ExprIR, t.Generic[_FunctionT]):
"""**Representing `Expr::Function`**.
https://github.com/pola-rs/polars/blob/dafd0a2d0e32b52bcfa4273bffdd6071a0d5977a/crates/polars-plan/src/dsl/expr.rs#L114-L120
https://github.com/pola-rs/polars/blob/112cab39380d8bdb82c6b76b31aca9b58c98fd93/crates/polars-plan/src/dsl/function_expr/mod.rs#L123
"""
__slots__ = ("function", "input", "options")
input: Seq[ExprIR]
function: _FunctionT
"""Enum type is named `FunctionExpr` in `polars`.
Mirroring *exactly* doesn't make much sense in OOP.
https://github.com/pola-rs/polars/blob/112cab39380d8bdb82c6b76b31aca9b58c98fd93/crates/polars-plan/src/dsl/function_expr/mod.rs#L123
"""
options: FunctionOptions
"""Assuming this is **either**:
1. `function.function_options`
2. The union of (1) and any `FunctionOptions` in `inputs`
"""
def with_options(self, options: FunctionOptions, /) -> Self:
options = self.options.with_flags(options.flags)
return type(self)(input=self.input, function=self.function, options=options)

dangotbanned added a commit that referenced this pull request May 20, 2025
- Mentioned in (#2391 (comment))
- Needed again for #2572
@dangotbanned dangotbanned mentioned this pull request May 20, 2025
10 tasks
Comment thread narwhals/_plan/common.py Outdated
@MarcoGorelli
Copy link
Copy Markdown
Member

@MarcoGorelli whenever you get to this - I've left loads of notes w/ references - hoping that you'd be able to demystify the rust magic 😄

at the moment it looks like this adds a self-standing _plan, that's not integrated into the rest of Narwhals? if it's possible to integrate it with the rest to prove that this is feasible, i'd be very interested in taking a close look

MarcoGorelli pushed a commit that referenced this pull request May 21, 2025
* chore(typing): Add `_typing_compat.py`

- Mentioned in (#2391 (comment))
- Needed again for #2572

* refactor: Reuse `TypeVar` import

* refactor: Reuse `@deprecated` import

* refactor: Reuse `Protocol38` import

* docs: Add module-level docstring
dangotbanned added a commit that referenced this pull request May 21, 2025
Still need:
- reprs
- fix the hierarchy issue (#2572 (comment))
- Flag summing (#2572 (comment))
dangotbanned added a commit that referenced this pull request May 21, 2025
dangotbanned added a commit that referenced this pull request May 21, 2025
- 1 step closer to the understanding for (#2572 (comment))
- There's still some magic going on when `polars` serializes
  - Need to track down where `'collect_groups': 'ElementWise'` and `'collect_groups': 'GroupWise'` first appear
  - Seems like the flags get reduced
Comment thread narwhals/_plan/functions.py Outdated
@dangotbanned
Copy link
Copy Markdown
Member Author

Thanks for peeking @MarcoGorelli

at the moment it looks like this adds a self-standing _plan, that's not integrated into the rest of Narwhals? if it's possible to integrate it with the rest to prove that this is feasible, i'd be very interested in taking a close look

That is definitely the eventual goal! 🤞

Despite how quickly things have progressed, I still feel I'm a few steps behind being ready for that just yet.

General overview

I'm trying to focus on modeling these structures and how they interact:

My thought was that narwhals currently solves similar problems, but in different ways to polars.
By getting this subset of polars that we need translated from rust -> python first, we're then in a good position to make decisions on how to bridge any gaps that narwhals is occupying at the moment, if that makes sense?

So like what I have in narwhals/_plan/dummy.py is all about creating an accurate expression graph.
Consuming the graph (evaluating an expression) is a very important step - but the main difference I'm anticipating is:

Current

  • Expressions are based on evaluating callables
  • with other callables used for output names & aliases
  • They also optionally store a ExprMetadata object
  • Some backends do other things with function names, kwargs, depth
  • Lazy backends use more callables for handling windows

ExprIR

  • Expressions are (likely still) based on evaluating callables
  • All of the remaining details are either:
    • Encoded into a node on the graph
      • E.g. the nodes for Expr.var, Expr.std have the name and ddof already
      • Depth can be computed anywhere by traversing the graph
    • Or, can be expressed in terms of operating on a node

I am confident we'll end up with something that's easier to maintain - but trying to integrate the two mid-solve and maintaining that branch over time seems like it'd be a real challenge 😔


FunctionOptions, FunctionFlags

The parts mentioned in (#2572 (comment)) are some were one of the main hurdles left.
However I think it is really just a skill issue on my part in understanding the rust code. That was all I was hoping for some help with 🙏
In narwhals terms, it is closest to (https://github.com/narwhals-dev/narwhals/blob/1b93c0ed2dc2bf47d7b8e4b4cab4c9e9cad59800/narwhals/_expression_parsing.py) but only a subset of the rules - since it only concerns functions and how they compose.

Note

Right before I was about to send this comment, I managed to fix the issue I had by updating to 1.30.0 🤦‍♂️
See (diff), which removed the flags I was having trouble finding in (0982b3a)

Example

Now all the flags I've been using are propagated in the same way as in polars! 🎉

Repro code

import polars as pl

from narwhals._plan import demo as nwd  # noqa: F811
from narwhals._plan import meta  # noqa: F811

expr_pl = (
    pl.col("a")
    .sort()
    .fill_null(1)
    .shift(1)
    .abs()
    .drop_nulls()
    .skew()
    .alias("col->sort->fill_null->shift->abs->drop_nulls->skew->alias")
)
expr_nwd = (
    nwd.col("a")
    .sort()
    .fill_null(1)
    .shift(1)
    .abs()
    .drop_nulls()
    .skew()
    .alias("col->sort->fill_null->shift->abs->drop_nulls->skew->alias")
)
roundtrip_pl = meta.polars_expr_to_dict(expr_pl)
roundtrip_nw = str(expr_nwd._ir)

roundtrip_pl

I haven't added ALLOW_EMPTY_INPUTS - so that's an expected difference between the two

>>> roundtrip_pl
{'Alias': [{'Function': {'input': [{'Function': {'input': [{'Function': {'input': [{'Function': {'input': [{'Function': {'input': [{'Sort': {'expr': {'Column': 'a'},
                   'options': {'descending': False,
                    'nulls_last': False,
                    'multithreaded': True,
                    'maintain_order': False,
                    'limit': None}}},
                 {'Literal': {'Dyn': {'Int': 1}}}],
                'function': 'FillNull',
                'options': {'check_lengths': True,
                 'flags': 'ALLOW_GROUP_AWARE | ROW_SEPARABLE | LENGTH_PRESERVING'}}},
              {'Literal': {'Dyn': {'Int': 1}}}],
             'function': 'Shift',
             'options': {'check_lengths': True,
              'flags': 'ALLOW_GROUP_AWARE | LENGTH_PRESERVING'}}}],
          'function': 'Abs',
          'options': {'check_lengths': True,
           'flags': 'ALLOW_GROUP_AWARE | ROW_SEPARABLE | LENGTH_PRESERVING'}}}],
       'function': 'DropNulls',
       'options': {'check_lengths': True,
        'flags': 'ALLOW_GROUP_AWARE | ALLOW_EMPTY_INPUTS | ROW_SEPARABLE'}}}],
    'function': {'Skew': True},
    'options': {'check_lengths': True,
     'flags': 'ALLOW_GROUP_AWARE | RETURNS_SCALAR'}}},
  'col->sort->fill_null->shift->abs->drop_nulls->skew->alias']}

roundtrip_nw

To produce this I removed the outer ", " and pasted back to ruff to format

The overall shape is very similar and the deviations from polars have been documented

Alias(
    expr=FunctionExpr(
        function=Skew(),
        input=[
            FunctionExpr(
                function=DropNulls(),
                input=[
                    FunctionExpr(
                        function=Abs(),
                        input=[
                            FunctionExpr(
                                function=Shift(n=1),
                                input=[
                                    FunctionExpr(
                                        function=FillNull(),
                                        input=[
                                            Sort(
                                                expr=Column(name="a"),
                                                options=SortOptions(
                                                    descending=False, nulls_last=False
                                                ),
                                            ),
                                            Literal(
                                                value=ScalarLiteral(
                                                    dtype=Unknown, value=1
                                                )
                                            ),
                                        ],
                                        options=FunctionOptions(
                                            flags="ALLOW_GROUP_AWARE | ROW_SEPARABLE | LENGTH_PRESERVING"
                                        ),
                                    )
                                ],
                                options=FunctionOptions(
                                    flags="ALLOW_GROUP_AWARE | LENGTH_PRESERVING"
                                ),
                            )
                        ],
                        options=FunctionOptions(
                            flags="ALLOW_GROUP_AWARE | ROW_SEPARABLE | LENGTH_PRESERVING"
                        ),
                    )
                ],
                options=FunctionOptions(flags="ALLOW_GROUP_AWARE | ROW_SEPARABLE"),
            )
        ],
        options=FunctionOptions(flags="ALLOW_GROUP_AWARE | RETURNS_SCALAR"),
    ),
    name="col->sort->fill_null->shift->abs->drop_nulls->skew->alias",
)

@dangotbanned

This comment was marked as resolved.

dangotbanned added a commit that referenced this pull request May 23, 2025
Can't tell if this means `FirstT` will match the entry `firstt`, but preserve the `firstt` fix (https://github.com/codespell-project/codespell#ignoring-words)

(#2572 (comment))
Comment thread .pre-commit-config.yaml Outdated
@dangotbanned
Copy link
Copy Markdown
Member Author

I should've expected this, but it was a nice suprise to find we get hashable selectors for free 😄

from narwhals._plan import selectors as ndcs

>>> ndcs.matches("[^z]a")._ir == ndcs.matches("[^z]a")._ir
True

>>> ndcs.matches("[^z]a")._ir == ndcs.matches("abc")._ir
False

@MarcoGorelli regarding (#2291)

from narwhals._plan import selectors as ndcs

>>> ndcs.all()._ir == ndcs.all()._ir
True

lhs = ndcs.all()
rhs = ndcs.all().mean()

>>> lhs._ir == rhs._ir
False

>>> lhs._ir == rhs._ir.expr
True

And the same holds for the non-selectors all 🥳

from narwhals._plan import demo as nwd

lhs = nwd.all()
rhs = nwd.all().mean()
>>> lhs._ir == rhs._ir
False

>>> lhs._ir == rhs._ir.expr
True

>>> type(rhs._ir)
narwhals._plan.aggregation.Mean

Comment thread pyproject.toml
dangotbanned added a commit that referenced this pull request May 24, 2025
dangotbanned added a commit that referenced this pull request May 26, 2025
Comment thread tests/plan/expr_parsing_test.py Outdated
Comment on lines +85 to +110
def test_valid_windows() -> None:
"""Was planning to test this matched, but we seem to allow elementwise horizontal?

https://github.com/narwhals-dev/narwhals/blob/63c8e4771a1df4e0bfeea5559c303a4a447d5cc2/tests/expression_parsing_test.py#L10-L45
"""
ELEMENTWISE_ERR = re.compile(r"cannot use.+over.+elementwise", re.IGNORECASE) # noqa: N806
a = nwd.col("a")
assert a.cum_sum()
assert a.cum_sum().over(order_by="id")
with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR):
assert a.cum_sum().abs().over(order_by="id")

assert (a.cum_sum() + 1).over(order_by="id")
assert a.cum_sum().cum_sum().over(order_by="id")
assert a.cum_sum().cum_sum()
assert nwd.sum_horizontal(a, a.cum_sum())
with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR):
assert nwd.sum_horizontal(a, a.cum_sum()).over(order_by="a")

assert nwd.sum_horizontal(a, a.cum_sum().over(order_by="i"))
assert nwd.sum_horizontal(a.diff(), a.cum_sum().over(order_by="i"))
with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR):
assert nwd.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i")

with pytest.raises(InvalidOperationError, match=ELEMENTWISE_ERR):
assert nwd.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli quick question

This is adapted from an existing test:

tests.expression_parsing_test.test_window_kind

@pytest.mark.parametrize(
("expr", "expected"),
[
(nw.col("a"), 0),
(nw.col("a").mean(), 0),
(nw.col("a").cum_sum(), 1),
(nw.col("a").cum_sum().over(order_by="id"), 0),
(nw.col("a").cum_sum().abs().over(order_by="id"), 1),
((nw.col("a").cum_sum() + 1).over(order_by="id"), 1),
(nw.col("a").cum_sum().cum_sum().over(order_by="id"), 1),
(nw.col("a").cum_sum().cum_sum(), 2),
(nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1),
(nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over(order_by="a"), 1),
(nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum().over(order_by="i")), 0),
(
nw.sum_horizontal(
nw.col("a").diff(), nw.col("a").cum_sum().over(order_by="i")
),
1,
),
(
nw.sum_horizontal(nw.col("a").diff(), nw.col("a").cum_sum()).over(
order_by="i"
),
2,
),
(
nw.sum_horizontal(nw.col("a").diff().abs(), nw.col("a").cum_sum()).over(
order_by="i"
),
2,
),
],
)
def test_window_kind(expr: nw.Expr, expected: int) -> None:
assert expr._metadata.n_orderable_ops == expected

AFAICT, all of the expressions I've needed a InvalidOperationError for shouldn't be valid.

But they aren't raising in current narwhals 🤔

1

import narwhals as nw

a = nw.col("a")
a.cum_sum().abs().over(order_by="id")
This error explicitly mentions abs

if self.is_elementwise or self.is_filtration:
msg = (
"Cannot use `over` on expressions which are elementwise\n"
"(e.g. `abs`) or which change length (e.g. `drop_nulls`)."
)
raise InvalidOperationError(msg)

2, 3, 4

These are all raising the same as (1), but the issue seems to be that horizontal functions aren't being treated as elementwise

import narwhals as nw

a = nw.col("a")
nw.sum_horizontal(a, a.cum_sum()).over(order_by="a")
nw.sum_horizontal(a.diff(), a.cum_sum()).over(order_by="i")
nw.sum_horizontal(a.diff().abs(), a.cum_sum()).over(order_by="i")

In polars, they all seem to be elementwise but with an additional flag

https://github.com/pola-rs/polars/blob/944bf553f1111c31259b55348a7dd0a512ae51a1/crates/polars-plan/src/dsl/function_expr/mod.rs#L1388-L1392

I've done the same in this PR, but I don't think that flag would factor into this?

class SumHorizontal(Function):
@property
def function_options(self) -> FunctionOptions:
return FunctionOptions.elementwise().with_flags(
FunctionFlags.INPUT_WILDCARD_EXPANSION
)
def __repr__(self) -> str:
return "sum_horizontal"

Comment thread narwhals/_namespace.py
Comment thread tests/plan/expr_parsing_test.py Outdated
* fix: Align `BinarySelector` repr w/ polars

Thought it looked a bit weird in a doctest
Turns out they use `()` in `Expr::BinaryExpr`,
but never in `Selector::*`

- https://github.com/pola-rs/polars/blob/7fc9f1875714fe9893c4d849b9593c1e4db1e854/crates/polars-plan/src/dsl/format.rs#L87
- https://github.com/pola-rs/polars/blob/7fc9f1875714fe9893c4d849b9593c1e4db1e854/crates/polars-plan/src/dsl/selector.rs#L641-L644

* docs: Explain `SelectorIR.to_dtype_selector`

Towards #3497

* test: Fix `issubclass` coverage

Not sure why this only came up recently

* refactor: Rename `_matches` -> `_matches_dtype`

* feat: Make `Empty` a concrete selector

* perf: Add `SelectorIR.invert` simplification

Couldn't get coverage for `AllDType`, `EmptyDType`

* chore: give up on flaky cov

* docs: Explain `SelectorIR.matches`

Towards #3497

* chore(typing): Align `Series.sum` return with new polars

pola-rs/polars#26629

* refactor: `iter_expand_names` -> `iter_expand_selector`

Documenting this is one of the last selectors parts in #3497

May as well pick the name first

* refactor: Simplify `expand_selectors` + friends

* docs: Explain `SelectorIR.iter_expand_selector`

Towards #3497

Adapted from https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.expand_selector

* feat(typing): Accept `Mapping[str, DType]` in `iter_expand_selector`

* perf: Cache imports from `into_version`

+ finish the partial API + use it everywhere

* docs: Align `BinaryExpr` with `BinarySelector`

* refactor: Move `iter_output_name` from `RootSelector` -> `ByName`

Wasn't possible in the (earlier) ADT version

* docs: Explain `Column`, `All`, `ByName`, `ByIndex`

Towards #3497

Highlights how this is based on the updated `polars` internals (pola-rs/polars#23351)

* docs: Use "Arguments" some more

Towards #3497

- pylance added support recently (can't find when) for
the text showing in both `__init__` and on attribute access
- there's still some larger docs I wanna keep on the attributes *for now*

* docs: Explain `SelectorIR`

Towards #3497

getting there indeed

* chore: Mention known selectors gaps

The time it would take to add tests is the only thing blocking these

* chore: Address exception todos

* test: Prepare for new combination expansion

- Planning to partially revert (#3029 (comment))
- I made the wrong call on `when`
- Still prefer the deviation for the other nodes

* revert: "disallow multi-output in when (for now)"

(b96dfd7)

* feat: Support combination expansion in `when`

Related: (90def5f),
(8303f70)

- Happy with it feature-wise
- Implementation + docs need more polish

* refactor: 2nd pass on `iter_expand_by_combination`

* perf: Add fastpath for single many combination

- Avoids the double zipping
- Covers the only valid expansion on `main`
  - + allows the expansion on a leaf

* test: Extra coverage for `ExprTraverser.names` cache

- Per-class (`{Binary,Ternary}Expr`), a cache hit can come from any instance
- This triggers another `(1, M, M)` case

* docs: Explain `has_multiple_outputs` behavior

See pola-rs/polars#23708

* refactor: Move, rename `seen_multi`

* abandon indices idea

added too much complexity for some that avoids a 2-3 string list

* refactor: Make combination error self-documenting

+ display expansion sizes in the intuitive order

* refactor: 3rd pass on `iter_expand_by_combination`

Muuuuuch easier to read now

* refactor: Don't use a set, when it can only have 1 member

* docs: Start explaining combination expansion

Towards #3497
Related to #3029 (comment)

* refactor: Remove the dedicated `FillNan` + support in arrow

- No need for it now `when` accepts selectors in any position
-The impl is identical in
  - https://github.com/narwhals-dev/narwhals/blob/ca85e68dccbbba915d2f6c54483d48521ff91d3a/narwhals/_plan/arrow/functions/_multiplex.py#L186-L194
- Can still use that path for `ArrowSeries`

* chore: Make `by_{name,index}` reprs less noisy

- Defaults are omitted from repr
  - polars does this in a lot of places
  - I think it makes a lot of sense here since these are created *mostly* indirectly
- `__str__` still shows them in full

* fix(typing): Ensure `ExprNode` docstrings are visible

- Noticed while trying to write docs in `ExprTraverser`
- Quite a tricky problem to solve
- The union of concrete classes produced multiple signatures
- Landed on last solution because `SingleExpr.is_scalar` wasn't in the protocol
  - It didn't need to be there
  - New typing narrows just fine

* chore: More planning expansion docs

Towards #3497

Ideally there will be some (contextually relevant) bits
sprinkled in all over

* docs: Example for `ExprIR.iter_expand`

Towards #3497

* docs: Explain `Expander.iter_expand_expressions`

Towards #3497

* refactor: Remove `Expander.inner`

Farewell my short lived friend

* refactor: Move `iter_output_name` root semantics to `ExprTraverser`

Makes selectors + renaming the special-cases,
(rather than root nodes)

* chore: Add note on `RollingExpr` removal

* chore: De-prioritize `FunctionExpr` integration

Experimented a bit, but was becoming a time-sink

* docs: Explain expansion in `ExprNode`

Towards #3497
Renamed methods after finally finding
something that describes the relationship

* docs: Explain `ExprTraverser.iter_expand`

Towards #3497

Gonna save examples for `ExprIR.iter_expand`

* docs: "leaf" -> "branch"

* docs: Explain `ExprIR.iter_expand`

Towards #3497

* chore: Temp improve `Expr` repr

Remembered this lil idea #3213 (comment)

* fix: Avoid creating binary selectors in `fill_nan`

Quite a goof there!

I missed that my test added `as_expr` on the selector case.
Thrown in more selectors to be sure

* docs: Explain `ExprTraverser.iter_expand_by_combination`

Towards #3497

* chore: Add `IsScalar` elementwise note

Need to get this done, but not just yet

* refactor: Skip passing empty `ignored` for selector-only expansion

Only needed when coming from `prepare_projection` - which this path never does

* chore(typing): Widen `prepare_projection` from `Sequence`

* docs: Nit `parse_expand_selectors`

That detail was more important when collection logic wasn;t inside `Expander`

* chore: Move and explain `expressions_to_schema`

Related to #3497

* docs: Polish `prepare_projection`, `expand_selectors`

Towards #3497

* refactor: `expressions_to_schema` -> `FrozenSchema.select_resolved`

2 birds

* refactor: Tighten up `_expansion` API boundaries

* docs: Explain `Expander`

Towards #3497

* test: Shrink some boilerplate

* fix: Reject non-length-preserving in `sort_by`

- Adds `ExprIR.is_length_preserving`
- Can integrate it more closely as a follow-up

* fix: Accept length-preserving, non-elementwise in binary expressions

`_is_filtration` is still incomplete,
but this is correct for `FunctionExpr` now at least

* fix: Don't mark `over` as non-length-preserving

Not sure why `polars` does that, but it doesn't reject the same expression

* fix: Reject length-changing in binary expressions

I'm sure there's a reason I'm missing, but currently baffled by `changes_length`, `is_length_preserving`, `is_scalar`

* perf: Simplify `FunctionExpr.changes_length`

- Previously
    - (per-instance) had a worst-case of 2x `Flag.__contains__` + `not`
- Now
    - the target (`_CHANGES_LENGTH`) is evaluated as a global
    - (per-instance) is a single `frozenset.__contains__`, which is cheaper than anything for `Flag`

* chore: Explain weirdness in `FunctionFlags.__str__`

* refactor: Split out `function_expr.py`

- Prep for documenting in #3497
- Also scopes the blanket `[misc]` ignore, so we still get that reported in `expr.py`

* refactor: Remove `RollingExpr`

- It wasn't consistent with `polars`
  - Uses that name to represent `Expr.rolling`
- I added it quite early (before dispatch)
  - `CumAgg` works fine without `CumExpr`
- The other `FunctionExpr`s have more motivation than grouping

* refactor: Deduplicate range validation

* chore: Add `{Function,FunctionFlags}.is_length_preserving`

* docs(DRAFT): Add some `FunctionExpr` basics

- (Eventually) towards #3497
- Need to tidy up the validation logic first

* feat: Add arity concept to `Function`

- So far this just cleans things up
- Fully utilizing it for expression dispatch should shrink things a lot
  - And be reusable across backends without subclassing

* docs: Explain `Parameters`

Towards #3497

* chore: todos

* feat(DRAFT): Use `Parameters` for dispatch

- (Mainly) performing some associated type *black magic*
- Did a few replacements for coverage
- Expect a lot more to change

* refactor: Limit dependencies on `.version`

* chore(typing): Use `HorizontalExpr` in compliant

* chore(typing): `FrameT_contra -> `FrameT`

Can be updated in other places, overall dislike this lint
Doesn't compose well with non-single-letter variables

* feat: Add `FunctionExpr.dispatch_args`

Builds on the typing from (a09be54)
Avoids the need for navigating through `parameters`
and then passing `node` back in

* feat(typing): Restricted `dispatch_arg` for `Unary`

`mypy` seems to not understand the rest, but does handle the negative part of this

* chore(typing): bump `mypy==1.20.1`, still no fancy fix 😭

Haven't figured out a solution yet, currently got 27 errors

* todo

* fix(typing): Less broken mypy

* fix: Handle multi-inheritance of `DispatchOptions`

Found out it was broken when trying a fix for (c4bcea0)

* refactor: Add `Function` subclasses for arity

This should be much easier from a typing perspective
Unexpected bonus was some safety for `Expr._with_unary`

* fix(typing): Help mypy with `FunctionExpr[UnaryFunction]`

* refactor: Replace associated types with overloads

Well, I had to try at least

* feat(pyarrow): Add `unary` factory

Need to finish porting over `_unary_function`

* chore(pyarrow): Transition more to `unary`

* refactor: Skip straight to `node.function`

* ci(ruff): ignore some more names

forgot to push this a while ago oops

* chore: remove todo

* chore(DRAFT): Add an accessor version of `unary`

Featurewise, this is pretty close
Left a lot of stuff to deduplicate

* refactor(typing): Generalize from `UnaryFunction` -> `ExprIR`

* give up

* fix: stop requiring keyword-support in `CompliantExpr`

- This allows the use of `Callable` instead of callback protocols
- `Callable` has more special-casing in type checkers
- It does not restrict impls to use positional-only
  - Just defines that they will be passed them by position
  - Which is fine, they're three parameters of different types

* refactor: Remove some noise from signatures

* fix: Remove some more required keyword support

* chore: More `FunctionExpr` docs prep

Towards #3497

* refactor: More `unary` usage

* refactor(DRAFT): Rethink `dispatch`, `version`

The dispatching part is done
Still got a ways to go on factoring out `version` as state

* refactor: Make `version` a classvar for compliant

Part 2 of (e118036)

Huge diff as I got close to fixing the variance issue, but not quite

* chore: misc cleanup

* feat(typing): More covariance

* chore: remove `version`

* feat: More progress on versions packages

* chore: Add `Compliant{Expr,Scalar}.native`

christ that took long to fix the typing

* docs: Show type parameters on hover

Lots of stuff to fix, need more visibility of the issues

* refactor: Avoid referencing `CompliantSeries`

Towards getting covariance in more places

* fix(typing): Support defaults for `Scalar`

Had to move the `len` definition out, since `Expr.from_python` isn't defined

* chore: cleanup some experiments

* tidier

* docs: Remove outdated docs

* refactor: Use native types in `io` protocols

* woops, forgot that

* feat(DRAFT): Everything is a plugin

* test: commit that already

* chore: typos

* chore: imports lint

* more typos

* chore: not banned anymore

* ci: pin `pyarrow<24`

Related #3560, #3561

* feat: Implement plugin discovery

Continued from (f82f351)

* chore: cov

* go away

* fix(typing): Include `PolarsPlugin` in typing

* feat(typing): Add `load_plugin` overloads and test them

* rename, test, explain: `Plugin` dependency guards

Resolves the overlap with `EntryPoint.load`, which is about the plugin import itself

- `is_loaded`    -> `is_imported`
- `is_available` -> `can_import`

* feat: More `_entry_points` validation

- `load_plugin` is only a temporary api
- need to share the validation with some alternative while I build them out

* feat(typing): Add versioning to `__narwhals_classes__`

* feat: Add `can_{eager,lazy}`

* refactor: `unsupported_backend_operation_error` -> `unsupported_error`

* explore replacing `LazyFrame.collect` w/ plugins

incredibly rough, spent way too long trying to get mypy to work but no dice

* prep to remove `PluginAny`

* chore: move `sys_modules_targets` to `Plugin

* fix(typing): Replace `PluginAny`

I made everything covariant, but then expected it to work contravariantly

* chore: remove `reveal_type`

* fix(typing): Avoid 1 `[var-annotated]`

Better than nothing

* refactor: Remove housekeeping from `compliant.plugins`

* refactor(typing): now those names are available

* feat: Integrate `_narwhals_classes__` versioning

Happy enough with this as a proof-of-concept
Need to start wrapping this up in some classes

* refactor: `Plugin.plugin_name` -> `Plugin.name`

* refactor: `sys_modules_targets` -> `requirements`

* feat(DRAFT): Add a plugin manager

Nothing too fancy, just avoiding repeating some work
and having consistent errors

* chore: Use tuples for `__all__`

can fold them and see on hover now

* chore: Expose `load_plugin`

* refactor: Move a bunch of types to `typing`

* fix(typing): Preserve class versions in most cases

Eventually need to do this with less complexity

* farewell `TemporaryPluginsType`

* chore: let's get repr'd

* refactor: rename to `PluginManager`

* test: cover `import_modules`

* refactor: Make the impl of `can_import` use the cache

* feat: Implelment `known`, `imported`, `importable`

* sketch out `is_native_dataframe`

Needs more work before replacing `translate.from_native_*`

* add `SeriesV2`

* just once will do

* refactor: reuse `hasattrs_static`

* fix(typing): More fiddling with guards

* refactor: Move into a package

* feat(DRAFT): Add a "parsed" plugin representation

The big idea is to parse, don't validate

* chore(typing): Close the `str` gap

* parse into `PluginIR` on load

* chore: easy repr

* fix: Ensure both destructive paths parse plugins

* feat: Populate plugin registry with accessors

* feat: Add `PluginManager.dataframe`

Quite happy with the ergonomics of this 😅

* refactor: Wrangle version strings

* feat: Add `PluginManager.{lazyframe,series,evaluator}`

Actually needed to change a lot more to fix the tests

* refactor: Replace `_namespace.evaluator` with `PluginManager.evaluator`

* chore: Remove `compliant.package`

Worked well as an experiment and lead to something I quite like now

* feat(typing): Add basic overloads on `PluginManager.<class-name>`

* test: Cover `is_native_dataframe`

* feat(typing): Add `PluginManager.plugin` overloads

* chore: Move `import_classes` out of runtime and deprecate

* refactor: Clean up accessors, rename

* feat(DRAFT): Universal `@signeldispatch`????

* chore(typing): Reuse `compliant.typing` aliases

* refactor(typing): `IntoBackendExt` -> `IntoPlugin`

* refactor: `_backend_to_plugin_name` -> `_plugin_name`

* refactor: Remove `require`

Only had one use left since adding the registry

* chore: Update notes

* docs: Explain some of `PluginManager`

... and rename `_get_class` -> `_import_class`

* feat: Use registry dispatch for `Series.from_native`

* feat: Use registry dispatch for `DataFrame.from_native`

* fix(typing): Remove default from `NativeFrameT_co`

Was periodically causing these warnings:
# `PluginManager.dataframe` (overload 1)
> Could not specialize type "DataFrame[NativeDataFrameT_co@DataFrame, NativeSeriesT_co@DataFrame]"
>   "NativeFrame" is not assignable to "DataFrame"

# `PluginManager.dataframe` (overload 2)
> Could not specialize type "DataFrame[NativeDataFrameT_co@DataFrame, NativeSeriesT_co@DataFrame]"
>   "NativeFrame" is not assignable to "Table"

* feat: Use registry dispatch for `LazyFrame.from_native`

* fix: Make `backend`, `version` required in `ScanFile`

* chore(typing): replace `typing_can_eager_lazy_integration`

That's 1/2 done, then I can exorcise `import_classes`

* chore: Remove everything `import_classes`-related

* chore: move repr

* fix: Pre-emptively avoid toctou

* refactor: Make `from_native` less hacky

* refactor: Simplify `from_iterable`

* refactor: Simplify `from_dict`

* refactor: Simplify `concat_series*`

* refactor: Simplify `concat_df*`

* refactor: Simplify eager IO

* refactor: Simplify lazy IO

* perf: Use `__slots__` across the entire `Compliant*`-level#

Only 2 classes have issues to resolve (and 1 is named `Resolve` 😉)

* fix: Remove unused `NativeDataFrameT_co` from `EagerNamespace`

* refactor: Simplify eager range constructors

* refactor: Remove `eager_implementation`

* refactor: Remove `known_implementation`

* refactor: Move `namespace`

* refactor: Simplify `read_{csv,parquet}_schema`

Annoying that these don't fit in with the rest of the model yet
They really do need a seperate API though,
since `pyarrow` can provide this natively for `pandas` as well

* chore: Remove superseded `unsupported_error`

* refactor: Rename `len` -> `len_star` at compliant level

The duplicate name uses for `ns.len` and `expr.len` was keeping the need for `Namespace`

* docs: Remove outdated `Lit.is_scalar` example

* refactor: Actually rename `Column`, `Len`

Left aliases behind, since `Column` is referenced literally everywhere

* refactor: Make `lit_series` a classmethod

* refactor: Make `len_star` a classmethod

* refactor: Make `lit` a classmethod

* refactor: Make `col` a classmethod

* typo

* chore: deprecate `namespaced`

* chore(typing): Add typing to `constructor` binding

* refactor: Move `concat_str` from namespace

* refactor: Move `mean_horizontal` from namespace

* refactor: Move all horizontal functions from namespace

* refactor: remove `namespace` helper

* refactor: Introduce `CompliantColumn`

Quite relieved with all the overrides that are gone now

* fix: CONTRAVARIANCE ONCE AGAIN!!!!

* refactor: Move `int_range` from namespace

* refactor: Move `date_range`, `linear_space` from namespace

* chore: Remove more traces of namespace

* chore(typing): Prep new dispatch typing

* fix(typing): Avoid LSP visibility bug

* chore: remove not planned

* refactor: Remove `ExprIR.__init_subclass__(dispatch="no_dispatch")`

Pretty deep into a refactor right now, but this part can go in early

* refactor: Move `DispatchOptions` merge

* perf: Zero-cost developer hints

* chore: all constructors are final

* remove `"no_dispatch"` there too

* refactor: Remove `LiteralExpr`

I'm adding a common base for `Lit`, `LitSeries`, `Col`, `LenStar`
This was just in the way

* stop being fancy and use classes!

* chore: Update more refs to dispatch

* refactor: just keep decoupling

* refactor: move `pascal_to_snake_case`

* refactor: Expression dispatch w/o `*Namespace`

- Truly the most insane commit (apologies future reader)
- Changes a very core assumption
  - that everywhere can access `__narwhals_namespace__`

* chore: remove more namespace

* chore: typos

* revert: modin filter warning

* surely not, right?

maybe fixes:
> TypeError: Parameters to generic types must be types. Got <narwhals._plan.translate.ParamSpec object at 0x7f72e4cf4dc0>.

* well, how about this?

> TypeError: type.__new__() takes exactly 3 arguments (0 given)

* empty parametrize?

> fixture 'lazyframe' not found
> fixture 'lazy' not found

* test: more empty fixtures plz

* plz

* pin pyright
commit 9d015e2031acd5e10404da72a7d698c632446c49
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date:   Sat May 16 09:51:39 2026 +0100

    release: Bump version to 2.21.2 (#3630)

commit 8674e44f657110b45aedffb8c1343d5b0505bb01
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date:   Sat May 16 09:37:56 2026 +0100

    release: Bump version to 2.21.1 (#3628)

commit 739dc579adfd38a75764d6d2c09ed42b0f2e887a
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date:   Sat May 16 09:35:33 2026 +0100

    ci: remove `downstream_tests_slow` (#3629)

commit 1f36279bb07203628ddd18a691cf98d4c78780d3
Author: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Date:   Sat May 16 09:17:10 2026 +0100

    Revert "fix: Allow `float('nan')` as value in join for duckdb (#3555)" (#3627)

    This reverts commit 0d7f352.

commit 1e50d020e3ca5e5c5dfe3486e53190e87c60b66c
Author: Pedro <pedro.villanueva@booking.com>
Date:   Fri May 15 13:05:01 2026 +0200

    [Enh]: Add the negation unary operator for expressions and series (#3625)

    negation unary operator for expressions and series

commit 37ea7953f6b455736bf6ae353ca257f468b3b083
Author: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Date:   Wed May 13 10:38:42 2026 +0200

    ci: Unpin "temporary" CI pins (#3618)

    * ci: Unpin 'temporary' CI pins

    * rollback formulaic and pointblank

commit 4ff3a1f545924e5e38e76c4d8a71de2010a2f4d6
Author: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Date:   Tue May 12 19:11:11 2026 +0200

    chore: Prepare for future pandas inplace deprecation (#3616)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: A richer Expr internal representation

2 participants