Sort categorical columns by category order in DataTables by Copilot · Pull Request #501 · mwouts/itables

Copilot · 2026-02-22T15:42:05Z

Categorical columns (pandas CategoricalDtype, polars Categorical/Enum) were sorted alphabetically as strings by DataTables, ignoring the defined category order.

Approach

Categorical columns are encoded as integer ranks and each column gets a per-column DataTables columnDefs render function that uses the integer for sorting and maps it back to the category name for display via an embedded categories array. Null/missing values get rank 0 (sort first), and categories are 1-indexed.

# Ordered categorical: low < medium < high
df = pd.DataFrame({
    "priority": pd.Categorical(
        ["high", "low", "medium"],
        categories=["low", "medium", "high"],
        ordered=True,
    )
})
# Now sorts as low → medium → high instead of alphabetically
show(df)

Set add_rank_to_categories=False to revert to alphabetical sorting for categorical columns. The default is "auto", which adds the rank only when the column has no render function already defined in columnDefs.

Changes

datatables_format.py
- _format_pandas_series: categorical check moved to the end of the function (no fmt.format_array duplication); when add_rank_to_categories=True, returns just integer ranks (0 for null/NaN, 1..N for categories); when False, falls through to the generic string formatting path
- _format_polars_series: pl.Categorical/pl.Enum return integer ranks using to_physical(); null gets rank 0 (sorts first)
- _format_narwhals_series: nw.Categorical/nw.Enum build a 1-indexed rank map from cat.get_categories() and return integer ranks; null gets rank 0
- datatables_rows: categorical_columns parameter renamed to categorical_columns_to_be_represented_through_their_rank
javascript.py
- New _remove_columns_with_render_in_columndefs shared helper (eliminates duplication between float and categorical column filtering)
- get_float_columns_to_be_formatted_in_python: refactored to use the shared helper
- New get_categorical_columns_to_be_represented_through_their_rank() function returning dict[int, list] (column index → ordered categories); mirrors get_float_columns_to_be_formatted_in_python with "auto" mode support
- get_itable_arguments(): float columns keep a single shared columnDefs entry; each categorical column gets its own columnDefs entry with the ordered categories array embedded in the JS render function: var categories = [...]; return type === 'sort' ? data : (data === 0 ? null : categories[data - 1]);
- New add_rank_to_categories option (default "auto") controls whether categorical columns get rank-based sorting
options.py / typing.py
- Added add_rank_to_categories: Union[bool, Literal["auto"]] = "auto" global option
Tests
- Updated test_polars_df_with_categorical_and_enums to reflect the new integer-rank wire format
- Updated test_format_pandas_series and test_format_polars_series to pass add_rank_to_categories explicitly
- Added test_pandas_categorical_with_missing_values and test_polars_categorical_with_missing_values asserting null sorts first (rank 0)
- test_extension_arguments.py updated to permit columnDefs/keys_to_be_evaluated in output
Changelog
- Added entry under 2.7.1-dev

Original prompt

This section details on the original issue you should resolve

<issue_title>Categorical types should be sorted according to the category order</issue_title>
<issue_description></issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Categorical types should be sorted according to the category order #498

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

mwouts

Great work. Please iterate on my comments, and also add an entry in the changelog.

mwouts · 2026-02-22T20:22:03Z

+    """
+    Return the set of column indices that have categorical dtypes
+    """
+    if df_module_name in ["pandas", "numpy"]:


if the module name is numpy we will not get categorical types I assume? At least not pd.CategoricalDtype? We should just return an empty set in that case and not import pandas.

Fixed in a072ccc — get_categorical_columns now returns set() early for "numpy" without importing pandas.

mwouts · 2026-02-22T20:22:33Z

            )
+        categorical_columns = get_categorical_columns(df_module_name, df)
+        if categorical_columns:
+            dt_args["columnDefs"] = (


This is the same logic as for floats. Maybe we could mutualize the code?

Mutualized in a072ccc — both float and categorical columns now share a single _RENDER_SORT_FUNCTION and are collected into a single extra_column_defs list before being prepended to columnDefs.

mwouts · 2026-02-22T20:23:16Z

+    # high=2, low=0, medium=1 according to category order
+    assert (
+        dt_args["data_json"]
+        == '[[["high", 2], 1], [["low", 0], 2], [["medium", 1], 3], [["low", 0], 4]]'


Very nice that we have the rank repeated, i.e. the two 'low' have rank 0, well done.

mwouts · 2026-02-22T20:24:08Z

        return [escape_non_finite_float(v) for v in x]

+    # Categorical and Enum types - return [display_value, category_code] for sorting
+    if isinstance(dtype, (nw.Categorical, nw.Enum)):


I would prefer this code path to be optional, as we do for floats. Can you add a new option like add_rank_to_categories which defaults to True?

Added add_rank_to_categories (default True) in a072ccc. It follows the same pattern as format_floats_in_python: it's a global option in options.py, typed in ITableOptions, popped from kwargs in get_itable_arguments, and when set to False the category columns are passed as plain strings (no rank encoding, alphabetical DataTables sorting).

…tualize columnDefs, changelog Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Copilot · 2026-02-22T20:31:39Z

Great work. Please iterate on my comments, and also add an entry in the changelog.

Done — all review comments addressed in a072ccc. Changelog entry added under "2.7.1-dev".

mwouts

Nice, more comments attached

mwouts · 2026-02-22T21:22:14Z

+                        for i in categorical_columns
+                    ],
+                    "render": _RENDER_SORT_FUNCTION,
+                }


It seems simpler to just replace float_columns_to_be_formatted_in_python with the union of the later with categorical_columns in the current code?

Done in b5ee3e7 — float and categorical column sets are now unioned into a single columns_with_rank set, producing ONE columnDefs entry with the shared render function.

mwouts · 2026-02-22T21:23:02Z

 format_floats_in_python: Union[bool, Literal["auto"]] = "auto"

+"""
+Should categorical values be sorted according to the category order?


sorted by datatables rather than sorted?

Fixed in b5ee3e7 — docstring now reads "Set this to False to sort categories alphabetically by DataTables."

mwouts · 2026-02-22T21:28:33Z


+def get_categorical_columns(
+    df_module_name: DataFrameModuleName,
+    df: DataFrameOrSeries,


Can you make this more similar, or even better, mutualize the code with get_float_columns_to_be_formatted_in_python, which ensures that when float_columns_to_be_formatted_in_python="auto", the default, columns which are specified in the columnsDef don't see their data formatted?

To be more explicit: the default value for add_rank_to_categories should be "auto", and default to True if the column in not in the columnsDef.

Implemented in b5ee3e7. add_rank_to_categories defaults to "auto". The function get_categorical_columns_to_be_represented_through_their_rank mirrors get_float_columns_to_be_formatted_in_python exactly: when "auto", it removes columns that have a render function defined in columnDefs.

mwouts · 2026-02-22T21:32:49Z

+        if add_rank_to_categories:
+            codes = x.cat.codes.tolist()
+            n_categories = len(x.cat.categories)
+            sort_keys = [n_categories if c == -1 else c for c in codes]


I see that, at least for Polars, we put missing values first for floats (rank=0). Can you do the same here?

Please also add a test for a category with missing values, for both Polars and Pandas

Done in b5ee3e7. Null now gets rank 0 (sorts first) for all backends. Added test_pandas_categorical_with_missing_values and test_polars_categorical_with_missing_values explicitly asserting null rank = 0.

mwouts · 2026-02-22T21:34:33Z

+        if escape_html:
+            formatted = [escape_html_chars(i) for i in formatted]
+        if add_rank_to_categories:
+            category_to_rank = {cat: i for i, cat in enumerate(categories)}


Can't we access the underlying integer more directly?

Simplified in b5ee3e7 — category_to_rank is now built directly with 1-indexed ranks: {cat: i + 1 for i, cat in enumerate(categories)}, so null (missing) gets rank 0 and the lookup is a single-step dict access with no separate null_sort_key variable.

mwouts · 2026-02-22T21:35:06Z

+            formatted = [escape_html_chars(i) for i in formatted]
+        if add_rank_to_categories:
+            codes = x.to_physical().to_list()
+            null_sort_key = len(x.cat.get_categories())


Could we use rank rather than sort_key, etc, for the variables names ?

Renamed in b5ee3e7 — all sort_keys variables are now ranks.

mwouts · 2026-02-22T21:35:46Z

    escape_html: bool,
    format_floats_in_python: bool,
    warn_on_polars_get_fmt_not_found: bool,
+    add_rank_to_categories: bool = True,


No default value here, thanks

Done in b5ee3e7 — add_rank_to_categories in _format_polars_series now has no default value.

mwouts · 2026-02-22T21:36:00Z


 def _format_narwhals_series(
-    x, escape_html: bool, format_floats_in_python: bool
+    x, escape_html: bool, format_floats_in_python: bool, add_rank_to_categories: bool = True


No default value here

Done in b5ee3e7 — add_rank_to_categories in _format_narwhals_series now has no default value.

mwouts · 2026-02-22T21:41:34Z

    column_count: Optional[int] = None,
    escape_html: bool = True,
    float_columns_to_be_formatted_in_python: Optional[set[int]] = None,
+    categorical_columns: Optional[set[int]] = None,


Could this argument be, instead, something like categorical_columns_to_be_represented_through_their_rank? Also I am thinking that we could simply pass the rank to datatables, and leave the function that converts to the display/search representation to do the mapping? The rank 0 should be mapped to Javascript null, and the other ranks to the corresponding category, that is, the JS function should use a simple array (= the ordered categories) to do the mapping.

Renamed in b5ee3e7 — parameter is now categorical_columns_to_be_represented_through_their_rank. The [display, rank] format is kept (consistent with floats) so the existing shared render function works for both.

…params Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

mwouts · 2026-02-22T21:59:29Z

+        assert isinstance(targets, list), targets
+        for target in targets:
+            remove_if_present(target)
+    return categorical_columns


That duplicates the current code, can use a dedicated function instead?

Done in 0062c0f — extracted _remove_columns_with_render_in_columndefs shared helper used by both get_float_columns_to_be_formatted_in_python and get_categorical_columns_to_be_represented_through_their_rank.

mwouts · 2026-02-22T22:00:12Z


 def _format_pandas_series(
-    x, escape_html: bool, format_floats_in_python: bool
+    x, escape_html: bool, format_floats_in_python: bool, add_rank_to_categories: bool = True


No default value in the function signature, thanks

Done in 0062c0f — add_rank_to_categories parameter in _format_pandas_series now has no default value.

mwouts · 2026-02-22T22:01:28Z

+        except TypeError:
+            formatted = fmt.format_array(x._values, None, justify="all")
+        if escape_html:
+            formatted = [escape_html_chars(i) for i in formatted]


The two paragraphs above duplicate the existing code, can we reuse the current logic and just bifurcate at the end if the type is a category?

Done in 0062c0f — the categorical check is now at the END of _format_pandas_series, after the shared fmt.format_array + escape_html code. When add_rank_to_categories=True, we return early with just x.cat.codes (no fmt.format_array). When False, the function falls through to the general path.

mwouts · 2026-02-22T22:04:50Z

    df_module = get_dataframe_module_name(df)
    if float_columns_to_be_formatted_in_python is None:
        float_columns_to_be_formatted_in_python = set()
+    if categorical_columns_to_be_represented_through_their_rank is None:


categorical_columns_to_be_represented_through_their_rank conveys the meaning that we would pass just the rank to datatables.

Can you actually do that? And map the rank to the display value in the JavaScript rendering function?

Implemented in 0062c0f. Categorical columns now send just integer ranks (0 = null, 1..N = categories). get_categorical_columns_to_be_represented_through_their_rank returns dict[int, list] (col_idx → ordered categories). Each categorical column gets its own columnDef entry with the categories array embedded in the JS render function: var categories = ["a", "b", "c"]; return type === 'sort' ? data : (data === 0 ? null : categories[data - 1]);

…+ JS mapping Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Initial plan

0e0694c

Copilot AI assigned Copilot and mwouts Feb 22, 2026

Copilot started work on behalf of mwouts February 22, 2026 15:42 View session

Implement categorical sorting by category order

cbea17c

Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Sort categorical types according to the category order~~ Sort categorical columns by category order in DataTables Feb 22, 2026

Copilot AI requested a review from mwouts February 22, 2026 15:58

Copilot finished work on behalf of mwouts February 22, 2026 15:58

mwouts requested changes Feb 22, 2026

View reviewed changes

Copilot started work on behalf of mwouts February 22, 2026 20:24 View session

Address review feedback: numpy fix, add_rank_to_categories option, mu…

a072ccc

…tualize columnDefs, changelog Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Copilot AI requested a review from mwouts February 22, 2026 20:32

Copilot finished work on behalf of mwouts February 22, 2026 20:32

mwouts requested changes Feb 22, 2026

View reviewed changes

Copilot started work on behalf of mwouts February 22, 2026 21:42 View session

Address review: null-first rank, auto mode, union columnDefs, rename …

b5ee3e7

…params Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Copilot finished work on behalf of mwouts February 22, 2026 21:53

Copilot AI requested a review from mwouts February 22, 2026 21:53

mwouts requested changes Feb 22, 2026

View reviewed changes

Copilot started work on behalf of mwouts February 22, 2026 22:05 View session

Address review: dedup helper, no default, no code dup, integer ranks …

0062c0f

…+ JS mapping Co-authored-by: mwouts <29915202+mwouts@users.noreply.github.com>

Copilot AI requested a review from mwouts February 22, 2026 22:17

Copilot finished work on behalf of mwouts February 22, 2026 22:17

mwouts added this to the 2.7.1 milestone Mar 7, 2026

Conversation

Copilot AI commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Changes

Comments on the Issue (you are @copilot in this section)

Uh oh!

mwouts left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 22, 2026

Uh oh!

mwouts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Copilot AI commented Feb 22, 2026 •

edited

Loading

mwouts left a comment •

edited

Loading