Skip to content

Conversation

@beingamanforever
Copy link

tabulate: categorical and string edge-case fixes

This change improves tabulate to handle categorical and string edge cases correctly, aligning behavior more closely with MATLAB.

Fixed

  • Categorical arrays with all values undefined now return zero counts and zero percentages (instead of NaN)
  • Categorical arrays with defined but unused categories correctly include zero-count rows
  • String arrays with all values missing now return an empty table
  • Categorical logic is guarded and only enabled when categorical support is available

Tests

Added focused BISTs covering:

  • All-undefined categorical inputs
  • Empty categorical arrays with predefined categories
  • All-missing string arrays

Tests are automatically skipped when required datatypes are unavailable.

@beingamanforever beingamanforever force-pushed the fix-tabulate-categorical branch from 18c662d to 5b3af43 Compare January 15, 2026 05:17
@pr0m1th3as
Copy link
Member

See my comment on #351

@beingamanforever beingamanforever force-pushed the fix-tabulate-categorical branch from 5b3af43 to 69849f4 Compare January 20, 2026 01:07
@beingamanforever beingamanforever force-pushed the fix-tabulate-categorical branch from 69849f4 to 9fa392d Compare January 20, 2026 01:10
@beingamanforever
Copy link
Author

Updated to keep changes minimal:

  • Fixed division-by-zero when all categorical values are undefined (now returns zeros instead of NaNs)
  • Removed a duplicated code block in string handling
  • No guards added, all existing BISTs preserved, and new tests only cover the fixed edge cases.

Please let me know if you’d prefer these changes split differently or adjusted further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants