Skip to content

Feature: Can we trivially speed up things for categorical X by aggregating Y's within categories? #174

@adw96

Description

@adw96

In the course of pursuing faster null fitting for categorical covariates, it occurred to me that we may be able to speed up fitting under the alternative using a trivial technique:

In any discrete X setting with p categories I think we can immediately collapse Y from n x J to p x J, summing over the same X's. This may reduce computation especially for large n. This should work for any g().

I can't remember if scaling in n with radEmu is poor, but this could be worth a try. There's no need to keep rows as distinct, as any time they appear in the likelihood there is going to be aggregation over common X's.

@svteichman would you have the bandwidth to investigate if this is promising? This is fitting under both the null and the alternative (potentially best tested individually), and any g(). I think we could first confirm that the results of fitting are the same when Y is n x J vs when Y is p x J (aggregating over the p categories).

There would need to be a standard error adjustment (for I? Dy?) to ensure the Wald test stats are right. Presumably the same is true for the score.

This is lower priority than #173 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions