Skip to content

why is src/sampling.jl/seqsample_d!() not used? #998

@lampretl

Description

@lampretl

The algorithm D in
Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984, page 716-17
is the main result of that article. In src/sampling.jl there are implementations of the A, C, D algorithms. Is there any special reason that B is not implemented?

The main interface function sample! uses algorithms A, C, but not D. Why? In the article on page 704 in table I, we see that D is by far the most performant.

Using sample!(a,x) where n=length(a) is much larger than k=length(x) gives awful performance. For example, when generating a random sparse nxn matrix, where n=10^7, with exactly k=10^8 nonzero entries, we sample k sorted unique elements from 1:n² = 1:10¹⁴ . Wouldn't it make more sense in

if ordered

to just use seqsample_d!?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions