why is `src/sampling.jl/seqsample_d!()` not used?

The algorithm D in 
[Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984, page 716-17](https://dl.acm.org/doi/epdf/10.1145/358105.893)
is the main result of that article. In `src/sampling.jl` there are implementations of the A, C, D algorithms. Is there any special reason that B is not implemented?

The main interface function `sample!` uses algorithms A, C, but not D. Why? In the article on page 704 in table I, we see that D is by far the most performant.

Using `sample!(a,x)` where `n=length(a)` is much larger than `k=length(x)` gives awful performance. For example, when generating a random sparse nxn matrix, where `n=10^7`, with exactly `k=10^8` nonzero entries, we sample `k` sorted unique elements from `1:n² = 1:10¹⁴` . Wouldn't it make more sense in 
https://github.com/JuliaStats/StatsBase.jl/blob/3e5382fa6b6ac90cf3d0b1904484668714d0e220/src/sampling.jl#L506
to just use `seqsample_d!`? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is `src/sampling.jl/seqsample_d!()` not used? #998

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

why is src/sampling.jl/seqsample_d!() not used? #998

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

why is `src/sampling.jl/seqsample_d!()` not used? #998