faster prime sieve part 1#120
Open
oscardssmith wants to merge 18 commits into
Open
Conversation
before `_primesmask(2^30)` took 2.726 seconds after it took `2.358s`.
10613fb to
58ad969
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
==========================================
+ Coverage 94.38% 94.67% +0.28%
==========================================
Files 2 3 +1
Lines 463 563 +100
==========================================
+ Hits 437 533 +96
- Misses 26 30 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Split sieve tests into test/sieve_tests.jl for a fast dev loop; runtests.jl includes it. Tests cross-check primes/primesmask against an isprime-based reference across small, window-crossing, and boundary ranges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_sieve_range crosses off primes in cache-sized 64-bit-chunk windows, resuming each base prime's (next-multiple, phase) state across windows. _segment_primes collects base primes recursively. Added alongside the old _primesmask; callers are rewired in following commits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both _primesmask methods are fully replaced by _sieve_range; no callers remain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pack each sieving prime's (prime, next-multiple, phase) into one tuple array so the crossing loop streams a single contiguous region; drop presieve primes and out-of-range primes at construction (_sieving_primes) to remove the per-window skip branch. Extract _presieve_fill!/_cross_window!. Restore 7,11,13,17,19 once after the loop instead of per-window. SEGMENT_CHUNKS 4096 -> 2048 (16 KiB). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace _sieve_range (which allocated the full O(hi) wheel mask) with a SegmentedSieve iterator holding only O(sqrt hi) base-prime state + one reusable SEGMENT_CHUNKS buffer. primes/primesmask/_segment_primes stream segments from it, scanning set bits with trailing_zeros (cost tracks prime count, not window width). SegmentedSieve accepts an optional precomputed base-prime list. Back-to-back @Btime vs 58ad969: primes(10^7) 18.6ms -> 9.7ms primes(10^8) 214ms -> 129ms primes(10^9) 3.58s -> 1.39s (2.6x) primesmask(10^9-10^7,10^9) 22ms -> 12.6ms Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split iterate into the no-state (first window) and stateful (subsequent) methods so the 0x1f restore lives in the first-window path instead of being re-tested on every window. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- _small_sieve_primes: single-buffer self-referential sieve for n that fits one window; _segment_primes uses it instead of recursing into SegmentedSieve, so base-prime generation no longer nests segmented sieves for small isqrt(hi). - SegmentedSieve(lo,hi,base_primes) now extends/recomputes the list when its max is below isqrt(hi); the no-base-primes constructor skips that check. - Extract _presieve_mask; _scan_segment now takes the buffer directly (reused by both the iterator consumers and the base case). - Collapse iterate back to one method with a once-per-window first-window check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Readability pass over the segmented sieve: - rename _sieving_primes -> _crossing_state (it returns crossing records, not primes), _segment_primes -> _base_primes, _small_sieve_primes -> _small_primes; drop the _segmented_sieve helper for a single SegmentedSieve constructor. - the window buffer is now a BitVector, using ordinary buf[b] / buf[b]=false and buf.chunks instead of hand-rolled _getbit/_clearbit!. - rename cryptic locals (m -> hi_val, whi -> hi_wi, jj/r/L -> widx/q/start, bps -> base_primes, seg_base -> base) and trim the verbose comments. No behavior change; 25 sieve tests pass, perf unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Interleaved @Btime sweep (min over rounds, one process to control for load): chunks primes(1e8) primes(1e9) 1024 163ms 2928ms 2048 172ms 2610ms 4096 166ms 1998ms <- chosen 8192 282ms 1888ms 16384 190ms 1904ms 32768 194ms 1811ms 2048->4096 is the big jump (~24% at 1e9); above 4096 is a flat, noisy plateau. 4096 keeps the window L1d-resident. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eachprime([lo,]hi) lazily streams primes in increasing order, backed by a SegmentedSieve it owns (single-pass, mutating cursor). The iteration state counts yielded primes. SegmentedSieve now yields an empty sieve for hi<7 (no throw), so EachPrime needs no Union. primes() drives eachprime with a presized result vector. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A clean back-to-back showed primes=collect(eachprime) is ~28% slower (per-element iterate vs the batch _each_setbit callback), so primes keeps the windowed batch collection while eachprime remains the lazy iterator; both are thin layers on SegmentedSieve. Also restore the si increment in eachprime's 2,3,5 loop (without it, an out-of-range small spun forever for lo > 2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before
_primesmask(2^30)took 2.726 seconds after it took2.358s. Although this is a relatively small improvement overall, it removes ~100% of the time for the small primes (250ms vs 30ms) which means that this will continue to show large gains once we optimize the larger primes.I've also separated sieving into it's own file since I expect the code will become more complex as we move to better sieves.
@haampie since this is essentially part 1 of #87.