Skip to content

faster prime sieve part 1#120

Open
oscardssmith wants to merge 18 commits into
mainfrom
oscardssmith-faster-sieve-part-1
Open

faster prime sieve part 1#120
oscardssmith wants to merge 18 commits into
mainfrom
oscardssmith-faster-sieve-part-1

Conversation

@oscardssmith

Copy link
Copy Markdown
Member

Before _primesmask(2^30) took 2.726 seconds after it took 2.358s. Although this is a relatively small improvement overall, it removes ~100% of the time for the small primes (250ms vs 30ms) which means that this will continue to show large gains once we optimize the larger primes.

I've also separated sieving into it's own file since I expect the code will become more complex as we move to better sieves.
@haampie since this is essentially part 1 of #87.

@oscardssmith oscardssmith changed the title Oscardssmith faster sieve part 1 faster prime sieve part 1 Jun 13, 2022
@oscardssmith oscardssmith reopened this Jul 3, 2023
@oscardssmith oscardssmith force-pushed the oscardssmith-faster-sieve-part-1 branch from 10613fb to 58ad969 Compare June 16, 2026 19:45
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.78788% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.67%. Comparing base (c5b95b9) to head (60a3197).

Files with missing lines Patch % Lines
src/sieve.jl 98.73% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #120      +/-   ##
==========================================
+ Coverage   94.38%   94.67%   +0.28%     
==========================================
  Files           2        3       +1     
  Lines         463      563     +100     
==========================================
+ Hits          437      533      +96     
- Misses         26       30       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

oscardssmith and others added 13 commits June 16, 2026 16:41
Split sieve tests into test/sieve_tests.jl for a fast dev loop; runtests.jl
includes it. Tests cross-check primes/primesmask against an isprime-based
reference across small, window-crossing, and boundary ranges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_sieve_range crosses off primes in cache-sized 64-bit-chunk windows, resuming
each base prime's (next-multiple, phase) state across windows. _segment_primes
collects base primes recursively. Added alongside the old _primesmask; callers
are rewired in following commits.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both _primesmask methods are fully replaced by _sieve_range; no callers remain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pack each sieving prime's (prime, next-multiple, phase) into one tuple array so
the crossing loop streams a single contiguous region; drop presieve primes and
out-of-range primes at construction (_sieving_primes) to remove the per-window
skip branch. Extract _presieve_fill!/_cross_window!. Restore 7,11,13,17,19 once
after the loop instead of per-window. SEGMENT_CHUNKS 4096 -> 2048 (16 KiB).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace _sieve_range (which allocated the full O(hi) wheel mask) with a
SegmentedSieve iterator holding only O(sqrt hi) base-prime state + one reusable
SEGMENT_CHUNKS buffer. primes/primesmask/_segment_primes stream segments from it,
scanning set bits with trailing_zeros (cost tracks prime count, not window width).
SegmentedSieve accepts an optional precomputed base-prime list.

Back-to-back @Btime vs 58ad969:
  primes(10^7)   18.6ms -> 9.7ms
  primes(10^8)   214ms  -> 129ms
  primes(10^9)   3.58s  -> 1.39s   (2.6x)
  primesmask(10^9-10^7,10^9)  22ms -> 12.6ms

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split iterate into the no-state (first window) and stateful (subsequent) methods
so the 0x1f restore lives in the first-window path instead of being re-tested on
every window.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- _small_sieve_primes: single-buffer self-referential sieve for n that fits one
  window; _segment_primes uses it instead of recursing into SegmentedSieve, so
  base-prime generation no longer nests segmented sieves for small isqrt(hi).
- SegmentedSieve(lo,hi,base_primes) now extends/recomputes the list when its max
  is below isqrt(hi); the no-base-primes constructor skips that check.
- Extract _presieve_mask; _scan_segment now takes the buffer directly (reused by
  both the iterator consumers and the base case).
- Collapse iterate back to one method with a once-per-window first-window check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Readability pass over the segmented sieve:
- rename _sieving_primes -> _crossing_state (it returns crossing records, not
  primes), _segment_primes -> _base_primes, _small_sieve_primes -> _small_primes;
  drop the _segmented_sieve helper for a single SegmentedSieve constructor.
- the window buffer is now a BitVector, using ordinary buf[b] / buf[b]=false and
  buf.chunks instead of hand-rolled _getbit/_clearbit!.
- rename cryptic locals (m -> hi_val, whi -> hi_wi, jj/r/L -> widx/q/start, bps ->
  base_primes, seg_base -> base) and trim the verbose comments.

No behavior change; 25 sieve tests pass, perf unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Interleaved @Btime sweep (min over rounds, one process to control for load):
  chunks   primes(1e8)  primes(1e9)
   1024       163ms        2928ms
   2048       172ms        2610ms
   4096       166ms        1998ms   <- chosen
   8192       282ms        1888ms
  16384       190ms        1904ms
  32768       194ms        1811ms
2048->4096 is the big jump (~24% at 1e9); above 4096 is a flat, noisy plateau.
4096 keeps the window L1d-resident.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eachprime([lo,]hi) lazily streams primes in increasing order, backed by a
SegmentedSieve it owns (single-pass, mutating cursor). The iteration state counts
yielded primes. SegmentedSieve now yields an empty sieve for hi<7 (no throw), so
EachPrime needs no Union. primes() drives eachprime with a presized result vector.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A clean back-to-back showed primes=collect(eachprime) is ~28% slower (per-element
iterate vs the batch _each_setbit callback), so primes keeps the windowed batch
collection while eachprime remains the lazy iterator; both are thin layers on
SegmentedSieve. Also restore the si increment in eachprime's 2,3,5 loop (without it,
an out-of-range small spun forever for lo > 2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant