Python vs Rust Performance Benchmark

Questions This Project Answers

Question	Answer
Why is Python slow?	The Python interpreter performs dynamic type checks and bytecode dispatch on every loop iteration and function call. The same logic takes 24–84x longer than Rust.
Is it still slow with external libraries?	No. When heavy computation is delegated to native libraries like NumPy, Python matches or even exceeds Rust. Python's "slowness" is limited to interpreter-level loops and function calls — once the work is handed off to C/Fortran, the overhead disappears.

A project that quantitatively verifies why "Python is slow" by comparing execution times of identical algorithms implemented in Python and Rust.

Phases

Phase	Topic	Status
1	Fibonacci (Recursive)	✅ Done
2	Sieve of Eratosthenes	✅ Done
3	Matrix Multiplication (N×N)	✅ Done
4	Palindrome Check	✅ Done

Prerequisites

Python 3.10+
Rust (via rustup): https://rustup.rs/
Visual Studio Build Tools 2022 (MSVC linker)

Setup

# Install Python dependencies
pip install -r requirements.txt

How to Run

Run All (Recommended)

# From PowerShell
powershell -ExecutionPolicy Bypass -File run_benchmark.ps1

# Or double-click run.bat
run.bat

Run Individually

# Pure Python
python python/fibonacci.py
python python/sieve.py
python python/matrix.py
python python/palindrome.py

# Python + Optimized (fix/)
python python/fix/fibonacci_lru.py
python python/fix/sieve_numpy.py
python python/fix/matrix_numpy.py
python python/fix/palindrome_regex.py
python python/fix/comparison_all.py    # All-in-one comparison

# Rust (workspace build)
cd rust
cargo build --release
./target/release/fibonacci.exe
./target/release/sieve.exe
./target/release/matrix.exe
./target/release/palindrome.exe

Project Structure

python-vs-rust-benchmark/
├── docs/
│   ├── 要件定義書.md              # Requirements (Japanese)
│   └── environment/
│       └── DxDiag.txt             # Hardware environment info
├── python/
│   ├── fibonacci.py               # Phase 1: Fibonacci (recursive)
│   ├── sieve.py                   # Phase 2: Sieve of Eratosthenes
│   ├── matrix.py                  # Phase 3: Matrix multiplication
│   ├── palindrome.py              # Phase 4: Palindrome check
│   └── fix/                       # Optimized versions (acceleration proof)
│       ├── comparison_all.py      # All-in-one comparison benchmark
│       ├── fibonacci_lru.py       # lru_cache version
│       ├── sieve_numpy.py         # NumPy version
│       ├── matrix_numpy.py        # NumPy version
│       └── palindrome_regex.py    # Slice version
├── rust/
│   ├── Cargo.toml                 # Workspace definition
│   ├── fibonacci/                 # Phase 1
│   ├── sieve/                     # Phase 2
│   ├── matrix/                    # Phase 3
│   └── palindrome/                # Phase 4
├── results/                       # Benchmark result logs
├── requirements.txt               # Python dependencies
├── run_benchmark.ps1              # Benchmark runner (PowerShell)
├── run.bat                        # Double-click to run
└── README.md                      # Documentation (Japanese)

Benchmark Descriptions

Phase 1: Fibonacci (Recursive)

The Fibonacci sequence adds two previous numbers to produce the next (0, 1, 1, 2, 3, 5, 8, 13...). We use naive recursion (no memoization) to compute fib(40), which triggers ~330 million function calls.

fib(5) = fib(4) + fib(3)
       = (fib(3) + fib(2)) + (fib(2) + fib(1))
       = ... calls explode exponentially

What it measures: Per-function-call overhead (frame object creation, stack operations)

Phase 2: Sieve of Eratosthenes

A prime number is a natural number greater than 1 that is divisible only by 1 and itself (2, 3, 5, 7, 11, 13...).

The Sieve of Eratosthenes is an ancient algorithm devised by the Greek mathematician Eratosthenes for efficiently finding all primes up to a given limit. It works by repeatedly "sieving out" composite numbers (non-primes), leaving only primes behind.

How the Algorithm Works

Create a list of all numbers from 2 to N, initially marked as "possibly prime"
The smallest number (2) is confirmed prime → cross out all multiples of 2 (4, 6, 8...)
The next remaining number (3) is confirmed prime → cross out all multiples of 3 (9, 12, 15...)
The next remaining number (5) is confirmed prime → cross out all multiples of 5 (25, 30, 35...)
Repeat until √N — all remaining numbers are prime

Initial:          [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...]

Cross out 2's:    [2, 3, ×, 5, ×, 7, ×, 9,  ×, 11,  ×, 13,  ×, 15...]
Cross out 3's:    [2, 3, ×, 5, ×, 7, ×, ×,  ×, 11,  ×, 13,  ×,  ×...]
Cross out 5's:    [2, 3, ×, 5, ×, 7, ×, ×,  ×, 11,  ×, 13,  ×,  ×...]
→ Remaining: 2, 3, 5, 7, 11, 13... are prime

In this benchmark, the limit is set to 10 million, finding 664,579 primes. The algorithm is dominated by array reads/writes and loops, making the per-iteration interpreter overhead and array access cost clearly visible.

What it measures: Per-loop-iteration cost, array read/write performance

Phase 3: Matrix Multiplication (N×N)

Multiplies two 200×200 matrices using a triple-nested loop (8,000,000 floating-point operations). The most CPU-intensive benchmark.

C[i][j] = A[i][0]*B[0][j] + A[i][1]*B[1][j] + ... + A[i][199]*B[199][j]
→ Computed for all 40,000 elements

What it measures: Raw numerical throughput, accumulated overhead in nested loops

Phase 4: Palindrome Check

Checks whether 1 million 100-character strings are palindromes by comparing characters from both ends.

"abcba" → a==a, b==b → palindrome
"abcde" → a!=e → not a palindrome

What it measures: String object creation/access cost, GC (garbage collection) impact

Results

Full Comparison (3-way)

Phase	Topic	Pure Python	Python + C	Rust	Pure vs Rust
1	Fibonacci (N=40)	12,000 ms	0.05 ms (lru_cache)	216 ms	55x
2	Sieve (limit 10M)	903 ms	20 ms (NumPy)	21 ms	43x
3	Matrix (200×200)	478 ms	0.77 ms (NumPy)	6 ms	84x
4	Palindrome (1M)	1,021 ms	171 ms (slice)	43 ms	24x

Analysis

Phase 1: Fibonacci (55x slower)

Root cause: Function call overhead
Python creates a frame object and performs dynamic type checks on every recursive call
Rust completes function calls with stack operations alone

Phase 2: Sieve (43x slower)

Root cause: Interpreter loop overhead
Python pays a bytecode dispatch cost per instruction
Array (list) access goes through object references in Python

Phase 3: Matrix (84x slower) — Largest gap

Root cause: Accumulated cost in CPU-dense numerical computation
8,000,000 floating-point operations, each requiring type checks
Rust compiler may apply auto-vectorization (SIMD)

Phase 4: Palindrome (24x slower) — Smallest gap

Root cause: String access and object manipulation
Python's string index access is relatively optimized at the C level
Still 24x slower due to reference counting and GC overhead

Conclusion

Pure Python is 24–84x slower than Rust for identical logic. The gap grows with more loops and numerical computation.

This is an inherent characteristic of Python's dynamic typing and interpreter execution model — unavoidable without native library delegation.

Supplementary: Pure Python vs Optimized Versions

Proof that "Python is slow" only applies to pure Python code. With optimization techniques (NumPy, lru_cache, slicing), performance improves dramatically.

Results

Topic	Pure Python	Optimized	Speedup	Method Used
Fibonacci	11,809 ms	0.03 ms	468,595x	functools.lru_cache
Sieve	894 ms	19 ms	47x	NumPy array slicing
Matrix	465 ms	12 ms	39x	numpy.matmul (BLAS)
Palindrome	1,003 ms	143 ms	7x	String slice `s[::-1]`

Rust vs Python+C Comparison

Topic	Rust	Python+C	Gap	Notes
Fibonacci	216 ms	0.05 ms	Python+C 4,000x faster	※Algorithm change (memoization), not a pure language comparison
Sieve	19 ms	19 ms	Nearly identical	NumPy array slicing delegates loops to C layer
Matrix	6 ms	0.77 ms	Python+C 8x faster	NumPy's internal BLAS (OpenBLAS) uses SIMD + multithreading
Palindrome	43 ms	143 ms	Rust 3x faster	Python's for-loop remains even with slice optimization

⚠️ Note on Fibonacci: The lru_cache version changes the time complexity from O(2^n) to O(n) through memoization. It is faster because of algorithmic improvement, not because of the optimization technique itself. Rust with the same memoization would achieve similar speed. The other 3 benchmarks are pure same-algorithm comparisons.

Key Takeaway

Pure Python is indeed slow (24–84x slower than Rust)
With proper optimization, Python can match or even exceed Rust's naive implementation
NumPy's matrix operations call BLAS (C/Fortran) internally, rivaling optimized native code
"Python is slow" really means the Python interpreter's loops and function calls are slow — when heavy computation is delegated to native libraries, performance is not an issue

Environment

OS: Windows 11 Pro
CPU: AMD Ryzen 7 7735HS with Radeon Graphics
RAM: 30 GB
Python: 3.13.12
Rust: 1.96.0 (ac68faa20 2026-05-25)
Build: cargo build --release (optimized)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python vs Rust Performance Benchmark

Questions This Project Answers

Phases

Prerequisites

Setup

How to Run

Run All (Recommended)

Run Individually

Project Structure

Benchmark Descriptions

Phase 1: Fibonacci (Recursive)

Phase 2: Sieve of Eratosthenes

How the Algorithm Works

Phase 3: Matrix Multiplication (N×N)

Phase 4: Palindrome Check

Results

Full Comparison (3-way)

Analysis

Phase 1: Fibonacci (55x slower)

Phase 2: Sieve (43x slower)

Phase 3: Matrix (84x slower) — Largest gap

Phase 4: Palindrome (24x slower) — Smallest gap

Conclusion

Supplementary: Pure Python vs Optimized Versions

Results

Rust vs Python+C Comparison

Key Takeaway

Environment

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Python vs Rust Performance Benchmark

Questions This Project Answers

Phases

Prerequisites

Setup

How to Run

Run All (Recommended)

Run Individually

Project Structure

Benchmark Descriptions

Phase 1: Fibonacci (Recursive)

Phase 2: Sieve of Eratosthenes

How the Algorithm Works

Phase 3: Matrix Multiplication (N×N)

Phase 4: Palindrome Check

Results

Full Comparison (3-way)

Analysis

Phase 1: Fibonacci (55x slower)

Phase 2: Sieve (43x slower)

Phase 3: Matrix (84x slower) — Largest gap

Phase 4: Palindrome (24x slower) — Smallest gap

Conclusion

Supplementary: Pure Python vs Optimized Versions

Results

Rust vs Python+C Comparison

Key Takeaway

Environment