Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,16 @@ build
*.log

# Temporary files
*.temp
*.temp

# Virtual environments
.venv/
venv/

# Editor/IDE
.claude/
.vscode/
.idea/

# UV lock file
uv.lock
64 changes: 64 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,67 @@ A Pure Python Raytracer by Arun Ravindran.
Watch the video tutorial: https://www.youtube.com/watch?v=KaCe63v4D_Q&list=PL8ENypDVcs3H-TxOXOzwDyCm5f2fGXlIS

Read the blog series: https://arunrocks.com/ray-tracer-in-python-1-points-in-3d-space-show-notes/

---

## AeyeOps Fork: GPU-Accelerated Edition

This fork extends Arun Ravindran's excellent raytracer tutorial with **NVIDIA GPU acceleration**, achieving up to **530x speedup** over single-core CPU rendering.

The original implementation is a masterclass in clean Python design—the well-structured separation between primitives, scenes, and the rendering engine made adding GPU support straightforward. We've preserved that clarity while adding a parallel path for high-performance rendering on modern NVIDIA hardware.

**What's new in this fork:**
- [`gpu_engine.py`](gpu_engine.py) — CUDA-accelerated rendering via Numba
- [`main_gpu.py`](main_gpu.py) — GPU entry point with benchmarking support
- Optimized for NVIDIA Blackwell (GB10) with Compute Capability 12.1
- Binary PPM output eliminates the I/O bottleneck

> Upstream PR: [arocks/puray#6](https://github.com/arocks/puray/pull/6)

---

## GPU Acceleration

The clean separation of concerns in the original code made it straightforward to add GPU acceleration. The new `gpu_engine.py` uses Numba CUDA to run ray tracing on NVIDIA GPUs.

### Performance

Tested on NVIDIA GB10 (Blackwell, Compute Capability 12.1):

| Scene | Resolution | CPU (20 cores) | GPU | Speedup |
|-------|------------|----------------|-----|---------|
| twoballs | 960x540 | 0.82s | 0.25s | 3x |
| manyballs | 1920x1080 | 14.3s | 0.32s | **45x** |

Single-core CPU baseline for manyballs: ~170s → GPU achieves **530x** speedup.

### Usage

```bash
# CPU (original)
python main.py examples.twoballs

# GPU
python main_gpu.py examples.twoballs
```

### Requirements

- NVIDIA GPU with CUDA support
- Python 3.12+
- numba >= 0.60.0
- numpy >= 1.26.0

Install with:
```bash
pip install numba numpy
```

### Key Changes

1. **Structure-of-arrays memory layout** - Scene data packed for coalesced GPU memory access
2. **Iterative ray tracing** - Replaced recursion with iteration (required for CUDA)
3. **Binary PPM output (P6)** - Eliminated the original I/O bottleneck (was 96% of runtime)
4. **fastmath compilation** - Enabled fast floating-point operations in CUDA kernels

The GPU implementation preserves the original's clarity while achieving real-time performance on modern hardware.
58 changes: 58 additions & 0 deletions examples/manyballs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""High-resolution scene with many spheres to demonstrate GPU acceleration."""
from color import Color
from light import Light
from material import ChequeredMaterial, Material
from point import Point
from sphere import Sphere
from vector import Vector
import random

# Higher resolution for GPU benchmark
WIDTH = 1920
HEIGHT = 1080
RENDERED_IMG = "manyballs.ppm"
CAMERA = Vector(0, -0.35, -1)

# Seed for reproducibility
random.seed(42)

# Generate many spheres
OBJECTS = [
# Ground Plane
Sphere(
Point(0, 10000.5, 1),
10000.0,
ChequeredMaterial(
color1=Color.from_hex("#420500"),
color2=Color.from_hex("#e6b87d"),
ambient=0.2,
reflection=0.2,
),
),
]

# Add 50 random spheres
colors = ["#FF0000", "#00FF00", "#0000FF", "#FFFF00", "#FF00FF", "#00FFFF",
"#FF8800", "#8800FF", "#00FF88", "#FF0088", "#88FF00", "#0088FF"]

for i in range(50):
x = random.uniform(-3, 3)
z = random.uniform(0.5, 8)
y = random.uniform(-0.3, 0.1)
radius = random.uniform(0.15, 0.4)
color = random.choice(colors)
reflection = random.uniform(0.2, 0.7)

OBJECTS.append(
Sphere(
Point(x, y, z),
radius,
Material(Color.from_hex(color), reflection=reflection)
)
)

LIGHTS = [
Light(Point(1.5, -0.5, -10), Color.from_hex("#FFFFFF")),
Light(Point(-0.5, -10.5, 0), Color.from_hex("#E6E6E6")),
Light(Point(-3, -2, 5), Color.from_hex("#AAAAFF")),
]
Loading