rust-vs-python-ml-bench/REVIEW.md at main · RantAI-dev/rust-vs-python-ml-bench

Project Review: Rust vs Python ML Benchmark System

Comprehensive cross-language benchmarking across four domains: Classical ML, Deep Learning (CNN/RNN), Reinforcement Learning, and LLMs.
Orchestrated with Nextflow, producing structured JSON metrics and analysis artifacts.
Dual implementations (Python/Rust) emphasizing performance, resource usage, and result quality.

Rust workspace compiles clean after dependency alignment:
- Upgraded tch to 0.14 in cnn_benchmark, rnn_benchmark, dqn_benchmark, and policy_gradient_benchmark to ensure C++17 compatibility on macOS 14.
- Classical ML crates (regression_benchmark, svm_benchmark, clustering_benchmark) compile without errors using linfa.
Smoke workflow:
- CNN, LLM, RL, RNN: green
- Python Classical ML: requires local venv and pip install -r requirements.txt, then nextflow run workflows/smoke.nf -resume.

Classical ML (Rust/Python):
- Uses linfa (Rust) and scikit-learn (Python) with standard algorithms and robust metrics (RMSE/MAE/R² for regression; Accuracy/F1/ROC where applicable for classification; clustering indices).
- Deterministic seeds and consistent data splits improve reproducibility.
Deep Learning (Rust/Python):
- CNN/RNN built on tch (Rust) and PyTorch (Python); supports common architectures and captures timing/resource metrics.
- Upgraded tch mitigates C++ standard mismatches on macOS.
Reinforcement Learning (Rust/Python):
- Rust RL crates build clean with tch; Python uses stable-baselines3.
- Consistent metrics surface (training time, mean reward, inference speed) across languages.
LLMs:
- Smoke validated on Python LLM tasks; Rust LLM crates exist in repo but are not part of the current workspace members.

Python setup (recommended):
- python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
Rust build (entire workspace):
- cargo clean && cargo build --workspace --release
Smoke workflow:
- nextflow run workflows/smoke.nf
- If Python Classical ML tasks fail due to deps: re-run with -resume after installing requirements.

Add pytest-based smoke tests that execute a minimal run per category and validate JSON schema outputs.
Tighten Rust metric calculations where simplified (e.g., some percentiles/aggregation) and add unit tests for metric helpers.
Provide a small Nextflow profile for local E2E runs and ensure container/local profiles are clearly documented.
Consider promoting Rust LLM crates into the workspace once they reach the same fidelity as other domains.

The project is in strong shape: Rust workspace builds clean, and the smoke workflow is green across CNN, LLM, RL, and RNN, with a clear remediation path for Python Classical ML via venv + requirements installation.
Documentation is aligned (USERGUIDE.md, README.md, SPECS.md) and contains actionable guidance for users to reproduce results.