The only tool that does not just translate your code. It tells you if it is safe to migrate, proves it compiles, and opens the Pull Request.
Migrating CUDA code to AMD ROCm/HIP is not a find-and-replace operation. It is an engineering decision that requires:
- Static risk analysis across thousands of lines of code
- Compile-time validation on the target hardware
- Runtime numerical proof that results match the original
- A structured audit trail for engineering sign-off
Most tools give you a diff. WarpShift gives you a verdict.
Git Repository URL --> [4-Stage Pipeline] --> Migration Decision + Proof
| Stage | What Happens | Output |
|---|---|---|
| 1. HIPIFY Conversion | Runs hipify-clang or hipify-perl on all .cu / .cuh files |
Converted HIP source + diff |
| 2. Static Analysis | Scans for 15+ known CUDA to ROCm incompatibility patterns | Risk report (HIGH / MED / LOW) |
| 3. Runtime Validation | Compiles with hipcc, runs the binary, validates numerical output |
Build status + ms/iter timing |
| 4. Agent Reasoning Layer | Gemini 2.5 Flash detects what HIPIFY missed, writes the fixes, and generates the PR reasoning | Actionable insights per issue |
Final output: PROCEED or DO NOT MIGRATE YET with full evidence.
- Docker >= 24.x
- Node.js >= 18.x
- Python >= 3.11
- GitHub CLI (
gh) for real PR creation (optional) - An OpenAI-compatible API key for live AI insights (optional)
git clone https://github.com/diegosantdev/warpshift.git
cd warpshiftCreate a .env file in the project root:
GOOGLE_AI_API_KEY=your_google_ai_key_here
# Get your key at: aistudio.google.com
# Execution Mode
MIGRATEAI_BACKEND_MODE=real # "mock" for safe demo, "real" for full pipeline
WARPSHIFT_EXECUTION_MODE=host # "host" or "docker" for sandboxed execution
# GitHub PR Automation (requires gh auth login)
GITHUB_REAL_PR=false # set to "true" to enable real PR creation
GITHUB_DEFAULT_BASE_BRANCH=maincd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run devOpen http://localhost:3000, paste any CUDA repository URL and click MIGRATE.
+----------------------------------------------------------------------+
| WarpShift Engine |
| |
| +-----------+ +------------+ +------------+ +------------+ |
| | HIPIFY |-->| Static |-->| Runtime |-->| AI | |
| | Stage | | Analysis | | Build | | Insights | |
| +-----------+ +------------+ +------------+ +------------+ |
| | | | | |
| +----------------+----------------+----------------+ |
| | |
| v |
| +------------------------------------------------+ |
| | Evidence Engine (evidence.json) | |
| | run_id commit stage_logs risk_items | |
| | build_status validation_result ai_insights | |
| +------------------------------------------------+ |
| | |
| +---------------+---------------+ |
| v v |
| +---------------------+ +----------------------+ |
| | Decision OS | | GitHub PR Flow | |
| | PROCEED / DO NOT | | branch + push + pr | |
| +---------------------+ +----------------------+ |
+----------------------------------------------------------------------+
warpshift/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI routes + SSE streaming
│ │ ├── pipeline.py # 4-stage orchestration engine
│ │ ├── stages.py # Stage logic + LLM call
│ │ ├── schemas.py # Pydantic models
│ │ ├── settings.py # Environment configuration
│ │ └── real_anchor.py # Reference artifact computation
│ ├── docker_executor.py # Isolated Docker sandbox runner
│ └── scripts/
│ └── docker_entrypoint.py # Container-side executor
├── frontend/
│ └── app/
│ ├── page.tsx # Decision OS UI
│ └── globals.css # Glassmorphism design system
├── data/
│ └── benchmark_sample/
│ ├── main.cu # SAXPY benchmark (CUDA/HIP dual-mode)
│ └── Makefile
└── Dockerfile # AMD ROCm-ready sandbox image
WarpShift bundles a real GPU benchmark (data/benchmark_sample/main.cu) that:
- Allocates 1M float elements on GPU
- Runs 100 iterations of SAXPY:
Y[i] = 2.0 * X[i] + Y[i] - Validates numerical correctness against CPU-computed expected values (
maxError < 1e-5) - Reports wall-clock time in ms/iter
[WARPSHIFT_BENCHMARK] time_ms=0.231
[WARPSHIFT_VALIDATION] status=SUCCESS
This is parsed automatically and surfaced in the SAXPY Benchmark (GPU Validated) tab in the UI.
The same kernel compiles with both nvcc (CUDA) and hipcc (ROCm) via a compile-time flag:
#ifndef __HIP_PLATFORM_AMD__
#include <cuda_runtime.h>
#else
#include <hip/hip_runtime.h>
#endifWarpShift was built and validated to run on AMD Instinct MI300X hardware.
# SSH into your AMD Developer Cloud instance, then:
git clone https://github.com/diegosantdev/warpshift.git
cd warpshift
export MIGRATEAI_BACKEND_MODE=real
export MIGRATEAI_LLM_API_KEY=your_key
export WARPSHIFT_EXECUTION_MODE=host
cd backend
pip install -r requirements.txt
nohup uvicorn app.main:app --host 0.0.0.0 --port 8000 &
# On your local machine, tunnel the ports:
ssh -L 3000:localhost:3000 -L 8000:localhost:8000 user@amd-cloud-ipThen open http://localhost:3000 on your local browser. The analysis runs on AMD silicon.
docker build -t warpshift-runner:latest .
docker run --rm \
-e WARPSHIFT_EXECUTION_MODE=docker \
-e MIGRATEAI_BACKEND_MODE=real \
-v /tmp/workspace:/workspace \
warpshift-runner:latest| Method | Endpoint | Description |
|---|---|---|
POST |
/analyze |
Run full migration analysis |
GET |
/analyze/stream?github_url=... |
Real-time SSE stream of stage progress |
GET |
/runs/{run_id} |
Retrieve full evidence JSON for a run |
POST |
/runs/{run_id}/create-pr |
Create real GitHub PR with converted code |
GET |
/history |
List of past analysis runs |
GET |
/demo-repos |
Curated demo repository candidates |
POST |
/export/risk-report |
Export risk report as JSON or Markdown |
GET |
/anchor/status |
Reference artifact validation status |
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"github_url": "https://github.com/NVIDIA/cuda-samples", "mode": "live"}'const source = new EventSource(
'http://localhost:8000/analyze/stream?github_url=https://github.com/NVIDIA/cuda-samples'
);
source.addEventListener('stage_update', (e) => console.log(JSON.parse(e.data)));
source.addEventListener('completed', (e) => console.log('Done!', JSON.parse(e.data)));Perfect for a live 2-minute demonstration.
- Open
http://localhost:3000 - Paste
https://github.com/NVIDIA/cuda-samples/tree/master/Samples/0_Introduction/matrixMulin the URL field - Click
MIGRATEand watch the 4 stages run in real time (20 to 35 seconds) - Open the Risk Report tab and point out HIGH risks with their detection sources
- Open the SAXPY Benchmark tab and show numerical validation status and ms/iter timing
- Show the Decision Banner:
PROCEED WITH CAUTIONorDO NOT MIGRATE YET
Total elapsed: under 90 seconds. Full end-to-end migration decision with proof.
WarpShift's static analysis engine scans for:
| Risk | Severity | Detection |
|---|---|---|
Hardcoded warpSize = 32 |
HIGH | Static scan |
cuBLAS argument ordering |
HIGH | Dependency scan |
| Dynamic kernel launches | HIGH | AST pattern |
cuDNN custom ops |
MEDIUM | Dependency scan |
| Texture memory usage | MEDIUM | Static scan |
__device__ function pointers |
MEDIUM | AST pattern |
| PTX inline assembly | HIGH | Static scan |
| Cooperative groups | MEDIUM | Dependency scan |
cudaGraph / CUDA Graphs |
MEDIUM | Static scan |
| Thrust algorithms | LOW | Dependency scan |
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
git checkout -b feature/my-improvement
git commit -m "feat: describe your change"
git push origin feature/my-improvementMIT License. Copyright 2026 Diego Sant.
Built for the AMD Developer Cloud Hackathon.
Built to make AMD ROCm adoption dead simple.
WarpShift - from CUDA to HIP in under 90 seconds.



