Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
619fb97
docs: add PRD for zero-boilerplate flash run experience
deanq Feb 18, 2026
db8908b
feat(client,scanner): LB route handler passthrough and path-aware dis…
deanq Feb 18, 2026
e583886
refactor(manifest): remove mothership dead code, flat resource structure
deanq Feb 18, 2026
35cfa6e
feat(run): file-system-as-namespace dev server generation
deanq Feb 18, 2026
5776af6
feat(build,lb_handler_generator): invoke LB handler generator, rglob …
deanq Feb 18, 2026
4fb94b5
fix(serverless): resolve flash run runtime bugs
deanq Feb 18, 2026
8a0e026
fix(run): hot-reload regenerates server.py on route changes
deanq Feb 18, 2026
e40930d
fix(run): suppress watchfiles debug logs from flash run output
deanq Feb 18, 2026
22ccee5
fix(run): omit body param from GET/HEAD route handlers
deanq Feb 18, 2026
adeb9c3
feat(run): proxy LB routes to deployed endpoints, restore --auto-prov…
deanq Feb 19, 2026
be2e65e
fix(run): add project root to sys.path during resource discovery
deanq Feb 19, 2026
417f9ec
feat(run): show resource count and elapsed time during cleanup
deanq Feb 19, 2026
e264ab1
fix(run): route LB calls through LoadBalancerSlsStub instead of HTTP …
deanq Feb 19, 2026
91dcc15
fix(run): revert LB to remote dispatch, remove QB /run route
deanq Feb 19, 2026
d15b51e
refactor(init): simplify skeleton to flat worker files for flash run
deanq Feb 19, 2026
93edabd
Merge branch 'main' into refactor/ae-2210-simplified-starter
deanq Feb 19, 2026
ef24356
fix(run): handle numeric-prefixed directories in server codegen
deanq Feb 19, 2026
ff7f0eb
fix(ci): update validate-wheel.sh for flat skeleton template
deanq Feb 20, 2026
89b45d7
fix: address PR 208 review feedback
deanq Feb 20, 2026
d168633
docs: remove coordinator/hub-spoke terminology, all endpoints are peers
deanq Feb 20, 2026
2b4bc55
Merge branch 'refactor/ae-2210-simplified-starter' of https://github.…
deanq Feb 20, 2026
dd22a72
docs: remove remaining mothership references from documentation
deanq Feb 20, 2026
80da273
refactor: replace flash run codegen with programmatic flash dev server
KAJdev Feb 20, 2026
c7cb8dd
refactor: rename flash run to flash dev, update all docs
KAJdev Feb 20, 2026
5013185
format
KAJdev Feb 20, 2026
6636bdf
fix: remove unused import flagged by ruff
KAJdev Feb 20, 2026
89c81b3
fix: clean up env var leaks in tests, remove em-dash
KAJdev Feb 20, 2026
b04c436
fix claudes generated doc changes lol
KAJdev Feb 20, 2026
9ecc6ab
merge: resolve conflicts with main, add flash run as legacy alias for…
KAJdev Feb 20, 2026
5004264
Merge branch 'main' into refactor/flash-dev-no-codegen
KAJdev Feb 20, 2026
ad0356d
feat: auto-select next available port when default is in use (#213)
KAJdev Feb 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

* **docs:** change idleTimeout from minutes to seconds ([#205](https://github.com/runpod/flash/issues/205)) ([51693c7](https://github.com/runpod/flash/commit/51693c7e2dd0c9d803f3c49de1d0009ded285d5d))
* prevent false deployment attempts in Flash environments ([#192](https://github.com/runpod/flash/issues/192)) ([f07c9fb](https://github.com/runpod/flash/commit/f07c9fb92003d4603fbf8cdc17b956c368009353))
* **runtime:** restore on-demand provisioning for flash run ([#206](https://github.com/runpod/flash/issues/206)) ([5859f4b](https://github.com/runpod/flash/commit/5859f4b78476a070db2100b689dfd94caf5fc93f))
* **runtime:** restore on-demand provisioning for flash dev ([#206](https://github.com/runpod/flash/issues/206)) ([5859f4b](https://github.com/runpod/flash/commit/5859f4b78476a070db2100b689dfd94caf5fc93f))


### Code Refactoring
Expand Down Expand Up @@ -189,7 +189,7 @@
### Features

* AE-1512: deploy() and undeploy() deployable resources directly ([#126](https://github.com/runpod/runpod-flash/issues/126)) ([3deac3a](https://github.com/runpod/runpod-flash/commit/3deac3a91b84fa4cf07cf553c46431907290a61c))
* **cli:** Add --auto-provision flag to flash run command ([#125](https://github.com/runpod/runpod-flash/issues/125)) ([ee5793c](https://github.com/runpod/runpod-flash/commit/ee5793c33537acc15e26b680e3bac5aedb3c0735))
* **cli:** Add --auto-provision flag to flash dev command ([#125](https://github.com/runpod/runpod-flash/issues/125)) ([ee5793c](https://github.com/runpod/runpod-flash/commit/ee5793c33537acc15e26b680e3bac5aedb3c0735))


### Code Refactoring
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Get your API key from: https://docs.runpod.io/get-started/api-keys
- Integration tests that interact with Runpod API

**When is the API key NOT needed?**
- Local development with `flash run` (local server only)
- Local development with `flash dev` (local server only)
- `flash init` command (project scaffolding)
- Unit tests (mocked API calls)
- Code formatting, linting, type checking
Expand Down
302 changes: 302 additions & 0 deletions PRD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
# Flash SDK: Zero-Boilerplate Experience — Product Requirements Document

## 1. Problem Statement

Flash currently forces every project into a FastAPI-first model:

- Users must create `main.py` with a `FastAPI()` instance
- HTTP routing boilerplate adds no semantic value — the routes simply call `@remote` functions
- No straightforward path for deploying a standalone QB function without wrapping it in a FastAPI app
- The "mothership" concept introduces an implicit coordinator with no clear ownership model
- `flash dev` fails unless `main.py` exists with a FastAPI app, blocking the simplest use cases

## 2. Goals

- **Zero boilerplate**: a `@remote`-decorated function in any `.py` file is sufficient for `flash dev` and `flash deploy`
- **File-system-as-namespace**: the project directory structure maps 1:1 to URL paths on the local dev server
- **Single command**: `flash dev` works for all project topologies (one QB function, many files, mixed QB+LB) without any configuration
- **`flash deploy` requires no additional configuration** beyond the `@remote` declarations themselves
- **Peer endpoints**: every `@resource_config` is a first-class endpoint; no implicit coordinator

## 3. Non-Goals

- No backward compatibility with `main.py`/FastAPI-first style
- No implicit "mothership" concept; all endpoints are peers
- No changes to the QB runtime (`generic_handler.py`) or QB stub behavior
- No changes to deployed endpoint behavior (RunPod QB/LB APIs are unchanged)

## 4. Developer Experience Specification

### 4.1 Minimum viable QB project

```python
# gpu_worker.py
from runpod_flash import LiveServerless, GpuGroup, remote

gpu_config = LiveServerless(name="gpu_worker", gpus=[GpuGroup.ANY])

@remote(gpu_config)
async def process(input_data: dict) -> dict:
return {"result": "processed", "input": input_data}
```

`flash dev` → `POST /gpu_worker/run_sync`
`flash deploy` → standalone QB endpoint at `api.runpod.ai/v2/{id}/run`

### 4.2 LB endpoint

```python
# api/routes.py
from runpod_flash import CpuLiveLoadBalancer, remote

lb_config = CpuLiveLoadBalancer(name="api_routes")

@remote(lb_config, method="POST", path="/compute")
async def compute(input_data: dict) -> dict:
return {"result": input_data}
```

`flash dev` → `POST /api/routes/compute`
`flash deploy` → LB endpoint at `{id}.api.runpod.ai/compute`

### 4.3 Mixed QB + LB (LB calling QB)

```python
# api/routes.py (LB)
from runpod_flash import CpuLiveLoadBalancer, remote
from workers.gpu import heavy_compute # QB stub

lb_config = CpuLiveLoadBalancer(name="api_routes")

@remote(lb_config, method="POST", path="/process")
async def process_route(data: dict):
return await heavy_compute(data) # dispatches to QB endpoint

# workers/gpu.py (QB)
from runpod_flash import LiveServerless, GpuGroup, remote

gpu_config = LiveServerless(name="gpu_worker", gpus=[GpuGroup.ANY])

@remote(gpu_config)
async def heavy_compute(data: dict) -> dict: ...
```

## 5. URL Path Specification

### 5.1 File prefix derivation

The local dev server uses the project directory structure as a URL namespace. Each file's URL prefix is its path relative to the project root with `.py` stripped:

```
File Local URL prefix
────────────────────────────── ────────────────────────────
gpu_worker.py → /gpu_worker
longruns/stage1.py → /longruns/stage1
preprocess/first_pass.py → /preprocess/first_pass
workers/gpu/inference.py → /workers/gpu/inference
```

### 5.2 QB route generation

| Condition | Routes |
|---|---|
| One `@remote` function in file | `POST {file_prefix}/run` and `POST {file_prefix}/run_sync` |
| Multiple `@remote` functions in file | `POST {file_prefix}/{fn_name}/run` and `POST {file_prefix}/{fn_name}/run_sync` |

### 5.3 LB route generation

| Condition | Route |
|---|---|
| `@remote(lb_config, method="POST", path="/compute")` | `POST {file_prefix}/compute` |

The declared `path=` is appended to the file prefix. The `method=` determines the HTTP verb.

### 5.4 QB request/response envelope

Mirrors RunPod's API for consistency:

```
POST /gpu_worker/run_sync
Body: {"input": {"key": "value"}}
Response: {"id": "uuid", "status": "COMPLETED", "output": {...}}
```

## 6. Deployed Topology Specification

Each unique resource config gets its own RunPod endpoint:

| Type | Deployed URL | Example |
|---|---|---|
| QB | `https://api.runpod.ai/v2/{endpoint_id}/run` | `https://api.runpod.ai/v2/uoy3n7hkyb052a/run` |
| QB sync | `https://api.runpod.ai/v2/{endpoint_id}/run_sync` | |
| LB | `https://{endpoint_id}.api.runpod.ai/{declared_path}` | `https://rzlk6lph6gw7dk.api.runpod.ai/compute` |

## 7. `.flash/` Folder Specification

The `.flash/` directory is used by `flash build` for build artifacts (e.g. `manifest.json`). The `flash dev` command does not create or use `.flash/` at all.

```
my_project/
├── gpu_worker.py
├── longruns/
│ └── stage1.py
└── .flash/
└── manifest.json ← generated by flash build
```

### 7.1 Dev server launch

Uvicorn is invoked with the `--factory` flag pointing to the app factory function, and the project root is passed via the `FLASH_PROJECT_ROOT` environment variable:

```bash
uvicorn --factory runpod_flash.cli.commands._dev_server:create_app \
--reload \
--reload-dir .
```

## 8. `flash dev` Behavior

1. Scan project for all `@remote` functions (QB and LB) in any `.py` file
- Skip: `.flash/`, `__pycache__`, `*.pyc`, `__init__.py`
2. If none found: print error with usage instructions, exit 1
3. Build FastAPI app programmatically via `_dev_server.create_app()`
4. Start uvicorn with `--factory` and `--reload` watching the project root
5. Print startup table: local paths, resource names, types
6. Swagger UI available at `http://localhost:{port}/docs`
7. On exit (Ctrl+C or SIGTERM): deprovision all Live Serverless endpoints provisioned during this session

### 8.1 Startup table format

```
Flash Dev Server http://localhost:8888

Local path Resource Type
────────────────────────────────── ─────────────────── ────
POST /gpu_worker/run gpu_worker QB
POST /gpu_worker/run_sync gpu_worker QB
POST /longruns/stage1/run longruns_stage1 QB
POST /preprocess/first_pass/compute preprocess_first_pass LB

Visit http://localhost:8888/docs for Swagger UI
```

## 9. `flash build` Behavior

1. Scan project for all `@remote` functions (QB and LB)
2. Build `.flash/manifest.json` with flat resource structure (see §10)
3. For LB resources: generate deployed handler files using `module_path`
4. Package build artifact

## 10. Manifest Structure

Resource names are derived from file paths (slashes → underscores):

```json
{
"version": "1.0",
"project_name": "my_project",
"resources": {
"gpu_worker": {
"resource_type": "LiveServerless",
"file_path": "gpu_worker.py",
"local_path_prefix": "/gpu_worker",
"module_path": "gpu_worker",
"functions": ["gpu_hello"],
"is_load_balanced": false,
"makes_remote_calls": false
},
"longruns_stage1": {
"resource_type": "LiveServerless",
"file_path": "longruns/stage1.py",
"local_path_prefix": "/longruns/stage1",
"module_path": "longruns.stage1",
"functions": ["stage1_process"],
"is_load_balanced": false,
"makes_remote_calls": false
},
"preprocess_first_pass": {
"resource_type": "CpuLiveLoadBalancer",
"file_path": "preprocess/first_pass.py",
"local_path_prefix": "/preprocess/first_pass",
"module_path": "preprocess.first_pass",
"functions": [
{"name": "first_pass_fn", "http_method": "POST", "http_path": "/compute"}
],
"is_load_balanced": true,
"makes_remote_calls": true
}
}
}
```

## 11. Dev Server App Structure

The dev server is built programmatically by `_dev_server.create_app()`. User modules are imported via `importlib.import_module()` and routes are registered with `app.add_api_route()`. Tracebacks point directly to the original source files.

Conceptual equivalent of the generated app:

```python
app = FastAPI(title="Flash Dev Server")

# QB: gpu_worker.py - imported via importlib, route added via add_api_route
# POST /gpu_worker/run_sync -> calls gpu_hello(body["input"])

# QB: longruns/stage1.py
# POST /longruns/stage1/run_sync -> calls stage1_process(body["input"])

# LB: preprocess/first_pass.py
# POST /preprocess/first_pass/compute -> calls lb_execute(config, first_pass_fn, body)

# Health
# GET / -> {"message": "Flash Dev Server", "docs": "/docs"}
# GET /ping -> {"status": "healthy"}
```

Subdirectory imports use dotted module paths: `longruns/stage1.py` -> `longruns.stage1`.

Multi-function QB files (2+ `@remote` functions) get sub-prefixed routes:
```
longruns/stage1.py has: stage1_preprocess, stage1_infer
→ POST /longruns/stage1/stage1_preprocess/run
→ POST /longruns/stage1/stage1_preprocess/run_sync
→ POST /longruns/stage1/stage1_infer/run
→ POST /longruns/stage1/stage1_infer/run_sync
```

## 12. Acceptance Criteria

- [ ] A file with one `@remote(QB_config)` function and nothing else is a valid Flash project
- [ ] `flash dev` produces a Swagger UI showing all routes grouped by source file
- [ ] QB routes accept `{"input": {...}}` and return `{"id": ..., "status": "COMPLETED", "output": {...}}`
- [ ] Subdirectory files produce URL prefixes matching their relative path
- [ ] Multiple `@remote` functions in one file each get their own sub-prefixed routes
- [ ] LB route handler body executes directly (not dispatched remotely)
- [ ] QB calls inside LB route handler body route to the remote QB endpoint
- [ ] `flash deploy` creates a RunPod endpoint for each resource config
- [ ] `flash build` produces `.flash/manifest.json` with `file_path`, `local_path_prefix`, `module_path` per resource
- [ ] When `flash dev` exits, all Live Serverless endpoints provisioned during that session are automatically undeployed

## 13. Edge Cases

- **No `@remote` functions found**: Error with clear message and usage instructions
- **Multiple `@remote` functions per file (QB)**: Sub-prefixed routes `/{file_prefix}/{fn_name}/run_sync`
- **`__init__.py` files**: Skipped -- not treated as worker files
- **File path with hyphens** (e.g., `my-worker.py`): Resource name sanitized to `my_worker`, URL prefix `/my-worker` (hyphens valid in URLs, underscores in Python identifiers)
- **LB function calling another LB function**: Not supported via `@remote` -- emit a warning at build time
- **`flash deploy` with no LB endpoints**: QB-only deploy
- **Subdirectory `__init__.py`** imports needed: Generator checks and warns if missing
- **Numeric-prefix directories** (e.g., `01_hello/`): Handled via `importlib.import_module()` with scoped `sys.path`

## 14. Implementation Files

| File | Change |
|------|--------|
| `flash/main/PRD.md` | This document |
| `src/runpod_flash/client.py` | Passthrough for LB route handlers (`__is_lb_route_handler__`) |
| `cli/commands/run.py` | Worker scanning, startup table, uvicorn subprocess management |
| `cli/commands/_dev_server.py` | Programmatic FastAPI app factory (`create_app`), route registration |
| `cli/commands/build_utils/scanner.py` | Path utilities; `is_lb_route_handler` field; file-based resource identity |
| `cli/commands/build_utils/manifest.py` | Flat resource structure; `file_path`/`local_path_prefix`/`module_path` fields |
| `cli/commands/build_utils/lb_handler_generator.py` | Import module by `module_path`, walk `__is_lb_route_handler__`, register routes |
| `cli/commands/build.py` | Remove main.py requirement from `validate_project_structure` |
| `core/resources/serverless.py` | Inject `FLASH_MODULE_PATH` env var |
Loading
Loading