You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The shared expert is always resident, so generation continues while the missing expert finishes loading in the background. Quality degrades smoothly and recovers automatically once the expert becomes available.
107
+
The shared expert is always resident, so generation continues while the missing expert finishes loading in the background. For exact lazy/offload comparison modes, Chronos synchronously materializes only the selected missing expert and evicts low-LRU experts to stay inside the resident budget; it does not silently full-load all experts. Quality degrades smoothly only when fallback mode is explicitly enabled.
108
108
109
109
---
110
110
@@ -187,6 +187,14 @@ flowchart LR
187
187
| 5 GRPO |`train_chronos_grpo.py`|`PG * A - beta * KL` with `ToyReward` or pluggable `LMRewardModel`| 0.10 |
-`--dtype auto` is the default. MPS/MLX resolve to BF16-first for training stability, CUDA/XPU resolve to FP16, and CPU resolves to FP32 unless `--dtype float16` or `--dtype bfloat16` is set explicitly.
193
+
- CPU training configures PyTorch to use physical cores by default. Override with `--cpu_threads` or `--cpu_budget_percent`.
194
+
- On macOS, MPS/MLX training forces DataLoader workers to `0` by default to avoid Metal command-buffer crashes from multiprocessing. CPU/CUDA still use worker processes; advanced users can override the guard with `CHRONOS_ALLOW_METAL_DATALOADER_WORKERS=1`.
195
+
- Native MLX training pushes UI logs, scalar readouts, and chart points every `log_interval` steps, and Web UI Stop is checked at each batch boundary.
196
+
- The Web UI writes a warning-only `<checkpoint>.verify.json` after each stage. It checks no-mask vs all-available MoE parity and, on Apple Silicon, MLX prefill logits against the PyTorch CPU baseline.
197
+
190
198
The full six-stage comparison harness lives in `tools/compare_minimind_chronos_v3.py`.
-**First-class backends for training and inference**: `cpu`, `mps`, `cuda`, `mlx`
220
228
-**Inference-only / experimental**: `vulkan` when PyTorch was custom-built with `USE_VULKAN=ON`
221
229
-**Third-party extension hook**: `opencl`, via `chronos/backend/ext/opencl.py:PROBE()`
230
+
-**Apple Silicon policy**: inference auto still prefers MLX; training keeps MLX on the native `chronos.mlx.*` path instead of calling `torch.model.to("mlx")`.
222
231
223
232
Honest note: upstream PyTorch does not ship a real OpenCL backend, and Vulkan support is still niche. Chronos provides a dispatcher seam so external integrations can plug in cleanly without touching core code.
224
233
@@ -304,7 +313,7 @@ All lambda terms are searchable with Optuna TPE, together with structural hyperp
304
313
## Installation
305
314
306
315
```bash
307
-
pip install project-chronos
316
+
pip install Project_Chronos
308
317
```
309
318
310
319
Or from source:
@@ -318,7 +327,7 @@ pip install -e ".[dev]"
318
327
**MLX (Apple Silicon):**
319
328
320
329
```bash
321
-
pip install "project-chronos[mlx]"
330
+
pip install "Project_Chronos[mlx]"
322
331
```
323
332
324
333
**vLLM serving (optional, Linux + CUDA only):**
@@ -337,7 +346,7 @@ pip install vllm
337
346
338
347
## Quick start
339
348
340
-
### Web UI (M6: 7 tabs, 4 languages)
349
+
### Web UI (M6: 8 tabs, 4 languages)
341
350
342
351
```bash
343
352
chronos-ui
@@ -351,12 +360,31 @@ Tabs included:
351
360
-`Train` with its own `data_path`
352
361
-`6-Stage Pipeline` with per-stage dataset paths
353
362
-`Inference`
363
+
-`Export` for FP16/Q8_0 safetensors and GGUF deployment artifacts
354
364
-`Benchmark` with Markdown table + bar plot
355
365
-`Auto-Tune` with persistent logs and one-click `Apply Best -> Config`
0 commit comments