diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 6277eb1..dc1dafe 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,620 +1,9 @@ -# Squirrel Backend Architecture +# Architecture -## Overview +The architecture documentation has moved to the documentation site: -Squirrel Backend is a high-performance FastAPI application designed to manage and monitor EPICS (Experimental Physics and Industrial Control System) process variables (PVs). It handles 40-50K PVs with real-time monitoring, caching, and snapshot capabilities. - -The system uses a **distributed architecture** with separate processes for API serving, PV monitoring, and background task processing, enabling horizontal scaling and fault isolation. - -## Technology Stack - -| Category | Technology | Purpose | -|----------|------------|---------| -| **Framework** | FastAPI 0.109+ | REST API and WebSocket | -| **Language** | Python 3.11+ | Async/await support | -| **Database** | PostgreSQL 16+ | Primary data store | -| **ORM** | SQLAlchemy 2.0+ (async) | Database abstraction | -| **Cache** | Redis 7+ | PV value caching, pub/sub | -| **EPICS** | aioca 1.7+ | Async Channel Access | -| **Task Queue** | Arq | Redis-backed job queue | -| **Server** | Uvicorn | ASGI server | - -## System Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Load Balancer │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ┌───────────────────────┼───────────────────────┐ - ▼ ▼ ▼ - ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ - │ API Instance │ │ API Instance │ │ API Instance │ - │ (squirrel-api) │ │ (squirrel-api) │ │ (squirrel-api) │ - │ REST + WebSocket│ │ REST + WebSocket│ │ REST + WebSocket│ - └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ - │ │ │ - └──────────────────────┼──────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ Redis │ - │ • PV Value Cache (Hash: pv:values) │ - │ • Pub/Sub (pv updates, WebSocket broadcasts) │ - │ • Subscription Registry (multi-instance WebSocket support) │ - │ • Arq Job Queue │ - │ • Monitor Leader Election Lock │ - └──────────────────────────────┬──────────────────────────────────────────┘ - │ - ┌────────────────────────┼────────────────────────┐ - ▼ ▼ ▼ -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ PV Monitor │ │ Arq Worker │ │ Arq Worker │ -│ (squirrel-monitor)│ │ (squirrel-worker)│ │ (squirrel-worker)│ -│ Single instance │ │ Scalable │ │ Scalable │ -│ Leader election │ │ │ │ │ -└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ - │ │ │ - └───────────────────────┼───────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ PostgreSQL │ - │ • PV metadata and configuration │ - │ • Snapshots and snapshot values │ - │ • Tags and tag groups │ - │ • Job tracking │ - └─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ EPICS IOCs │ - │ • 40-50K Process Variables │ - │ • Channel Access protocol │ - └─────────────────────────────────────────────────────────────────────────┘ -``` - -## Directory Structure - -``` -squirrel-backend/ -├── app/ # Main application package -│ ├── main.py # FastAPI entry point (API-only) -│ ├── monitor_main.py # Standalone PV monitor entry point -│ ├── worker.py # Arq worker configuration -│ ├── config.py # Pydantic settings management -│ ├── dependencies.py # FastAPI dependency injection -│ │ -│ ├── api/ # API layer -│ │ ├── responses.py # Response wrappers -│ │ └── v1/ # API v1 endpoints -│ │ ├── router.py # Main router aggregator -│ │ ├── pvs.py # PV CRUD endpoints -│ │ ├── snapshots.py # Snapshot operations -│ │ ├── tags.py # Tag management -│ │ ├── jobs.py # Job status tracking -│ │ ├── health.py # Health monitoring -│ │ └── websocket.py # Real-time PV updates -│ │ -│ ├── services/ # Business logic layer -│ │ ├── epics_service.py # EPICS read/write (aioca) -│ │ ├── redis_service.py # Redis cache management -│ │ ├── pv_monitor.py # Background PV monitoring -│ │ ├── pv_service.py # PV business logic -│ │ ├── snapshot_service.py # Snapshot creation/restore -│ │ ├── tag_service.py # Tag operations -│ │ ├── job_service.py # Job tracking -│ │ ├── watchdog.py # Health monitoring service -│ │ ├── circuit_breaker.py # EPICS circuit breaker -│ │ ├── subscription_registry.py # Multi-instance WebSocket support -│ │ ├── bulk_insert_service.py# PostgreSQL COPY inserts -│ │ └── background_tasks.py # Async background jobs -│ │ -│ ├── tasks/ # Arq task definitions -│ │ ├── __init__.py -│ │ └── snapshot_tasks.py # Snapshot create/restore tasks -│ │ -│ ├── shared/ # Shared utilities -│ │ ├── __init__.py -│ │ └── redis_channels.py # Redis channel constants -│ │ -│ ├── repositories/ # Data access layer -│ │ ├── base.py # Base repository class -│ │ ├── pv_repository.py # PV database operations -│ │ ├── snapshot_repository.py# Snapshot storage -│ │ ├── tag_repository.py # Tag queries -│ │ └── job_repository.py # Job tracking -│ │ -│ ├── models/ # SQLAlchemy ORM models -│ │ ├── base.py # Base model with UUID/timestamps -│ │ ├── pv.py # PV model -│ │ ├── snapshot.py # Snapshot models -│ │ ├── tag.py # Tag models -│ │ └── job.py # Job model -│ │ -│ ├── schemas/ # Pydantic DTOs -│ │ ├── common.py # Common response wrappers -│ │ ├── pv.py # PV DTOs -│ │ ├── snapshot.py # Snapshot DTOs -│ │ ├── tag.py # Tag DTOs -│ │ └── job.py # Job DTOs -│ │ -│ └── db/ # Database configuration -│ └── session.py # Async engine and session factory -│ -├── alembic/ # Database migrations -│ ├── alembic.ini -│ └── versions/ # Migration files -│ -├── docker/ # Docker configuration -│ ├── docker-compose.yml # Full stack deployment -│ └── Dockerfile.dev # Development image -│ -├── scripts/ # Utility scripts -│ ├── upload_csv.py # CSV data loader -│ ├── seed_pvs.py # Test data generator -│ └── benchmark.py # Performance testing -│ -└── tests/ # Test suite - ├── conftest.py # Pytest fixtures - ├── test_api/ # API integration tests - ├── test_services/ # Service unit tests - └── mocks/ # Mock EPICS service -``` - -## Core Components - -### 1. API Server (`app/main.py`) - -FastAPI application serving REST and WebSocket endpoints. **Decoupled from PV monitoring** for fast startup and fault isolation. - -**Startup Sequence:** -1. Connect to Redis -2. Start WebSocket DiffManager (subscribes to Redis pub/sub) -3. Initialize EPICS service (for direct reads during snapshot restore) - -**Key Features:** -- Sub-second startup time (no PV subscription blocking) -- Horizontally scalable behind load balancer -- Crash-isolated from EPICS/aioca issues - -### 2. PV Monitor (`app/monitor_main.py`) - -Dedicated process for EPICS PV monitoring. Runs as a **single instance** with leader election. - -**Responsibilities:** -- Subscribe to all PVs via aioca monitors -- Update Redis cache with PV values -- Publish updates to Redis pub/sub -- Run Watchdog for health checks - -**Leader Election:** -- Uses Redis lock (`squirrel:monitor:lock`) with TTL -- Prevents duplicate monitoring in multi-instance deployments -- Auto-recovers if leader dies - -### 3. Arq Worker (`app/worker.py`) - -Background task processor using Redis-backed Arq queue. - -**Features:** -- Job persistence across restarts -- Automatic retries with exponential backoff -- Scalable (can run multiple workers) -- 10-minute job timeout - -**Tasks:** -- `create_snapshot_task` - Create snapshots from cache or EPICS -- `restore_snapshot_task` - Restore snapshot values to EPICS - -### 4. Configuration (`app/config.py`) - -Pydantic-Settings with environment variable support (prefix: `SQUIRREL_`): - -| Category | Key Settings | -|----------|--------------| -| Database | `database_url`, `pool_size` (30), `max_overflow` (20) | -| EPICS | `ca_addr_list`, `ca_timeout` (10s), `chunk_size` (1000) | -| Redis | `redis_url`, `pv_cache_ttl` (60s) | -| PV Monitor | `batch_size` (500), `batch_delay_ms` (100) | -| Watchdog | `check_interval` (60s), `stale_threshold` (300s) | -| WebSocket | `batch_interval_ms` (100) | - -### 5. Database Models - -``` -┌──────────────────┐ ┌──────────────────┐ -│ PV │ │ TagGroup │ -├──────────────────┤ ├──────────────────┤ -│ setpoint_address │ │ name │ -│ readback_address │ │ description │ -│ config_address │ └────────┬─────────┘ -│ device │ │ -│ description │ │ 1:n -│ abs_tolerance │ ▼ -│ rel_tolerance │ ┌──────────────────┐ -└────────┬─────────┘ │ Tag │ - │ ├──────────────────┤ - │ n:m │ name │ - └───────────────│ tag_group_id │ - └──────────────────┘ - -┌──────────────────┐ ┌──────────────────┐ -│ Snapshot │ │ Job │ -├──────────────────┤ ├──────────────────┤ -│ title │ │ type (enum) │ -│ comment │ │ status (enum) │ -│ created_by │ │ progress (0-100) │ -└────────┬─────────┘ │ data (JSONB) │ - │ │ result_id │ - │ 1:n │ retry_count │ - ▼ └──────────────────┘ -┌──────────────────┐ -│ SnapshotValue │ -├──────────────────┤ -│ pv_name │ -│ setpoint_value │ -│ readback_value │ -│ status │ -│ severity │ -└──────────────────┘ -``` - -### 6. Services Layer - -| Service | Responsibility | -|---------|----------------| -| **EPICSService** | aioca wrapper for caget/caput with circuit breaker | -| **RedisService** | PV value cache, connection tracking, pub/sub, leader election | -| **PVMonitor** | Background subscription to all PVs, updates Redis | -| **SnapshotService** | Create/restore snapshots from cache or EPICS | -| **Watchdog** | Periodic health checks, reconnection attempts | -| **CircuitBreaker** | Fail-fast on unresponsive IOCs | -| **SubscriptionRegistry** | Multi-instance WebSocket subscription tracking | -| **BulkInsertService** | High-perf PostgreSQL COPY for bulk data | - -## Data Flow - -### Snapshot Creation (Async via Arq) - -``` -API Request (/v1/snapshots POST) - │ - ▼ -┌─────────────────────────────────┐ -│ JobService creates Job record │ -└─────────────────┬───────────────┘ - │ - ▼ (enqueue to Arq) -┌─────────────────────────────────┐ -│ Return Job ID immediately │ -└─────────────────┬───────────────┘ - │ - ▼ (Arq worker picks up) -┌─────────────────────────────────┐ -│ Read PV addresses from DB │ -└─────────────────┬───────────────┘ - │ - ▼ (use_cache?) - ┌─────┴─────┐ - │ │ - ▼ ▼ -┌───────┐ ┌───────────────┐ -│ Redis │ │ EPICS direct │ -│ <5s │ │ 30-60s │ -└───┬───┘ └───────┬───────┘ - │ │ - └───────┬───────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ BulkInsertService (COPY) │ -│ Insert SnapshotValues to DB │ -└─────────────────┬───────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Mark Job as COMPLETED │ -└─────────────────────────────────┘ -``` - -### Real-Time PV Monitoring - -``` - PV Monitor Process Startup - │ - ▼ - ┌────────────────────────┐ - │ Acquire Leader Lock │ - │ (Redis SETNX) │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Load PV addresses │ - │ from PostgreSQL │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Batched PV init │ - │ (500/batch, 100ms) │ - └───────────┬────────────┘ - │ - ┌───────────────┼───────────────┐ - │ │ │ - ▼ ▼ ▼ -┌──────────┐ ┌──────────┐ ┌──────────┐ -│ aioca │ │ aioca │ │ aioca │ -│ monitor │ │ monitor │ │ monitor │ -│ (batch 1)│ │ (batch 2)│ │ (batch N)│ -└────┬─────┘ └────┬─────┘ └────┬─────┘ - │ │ │ - └──────────────┼──────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Redis Cache │ - │ • Hash: pv:values │ - │ • Pub/Sub: updates │ - └───────────┬────────────┘ - │ - ▼ (Redis pub/sub) - ┌────────────────────────┐ - │ API Instances │ - │ DiffStreamManager │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Subscription Registry │ - │ (Redis-based) │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ WebSocket Clients │ - │ (100ms batching) │ - └────────────────────────┘ -``` - -### Circuit Breaker Flow - -``` -EPICS Request (caget/caput) - │ - ▼ -┌─────────────────────────────────┐ -│ Circuit Breaker Check │ -│ (by IOC prefix) │ -└─────────────────┬───────────────┘ - │ - ┌────┴────┐ - │ │ - ▼ ▼ -┌───────┐ ┌───────────────────┐ -│ OPEN │ │ CLOSED/HALF-OPEN │ -│ │ │ │ -│ Fail │ │ Execute request │ -│ Fast │ │ │ -└───┬───┘ └─────────┬─────────┘ - │ │ - │ ┌────┴────┐ - │ │ │ - │ ▼ ▼ - │ ┌───────┐ ┌───────┐ - │ │Success│ │Failure│ - │ └───┬───┘ └───┬───┘ - │ │ │ - │ ▼ ▼ - │ Reset count Increment count - │ (HALF→CLOSED) (threshold→OPEN) - │ │ │ - └────────────┴───────────┘ - │ - ▼ - Response -``` - -## API Endpoints - -**Base URL:** `/v1/` - -**Standard Response Format:** -```json -{ - "errorCode": 0, - "errorMessage": null, - "payload": {...} -} -``` - -| Endpoint | Methods | Description | -|----------|---------|-------------| -| `/v1/pvs` | GET, POST | PV management with pagination | -| `/v1/pvs/paged` | GET | Paginated PV search | -| `/v1/snapshots` | GET, POST | Snapshot operations (async via Arq) | -| `/v1/snapshots/{id}/restore` | POST | Restore snapshot to EPICS | -| `/v1/tags` | GET, POST | Tag group management | -| `/v1/jobs/{id}` | GET | Job status polling | -| `/v1/health/*` | GET | Health monitoring endpoints | -| `/v1/health/monitor/status` | GET | Monitor process health | -| `/v1/health/circuits` | GET | Circuit breaker status | -| `/ws` | WebSocket | Real-time PV updates | - -## Design Patterns - -| Pattern | Usage | -|---------|-------| -| **Repository** | Abstracts database access in `repositories/` | -| **Service Layer** | Business logic separated from API handlers | -| **Dependency Injection** | FastAPI Depends() for resources | -| **Background Tasks** | Arq queue for long operations with Job tracking | -| **Singleton Services** | EPICS, Redis as module-level instances | -| **DTO Pattern** | Pydantic schemas separate from ORM models | -| **Cache-Aside** | Redis cache with Watchdog freshness checks | -| **Diff-Based Streaming** | WebSocket sends only changed PVs | -| **Circuit Breaker** | Fail-fast on unresponsive IOCs | -| **Leader Election** | Single PV monitor via Redis lock | -| **Continuation Token Pagination** | ID-based (not offset) for scalability | - -## Performance Optimizations - -1. **Process Isolation** - - API starts in <1s (no PV subscription blocking) - - Monitor crash doesn't affect API - - Workers can scale independently - -2. **Database** - - Connection pooling (30 + 20 overflow) - - PostgreSQL COPY for bulk inserts (10x faster) - - ID-based pagination (no OFFSET) - - Indexes on search fields - -3. **EPICS** - - Batched PV startup (500/batch, 100ms delay) prevents UDP flood - - Async operations via aioca (no blocking) - - Circuit breaker prevents cascading timeouts - - Connection pre-caching - -4. **Redis Caching** - - Instant snapshot reads (<5s for 40K PVs) - - PV Monitor maintains fresh cache - - Pub/Sub for efficient broadcasts - -5. **WebSocket** - - Diff-based streaming (only deltas) - - 100ms batching window - - Redis-based subscription registry for multi-instance - - Reduces bandwidth 10-100x - -6. **Task Queue** - - Jobs persist across restarts - - Automatic retries for transient failures - - Progress tracking in database - -## External Services - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Squirrel Backend Services │ -│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ -│ │ API │ │ Monitor │ │ Worker │ │ Worker │ │ -│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ -└───────┼────────────┼────────────┼────────────┼──────────────────┘ - │ │ │ │ - └────────────┼────────────┼────────────┘ - │ │ - ┌────────────┼────────────┼────────────┐ - ▼ ▼ ▼ ▼ -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ PostgreSQL │ │ Redis │ │ EPICS │ -│ (asyncpg) │ │ (hiredis) │ │ (aioca) │ -├─────────────────┤ ├─────────────────┤ ├─────────────────┤ -│ • PV metadata │ │ • Value cache │ │ • Channel Access│ -│ • Snapshots │ │ • Pub/Sub │ │ • 40K+ PVs │ -│ • Tags │ │ • Job queue │ │ • Read/Write │ -│ • Jobs │ │ • Leader lock │ │ • Monitor │ -│ │ │ • Subscriptions │ │ │ -└─────────────────┘ └─────────────────┘ └─────────────────┘ -``` - -## Configuration - -Environment variables (prefix: `SQUIRREL_`): - -```bash -# Database -SQUIRREL_DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/squirrel -SQUIRREL_DATABASE_POOL_SIZE=30 -SQUIRREL_DATABASE_MAX_OVERFLOW=20 - -# EPICS -SQUIRREL_EPICS_CA_ADDR_LIST="" -SQUIRREL_EPICS_CA_TIMEOUT=10.0 -SQUIRREL_EPICS_CHUNK_SIZE=1000 - -# Redis -SQUIRREL_REDIS_URL=redis://localhost:6379/0 -SQUIRREL_REDIS_PV_CACHE_TTL=60 - -# PV Monitor -SQUIRREL_PV_MONITOR_BATCH_SIZE=500 -SQUIRREL_PV_MONITOR_BATCH_DELAY_MS=100 - -# Watchdog -SQUIRREL_WATCHDOG_ENABLED=true -SQUIRREL_WATCHDOG_CHECK_INTERVAL=60.0 -SQUIRREL_WATCHDOG_STALE_THRESHOLD=300.0 - -# WebSocket -SQUIRREL_WEBSOCKET_BATCH_INTERVAL_MS=100 - -# Legacy Mode (embedded monitor in API process) -SQUIRREL_EMBEDDED_MONITOR=false -``` - -## Deployment - -### Docker Compose (Recommended) - -Full distributed deployment with all services: - -```bash -cd docker -docker-compose up --build -``` - -This starts: -- **PostgreSQL** (port 5432) -- **Redis** (port 6379) -- **API** (port 8000) - REST/WebSocket server -- **Monitor** (1 replica) - PV monitoring -- **Worker** (2 replicas) - Background task processing - -### Legacy Mode - -For simpler deployments or backward compatibility: - -```bash -cd docker -docker-compose --profile legacy up backend db redis -``` - -### Local Development - -```bash -# 1. Start infrastructure -cd docker -docker-compose up -d db redis - -# 2. Set up Python environment -cd .. -python -m venv venv -source venv/bin/activate -pip install -e ".[dev]" -cp .env.example .env - -# 3. Run migrations -alembic upgrade head - -# 4. Start services (in separate terminals) -uvicorn app.main:app --reload --port 8000 # API -python -m app.monitor_main # Monitor -arq app.worker.WorkerSettings # Worker -``` - -## API Documentation - -- Swagger UI: `http://localhost:8000/docs` -- OpenAPI Spec: `http://localhost:8000/openapi.json` - -## Health Monitoring - -| Endpoint | Description | -|----------|-------------| -| `/v1/health` | Overall API health | -| `/v1/health/db` | Database connectivity | -| `/v1/health/redis` | Redis connectivity | -| `/v1/health/monitor/status` | PV monitor process health (via heartbeat) | -| `/v1/health/circuits` | Circuit breaker status by IOC prefix | +- **Online:** https://slaclab.github.io/react-squirrel-backend/architecture/ +- **In-repo:** + - [docs/architecture/index.md](docs/architecture/index.md) — system overview, components, design patterns + - [docs/architecture/distributed-system.md](docs/architecture/distributed-system.md) — process model, circuit breaker, deployment + - [docs/architecture/data-flow.md](docs/architecture/data-flow.md) — snapshot creation, monitoring, restore, job tracking diff --git a/QUICKSTART.md b/QUICKSTART.md index 7826a82..10139ed 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -1,172 +1,6 @@ -# Squirrel Backend - Quick Start Guide +# Quick Start -Get the backend running in 2 minutes! +The quick start guide has moved to the documentation site: -## Prerequisites - -- Docker and Docker Compose installed -- (Optional) Python 3.11+ for local development - -## Option 1: Docker Compose (Fastest) - -### 1. Configure EPICS Networking - -If you need to connect to EPICS servers: - -```bash -# Copy the example environment file -cp docker/.env.example docker/.env - -# The example .env file has EPICS_CA_AUTO_ADDR_LIST set to YES by default. -# This should be sufficient to find any test PVs you are running locally without any modifications. -# -# If you need to point to specific IP addresses to locate PVs, uncomment EPICS_CA_ADDR_LIST or -# EPICS_PVA_ADDR_LIST as needed and specify the IP address needed. -# -# If pointing to a hostname, ensure that a DNS entry is created for it by specifying either EPICS_HOST_PROD -# or EPICS_HOST_DMZ -``` - -### 2. Start everything - -```bash -cd docker -docker compose up --build -``` - -**That's it!** The backend is now running: -- API: http://localhost:8080 -- Swagger docs: http://localhost:8080/docs -- Health: http://localhost:8080/v1/health/summary - -### What's Running? - -- `squirrel-api` - REST/WebSocket API server (port 8080) -- `squirrel-db` - PostgreSQL database (port 5432) -- `squirrel-redis` - Redis cache/queue (port 6379) -- `squirrel-monitor` - EPICS PV monitoring service -- `squirrel-worker-1` & `squirrel-worker-2` - Background job processors - -### Load Test Data - -```bash -# In a new terminal -docker compose exec api python -m scripts.seed_pvs --count 100 -``` - -This creates 100 test PVs with tags. Now you can test snapshots! - -### View Logs - -```bash -# All services -docker compose logs -f - -# Specific service -docker compose logs -f api -docker compose logs -f monitor -docker compose logs -f worker -``` - -### Stop Everything - -```bash -docker compose down - -# Or to also delete the database -docker compose down -v -``` - -## Option 2: Local Development - -Better for active development with hot-reload: - -```bash -# 1. Start infrastructure only -cd docker -docker compose up -d db redis - -# 2. Run setup script -cd .. -./setup.sh - -# 3. Run migrations -alembic upgrade head - -# 4. Load test data -python -m scripts.seed_pvs --count 100 - -# 5. Start services (each in a separate terminal) -uvicorn app.main:app --reload --port 8000 # Terminal 1: API -python -m app.monitor_main # Terminal 2: Monitor -arq app.worker.WorkerSettings # Terminal 3: Worker -``` - -Access at: http://localhost:8000 - -## Common Commands - -```bash -# Check service status -docker compose ps - -# Restart a service -docker compose restart api - -# Scale workers -docker compose up -d --scale worker=4 - -# Access database -docker exec -it squirrel-db psql -U squirrel - -# Access Redis CLI -docker exec -it squirrel-redis redis-cli - -# Run migrations in Docker -docker exec -it squirrel-api alembic upgrade head -``` - -## Troubleshooting - -### Snapshots are empty -This is normal! Test PVs don't exist on a real EPICS network. To test with real data: -1. Upload real PV addresses via CSV: `python -m scripts.upload_csv your_pvs.csv` -2. Make sure your EPICS network is accessible from Docker - -### Worker not running -Snapshots will hang if the worker isn't running: -```bash -docker compose ps worker # Check status -docker compose up -d worker # Start if stopped -docker compose logs -f worker # View logs -``` - -### Port 8080 already in use -Edit `docker/docker-compose.yml` and change the port: -```yaml -api: - ports: - - "8081:8000" # Change 8080 to 8081 -``` - -### Monitor not connecting -Check Redis connection: -```bash -docker compose logs monitor -docker exec -it squirrel-redis redis-cli PING -``` - -## Next Steps - -- Read the full [README.md](README.md) for detailed documentation -- Check out [ARCHITECTURE.md](ARCHITECTURE.md) for system design -- See [API documentation](http://localhost:8080/docs) after starting the backend - -## Need Help? - -- Check logs: `docker compose logs -f [service]` -- Verify all services are running: `docker compose ps` -- Restart everything: `docker compose restart` -- Reset database: `docker compose down -v && docker compose up -d` - -Happy snapshoting! 🐿️ +- **Online:** https://slaclab.github.io/react-squirrel-backend/getting-started/ +- **In-repo:** [docs/getting-started/index.md](docs/getting-started/index.md) diff --git a/README.md b/README.md index 6ab2610..078666d 100644 --- a/README.md +++ b/README.md @@ -2,17 +2,15 @@ High-performance Python FastAPI backend for EPICS control system snapshot/restore operations, designed to handle 40-50K PVs efficiently. +**Full documentation:** https://slaclab.github.io/react-squirrel-backend/ + ## Features -- **Distributed Architecture**: Separate processes for API, PV monitoring, and background tasks -- **Fast Snapshot Creation**: Parallel EPICS reads or instant Redis cache reads (<5s for 40K PVs) -- **Efficient Restore Operations**: Parallel EPICS writes for quick machine state restoration -- **Real-Time Updates**: WebSocket streaming with diff-based updates and multi-instance support -- **Tag-based Organization**: Group and categorize PVs using hierarchical tags -- **Snapshot Comparison**: Compare two snapshots with tolerance-based diff -- **Persistent Job Queue**: Background tasks survive restarts with automatic retries -- **Circuit Breaker**: Fail-fast protection against unresponsive IOCs -- **PostgreSQL Storage**: Reliable relational database with async support +- **Distributed Architecture** — separate processes for API, PV monitoring, and background tasks +- **Fast Snapshot Creation** — instant Redis cache reads (<5s for 40K PVs), or direct EPICS reads for guaranteed-fresh values +- **Efficient Restore** — parallel EPICS writes for quick machine state restoration +- **Real-Time Updates** — WebSocket streaming with diff-based updates and multi-instance support +- **Tag-based Organization**, **Snapshot Comparison**, **Persistent Job Queue**, **Circuit Breaker**, **API Key Authentication** ## Technology Stack @@ -24,575 +22,43 @@ High-performance Python FastAPI backend for EPICS control system snapshot/restor | ORM | SQLAlchemy 2.0 (async) | | Cache/Queue | Redis 7+ | | Task Queue | Arq | -| EPICS | aioca (async Channel Access) | +| EPICS | aioca (Channel Access), p4p (PVAccess) | | Migrations | Alembic | | Validation | Pydantic v2 | ---- - -## Quick Start - -**New here?** See [QUICKSTART.md](QUICKSTART.md) for a 2-minute setup guide! - -### Option 1: Docker Compose (Recommended) - -The easiest way to get started with the full distributed architecture: - -```bash -# Clone the repository -git clone -cd react-squirrel-backend - -# Start the full stack -cd docker -cp .env.example .env -# Note: If needing to make EPICS connections outside of your machine's localhost, edit -# the .env file to add the IP addresses or host names to EPICS_CA_ADDR_LIST/EPICS_PVA_ADDR_LIST -# as necessary. For example: -# EPICS_CA_ADDR_LIST=lcls-prod01:5068 lcls-prod01:5063 -docker-compose up -d --build - -# Configure the database -docker exec squirrel-api alembic upgrade head -``` - -This starts: -- **PostgreSQL** on port `5432` -- **Redis** on port `6379` -- **API Server** on port `8080` (REST/WebSocket) -- **PV Monitor** (1 replica) - EPICS monitoring process -- **Workers** (2 replicas) - Background task processors - -The Docker Compose project is named **`squirrel`**, so containers are: -- `squirrel-api`, `squirrel-db`, `squirrel-redis`, `squirrel-monitor`, `squirrel-worker-1`, `squirrel-worker-2` - -The API will be available at: -- **API**: http://localhost:8080 -- **Swagger Docs**: http://localhost:8080/docs -- **Health Check**: http://localhost:8080/v1/health/summary - -To stop the services: -```bash -docker compose down -``` - -To reset the database (delete all data): -```bash -docker compose down -v -``` - -### Option 2: Legacy Mode (Single Process) - -For simpler deployments with embedded PV monitoring: - -```bash -cd docker -cp .env.example .env -docker compose --profile legacy up backend db redis -``` - -This runs the API with embedded PV monitor on port `8001`. - -**Note**: Workers are still required for snapshot creation. Start them separately: -```bash -docker compose up -d worker -``` - -### Option 3: Local Development - -Run infrastructure in Docker, services locally for faster development: - -```bash -# 1. Start PostgreSQL and Redis -cd docker -docker compose up -d db redis - -# 2. Set up Python environment (or run ./setup.sh) -cd .. -python -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate -pip install -e ".[dev]" - -# 3. Configure environment -cp .env.example .env -# Edit .env if needed (defaults work with docker compose) - -# 4. Run database migrations -alembic upgrade head - -# 5. (Optional) Load test data -python -m scripts.seed_pvs --count 100 - -# 6. Start services (in separate terminals) -uvicorn app.main:app --reload --port 8000 # API Server -python -m app.monitor_main # PV Monitor -arq app.worker.WorkerSettings # Worker (REQUIRED for snapshots) -``` - -**Important**: All three services must be running for full functionality: -- **API**: Handles HTTP/WebSocket requests -- **Monitor**: Maintains Redis cache of live PV values -- **Worker**: Processes background jobs (snapshot creation/restore) - ---- - -## Architecture Overview - -``` -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ API Server │ │ PV Monitor │ │ Arq Worker │ -│ (squirrel-api) │ │(squirrel-monitor)│ │(squirrel-worker)│ -│ REST/WebSocket │ │ EPICS → Redis │ │ Snapshot jobs │ -└────────┬────────┘ └────────┬────────┘ └────────┬────────┘ - │ │ │ - └───────────────────────┼───────────────────────┘ - │ - ┌────────────┴────────────┐ - ▼ ▼ - ┌─────────────┐ ┌─────────────┐ - │ Redis │ │ PostgreSQL │ - │ Cache/Queue │ │ Storage │ - └─────────────┘ └─────────────┘ - │ - ▼ - ┌─────────────┐ - │ EPICS IOCs │ - │ 40-50K PVs │ - └─────────────┘ -``` - -For detailed architecture documentation, see [ARCHITECTURE.md](ARCHITECTURE.md). - ---- - -## Loading Data - -### Upload PVs from CSV - -The expected format: -```csv -Setpoint,Readback,Region,Area,Subsystem -FBCK:LNG6:1:BC2ELTOL,,"Feedback-All","LIMITS","FBCK" -QUAD:LI21:201:BDES,QUAD:LI21:201:BACT,"Cu Linac","LI21","Magnet" -... -``` - -#### Using the UI -* navigate to the "Browser PVs" page -* click the "Import PVs" button -* select the consolidated CSV - -#### Using a bash script -In addition to importing PVs, upload_csv.py also creates tag groups for the tags found in the CSV. However, it must be run from within the docker service. - -```bash -# Copying script and data into docker service -docker cp /path/to/local/upload_csv.py /path/to/local/consolidated.py squirrel-api:/tmp/ - -# Dry run (see what would be uploaded) -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv --dry-run - -# Full upload (~36K PVs) -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv - -# With custom batch size -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv --batch-size 1000 -``` - -### Seed Test Data - -For development/testing with sample data: - -```bash -# Create 1000 test PVs with tags -python -m scripts.seed_pvs --count 1000 - -# Create 50K PVs for performance testing -python -m scripts.seed_pvs --count 50000 --batch-size 5000 - -# Clear existing data first -python -m scripts.seed_pvs --count 1000 --clear -``` - ---- - -## Development - -### Project Structure - -``` -squirrel-backend/ -├── app/ -│ ├── main.py # API entry point -│ ├── monitor_main.py # PV Monitor entry point -│ ├── worker.py # Arq worker configuration -│ ├── config.py # Configuration settings -│ ├── api/v1/ # API endpoints -│ ├── models/ # SQLAlchemy models -│ ├── schemas/ # Pydantic schemas (DTOs) -│ ├── services/ # Business logic layer -│ ├── repositories/ # Data access layer -│ ├── tasks/ # Arq task definitions -│ └── db/ # Database session management -├── alembic/ # Database migrations -├── tests/ # Test suite -├── docker/ # Docker configuration -└── scripts/ # Utility scripts -``` - -### Running Tests - -```bash -# Run all tests -pytest - -# Run with verbose output -pytest -v +## Quick Start (Docker) -# Run specific test file -pytest tests/test_api/test_pvs.py - -# Run with coverage report -pytest --cov=app --cov-report=html -``` - -**Note**: Tests use a separate test database (`squirrel_test`). Create it first: ```bash -createdb squirrel_test -# Or via Docker: -docker exec -it squirrel-db createdb -U squirrel squirrel_test +git clone https://github.com/slaclab/react-squirrel-backend.git +cd react-squirrel-backend/docker +cp .env.example .env # edit for your EPICS network if needed +docker compose up -d --build # migrations run automatically +docker exec squirrel-api python -m scripts.create_key my-app --read --write ``` -### Database Migrations - -```bash -# Apply all migrations -alembic upgrade head - -# Create new migration after model changes -alembic revision --autogenerate -m "description of changes" - -# Rollback one migration -alembic downgrade -1 - -# Show current migration status -alembic current -``` - -### Code Quality - -```bash -# Format code -ruff format . - -# Lint code -ruff check . - -# Fix auto-fixable lint issues -ruff check . --fix - -# Type checking -mypy app/ -``` - ---- - -## API Endpoints - -### PV Endpoints (`/v1/pvs`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/pvs` | Search PVs (simple) | -| `GET` | `/v1/pvs/paged` | Search PVs with pagination | -| `POST` | `/v1/pvs` | Create single PV | -| `POST` | `/v1/pvs/multi` | Bulk create PVs | -| `PUT` | `/v1/pvs/{id}` | Update PV | -| `DELETE` | `/v1/pvs/{id}` | Delete PV | - -### Snapshot Endpoints (`/v1/snapshots`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/snapshots` | List snapshots | -| `POST` | `/v1/snapshots` | Create snapshot (async, returns job ID) | -| `GET` | `/v1/snapshots/{id}` | Get snapshot with all values | -| `DELETE` | `/v1/snapshots/{id}` | Delete snapshot | -| `POST` | `/v1/snapshots/{id}/restore` | Restore values to EPICS | -| `GET` | `/v1/snapshots/{id}/compare/{id2}` | Compare two snapshots | - -### Tag Endpoints (`/v1/tags`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/tags` | List tag groups | -| `POST` | `/v1/tags` | Create tag group | -| `GET` | `/v1/tags/{id}` | Get tag group with tags | -| `PUT` | `/v1/tags/{id}` | Update tag group | -| `DELETE` | `/v1/tags/{id}` | Delete tag group | - -### Job Endpoints (`/v1/jobs`) +The API is now available at http://localhost:8080 (Swagger docs at `/docs`). -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/jobs/{id}` | Get job status and progress | +!!! note + The token printed by `create_key` is only shown once — save it. All endpoints require an `X-API-Key` header. -### Health Endpoints (`/v1/health`) +For local development, detailed setup, configuration, API reference, and architecture, see the [documentation site](https://slaclab.github.io/react-squirrel-backend/). -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/health` | Overall health | -| `GET` | `/v1/health/db` | Database connectivity | -| `GET` | `/v1/health/redis` | Redis connectivity | -| `GET` | `/v1/health/monitor/status` | PV monitor process health | -| `GET` | `/v1/health/circuits` | Circuit breaker status | +## Documentation -### WebSocket (`/ws`) - -Real-time PV value streaming with diff-based updates: - -```javascript -const ws = new WebSocket('ws://localhost:8000/ws'); - -// Subscribe to PVs -ws.send(JSON.stringify({ - action: 'subscribe', - pv_names: ['PV:NAME:1', 'PV:NAME:2'] -})); - -// Receive updates -ws.onmessage = (event) => { - const data = JSON.parse(event.data); - // { pv_name: 'PV:NAME:1', value: 42.0, timestamp: '...' } -}; -``` - ---- - -## Configuration - -All configuration is via environment variables (with `SQUIRREL_` prefix): - -| Variable | Default | Description | -|----------|---------|-------------| -| `SQUIRREL_DATABASE_URL` | `postgresql+asyncpg://...` | Database connection | -| `SQUIRREL_DATABASE_POOL_SIZE` | `30` | Connection pool size | -| `SQUIRREL_REDIS_URL` | `redis://localhost:6379/0` | Redis connection | -| `SQUIRREL_EPICS_CA_TIMEOUT` | `10.0` | Operation timeout (seconds) | -| `SQUIRREL_EPICS_CHUNK_SIZE` | `1000` | PVs per batch in parallel ops | -| `SQUIRREL_PV_MONITOR_BATCH_SIZE` | `500` | PVs per subscription batch | -| `SQUIRREL_WATCHDOG_ENABLED` | `true` | Enable health monitoring | -| `SQUIRREL_EMBEDDED_MONITOR` | `false` | Run monitor in API process | -| `SQUIRREL_DEBUG` | `false` | Enable debug logging | - -See `.env.example` for a complete template. - -### Docker-Specific Configuration - -If needing to create EPICS connections to specific host names, configure EPICS server DNS mappings: - -```bash -# Copy the example file -cp docker/.env.example docker/.env - -# Edit with your EPICS server hostnames and IPs -# Get IPs with: host -``` - -Example `docker/.env`: -```bash -COMPOSE_PROJECT_NAME=squirrel - -EPICS_HOST_PROD=your-epics-server:xxx.xxx.xxx.xxx -EPICS_HOST_DMZ=your-dmz-server:xxx.xxx.xxx.xxx -``` - -**Note**: `docker/.env` is gitignored and should contain your site-specific configuration. - ---- - -## Docker Commands Reference - -```bash -# Start all services -cd docker -docker compose up - -# Start in background (detached) -docker compose up -d - -# Rebuild images after code changes -docker compose up --build - -# View logs -docker compose logs -f api -docker compose logs -f monitor -docker compose logs -f worker - -# Stop services -docker compose down - -# Stop and remove volumes (reset database) -docker compose down -v - -# Scale workers (for high load) -docker compose up -d --scale worker=4 - -# Execute command in running container -docker exec -it squirrel-api bash -docker exec -it squirrel-db psql -U squirrel - -# Run migrations in Docker -docker exec -it squirrel-api alembic upgrade head - -# Load test data in Docker -docker compose exec api python -m scripts.seed_pvs --count 100 -``` - ---- - -## Troubleshooting - -### Database connection refused -```bash -# Check if PostgreSQL is running -docker compose ps db - -# Check database health -docker compose logs db - -# Test connection -docker exec -it squirrel-db pg_isready -U squirrel -# Or for local: pg_isready -h localhost -p 5432 -``` - -### Migrations fail -```bash -# Ensure database exists -docker exec -it squirrel-db createdb -U squirrel squirrel - -# Check migration status -alembic current -``` - -### EPICS connection issues -```bash -# Verify EPICS environment -echo $EPICS_CA_ADDR_LIST - -# Test PV connectivity -caget -``` - -### PV Monitor not updating -```bash -# Check monitor health via API -curl http://localhost:8000/v1/health/monitor/status - -# Check Redis for heartbeat -docker exec -it squirrel-redis redis-cli GET squirrel:monitor:heartbeat -``` - -### Snapshots hanging or have no data -```bash -# Check if worker is running -docker compose ps worker - -# If not running, start it -docker compose up -d worker - -# Check worker logs -docker compose logs -f worker - -# Verify worker is processing jobs -docker exec -it squirrel-redis redis-cli LLEN arq:queue -``` - -**Note**: Snapshots will be empty if: -- Test PVs don't exist on EPICS network (expected for development) -- Monitor can't connect to PVs (check EPICS_CA_ADDR_LIST) -- Redis cache is empty and direct EPICS reads fail - -### Port already in use -```bash -# Find process using port 8080 (Docker) or 8000 (local) -lsof -i :8080 - -# Change Docker port in docker-compose.yml: -# ports: -# - "8081:8000" # Change 8080 to 8081 - -# Or use different port locally -uvicorn app.main:app --reload --port 8001 -``` - ---- - -## API Key Management - -These scripts manage API keys for authenticating requests to the backend. Run them from the project root with `python -m scripts.`. - -### Create a key - -```bash -python -m scripts.create_key [--read] [--write] -``` - -At least one of `--read` / `--write` is required. - -```bash -# Read-only key for the frontend app -python -m scripts.create_key my-app --read - -# Read/write key -python -m scripts.create_key my-app --read --write -``` - -Output includes the app name, key ID, access level, creation timestamp, and the token (only shown at creation time). - -### List keys - -```bash -python -m scripts.list_keys [--active-only] -``` - -Prints a table of all stored API keys. Use `--active-only` (`-a`) to filter out deactivated keys. - -### Deactivate a key - -```bash -python -m scripts.deactivate_key -``` - -Deactivates the key with the given ID. The key is retained in the database but can no longer be used for authentication. - ---- - -## Performance Benchmarking - -```bash -# Start the backend first, then run: -python -m scripts.benchmark - -# With more iterations -python -m scripts.benchmark --iterations 10 - -# Skip restore benchmark (no EPICS writes) -python -m scripts.benchmark --skip-restore -``` - ---- +| Topic | Link | +|---|---| +| Getting Started | [docs/getting-started/](docs/getting-started/index.md) | +| Installation options | [docs/getting-started/installation.md](docs/getting-started/installation.md) | +| Configuration | [docs/getting-started/configuration.md](docs/getting-started/configuration.md) | +| API Keys | [docs/getting-started/api-keys.md](docs/getting-started/api-keys.md) | +| Architecture | [docs/architecture/](docs/architecture/index.md) | +| REST / WebSocket API | [docs/api-reference/](docs/api-reference/index.md) | +| Development | [docs/development/](docs/development/index.md) | ## Frontend -The Squirrel React frontend is available at: -- Repository: `squirrel` (separate repo) -- Default API URL: `http://localhost:8000` - -Configure the frontend to point to this backend by setting the API base URL. - ---- +The React frontend lives at [slaclab/react-squirrel](https://github.com/slaclab/react-squirrel). ## License -MIT License +MIT — see [LICENSE.md](LICENSE.md). diff --git a/app/main.py b/app/main.py index e92ec15..5d8672b 100644 --- a/app/main.py +++ b/app/main.py @@ -12,7 +12,6 @@ - Independent scaling and deployment """ -import os import logging from contextlib import asynccontextmanager @@ -24,10 +23,8 @@ from app.api.responses import APIException from app.api.v1.router import router as v1_router from app.api.v1.websocket import get_diff_manager -from app.services.pv_protocol import is_unprefixed, parse_pv_name from app.services.epics_service import get_epics_service from app.services.redis_service import get_redis_service -from app.services.pvaccess_monitor import get_pvaccess_monitor # Configure logging logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") @@ -35,9 +32,6 @@ settings = get_settings() -# Environment variable to optionally enable embedded monitor (for backward compatibility) -EMBEDDED_MONITOR = os.environ.get("SQUIRREL_EMBEDDED_MONITOR", "false").lower() == "true" - @asynccontextmanager async def lifespan(app: FastAPI): @@ -78,15 +72,7 @@ async def lifespan(app: FastAPI): if monitor_alive: logger.info("PV Monitor process detected (via heartbeat)") else: - logger.warning( - "PV Monitor process not detected - start squirrel-monitor separately, " - "or set SQUIRREL_EMBEDDED_MONITOR=true to run embedded" - ) - - # Optionally start embedded monitor (for backward compatibility/development) - if EMBEDDED_MONITOR: - logger.info("Starting EMBEDDED PV Monitor (SQUIRREL_EMBEDDED_MONITOR=true)") - await _start_embedded_monitor(redis_service, epics) + logger.warning("PV Monitor process not detected - start squirrel-monitor separately") # Start WebSocket diff stream manager (subscribes to Redis pub/sub) diff_manager = get_diff_manager() @@ -104,10 +90,6 @@ async def lifespan(app: FastAPI): # Cleanup logger.info("Shutting down Squirrel API...") - # Stop embedded monitor if running - if EMBEDDED_MONITOR: - await _stop_embedded_monitor() - # Stop WebSocket manager try: diff_manager = get_diff_manager() @@ -132,100 +114,6 @@ async def lifespan(app: FastAPI): logger.info("Squirrel API shutdown complete") -async def _start_embedded_monitor(redis_service, epics): - """ - Start PV Monitor and Watchdog in embedded mode (backward compatibility). - - This is enabled by setting SQUIRREL_EMBEDDED_MONITOR=true. - """ - from app.db.session import async_session_maker - from app.services.watchdog import get_watchdog - from app.services.pv_monitor import get_pv_monitor - from app.repositories.pv_repository import PVRepository - - pv_monitor = get_pv_monitor(redis_service) - pva_monitor = None - - # Get all PV addresses from database - async with async_session_maker() as session: - pv_repo = PVRepository(session) - pv_addresses_data = await pv_repo.get_all_addresses() - - # Extract unique addresses (setpoint and readback) - pv_addresses = set() - for _, setpoint, readback, config in pv_addresses_data: - if setpoint: - pv_addresses.add(setpoint) - if readback: - pv_addresses.add(readback) - - # Start PV monitoring (with batched startup) - ca_pvs: list[str] = [] - pva_pvs: list[str] = [] - for pv_name in pv_addresses: - protocol, _ = parse_pv_name(pv_name) - if protocol == "pva": - pva_pvs.append(pv_name) - else: - ca_pvs.append(pv_name) - if settings.epics_unprefixed_pva_fallback and is_unprefixed(pv_name): - pva_pvs.append(pv_name) - - if pv_addresses: - logger.info(f"[EMBEDDED] Starting PV Monitor for {len(ca_pvs)} CA and {len(pva_pvs)} PVA addresses") - await pv_monitor.start(ca_pvs) - logger.info(f"[EMBEDDED] PV Monitor started for {len(ca_pvs)} CA addresses") - - if pva_pvs: - pva_monitor = get_pvaccess_monitor(redis_service) - await pva_monitor.start(pva_pvs) - logger.info(f"[EMBEDDED] PVAccess Monitor started for {len(pva_pvs)} PVA addresses") - else: - logger.info("[EMBEDDED] No PVA addresses found; PVAccess Monitor not started") - else: - logger.warning("[EMBEDDED] No PV addresses found in database") - - # Start Watchdog if enabled - if settings.watchdog_enabled: - watchdog = get_watchdog(redis_service, epics, pv_monitor, pva_monitor if pva_pvs else None) - await watchdog.start() - logger.info("[EMBEDDED] Watchdog started") - - -async def _stop_embedded_monitor(): - """Stop embedded PV Monitor and Watchdog.""" - from app.services.watchdog import get_watchdog - from app.services.pv_monitor import get_pv_monitor - - # Stop Watchdog - if settings.watchdog_enabled: - try: - watchdog = get_watchdog() - if watchdog.is_running(): - await watchdog.stop() - logger.info("[EMBEDDED] Watchdog stopped") - except Exception as e: - logger.error(f"Error stopping Watchdog: {e}") - - # Stop PV Monitor - try: - pv_monitor = get_pv_monitor() - if pv_monitor.is_running(): - await pv_monitor.stop() - logger.info("[EMBEDDED] PV Monitor stopped") - except Exception as e: - logger.error(f"Error stopping PV Monitor: {e}") - - # Stop PVAccess Monitor - try: - pva_monitor = get_pvaccess_monitor() - if pva_monitor.is_running(): - await pva_monitor.stop() - logger.info("[EMBEDDED] PVAccess Monitor stopped") - except Exception as e: - logger.error(f"Error stopping PVAccess Monitor: {e}") - - app = FastAPI( title="Squirrel Backend", description="High-performance EPICS snapshot/restore backend with 40k PV support", diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml index 5f8dec6..b011fc7 100644 --- a/docker/docker-compose.yml +++ b/docker/docker-compose.yml @@ -139,41 +139,5 @@ services: deploy: replicas: 2 # Can scale workers independently - # Legacy backend alias (for backward compatibility) - backend: - build: - context: .. - dockerfile: docker/Dockerfile.dev - container_name: squirrel-backend - command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload - environment: - SQUIRREL_DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB} - SQUIRREL_REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379/0 - SQUIRREL_DEBUG: "${SQUIRREL_DEBUG}" - # Enable embedded monitor for backward compatibility - SQUIRREL_EMBEDDED_MONITOR: "true" - - EPICS_CA_AUTO_ADDR_LIST: "${EPICS_CA_AUTO_ADDR_LIST}" - EPICS_CA_ADDR_LIST: "${EPICS_CA_ADDR_LIST}" - EPICS_CA_REPEATER_PORT: "${EPICS_CA_REPEATER_PORT}" - EPICS_CA_SERVER_PORT: "${EPICS_CA_SERVER_PORT}" - - EPICS_PVA_AUTO_ADDR_LIST: "${EPICS_PVA_AUTO_ADDR_LIST}" - EPICS_PVA_ADDR_LIST: "${EPICS_PVA_ADDR_LIST}" - EPICS_PVA_SERVER_PORT: "${EPICS_PVA_SERVER_PORT}" - EPICS_PVA_BROADCAST_PORT: "${EPICS_PVA_BROADCAST_PORT}" - ports: - - "8001:8000" - volumes: - - ../app:/app/app - - ../tests:/app/tests - depends_on: - db: - condition: service_healthy - redis: - condition: service_healthy - profiles: - - legacy # Only start with --profile legacy - volumes: squirrel_db_data: diff --git a/docs/api-reference/endpoints.md b/docs/api-reference/endpoints.md index d708629..09b36dd 100644 --- a/docs/api-reference/endpoints.md +++ b/docs/api-reference/endpoints.md @@ -112,156 +112,161 @@ Requires `read_access`. ## PV Endpoints +All request/response fields use **camelCase**. `id` in paths is a UUID. + ### Search PVs ``` GET /v1/pvs ``` -Search for PVs with optional filters. +Non-paginated search by name (backward-compatibility helper; returns up to 1000 rows). **Query Parameters:** | Parameter | Type | Description | |-----------|------|-------------| -| `query` | string | Search term (matches address or description) | -| `tag_ids` | array | Filter by tag IDs | -| `limit` | integer | Max results (default: 100) | +| `pvName` | string | Search term | -**Response:** +### Search PVs (Paginated) + +``` +GET /v1/pvs/paged +``` + +Search PVs with cursor-based pagination and tag filtering. + +**Query Parameters:** + +| Parameter | Type | Description | +|-----------|------|-------------| +| `pvName` | string | Search term (matches address or description) | +| `pageSize` | integer | Page size (1-1000, default 100) | +| `continuationToken` | string | Opaque cursor returned by the previous response | +| `tagFilters` | string | JSON object: `{groupId: [tagId1, tagId2], ...}` — returns PVs matching (any tag in group A) AND (any tag in group B) | + +**Response payload:** ```json { - "errorCode": 0, - "errorMessage": null, - "payload": [ + "results": [ { "id": "550e8400-e29b-41d4-a716-446655440000", - "setpoint_address": "QUAD:LI21:201:BDES", - "readback_address": "QUAD:LI21:201:BACT", + "setpointAddress": "QUAD:LI21:201:BDES", + "readbackAddress": "QUAD:LI21:201:BACT", "description": "Quadrupole magnet", - "tags": [ - {"id": "...", "name": "Magnet"} - ] + "absTolerance": 0.01, + "relTolerance": 0.001, + "readOnly": false, + "tags": [{"id": "...", "name": "Magnet"}] } - ] + ], + "continuationToken": "eyJpZCI6IjU1MGU4...", + "hasMore": true } ``` -### Search PVs (Paginated) +### Filtered Search (with optional live values) ``` -GET /v1/pvs/paged +GET /v1/pvs/search ``` -Search PVs with cursor-based pagination. +Server-side filtered search that optionally returns live values from the Redis cache alongside metadata. **Query Parameters:** | Parameter | Type | Description | |-----------|------|-------------| -| `query` | string | Search term | -| `tag_ids` | array | Filter by tag IDs | -| `limit` | integer | Page size (default: 100) | -| `cursor` | string | Continuation token | +| `q` | string | Text search | +| `devices` | array | Filter by device name(s) | +| `tags` | array | Filter by tag IDs | +| `limit` | integer | Max results (≤1000, default 100) | +| `offset` | integer | Pagination offset | +| `include_live_values` | boolean | Include Redis cache values in `liveValues` | -**Response:** +### List Devices -```json -{ - "errorCode": 0, - "errorMessage": null, - "payload": { - "items": [...], - "next_cursor": "eyJpZCI6IjU1MGU4...", - "has_more": true - } -} +``` +GET /v1/pvs/devices ``` -### Create PV +Returns the distinct set of device names currently in use. + +### Live Values (GET / POST) ``` -POST /v1/pvs +GET /v1/pvs/live?pv_names=PV:1&pv_names=PV:2 +POST /v1/pvs/live ``` -Create a new PV. +Fetch cached live values from Redis. Use `POST` with body `{"pv_names": ["..."]}` when the list is too long for a query string. -**Request Body:** +### All Live Values -```json -{ - "setpoint_address": "TEST:PV:SETPOINT", - "readback_address": "TEST:PV:READBACK", - "config_address": null, - "device": "Test Device", - "description": "Test PV for documentation", - "abs_tolerance": 0.01, - "rel_tolerance": 0.001, - "tag_ids": ["550e8400-e29b-41d4-a716-446655440000"] -} +``` +GET /v1/pvs/live/all ``` -**Response:** +Every PV value currently in the cache (for initial table load). + +### Cache Status -```json -{ - "errorCode": 0, - "errorMessage": null, - "payload": { - "id": "660e8400-e29b-41d4-a716-446655440001", - "setpoint_address": "TEST:PV:SETPOINT", - "readback_address": "TEST:PV:READBACK", - ... - } -} +``` +GET /v1/pvs/cache/status ``` -### Bulk Create PVs +Returns cached PV count and Redis connectivity status. + +### Create PV ``` -POST /v1/pvs/multi +POST /v1/pvs ``` -Create multiple PVs in a single request. +Create a new PV. At least one of `setpointAddress`, `readbackAddress`, or `configAddress` must be provided. **Request Body:** ```json { - "pvs": [ - { - "setpoint_address": "PV:1", - "description": "First PV" - }, - { - "setpoint_address": "PV:2", - "description": "Second PV" - } - ] + "setpointAddress": "TEST:PV:SETPOINT", + "readbackAddress": "TEST:PV:READBACK", + "configAddress": null, + "device": "Test Device", + "description": "Test PV for documentation", + "absTolerance": 0.01, + "relTolerance": 0.001, + "readOnly": false, + "tags": ["550e8400-e29b-41d4-a716-446655440000"] } ``` -### Update PV +### Bulk Create PVs ``` -PUT /v1/pvs/{id} +POST /v1/pvs/multi ``` -Update an existing PV. +Create multiple PVs in one request. Body is a JSON **array** of `NewPVElementDTO` objects (same shape as `POST /v1/pvs`). -**Path Parameters:** +### Update PV -| Parameter | Type | Description | -|-----------|------|-------------| -| `id` | UUID | PV ID | +``` +PUT /v1/pvs/{id} +``` + +Partially update a PV. Unspecified fields are left unchanged. **Request Body:** ```json { "description": "Updated description", - "abs_tolerance": 0.02 + "absTolerance": 0.02, + "relTolerance": null, + "readOnly": true, + "tags": ["tag-id-1", "tag-id-2"] } ``` @@ -271,14 +276,6 @@ Update an existing PV. DELETE /v1/pvs/{id} ``` -Delete a PV. - -**Path Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `id` | UUID | PV ID | - --- ## Snapshot Endpoints @@ -289,33 +286,16 @@ Delete a PV. GET /v1/snapshots ``` -List all snapshots. +List all snapshots, optionally filtered. **Query Parameters:** | Parameter | Type | Description | |-----------|------|-------------| -| `limit` | integer | Max results | -| `offset` | integer | Skip N results | - -**Response:** +| `title` | string | Filter by title substring | +| `tags` | array | Filter by tag IDs (returns snapshots containing PVs with any of these tags) | -```json -{ - "errorCode": 0, - "errorMessage": null, - "payload": [ - { - "id": "...", - "title": "Morning snapshot", - "comment": "Before tuning", - "created_by": "operator", - "created_at": "2024-01-15T10:30:00Z", - "pv_count": 1500 - } - ] -} -``` +Response is an array of `SnapshotSummaryDTO` (`id`, `title`, `description`, `createdDate`, `createdBy`, `pvCount`). ### Create Snapshot @@ -323,43 +303,44 @@ List all snapshots. POST /v1/snapshots ``` -Create a new snapshot (asynchronous operation). +Create a new snapshot. Captures the current state of **all** configured PVs. **Request Body:** ```json { "title": "Morning snapshot", - "comment": "Before beam tuning", - "created_by": "operator", - "pv_ids": ["...", "..."], - "tag_ids": ["..."], - "use_cache": true + "description": "Before beam tuning" } ``` | Field | Type | Description | |-------|------|-------------| -| `title` | string | Snapshot name | -| `comment` | string | Optional description | -| `created_by` | string | Creator identifier | -| `pv_ids` | array | Specific PVs to include | -| `tag_ids` | array | Include PVs with these tags | -| `use_cache` | boolean | Read from Redis cache (fast) or EPICS (fresh) | +| `title` | string | Snapshot name (required, 1-255 chars) | +| `description` | string | Optional description | -**Response:** +**Query Parameters:** + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `async` | boolean | `true` | Return a job ID immediately and run in the background | +| `use_cache` | boolean | `true` | `true` reads from Redis cache (<5s for 40K PVs); `false` reads directly from EPICS (30-60s) | +| `use_arq` | boolean | `true` | `true` uses the Arq persistent queue; `false` uses FastAPI `BackgroundTasks` (lost on restart) | + +**Response (async=true):** ```json { "errorCode": 0, "errorMessage": null, "payload": { - "job_id": "770e8400-e29b-41d4-a716-446655440002" + "jobId": "770e8400-e29b-41d4-a716-446655440002", + "message": "Snapshot creation queued for 'Morning snapshot' (from cache)" } } ``` -Poll `/v1/jobs/{job_id}` for progress and completion. +Poll `/v1/jobs/{jobId}` for progress and completion. With `async=false`, the endpoint blocks and returns the completed snapshot inline (may time out on large PV sets). ### Get Snapshot @@ -367,36 +348,36 @@ Poll `/v1/jobs/{job_id}` for progress and completion. GET /v1/snapshots/{id} ``` -Get a snapshot with all its values. +Get a snapshot with its PV values. -**Path Parameters:** +**Query Parameters:** | Parameter | Type | Description | |-----------|------|-------------| -| `id` | UUID | Snapshot ID | +| `limit` | integer | Limit number of PV values returned | +| `offset` | integer | Offset for pagination (default 0) | -**Response:** +Returns `SnapshotDTO` with `pvValues: PVValueDTO[]`. Each `PVValueDTO` has `pvId`, `pvName`, `setpointName`, `readbackName`, `setpointValue` (EpicsValueDTO), `readbackValue` (EpicsValueDTO), `tags`. + +### Update Snapshot + +``` +PUT /v1/snapshots/{id} +``` + +Update snapshot title and/or description. + +**Request Body:** ```json { - "errorCode": 0, - "errorMessage": null, - "payload": { - "id": "...", - "title": "Morning snapshot", - "values": [ - { - "pv_name": "QUAD:LI21:201:BDES", - "setpoint_value": 42.5, - "readback_value": 42.48, - "status": 0, - "severity": 0 - } - ] - } + "title": "Morning snapshot v2", + "description": "After tuning" } ``` +Both fields are optional (nullable). + ### Delete Snapshot ``` @@ -411,16 +392,31 @@ Delete a snapshot and all its values. POST /v1/snapshots/{id}/restore ``` -Restore snapshot values to EPICS (asynchronous operation). +Restore snapshot values to EPICS. -**Response:** +**Request Body (optional):** + +```json +{ "pvIds": ["pv-id-1", "pv-id-2"] } +``` + +If omitted (or `pvIds` is null), all PVs in the snapshot are restored. + +**Query Parameters:** + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `async` | boolean | `true` | Return a job ID and run in the background | +| `use_arq` | boolean | `true` | `true` uses Arq; `false` uses FastAPI `BackgroundTasks` | + +**Response (async=true):** ```json { "errorCode": 0, "errorMessage": null, "payload": { - "job_id": "880e8400-e29b-41d4-a716-446655440003" + "jobId": "880e8400-e29b-41d4-a716-446655440003" } } ``` @@ -428,16 +424,10 @@ Restore snapshot values to EPICS (asynchronous operation). ### Compare Snapshots ``` -GET /v1/snapshots/{id}/compare/{id2} +GET /v1/snapshots/{snapshot1_id}/compare/{snapshot2_id} ``` -Compare two snapshots and show differences. - -**Query Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `tolerance` | float | Difference threshold (default: uses PV tolerances) | +Compare two snapshots. PV-level tolerance (`absTolerance` / `relTolerance` on the PV record) decides whether each pair is within tolerance. **Response:** @@ -446,19 +436,19 @@ Compare two snapshots and show differences. "errorCode": 0, "errorMessage": null, "payload": { - "snapshot1_id": "...", - "snapshot2_id": "...", + "snapshot1Id": "...", + "snapshot2Id": "...", "differences": [ { - "pv_name": "QUAD:LI21:201:BDES", + "pvId": "...", + "pvName": "QUAD:LI21:201:BDES", "value1": 42.5, "value2": 43.2, - "diff": 0.7, - "diff_percent": 1.6 + "withinTolerance": false } ], - "total_pvs": 1500, - "different_count": 23 + "matchCount": 1477, + "differenceCount": 23 } } ``` @@ -539,7 +529,67 @@ Update a tag group. DELETE /v1/tags/{id} ``` -Delete a tag group and all its tags. +Delete a tag group and all its tags. Pass `?force=true` to delete even if the group still has tags referenced by PVs. + +### Add Tag to Group + +``` +POST /v1/tags/{group_id}/tags +``` + +Add a single tag to an existing group. + +**Query Parameters:** + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `skip_duplicates` | boolean | `false` | If `true`, adding an existing tag returns `wasCreated: false` instead of 409 | + +**Request Body:** `TagCreate` (`{ "name": "...", "description": null }`). + +### Update Tag + +``` +PUT /v1/tags/{group_id}/tags/{tag_id} +``` + +Rename or update a tag's description. Body: `TagUpdate` (`name` and/or `description`). + +### Remove Tag from Group + +``` +DELETE /v1/tags/{group_id}/tags/{tag_id} +``` + +### Bulk Import Tags + +``` +POST /v1/tags/bulk +``` + +Requires `write_access`. Import multiple tag groups and tags in one call, with duplicate handling. + +**Request Body:** + +```json +{ + "groups": { + "Area": ["LI21", "LI22", "LI23"], + "Subsystem": ["Magnet", "BPM", "Feedback"] + } +} +``` + +**Response payload:** + +```json +{ + "groupsCreated": 2, + "tagsCreated": 6, + "tagsSkipped": 0, + "warnings": [] +} +``` --- @@ -567,8 +617,8 @@ Get the status and progress of a background job. "errorMessage": null, "payload": { "id": "...", - "type": "CREATE_SNAPSHOT", - "status": "IN_PROGRESS", + "type": "snapshot_create", + "status": "running", "progress": 45, "data": { "processed": 675, @@ -581,43 +631,49 @@ Get the status and progress of a background job. } ``` +**Job Type Values:** + +| Type | Description | +|------|-------------| +| `snapshot_create` | Creating a snapshot | +| `snapshot_restore` | Restoring a snapshot to EPICS | + **Job Status Values:** | Status | Description | |--------|-------------| -| `PENDING` | Job created, waiting for worker | -| `IN_PROGRESS` | Worker is processing | -| `COMPLETED` | Successfully finished | -| `FAILED` | Error occurred | -| `RETRYING` | Automatic retry in progress | +| `pending` | Job created, waiting for worker | +| `running` | Worker is processing | +| `completed` | Successfully finished | +| `failed` | Error occurred | --- ## Health Endpoints -### Overall Health +### Heartbeat ``` -GET /v1/health +GET /v1/health/heartbeat ``` -Check overall API health. +Simple heartbeat check for frontend polling. No authentication required. -### Database Health +### Health Summary ``` -GET /v1/health/db +GET /v1/health/summary ``` -Check database connectivity. +Complete health summary for monitoring dashboards. Includes database, Redis, monitor, and watchdog status in a single response. -### Redis Health +### Monitor Health ``` -GET /v1/health/redis +GET /v1/health/monitor ``` -Check Redis connectivity. +Detailed PV monitor health information. ### Monitor Status @@ -625,7 +681,7 @@ Check Redis connectivity. GET /v1/health/monitor/status ``` -Check PV monitor process health. +Check PV monitor process health via Redis heartbeat. **Response:** @@ -642,6 +698,44 @@ Check PV monitor process health. } ``` +### Watchdog Statistics + +``` +GET /v1/health/watchdog +``` + +Get watchdog health monitoring statistics. + +### Force Watchdog Check + +``` +POST /v1/health/watchdog/check +``` + +Force an immediate watchdog health check. Requires `write_access`. + +### Disconnected PVs + +``` +GET /v1/health/disconnected +``` + +List all PVs currently disconnected from EPICS. + +### Stale PVs + +``` +GET /v1/health/stale +``` + +List PVs that haven't been updated recently. + +**Query Parameters:** + +| Parameter | Type | Description | +|-----------|------|-------------| +| `max_age_seconds` | float | Consider PVs stale after this many seconds | + ### Circuit Breaker Status ``` @@ -664,3 +758,31 @@ Check circuit breaker status by IOC prefix. } } ``` + +### Force Close Circuit Breaker + +``` +POST /v1/health/circuits/{circuit_name}/close +``` + +Force close (reset) a circuit breaker. Requires `write_access`. + +**Path Parameters:** + +| Parameter | Type | Description | +|-----------|------|-------------| +| `circuit_name` | string | Circuit name (IOC identifier derived from PV name, e.g., `QUAD:LI21`) | + +### Force Open Circuit Breaker + +``` +POST /v1/health/circuits/{circuit_name}/open +``` + +Force open a circuit breaker (block all requests to this IOC). Requires `write_access`. + +**Path Parameters:** + +| Parameter | Type | Description | +|-----------|------|-------------| +| `circuit_name` | string | Circuit name (e.g., `BPM:LI22`) | diff --git a/docs/api-reference/index.md b/docs/api-reference/index.md index 751dd17..5289832 100644 --- a/docs/api-reference/index.md +++ b/docs/api-reference/index.md @@ -1,6 +1,6 @@ # API Reference -Squirrel Backend provides a REST API for managing EPICS PVs, snapshots, and tags. +Squirrel Backend provides a REST API and WebSocket streaming API for managing EPICS PVs, snapshots, and tags. ## Base URL @@ -45,78 +45,6 @@ All responses follow a standard format: | `4` | EPICS error | | `5` | Internal error | -## Endpoint Summary - -### PV Endpoints (`/v1/pvs`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/pvs` | Search PVs (simple) | -| `GET` | `/v1/pvs/paged` | Search PVs with pagination | -| `POST` | `/v1/pvs` | Create single PV | -| `POST` | `/v1/pvs/multi` | Bulk create PVs | -| `PUT` | `/v1/pvs/{id}` | Update PV | -| `DELETE` | `/v1/pvs/{id}` | Delete PV | - -### Snapshot Endpoints (`/v1/snapshots`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/snapshots` | List snapshots | -| `POST` | `/v1/snapshots` | Create snapshot (async) | -| `GET` | `/v1/snapshots/{id}` | Get snapshot with values | -| `DELETE` | `/v1/snapshots/{id}` | Delete snapshot | -| `POST` | `/v1/snapshots/{id}/restore` | Restore to EPICS | -| `GET` | `/v1/snapshots/{id}/compare/{id2}` | Compare two snapshots | - -### Tag Endpoints (`/v1/tags`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/tags` | List tag groups | -| `POST` | `/v1/tags` | Create tag group | -| `GET` | `/v1/tags/{id}` | Get tag group with tags | -| `PUT` | `/v1/tags/{id}` | Update tag group | -| `DELETE` | `/v1/tags/{id}` | Delete tag group | - -### Job Endpoints (`/v1/jobs`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/jobs/{id}` | Get job status and progress | - -### Health Endpoints (`/v1/health`) - -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/health` | Overall health | -| `GET` | `/v1/health/db` | Database connectivity | -| `GET` | `/v1/health/redis` | Redis connectivity | -| `GET` | `/v1/health/monitor/status` | PV monitor health | -| `GET` | `/v1/health/circuits` | Circuit breaker status | - -### WebSocket (`/ws`) - -Real-time PV value streaming. - -## Pagination - -Paginated endpoints use continuation tokens (not offset-based): - -```json -{ - "items": [...], - "next_cursor": "abc123", - "has_more": true -} -``` - -To get the next page: - -``` -GET /v1/pvs/paged?cursor=abc123 -``` - ## Authentication All endpoints require an API key passed via the `X-API-Key` header: @@ -127,34 +55,30 @@ X-API-Key: sq_your_token_here Requests without a valid key return `401 Unauthorized`. -### Permission Levels - | Permission | Required for | |------------|--------------| | `read_access` | GET requests, WebSocket connections | | `write_access` | POST, PUT, DELETE requests | -### Getting an API Key +See [API Key Management](../getting-started/api-keys.md) for details on creating and managing keys. -Use the management script to create your first key: - -```bash -# Docker -docker exec squirrel-api python scripts/create_key.py [--read] [--write] -``` +## Pagination -See [API Key Management](../getting-started/api-keys.md) for full details on creating and managing keys. +Paginated endpoints use continuation tokens (not offset-based): -### API Key Endpoints +```json +{ + "results": [...], + "continuationToken": "abc123", + "hasMore": true +} +``` -The `/v1/api-keys` endpoints allow managing keys via the REST API (requires `write_access`): +To get the next page: -| Method | Endpoint | Description | -|--------|----------|-------------| -| `GET` | `/v1/api-keys` | List all API keys | -| `POST` | `/v1/api-keys` | Create a new API key | -| `DELETE` | `/v1/api-keys/{id}` | Deactivate an API key | -| `GET` | `/v1/api-keys/count` | Count API keys | +``` +GET /v1/pvs/paged?continuationToken=abc123 +``` ## Rate Limiting @@ -162,5 +86,5 @@ No rate limiting is currently enforced. For production deployments, consider add ## Detailed Documentation -- [REST Endpoints](endpoints.md) - Complete endpoint documentation -- [WebSocket](websocket.md) - Real-time streaming API +- [REST Endpoints](endpoints.md) — complete per-endpoint request/response documentation +- [WebSocket](websocket.md) — real-time streaming API diff --git a/docs/api-reference/websocket.md b/docs/api-reference/websocket.md index 9a89b13..840f7e0 100644 --- a/docs/api-reference/websocket.md +++ b/docs/api-reference/websocket.md @@ -7,160 +7,187 @@ The WebSocket API provides real-time PV value streaming with diff-based updates. Connect to the WebSocket endpoint and include your API key in the `X-API-Key` header: ``` -ws://localhost:8080/ws +ws://localhost:8080/v1/ws/pvs ``` Or for local development: ``` -ws://localhost:8000/ws +ws://localhost:8000/v1/ws/pvs ``` +An alias endpoint is also available at `/v1/ws/live`. + !!! info "Authentication" WebSocket connections require an `X-API-Key` header with a key that has `read_access`. Connections without a valid key are rejected with close code `1008 (Policy Violation)`. See [API Key Management](../getting-started/api-keys.md). ## Message Format -All messages are JSON objects with an `action` field. +All messages are JSON objects. Both client and server messages use a `type` field as the discriminator. -### Client Messages +### Client → Server #### Subscribe to PVs ```json { - "action": "subscribe", - "pv_names": ["PV:NAME:1", "PV:NAME:2", "PV:NAME:3"] + "type": "subscribe", + "pvNames": ["PV:NAME:1", "PV:NAME:2", "PV:NAME:3"] } ``` +After subscribing, the server immediately sends an `initial` message with current cached values for any subscribed PVs that exist in Redis. + #### Unsubscribe from PVs ```json { - "action": "unsubscribe", - "pv_names": ["PV:NAME:1"] + "type": "unsubscribe", + "pvNames": ["PV:NAME:1"] } ``` -#### Get Current Subscriptions +#### Get all cached values + +Returns every PV value currently in the Redis cache (one-shot, not tied to your subscriptions). + +```json +{ "type": "get_all" } +``` + +#### Ping + +```json +{ "type": "ping" } +``` + +The server replies with a `pong`. + +### Server → Client + +#### Initial values + +Sent once after a successful `subscribe`, containing the current cached values for subscribed PVs that have data in Redis. ```json { - "action": "list_subscriptions" + "type": "initial", + "data": { + "PV:NAME:1": { + "value": 42.5, + "connected": true, + "updated_at": 1705312200.123, + "status": "NO_ALARM", + "severity": 0, + "timestamp": 1705312200.000, + "units": "mA" + } + }, + "count": 1 } ``` -### Server Messages +The per-PV shape matches the Redis cache entry (`app/services/redis_service.py::PVCacheEntry`). Null fields are omitted from the payload. -#### PV Update +#### Diff updates -When a subscribed PV value changes: +Emitted periodically (see batching window below) with only the PVs that changed since the last diff and are in your subscription set. ```json { - "type": "update", - "pv_name": "PV:NAME:1", - "value": 42.5, - "timestamp": "2024-01-15T10:30:00.123456Z", - "status": 0, - "severity": 0 + "type": "diff", + "data": { + "PV:NAME:1": { + "value": 43.0, + "connected": true, + "updated_at": 1705312201.456, + "status": "NO_ALARM", + "severity": 0 + } + }, + "count": 1, + "timestamp": 1705312201.500 } ``` -#### Batch Update +Timestamps in diff and heartbeat messages are Unix seconds (float), not ISO strings. -Multiple updates are batched together (100ms window): +#### Heartbeat + +Sent every ~5 seconds on every connection, regardless of subscriptions. Useful for keep-alive and for surfacing monitor health in the UI. ```json { - "type": "batch_update", - "updates": [ - { - "pv_name": "PV:NAME:1", - "value": 42.5, - "timestamp": "2024-01-15T10:30:00.123456Z" - }, - { - "pv_name": "PV:NAME:2", - "value": 3.14, - "timestamp": "2024-01-15T10:30:00.123789Z" - } - ] + "type": "heartbeat", + "timestamp": 1705312205.000, + "monitor_heartbeat": 1705312204.950, + "monitor_alive": true } ``` -#### Subscription Confirmation +#### All values + +Response to `get_all`. ```json { - "type": "subscribed", - "pv_names": ["PV:NAME:1", "PV:NAME:2"], - "initial_values": { - "PV:NAME:1": {"value": 42.5, "timestamp": "..."}, - "PV:NAME:2": {"value": 3.14, "timestamp": "..."} - } + "type": "all_values", + "values": { "PV:NAME:1": { "value": 42.5, "connected": true, "updated_at": 1705312200.123 } }, + "count": 1 } ``` +#### Pong + +Response to `ping`. + +```json +{ "type": "pong", "timestamp": 1705312205.000 } +``` + #### Error ```json { "type": "error", - "message": "PV not found: INVALID:PV:NAME" + "message": "Error description" } ``` ## JavaScript Example ```javascript -const ws = new WebSocket('ws://localhost:8080/ws', [], { +const ws = new WebSocket('ws://localhost:8080/v1/ws/pvs', [], { headers: { 'X-API-Key': 'sq_your_token_here' } }); ws.onopen = () => { - console.log('Connected to WebSocket'); - - // Subscribe to PVs ws.send(JSON.stringify({ - action: 'subscribe', - pv_names: ['QUAD:LI21:201:BDES', 'QUAD:LI21:201:BACT'] + type: 'subscribe', + pvNames: ['QUAD:LI21:201:BDES', 'QUAD:LI21:201:BACT'] })); }; ws.onmessage = (event) => { - const data = JSON.parse(event.data); + const msg = JSON.parse(event.data); - switch (data.type) { - case 'subscribed': - console.log('Subscribed to:', data.pv_names); - console.log('Initial values:', data.initial_values); + switch (msg.type) { + case 'initial': + console.log('Initial values:', msg.data); break; - - case 'update': - console.log(`${data.pv_name} = ${data.value}`); + case 'diff': + for (const [pvName, entry] of Object.entries(msg.data)) { + console.log(`${pvName} = ${entry.value}`); + } break; - - case 'batch_update': - data.updates.forEach(update => { - console.log(`${update.pv_name} = ${update.value}`); - }); + case 'heartbeat': + if (!msg.monitor_alive) console.warn('PV monitor is down'); break; - case 'error': - console.error('WebSocket error:', data.message); + console.error(msg.message); break; } }; - -ws.onclose = () => { - console.log('Disconnected from WebSocket'); -}; - -ws.onerror = (error) => { - console.error('WebSocket error:', error); -}; ``` ## Python Example @@ -171,29 +198,24 @@ import json import websockets async def subscribe_to_pvs(): - uri = "ws://localhost:8080/ws" + uri = "ws://localhost:8080/v1/ws/pvs" headers = {"X-API-Key": "sq_your_token_here"} - async with websockets.connect(uri, additional_headers=headers) as websocket: - # Subscribe to PVs - await websocket.send(json.dumps({ - "action": "subscribe", - "pv_names": ["QUAD:LI21:201:BDES", "QUAD:LI21:201:BACT"] + async with websockets.connect(uri, additional_headers=headers) as ws: + await ws.send(json.dumps({ + "type": "subscribe", + "pvNames": ["QUAD:LI21:201:BDES", "QUAD:LI21:201:BACT"] })) - # Receive updates - async for message in websocket: - data = json.loads(message) - - if data["type"] == "subscribed": - print(f"Subscribed to: {data['pv_names']}") - - elif data["type"] == "update": - print(f"{data['pv_name']} = {data['value']}") - - elif data["type"] == "batch_update": - for update in data["updates"]: - print(f"{update['pv_name']} = {update['value']}") + async for message in ws: + msg = json.loads(message) + if msg["type"] == "initial": + print("Initial:", msg["data"]) + elif msg["type"] == "diff": + for pv_name, entry in msg["data"].items(): + print(f"{pv_name} = {entry['value']}") + elif msg["type"] == "heartbeat" and not msg["monitor_alive"]: + print("Monitor is down") asyncio.run(subscribe_to_pvs()) ``` @@ -201,106 +223,78 @@ asyncio.run(subscribe_to_pvs()) ## React Hook Example ```typescript -import { useEffect, useState, useCallback } from 'react'; - -interface PVValue { - value: number; - timestamp: string; - status: number; - severity: number; +import { useEffect, useState } from 'react'; + +interface PVEntry { + value: unknown; + connected: boolean; + updated_at: number; + status?: string; + severity?: number; + timestamp?: number; + units?: string; } function usePVSubscription(pvNames: string[], apiKey: string) { - const [values, setValues] = useState>({}); + const [values, setValues] = useState>({}); const [connected, setConnected] = useState(false); useEffect(() => { - const ws = new WebSocket('ws://localhost:8080/ws', [], { + const ws = new WebSocket('ws://localhost:8080/v1/ws/pvs', [], { headers: { 'X-API-Key': apiKey } }); ws.onopen = () => { setConnected(true); - ws.send(JSON.stringify({ - action: 'subscribe', - pv_names: pvNames - })); + ws.send(JSON.stringify({ type: 'subscribe', pvNames })); }; ws.onmessage = (event) => { - const data = JSON.parse(event.data); - - if (data.type === 'subscribed') { - setValues(data.initial_values); - } else if (data.type === 'update') { - setValues(prev => ({ - ...prev, - [data.pv_name]: { - value: data.value, - timestamp: data.timestamp, - status: data.status, - severity: data.severity - } - })); - } else if (data.type === 'batch_update') { - setValues(prev => { - const updated = { ...prev }; - data.updates.forEach((update: any) => { - updated[update.pv_name] = { - value: update.value, - timestamp: update.timestamp, - status: update.status || 0, - severity: update.severity || 0 - }; - }); - return updated; - }); + const msg = JSON.parse(event.data); + if (msg.type === 'initial' || msg.type === 'diff') { + setValues(prev => ({ ...prev, ...msg.data })); } }; ws.onclose = () => setConnected(false); - return () => ws.close(); }, [pvNames.join(',')]); return { values, connected }; } +``` + +## Connection Status Endpoint + +For diagnostics, `GET /v1/ws/status` returns subscription and connection statistics for the current API instance: -// Usage -function PVDisplay() { - const { values, connected } = usePVSubscription( - ['QUAD:LI21:201:BDES', 'QUAD:LI21:201:BACT'], - 'sq_your_token_here' - ); - - return ( -
-

Status: {connected ? 'Connected' : 'Disconnected'}

- {Object.entries(values).map(([name, pv]) => ( -

{name}: {pv.value}

- ))} -
- ); +```json +{ + "instanceId": "...", + "multiInstanceEnabled": true, + "activeConnections": 3, + "totalSubscriptions": 42, + "uniquePVsSubscribed": 37, + "bufferSize": 0, + "batchIntervalMs": 100 } ``` +Requires `read_access`. + ## Performance Considerations ### Batching -Updates are batched with a 100ms window to reduce message frequency: - -- Individual updates within 100ms are combined -- Reduces WebSocket message overhead -- Client receives fewer, larger messages +`diff` messages are flushed on a rolling window (default 100ms, `SQUIRREL_WEBSOCKET_BATCH_INTERVAL_MS`). Multiple PV changes arriving within the window are coalesced into one message. ### Diff-Based Updates -Only changed values are sent: +Only changed PVs are sent after the initial snapshot: -- Initial subscription sends all current values -- Subsequent messages only include changed PVs -- Reduces bandwidth by 10-100x compared to polling +- `initial` carries the current cache state for your subscription set +- `diff` messages only include PVs that changed since the last flush +- Reduces bandwidth 10-100x compared to polling ### Multi-Instance Support @@ -310,51 +304,9 @@ WebSocket connections work across multiple API instances: - PV updates broadcast via Redis pub/sub - Clients can connect to any API instance -## Status and Severity Codes +## Reconnection -EPICS alarm status and severity are included in updates: - -**Status Codes:** - -| Code | Meaning | -|------|---------| -| 0 | NO_ALARM | -| 1 | READ | -| 2 | WRITE | -| 3 | HIHI | -| 4 | HIGH | -| 5 | LOLO | -| 6 | LOW | -| 7 | STATE | -| 8 | COS | -| 9 | COMM | -| 10 | TIMEOUT | - -**Severity Codes:** - -| Code | Meaning | -|------|---------| -| 0 | NO_ALARM | -| 1 | MINOR | -| 2 | MAJOR | -| 3 | INVALID | - -## Connection Management - -### Heartbeat - -The server sends periodic heartbeat messages to keep connections alive: - -```json -{ - "type": "heartbeat", - "timestamp": "2024-01-15T10:30:00Z" -} -``` - -### Reconnection - -Implement reconnection logic in your client: +Clients should implement exponential backoff and re-subscribe on reconnect: ```javascript function createReconnectingWebSocket(url, apiKey, onMessage) { @@ -366,24 +318,18 @@ function createReconnectingWebSocket(url, apiKey, onMessage) { ws = new WebSocket(url, [], { headers: { 'X-API-Key': apiKey } }); ws.onopen = () => { - console.log('Connected'); - reconnectInterval = 1000; // Reset on successful connect + reconnectInterval = 1000; }; ws.onmessage = onMessage; ws.onclose = () => { - console.log(`Reconnecting in ${reconnectInterval}ms...`); setTimeout(connect, reconnectInterval); reconnectInterval = Math.min(reconnectInterval * 2, maxInterval); }; } connect(); - - return { - send: (data) => ws.send(data), - close: () => ws.close() - }; + return { send: (data) => ws.send(data), close: () => ws.close() }; } ``` diff --git a/docs/architecture/data-flow.md b/docs/architecture/data-flow.md index e831512..defdb49 100644 --- a/docs/architecture/data-flow.md +++ b/docs/architecture/data-flow.md @@ -6,46 +6,8 @@ This document describes how data flows through the Squirrel Backend system for k Snapshot creation is an asynchronous operation that can read PV values from Redis cache (fast) or directly from EPICS (slower but always current). -``` -API Request (/v1/snapshots POST) - │ - ▼ -┌─────────────────────────────────┐ -│ JobService creates Job record │ -└─────────────────┬───────────────┘ - │ - ▼ (enqueue to Arq) -┌─────────────────────────────────┐ -│ Return Job ID immediately │ -└─────────────────┬───────────────┘ - │ - ▼ (Arq worker picks up) -┌─────────────────────────────────┐ -│ Read PV addresses from DB │ -└─────────────────┬───────────────┘ - │ - ▼ (use_cache?) - ┌─────┴─────┐ - │ │ - ▼ ▼ -┌───────┐ ┌───────────────┐ -│ Redis │ │ EPICS direct │ -│ <5s │ │ 30-60s │ -└───┬───┘ └───────┬───────┘ - │ │ - └───────┬───────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ BulkInsertService (COPY) │ -│ Insert SnapshotValues to DB │ -└─────────────────┬───────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Mark Job as COMPLETED │ -└─────────────────────────────────┘ -``` +![Snapshot creation flow](../assets/figure-2-snapshot-flow-light.png#only-light) +![Snapshot creation flow](../assets/figure-2-snapshot-flow-dark.png#only-dark) ### Performance Comparison @@ -58,63 +20,8 @@ API Request (/v1/snapshots POST) The PV Monitor process maintains a live cache of all PV values in Redis and broadcasts updates to connected WebSocket clients. -``` - PV Monitor Process Startup - │ - ▼ - ┌────────────────────────┐ - │ Acquire Leader Lock │ - │ (Redis SETNX) │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Load PV addresses │ - │ from PostgreSQL │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Batched PV init │ - │ (500/batch, 100ms) │ - └───────────┬────────────┘ - │ - ┌───────────────┼───────────────┐ - │ │ │ - ▼ ▼ ▼ -┌──────────┐ ┌──────────┐ ┌──────────┐ -│ aioca │ │ aioca │ │ aioca │ -│ monitor │ │ monitor │ │ monitor │ -│ (batch 1)│ │ (batch 2)│ │ (batch N)│ -└────┬─────┘ └────┬─────┘ └────┬─────┘ - │ │ │ - └──────────────┼──────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Redis Cache │ - │ • Hash: pv:values │ - │ • Pub/Sub: updates │ - └───────────┬────────────┘ - │ - ▼ (Redis pub/sub) - ┌────────────────────────┐ - │ API Instances │ - │ DiffStreamManager │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Subscription Registry │ - │ (Redis-based) │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ WebSocket Clients │ - │ (100ms batching) │ - └────────────────────────┘ -``` +![PV monitor startup and fan-out](../assets/figure-4-monitor-startup-light.png#only-light) +![PV monitor startup and fan-out](../assets/figure-4-monitor-startup-dark.png#only-dark) ### Batching Strategy @@ -130,42 +37,8 @@ PV subscriptions are created in batches to prevent overwhelming the EPICS networ WebSocket clients receive diff-based updates to minimize bandwidth: -``` -┌──────────────────┐ ┌──────────────────┐ -│ Client A │ │ Client B │ -│ Subscribed to: │ │ Subscribed to: │ -│ PV1, PV2, PV3 │ │ PV2, PV4 │ -└────────┬─────────┘ └────────┬─────────┘ - │ │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ Subscription Registry │ - │ (Redis Set per PV) │ - └───────────┬────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ DiffStreamManager │ - │ (per API instance) │ - └───────────┬────────────┘ - │ - │ PV2 changes - ▼ - ┌────────────────────────┐ - │ Batch updates │ - │ (100ms window) │ - └───────────┬────────────┘ - │ - ┌───────────┴───────────┐ - │ │ - ▼ ▼ -┌──────────────────┐ ┌──────────────────┐ -│ Client A │ │ Client B │ -│ Receives: PV2 │ │ Receives: PV2 │ -└──────────────────┘ └──────────────────┘ -``` +![WebSocket subscription fan-out](../assets/figure-3-pv-fanout-light.png#only-light) +![WebSocket subscription fan-out](../assets/figure-3-pv-fanout-dark.png#only-dark) ### Bandwidth Savings @@ -178,85 +51,15 @@ WebSocket clients receive diff-based updates to minimize bandwidth: Restoring a snapshot writes values back to EPICS: -``` -API Request (/v1/snapshots/{id}/restore POST) - │ - ▼ -┌─────────────────────────────────┐ -│ Load snapshot from DB │ -└─────────────────┬───────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Create Job record │ -└─────────────────┬───────────────┘ - │ - ▼ (enqueue to Arq) -┌─────────────────────────────────┐ -│ Return Job ID immediately │ -└─────────────────┬───────────────┘ - │ - ▼ (Arq worker picks up) -┌─────────────────────────────────┐ -│ Parallel EPICS writes │ -│ (chunked, 1000/batch) │ -└─────────────────┬───────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Circuit breaker per IOC │ -│ (fail-fast on unresponsive) │ -└─────────────────┬───────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Mark Job as COMPLETED │ -│ (with success/failure counts) │ -└─────────────────────────────────┘ -``` +![Snapshot restore flow](../assets/figure-8-restore-light.png#only-light) +![Snapshot restore flow](../assets/figure-8-restore-dark.png#only-dark) ## Job Tracking All long-running operations use the job tracking system: -``` -┌─────────────────┐ -│ API Request │ -└────────┬────────┘ - │ - ▼ -┌─────────────────┐ -│ Create Job │──────────────┐ -│ status=PENDING │ │ -└────────┬────────┘ │ - │ │ - ▼ │ -┌─────────────────┐ │ -│ Enqueue Task │ │ -│ (Arq/Redis) │ │ -└────────┬────────┘ │ - │ │ - ▼ │ -┌─────────────────┐ │ -│ Return Job ID │◄─────────────┘ -└────────┬────────┘ - │ - │ (client polls) - ▼ -┌─────────────────┐ -│ GET /jobs/{id} │ -└────────┬────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ Job Status Response │ -│ { │ -│ "status": "IN_PROGRESS", │ -│ "progress": 45, │ -│ "data": {...} │ -│ } │ -└─────────────────────────────────┘ -``` +![Job tracking and client polling](../assets/figure-5-polling-light.png#only-light) +![Job tracking and client polling](../assets/figure-5-polling-dark.png#only-dark) ### Job States diff --git a/docs/architecture/distributed-system.md b/docs/architecture/distributed-system.md index ea0a77a..31848bd 100644 --- a/docs/architecture/distributed-system.md +++ b/docs/architecture/distributed-system.md @@ -10,9 +10,9 @@ FastAPI application serving REST and WebSocket endpoints. **Decoupled from PV mo **Startup Sequence:** -1. Connect to Redis -2. Start WebSocket DiffManager (subscribes to Redis pub/sub) -3. Initialize EPICS service (for direct reads during snapshot restore) +1. Initialize EPICS service (for direct reads during snapshot restore) +2. Connect to Redis (used for reading cached values) +3. Start WebSocket DiffManager (subscribes to Redis pub/sub) **Key Features:** @@ -59,12 +59,14 @@ Pydantic-Settings with environment variable support (prefix: `SQUIRREL_`): | Category | Key Settings | |----------|--------------| -| Database | `database_url`, `pool_size` (30), `max_overflow` (20) | -| EPICS | `ca_addr_list`, `ca_timeout` (10s), `chunk_size` (1000) | -| Redis | `redis_url`, `pv_cache_ttl` (60s) | -| PV Monitor | `batch_size` (500), `batch_delay_ms` (100) | -| Watchdog | `check_interval` (60s), `stale_threshold` (300s) | -| WebSocket | `batch_interval_ms` (100) | +| Database | `database_url`, `database_pool_size` (30), `database_max_overflow` (20) | +| EPICS | `epics_ca_timeout` (10s), `epics_ca_conn_timeout` (5s), `epics_chunk_size` (1000) | +| Redis | `redis_url`, `redis_pv_cache_ttl` (60s) | +| PV Monitor | `pv_monitor_batch_size` (500), `pv_monitor_batch_delay_ms` (100) | +| Watchdog | `watchdog_check_interval` (60s), `watchdog_stale_threshold` (300s) | +| WebSocket | `websocket_batch_interval_ms` (100) | + +EPICS network discovery (`EPICS_CA_ADDR_LIST`, `EPICS_PVA_ADDR_LIST`, etc.) is configured via the standard EPICS library env vars — not through Pydantic settings. ## Performance Optimizations @@ -111,41 +113,8 @@ Pydantic-Settings with environment variable support (prefix: `SQUIRREL_`): The circuit breaker prevents cascading failures when EPICS IOCs become unresponsive. -``` -EPICS Request (caget/caput) - │ - ▼ -┌─────────────────────────────────┐ -│ Circuit Breaker Check │ -│ (by IOC prefix) │ -└─────────────────┬───────────────┘ - │ - ┌────┴────┐ - │ │ - ▼ ▼ -┌───────┐ ┌───────────────────┐ -│ OPEN │ │ CLOSED/HALF-OPEN │ -│ │ │ │ -│ Fail │ │ Execute request │ -│ Fast │ │ │ -└───┬───┘ └─────────┬─────────┘ - │ │ - │ ┌────┴────┐ - │ │ │ - │ ▼ ▼ - │ ┌───────┐ ┌───────┐ - │ │Success│ │Failure│ - │ └───┬───┘ └───┬───┘ - │ │ │ - │ ▼ ▼ - │ Reset count Increment count - │ (HALF→CLOSED) (threshold→OPEN) - │ │ │ - └────────────┴───────────┘ - │ - ▼ - Response -``` +![Circuit breaker state flow](../assets/figure-6-circuit-breaker-light.png#only-light) +![Circuit breaker state flow](../assets/figure-6-circuit-breaker-dark.png#only-dark) **States:** @@ -155,61 +124,15 @@ EPICS Request (caget/caput) ## Deployment Options -### Docker Compose (Recommended) - -Full distributed deployment with all services: - -```bash -cd docker -docker-compose up --build -``` - -This starts: - -- **PostgreSQL** (port 5432) -- **Redis** (port 6379) -- **API** (port 8000) - REST/WebSocket server -- **Monitor** (1 replica) - PV monitoring -- **Worker** (2 replicas) - Background task processing - -### Legacy Mode - -For simpler deployments or backward compatibility: - -```bash -cd docker -docker-compose --profile legacy up backend db redis -``` - -### Local Development - -```bash -# 1. Start infrastructure -cd docker -docker-compose up -d db redis - -# 2. Set up Python environment -cd .. -python -m venv venv -source venv/bin/activate -pip install -e ".[dev]" -cp .env.example .env - -# 3. Run migrations -alembic upgrade head - -# 4. Start services (in separate terminals) -uvicorn app.main:app --reload --port 8000 # API -python -m app.monitor_main # Monitor -arq app.worker.WorkerSettings # Worker -``` +For Docker Compose and local-development setup, see [Installation](../getting-started/installation.md). ## Health Monitoring | Endpoint | Description | |----------|-------------| -| `/v1/health` | Overall API health | -| `/v1/health/db` | Database connectivity | -| `/v1/health/redis` | Redis connectivity | +| `/v1/health/heartbeat` | Lightweight liveness check (no auth) | +| `/v1/health/summary` | Consolidated health for dashboards (DB, Redis, monitor, watchdog) | | `/v1/health/monitor/status` | PV monitor process health (via heartbeat) | | `/v1/health/circuits` | Circuit breaker status by IOC prefix | + +See [REST Endpoints › Health](../api-reference/endpoints.md#health-endpoints) for the full list. diff --git a/docs/architecture/index.md b/docs/architecture/index.md index 704b8df..cad4cc9 100644 --- a/docs/architecture/index.md +++ b/docs/architecture/index.md @@ -6,71 +6,10 @@ The system uses a **distributed architecture** with separate processes for API s ## System Architecture -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Load Balancer │ -└─────────────────────────────────────────────────────────────────────────────┘ - │ - ┌───────────────────────┼───────────────────────┐ - ▼ ▼ ▼ - ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ - │ API Instance │ │ API Instance │ │ API Instance │ - │ (squirrel-api) │ │ (squirrel-api) │ │ (squirrel-api) │ - │ REST + WebSocket│ │ REST + WebSocket│ │ REST + WebSocket│ - └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ - │ │ │ - └──────────────────────┼──────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ Redis │ - │ • PV Value Cache (Hash: pv:values) │ - │ • Pub/Sub (pv updates, WebSocket broadcasts) │ - │ • Subscription Registry (multi-instance WebSocket support) │ - │ • Arq Job Queue │ - │ • Monitor Leader Election Lock │ - └──────────────────────────────┬──────────────────────────────────────────┘ - │ - ┌────────────────────────┼────────────────────────┐ - ▼ ▼ ▼ -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ PV Monitor │ │ Arq Worker │ │ Arq Worker │ -│ (squirrel-monitor)│ │ (squirrel-worker)│ │ (squirrel-worker)│ -│ Single instance │ │ Scalable │ │ Scalable │ -│ Leader election │ │ │ │ │ -└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ - │ │ │ - └───────────────────────┼───────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ PostgreSQL │ - │ • PV metadata and configuration │ - │ • Snapshots and snapshot values │ - │ • Tags and tag groups │ - │ • Job tracking │ - └─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ - ┌─────────────────────────────────────────────────────────────────────────┐ - │ EPICS IOCs │ - │ • 40-50K Process Variables │ - │ • Channel Access protocol │ - └─────────────────────────────────────────────────────────────────────────┘ -``` +![System architecture](../assets/figure-1-system-architecture-light.png#only-light) +![System architecture](../assets/figure-1-system-architecture-dark.png#only-dark) -## Technology Stack - -| Category | Technology | Purpose | -|----------|------------|---------| -| **Framework** | FastAPI 0.109+ | REST API and WebSocket | -| **Language** | Python 3.11+ | Async/await support | -| **Database** | PostgreSQL 16+ | Primary data store | -| **ORM** | SQLAlchemy 2.0+ (async) | Database abstraction | -| **Cache** | Redis 7+ | PV value caching, pub/sub | -| **EPICS** | aioca 1.7+ | Async Channel Access | -| **Task Queue** | Arq | Redis-backed job queue | -| **Server** | Uvicorn | ASGI server | +For the full technology stack, see the [Home](../index.md#technology-stack) page. ## Directory Structure @@ -159,43 +98,8 @@ squirrel-backend/ ## Database Models -``` -┌──────────────────┐ ┌──────────────────┐ -│ PV │ │ TagGroup │ -├──────────────────┤ ├──────────────────┤ -│ setpoint_address │ │ name │ -│ readback_address │ │ description │ -│ config_address │ └────────┬─────────┘ -│ device │ │ -│ description │ │ 1:n -│ abs_tolerance │ ▼ -│ rel_tolerance │ ┌──────────────────┐ -└────────┬─────────┘ │ Tag │ - │ ├──────────────────┤ - │ n:m │ name │ - └───────────────│ tag_group_id │ - └──────────────────┘ - -┌──────────────────┐ ┌──────────────────┐ -│ Snapshot │ │ Job │ -├──────────────────┤ ├──────────────────┤ -│ title │ │ type (enum) │ -│ comment │ │ status (enum) │ -│ created_by │ │ progress (0-100) │ -└────────┬─────────┘ │ data (JSONB) │ - │ │ result_id │ - │ 1:n │ retry_count │ - ▼ └──────────────────┘ -┌──────────────────┐ -│ SnapshotValue │ -├──────────────────┤ -│ pv_name │ -│ setpoint_value │ -│ readback_value │ -│ status │ -│ severity │ -└──────────────────┘ -``` +![Data model](../assets/figure-7-data-model-light.png#only-light) +![Data model](../assets/figure-7-data-model-dark.png#only-dark) ## Services Layer @@ -212,26 +116,8 @@ squirrel-backend/ ## External Services -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Squirrel Backend Services │ -│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ -│ │ API │ │ Monitor │ │ Worker │ │ Worker │ │ -│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ -└───────┼────────────┼────────────┼────────────┼──────────────────┘ - │ │ │ │ - └────────────┼────────────┼────────────┘ - │ │ - ┌────────────┼────────────┼────────────┐ - ▼ ▼ ▼ ▼ -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ PostgreSQL │ │ Redis │ │ EPICS │ -│ (asyncpg) │ │ (hiredis) │ │ (aioca) │ -├─────────────────┤ ├─────────────────┤ ├─────────────────┤ -│ • PV metadata │ │ • Value cache │ │ • Channel Access│ -│ • Snapshots │ │ • Pub/Sub │ │ • 40K+ PVs │ -│ • Tags │ │ • Job queue │ │ • Read/Write │ -│ • Jobs │ │ • Leader lock │ │ • Monitor │ -│ │ │ • Subscriptions │ │ │ -└─────────────────┘ └─────────────────┘ └─────────────────┘ -``` +| Service | Driver | Role | +|---------|--------|------| +| **PostgreSQL** | asyncpg | PV metadata, snapshots, tags, jobs | +| **Redis** | hiredis | PV value cache, pub/sub, Arq job queue, monitor leader lock, WebSocket subscription registry | +| **EPICS** | aioca (CA), p4p (PVA) | Read/write/monitor 40K+ process variables | diff --git a/docs/assets/figure-1-system-architecture-dark.png b/docs/assets/figure-1-system-architecture-dark.png new file mode 100644 index 0000000..a444057 Binary files /dev/null and b/docs/assets/figure-1-system-architecture-dark.png differ diff --git a/docs/assets/figure-1-system-architecture-light.png b/docs/assets/figure-1-system-architecture-light.png new file mode 100644 index 0000000..7844052 Binary files /dev/null and b/docs/assets/figure-1-system-architecture-light.png differ diff --git a/docs/assets/figure-2-snapshot-flow-dark.png b/docs/assets/figure-2-snapshot-flow-dark.png new file mode 100644 index 0000000..8bfc298 Binary files /dev/null and b/docs/assets/figure-2-snapshot-flow-dark.png differ diff --git a/docs/assets/figure-2-snapshot-flow-light.png b/docs/assets/figure-2-snapshot-flow-light.png new file mode 100644 index 0000000..4c9fcc1 Binary files /dev/null and b/docs/assets/figure-2-snapshot-flow-light.png differ diff --git a/docs/assets/figure-3-pv-fanout-dark.png b/docs/assets/figure-3-pv-fanout-dark.png new file mode 100644 index 0000000..8671c3a Binary files /dev/null and b/docs/assets/figure-3-pv-fanout-dark.png differ diff --git a/docs/assets/figure-3-pv-fanout-light.png b/docs/assets/figure-3-pv-fanout-light.png new file mode 100644 index 0000000..535381d Binary files /dev/null and b/docs/assets/figure-3-pv-fanout-light.png differ diff --git a/docs/assets/figure-4-monitor-startup-dark.png b/docs/assets/figure-4-monitor-startup-dark.png new file mode 100644 index 0000000..a16f298 Binary files /dev/null and b/docs/assets/figure-4-monitor-startup-dark.png differ diff --git a/docs/assets/figure-4-monitor-startup-light.png b/docs/assets/figure-4-monitor-startup-light.png new file mode 100644 index 0000000..1a6eb85 Binary files /dev/null and b/docs/assets/figure-4-monitor-startup-light.png differ diff --git a/docs/assets/figure-5-polling-dark.png b/docs/assets/figure-5-polling-dark.png new file mode 100644 index 0000000..a5d3c3c Binary files /dev/null and b/docs/assets/figure-5-polling-dark.png differ diff --git a/docs/assets/figure-5-polling-light.png b/docs/assets/figure-5-polling-light.png new file mode 100644 index 0000000..2b6c812 Binary files /dev/null and b/docs/assets/figure-5-polling-light.png differ diff --git a/docs/assets/figure-6-circuit-breaker-dark.png b/docs/assets/figure-6-circuit-breaker-dark.png new file mode 100644 index 0000000..61f49b5 Binary files /dev/null and b/docs/assets/figure-6-circuit-breaker-dark.png differ diff --git a/docs/assets/figure-6-circuit-breaker-light.png b/docs/assets/figure-6-circuit-breaker-light.png new file mode 100644 index 0000000..e3c5851 Binary files /dev/null and b/docs/assets/figure-6-circuit-breaker-light.png differ diff --git a/docs/assets/figure-7-data-model-dark.png b/docs/assets/figure-7-data-model-dark.png new file mode 100644 index 0000000..0d16a1b Binary files /dev/null and b/docs/assets/figure-7-data-model-dark.png differ diff --git a/docs/assets/figure-7-data-model-light.png b/docs/assets/figure-7-data-model-light.png new file mode 100644 index 0000000..980ddbc Binary files /dev/null and b/docs/assets/figure-7-data-model-light.png differ diff --git a/docs/assets/figure-8-restore-dark.png b/docs/assets/figure-8-restore-dark.png new file mode 100644 index 0000000..c46962e Binary files /dev/null and b/docs/assets/figure-8-restore-dark.png differ diff --git a/docs/assets/figure-8-restore-light.png b/docs/assets/figure-8-restore-light.png new file mode 100644 index 0000000..89ad902 Binary files /dev/null and b/docs/assets/figure-8-restore-light.png differ diff --git a/docs/assets/figure-svg-files/figure-1-system-architecture-dark.svg b/docs/assets/figure-svg-files/figure-1-system-architecture-dark.svg new file mode 100644 index 0000000..4e3ee73 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-1-system-architecture-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 1System architectureEPICS Control SystemEPICS IOCs40–50K process variablesFrontend serverNode web serverOPI / User computerTauri appBackend ServerFastAPIsquirrel-api · ×3EPICS Monitorsquirrel-monitor · singletonPostgresdatabaseArq Workerssquirrel-workerRedis Cachepub/sub · queue · leader lock \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-1-system-architecture-light.svg b/docs/assets/figure-svg-files/figure-1-system-architecture-light.svg new file mode 100644 index 0000000..00eaf82 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-1-system-architecture-light.svg @@ -0,0 +1,2 @@ + +FIGURE 1System architectureEPICS Control SystemEPICS IOCs40–50K process variablesFrontend serverNode web serverOPI / User computerTauri appBackend ServerFastAPIsquirrel-api · ×3EPICS Monitorsquirrel-monitor · singletonPostgresdatabaseArq Workerssquirrel-workerRedis Cachepub/sub · queue · leader lock \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-2-snapshot-flow-dark.svg b/docs/assets/figure-svg-files/figure-2-snapshot-flow-dark.svg new file mode 100644 index 0000000..200dfd5 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-2-snapshot-flow-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 2Snapshot job flowAPI RequestPOST /v1/snapshotsJobService creates Job recordenqueue to ArqReturn Job ID immediatelyArq worker picks upRead PV addresses from DBuse_cache?yesnoRedis< 5 sEPICS direct30 – 60 sBulkInsertService (COPY)insert SnapshotValues to DBMark Job as COMPLETED \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-2-snapshot-flow-light.svg b/docs/assets/figure-svg-files/figure-2-snapshot-flow-light.svg new file mode 100644 index 0000000..6648e3c --- /dev/null +++ b/docs/assets/figure-svg-files/figure-2-snapshot-flow-light.svg @@ -0,0 +1,2 @@ + +FIGURE 2Snapshot job flowAPI RequestPOST /v1/snapshotsJobService creates Job recordenqueue to ArqReturn Job ID immediatelyArq worker picks upRead PV addresses from DBuse_cache?yesnoRedis< 5 sEPICS direct30 – 60 sBulkInsertService (COPY)insert SnapshotValues to DBMark Job as COMPLETED \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-3-pv-fanout-dark.svg b/docs/assets/figure-svg-files/figure-3-pv-fanout-dark.svg new file mode 100644 index 0000000..6dc07de --- /dev/null +++ b/docs/assets/figure-svg-files/figure-3-pv-fanout-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 3PV subscription fanoutClient ASubscribed toPV1PV2PV3Client BSubscribed toPV2PV4Subscription RegistryRedis Set per PVDiffStreamManagerper API instancePV2 changesBatch updates100 ms windowClient AreceivesPV2Client BreceivesPV2 \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-3-pv-fanout-light.svg b/docs/assets/figure-svg-files/figure-3-pv-fanout-light.svg new file mode 100644 index 0000000..325897c --- /dev/null +++ b/docs/assets/figure-svg-files/figure-3-pv-fanout-light.svg @@ -0,0 +1,2 @@ + +FIGURE 3PV subscription fanoutClient ASubscribed toPV1PV2PV3Client BSubscribed toPV2PV4Subscription RegistryRedis Set per PVDiffStreamManagerper API instancePV2 changesBatch updates100 ms windowClient AreceivesPV2Client BreceivesPV2 \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-4-monitor-startup-dark.svg b/docs/assets/figure-svg-files/figure-4-monitor-startup-dark.svg new file mode 100644 index 0000000..6708ffd --- /dev/null +++ b/docs/assets/figure-svg-files/figure-4-monitor-startup-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 4PV Monitor · process startupPV Monitor process startupAcquire Leader LockRedis SETNXLoad PV addressesfrom PostgreSQLBatched PV init500 / batch · 100 msaioca monitorbatch 1aioca monitorbatch 2aioca monitorbatch NRedis Cache• Hash: pv:values• Pub/Sub: updatesRedis pub/subAPI InstancesDiffStreamManagerSubscription RegistryRedis-basedWebSocket Clients100 ms batching \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-4-monitor-startup-light.svg b/docs/assets/figure-svg-files/figure-4-monitor-startup-light.svg new file mode 100644 index 0000000..a485590 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-4-monitor-startup-light.svg @@ -0,0 +1,2 @@ + +FIGURE 4PV Monitor · process startupPV Monitor process startupAcquire Leader LockRedis SETNXLoad PV addressesfrom PostgreSQLBatched PV init500 / batch · 100 msaioca monitorbatch 1aioca monitorbatch 2aioca monitorbatch NRedis Cache• Hash: pv:values• Pub/Sub: updatesRedis pub/subAPI InstancesDiffStreamManagerSubscription RegistryRedis-basedWebSocket Clients100 ms batching \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-5-polling-dark.svg b/docs/assets/figure-svg-files/figure-5-polling-dark.svg new file mode 100644 index 0000000..4d897b6 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-5-polling-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 5Async job status pollingAPI RequestCreate Jobstatus = PENDINGEnqueue TaskArq · RedisReturn Job IDclient pollsGET /jobs/{id}Job Status Response{"status":"IN_PROGRESS","progress":45,"data":{...}} \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-5-polling-light.svg b/docs/assets/figure-svg-files/figure-5-polling-light.svg new file mode 100644 index 0000000..d974ccd --- /dev/null +++ b/docs/assets/figure-svg-files/figure-5-polling-light.svg @@ -0,0 +1,2 @@ + +FIGURE 5Async job status pollingAPI RequestCreate Jobstatus = PENDINGEnqueue TaskArq · RedisReturn Job IDclient pollsGET /jobs/{id}Job Status Response{"status":"IN_PROGRESS","progress":45,"data":{...}} \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-6-circuit-breaker-dark.svg b/docs/assets/figure-svg-files/figure-6-circuit-breaker-dark.svg new file mode 100644 index 0000000..9d236bc --- /dev/null +++ b/docs/assets/figure-svg-files/figure-6-circuit-breaker-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 6Circuit breaker request pathEPICS Requestcaget · caputCircuit Breaker Checkkeyed by IOC prefixOPENCLOSED · HALF-OPENFail fastreject immediatelyExecute requestforward to EPICS IOCSuccessFailureReset countHALF-OPEN → CLOSEDIncrement countthreshold → OPENResponse \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-6-circuit-breaker-light.svg b/docs/assets/figure-svg-files/figure-6-circuit-breaker-light.svg new file mode 100644 index 0000000..6c51360 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-6-circuit-breaker-light.svg @@ -0,0 +1,2 @@ + +FIGURE 6Circuit breaker request pathEPICS Requestcaget · caputCircuit Breaker Checkkeyed by IOC prefixOPENCLOSED · HALF-OPENFail fastreject immediatelyExecute requestforward to EPICS IOCSuccessFailureReset countHALF-OPEN → CLOSEDIncrement countthreshold → OPENResponse \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-7-data-model-dark.svg b/docs/assets/figure-svg-files/figure-7-data-model-dark.svg new file mode 100644 index 0000000..c1ff419 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-7-data-model-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 7Data model1:nn:m1:nPVsetpoint_addressreadback_addressconfig_addressdevicedescriptionabs_tolerancerel_toleranceTagGroupnamedescriptionTagnametag_group_idSnapshottitlecommentcreated_bySnapshotValuepv_namesetpoint_valuereadback_valuestatusseverityJobtype (enum)status (enum)progress (0–100)data (JSONB)result_idretry_countRELATIONSHIPSone-to-manymany-to-many (junction implied) \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-7-data-model-light.svg b/docs/assets/figure-svg-files/figure-7-data-model-light.svg new file mode 100644 index 0000000..dfaca50 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-7-data-model-light.svg @@ -0,0 +1,2 @@ + +FIGURE 7Data model1:nn:m1:nPVsetpoint_addressreadback_addressconfig_addressdevicedescriptionabs_tolerancerel_toleranceTagGroupnamedescriptionTagnametag_group_idSnapshottitlecommentcreated_bySnapshotValuepv_namesetpoint_valuereadback_valuestatusseverityJobtype (enum)status (enum)progress (0–100)data (JSONB)result_idretry_countRELATIONSHIPSone-to-manymany-to-many (junction implied) \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-8-restore-dark.svg b/docs/assets/figure-svg-files/figure-8-restore-dark.svg new file mode 100644 index 0000000..ca05cb3 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-8-restore-dark.svg @@ -0,0 +1,2 @@ + +FIGURE 8Snapshot restore flowAPI RequestPOST /v1/snapshots/{id}/restoreLoad snapshot from DBfetch SnapshotValues by snapshot_idCreate Job recordtype = restoreenqueue to ArqReturn Job ID immediatelyclient polls GET /jobs/{id}Arq worker picks upParallel EPICS writeschunked · 1000 PVs per batchCircuit breaker per IOCfail-fast on unresponsive IOC · see Figure 6Mark Job as COMPLETEDwith success / failure counts \ No newline at end of file diff --git a/docs/assets/figure-svg-files/figure-8-restore-light.svg b/docs/assets/figure-svg-files/figure-8-restore-light.svg new file mode 100644 index 0000000..8b82049 --- /dev/null +++ b/docs/assets/figure-svg-files/figure-8-restore-light.svg @@ -0,0 +1,2 @@ + +FIGURE 8Snapshot restore flowAPI RequestPOST /v1/snapshots/{id}/restoreLoad snapshot from DBfetch SnapshotValues by snapshot_idCreate Job recordtype = restoreenqueue to ArqReturn Job ID immediatelyclient polls GET /jobs/{id}Arq worker picks upParallel EPICS writeschunked · 1000 PVs per batchCircuit breaker per IOCfail-fast on unresponsive IOC · see Figure 6Mark Job as COMPLETEDwith success / failure counts \ No newline at end of file diff --git a/docs/development/index.md b/docs/development/index.md index d3b76ad..586b99d 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -2,89 +2,8 @@ This guide covers setting up a development environment and contributing to Squirrel Backend. -## Project Structure - -``` -squirrel-backend/ -├── app/ -│ ├── main.py # API entry point -│ ├── monitor_main.py # PV Monitor entry point -│ ├── worker.py # Arq worker configuration -│ ├── config.py # Configuration settings -│ ├── api/v1/ # API endpoints -│ ├── models/ # SQLAlchemy models -│ ├── schemas/ # Pydantic schemas (DTOs) -│ ├── services/ # Business logic layer -│ ├── repositories/ # Data access layer -│ ├── tasks/ # Arq task definitions -│ └── db/ # Database session management -├── alembic/ # Database migrations -├── tests/ # Test suite -├── docker/ # Docker configuration -└── scripts/ # Utility scripts -``` - -## Setting Up Development Environment - -### 1. Start Infrastructure - -```bash -cd docker -docker compose up -d db redis -``` - -### 2. Set Up Python Environment - -```bash -cd .. -python -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate -pip install -e ".[dev]" -``` - -Or use the setup script: - -```bash -./setup.sh -``` - -### 3. Configure Environment - -```bash -cp .env.example .env -# Edit .env if needed (defaults work with docker compose) -``` - -### 4. Run Migrations - -```bash -alembic upgrade head -``` - -### 5. Load Test Data - -```bash -python -m scripts.seed_pvs --count 100 -``` - -### 6. Start Services - -Run each in a separate terminal: - -=== "API Server" - ```bash - uvicorn app.main:app --reload --port 8000 - ``` - -=== "PV Monitor" - ```bash - python -m app.monitor_main - ``` - -=== "Worker" - ```bash - arq app.worker.WorkerSettings - ``` +For the project layout, see [Architecture › Directory Structure](../architecture/index.md#directory-structure). +For detailed local-install steps, see [Installation › Local Development](../getting-started/installation.md#option-2-local-development). ## Development Workflow @@ -116,86 +35,9 @@ export SQUIRREL_DEBUG=true uvicorn app.main:app --reload --port 8000 ``` -## Performance Benchmarking - -```bash -# Start the backend first, then run: -python -m scripts.benchmark - -# With more iterations -python -m scripts.benchmark --iterations 10 - -# Skip restore benchmark (no EPICS writes) -python -m scripts.benchmark --skip-restore -``` - ## Utility Scripts -### seed_pvs.py - -Generate test PV data: - -```bash -# Create 1000 test PVs with tags -python -m scripts.seed_pvs --count 1000 - -# Create 50K PVs for performance testing -python -m scripts.seed_pvs --count 50000 --batch-size 5000 - -# Clear existing data first -python -m scripts.seed_pvs --count 1000 --clear -``` - -### upload_csv.py - -Import PVs from CSV: - -```bash -# Dry run -python -m scripts.upload_csv your_pvs.csv --dry-run - -# Full upload -python -m scripts.upload_csv your_pvs.csv -``` - -### benchmark.py - -Performance testing: - -```bash -python -m scripts.benchmark --iterations 5 -``` - -### create_key.py - -Create a new API key. At least one of `--read` / `--write` is required: - -```bash -python -m scripts.create_key --read -python -m scripts.create_key --read --write -``` - -Output includes the app name, key ID, access level, creation timestamp, and the token. The token is only shown at creation time — store it securely. - -### list_keys.py - -List all API keys as a formatted table: - -```bash -# All keys -python -m scripts.list_keys - -# Active keys only -python -m scripts.list_keys --active-only -``` - -### deactivate_key.py - -Deactivate an existing key by its ID. The key is retained in the database but can no longer authenticate requests: - -```bash -python -m scripts.deactivate_key -``` +The `scripts/` directory contains management commands for API keys. See [API Key Management](../getting-started/api-keys.md) for full usage of `create_key.py`, `list_keys.py`, and `deactivate_key.py`. ## IDE Setup @@ -244,6 +86,6 @@ pre-commit run --all-files ## Next Steps -- [Testing](testing.md) - Running and writing tests -- [Database Migrations](migrations.md) - Managing schema changes -- [Code Quality](code-quality.md) - Linting and formatting +- [Testing](testing.md) — running and writing tests +- [Database Migrations](migrations.md) — managing schema changes +- [Code Quality](code-quality.md) — linting and formatting diff --git a/docs/development/migrations.md b/docs/development/migrations.md index 3eff2cb..f0e257c 100644 --- a/docs/development/migrations.md +++ b/docs/development/migrations.md @@ -209,9 +209,6 @@ docker exec -it squirrel-db createdb -U squirrel squirrel # Re-run all migrations alembic upgrade head - -# Re-seed data -python -m scripts.seed_pvs --count 100 ``` ## Alembic Configuration diff --git a/docs/getting-started/configuration.md b/docs/getting-started/configuration.md index 9ef73e2..51afe73 100644 --- a/docs/getting-started/configuration.md +++ b/docs/getting-started/configuration.md @@ -1,8 +1,8 @@ # Configuration -All configuration is via environment variables with the `SQUIRREL_` prefix. +Application settings use the `SQUIRREL_` prefix and are defined in [`app/config.py`](https://github.com/slaclab/react-squirrel-backend/blob/main/app/config.py). EPICS networking is controlled by the standard EPICS environment variables (no `SQUIRREL_` prefix) and is consumed by `aioca` / `p4p` directly. -## Environment Variables +## Application Environment Variables ### Database @@ -16,23 +16,40 @@ All configuration is via environment variables with the `SQUIRREL_` prefix. | Variable | Default | Description | |----------|---------|-------------| -| `SQUIRREL_REDIS_URL` | `redis://localhost:6379/0` | Redis connection string | +| `SQUIRREL_REDIS_URL` | `redis://:squirrel@localhost:6379/0` | Redis connection string (includes `REDIS_PASSWORD` by default) | +| `SQUIRREL_REDIS_USERNAME` | (empty) | Redis authentication username | +| `SQUIRREL_REDIS_PASSWORD` | `squirrel` | Redis authentication password | | `SQUIRREL_REDIS_PV_CACHE_TTL` | `60` | PV cache TTL in seconds | -### EPICS +### EPICS (application-level) | Variable | Default | Description | |----------|---------|-------------| -| `SQUIRREL_EPICS_CA_ADDR_LIST` | (empty) | EPICS CA address list | -| `SQUIRREL_EPICS_CA_TIMEOUT` | `10.0` | Operation timeout in seconds | +| `SQUIRREL_EPICS_CA_TIMEOUT` | `10.0` | Channel Access read timeout in seconds | +| `SQUIRREL_EPICS_CA_CONN_TIMEOUT` | `5.0` | Channel Access connection timeout in seconds | +| `SQUIRREL_EPICS_PVA_TIMEOUT` | `10.0` | PVAccess protocol timeout in seconds | +| `SQUIRREL_EPICS_UNPREFIXED_PVA_FALLBACK` | `false` | If true, unprefixed PVs try CA then PVA on failure | | `SQUIRREL_EPICS_CHUNK_SIZE` | `1000` | PVs per batch in parallel operations | +### EPICS networking (library-level, no `SQUIRREL_` prefix) + +These are standard EPICS environment variables, read by `aioca` and `p4p`. Docker Compose passes them through from `docker/.env` — see [Docker-Specific Configuration](#docker-specific-configuration) below. + +| Variable | Purpose | +|----------|---------| +| `EPICS_CA_ADDR_LIST` | Space-separated list of Channel Access server addresses | +| `EPICS_CA_AUTO_ADDR_LIST` | `YES` to broadcast-discover servers on the local subnet | +| `EPICS_CA_SERVER_PORT`, `EPICS_CA_REPEATER_PORT` | CA ports | +| `EPICS_PVA_ADDR_LIST`, `EPICS_PVA_AUTO_ADDR_LIST` | PVAccess equivalents | +| `EPICS_PVA_SERVER_PORT`, `EPICS_PVA_BROADCAST_PORT` | PVAccess ports | + ### PV Monitor | Variable | Default | Description | |----------|---------|-------------| | `SQUIRREL_PV_MONITOR_BATCH_SIZE` | `500` | PVs per subscription batch | | `SQUIRREL_PV_MONITOR_BATCH_DELAY_MS` | `100` | Delay between batches in ms | +| `SQUIRREL_PV_MONITOR_HEARTBEAT_INTERVAL` | `1.0` | Heartbeat update interval in seconds | ### Watchdog @@ -41,6 +58,7 @@ All configuration is via environment variables with the `SQUIRREL_` prefix. | `SQUIRREL_WATCHDOG_ENABLED` | `true` | Enable health monitoring | | `SQUIRREL_WATCHDOG_CHECK_INTERVAL` | `60.0` | Check interval in seconds | | `SQUIRREL_WATCHDOG_STALE_THRESHOLD` | `300.0` | Stale data threshold in seconds | +| `SQUIRREL_WATCHDOG_RECONNECT_TIMEOUT` | `2.0` | Timeout for reconnection attempts in seconds | ### WebSocket @@ -48,17 +66,17 @@ All configuration is via environment variables with the `SQUIRREL_` prefix. |----------|---------|-------------| | `SQUIRREL_WEBSOCKET_BATCH_INTERVAL_MS` | `100` | Batch interval for updates in ms | -### Legacy Mode +### Debug | Variable | Default | Description | |----------|---------|-------------| -| `SQUIRREL_EMBEDDED_MONITOR` | `false` | Run monitor in API process | +| `SQUIRREL_DEBUG` | `false` | Enable debug logging | -### Debug +### Bulk Operations | Variable | Default | Description | |----------|---------|-------------| -| `SQUIRREL_DEBUG` | `false` | Enable debug logging | +| `SQUIRREL_BULK_INSERT_BATCH_SIZE` | `5000` | Batch size for PostgreSQL COPY bulk inserts | ## Example .env File @@ -68,29 +86,38 @@ SQUIRREL_DATABASE_URL=postgresql+asyncpg://squirrel:squirrel@localhost:5432/squi SQUIRREL_DATABASE_POOL_SIZE=30 SQUIRREL_DATABASE_MAX_OVERFLOW=20 -# EPICS -SQUIRREL_EPICS_CA_ADDR_LIST= +# EPICS (application-level) SQUIRREL_EPICS_CA_TIMEOUT=10.0 +SQUIRREL_EPICS_CA_CONN_TIMEOUT=5.0 +SQUIRREL_EPICS_PVA_TIMEOUT=10.0 SQUIRREL_EPICS_CHUNK_SIZE=1000 +# EPICS networking (library-level — no SQUIRREL_ prefix) +EPICS_CA_AUTO_ADDR_LIST=YES +# EPICS_CA_ADDR_LIST="lcls-prod01:5068 lcls-prod01:5063" + # Redis SQUIRREL_REDIS_URL=redis://localhost:6379/0 +SQUIRREL_REDIS_USERNAME= +SQUIRREL_REDIS_PASSWORD=squirrel SQUIRREL_REDIS_PV_CACHE_TTL=60 # PV Monitor SQUIRREL_PV_MONITOR_BATCH_SIZE=500 SQUIRREL_PV_MONITOR_BATCH_DELAY_MS=100 +SQUIRREL_PV_MONITOR_HEARTBEAT_INTERVAL=1.0 # Watchdog SQUIRREL_WATCHDOG_ENABLED=true SQUIRREL_WATCHDOG_CHECK_INTERVAL=60.0 SQUIRREL_WATCHDOG_STALE_THRESHOLD=300.0 +SQUIRREL_WATCHDOG_RECONNECT_TIMEOUT=2.0 # WebSocket SQUIRREL_WEBSOCKET_BATCH_INTERVAL_MS=100 -# Legacy Mode (embedded monitor in API process) -SQUIRREL_EMBEDDED_MONITOR=false +# Bulk Operations +SQUIRREL_BULK_INSERT_BATCH_SIZE=5000 ``` ## Docker-Specific Configuration diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 108b087..8efc748 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -36,7 +36,7 @@ docker compose up --build All endpoints require authentication. Create your first key using the management script: ```bash -docker exec squirrel-api python scripts/create_key [--read] [--write] +docker exec squirrel-api python -m scripts.create_key [--read] [--write] ``` !!! warning "Save your token" @@ -58,15 +58,6 @@ docker exec squirrel-api python scripts/create_key [--read] [--write] | `squirrel-monitor` | EPICS PV monitoring service | | `squirrel-worker-1` & `squirrel-worker-2` | Background job processors | -### Load Test Data - -```bash -# In a new terminal -docker compose exec api python -m scripts.seed_pvs --count 100 -``` - -This creates 100 test PVs with tags. Now you can test snapshots! - ### View Logs ```bash @@ -104,10 +95,7 @@ cd .. # 3. Run migrations alembic upgrade head -# 4. Load test data -python -m scripts.seed_pvs --count 100 - -# 5. Start services (each in a separate terminal) +# 4. Start services (each in a separate terminal) uvicorn app.main:app --reload --port 8000 # Terminal 1: API python -m app.monitor_main # Terminal 2: Monitor arq app.worker.WorkerSettings # Terminal 3: Worker @@ -141,10 +129,7 @@ docker exec -it squirrel-api alembic upgrade head ### Snapshots are empty -This is normal! Test PVs don't exist on a real EPICS network. To test with real data: - -1. Upload real PV addresses via CSV: `python -m scripts.upload_csv your_pvs.csv` -2. Make sure your EPICS network is accessible from Docker +This is normal when no PVs exist on your EPICS network. Make sure your EPICS network is reachable from Docker and that PVs have been loaded via the UI's "Import PVs" flow. ### Worker not running diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 26bc089..0c82afb 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -9,17 +9,17 @@ The easiest way to get started with the full distributed architecture: ```bash # Clone the repository git clone https://github.com/slaclab/react-squirrel-backend.git -cd react-squirrel-backend +cd react-squirrel-backend/docker -# Start the full stack -cd docker -docker-compose up -d --build +# Configure environment (EPICS network, Redis password, etc.) +cp .env.example .env +# Edit .env if you need to reach EPICS servers outside localhost -# Configure the database -docker exec squirrel-api alembic upgrade head +# Start the full stack (migrations run automatically via entrypoint.sh) +docker compose up -d --build # Create an API key (required to use the API) -docker exec squirrel-api python script/create_key.py [--read] [--write] +docker exec squirrel-api python -m scripts.create_key [--read] [--write] ``` !!! warning "Save your token" @@ -60,24 +60,7 @@ docker compose down docker compose down -v ``` -## Option 2: Legacy Mode (Single Process) - -For simpler deployments with embedded PV monitoring: - -```bash -cd docker -docker compose --profile legacy up backend db redis -``` - -This runs the API with embedded PV monitor on port `8001`. - -!!! warning "Workers still required" - Workers are still required for snapshot creation. Start them separately: - ```bash - docker compose up -d worker - ``` - -## Option 3: Local Development +## Option 2: Local Development Run infrastructure in Docker, services locally for faster development: @@ -116,13 +99,7 @@ cp .env.example .env alembic upgrade head ``` -### 5. (Optional) Load test data - -```bash -python -m scripts.seed_pvs --count 100 -``` - -### 6. Start services +### 5. Start services In separate terminals: @@ -148,11 +125,9 @@ In separate terminals: - **Monitor**: Maintains Redis cache of live PV values - **Worker**: Processes background jobs (snapshot creation/restore) -## Loading Data +## Loading PVs from CSV -### Upload PVs from CSV - -The expected format: +The expected CSV format: ```csv Setpoint,Readback,Region,Area,Subsystem @@ -160,44 +135,12 @@ FBCK:LNG6:1:BC2ELTOL,,"Feedback-All","LIMITS","FBCK" QUAD:LI21:201:BDES,QUAD:LI21:201:BACT,"Cu Linac","LI21","Magnet" ``` -#### Using the UI +Upload through the UI: 1. Navigate to the "Browse PVs" page 2. Click the "Import PVs" button 3. Select the consolidated CSV -#### Using a bash script - -```bash -# Copy script and data into docker service -docker cp /path/to/local/upload_csv.py squirrel-api:/tmp/ -docker cp /path/to/local/consolidated.csv squirrel-api:/tmp/ - -# Dry run (see what would be uploaded) -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv --dry-run - -# Full upload (~36K PVs) -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv - -# With custom batch size -docker exec squirrel-api python /tmp/upload_csv.py /tmp/consolidated.csv --batch-size 1000 -``` - -### Seed Test Data - -For development/testing with sample data: - -```bash -# Create 1000 test PVs with tags -python -m scripts.seed_pvs --count 1000 - -# Create 50K PVs for performance testing -python -m scripts.seed_pvs --count 50000 --batch-size 5000 - -# Clear existing data first -python -m scripts.seed_pvs --count 1000 --clear -``` - ## Docker Commands Reference ```bash @@ -232,6 +175,4 @@ docker exec -it squirrel-db psql -U squirrel # Run migrations in Docker docker exec -it squirrel-api alembic upgrade head -# Load test data in Docker -docker compose exec api python -m scripts.seed_pvs --count 100 ``` diff --git a/docs/index.md b/docs/index.md index cb82789..2c74c42 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,7 +5,7 @@ High-performance Python FastAPI backend for EPICS control system snapshot/restor ## Features - **Distributed Architecture** - Separate processes for API, PV monitoring, and background tasks -- **Fast Snapshot Creation** - Parallel EPICS reads or instant Redis cache reads (<5s for 40K PVs) +- **Fast Snapshot Creation** - Instant Redis cache reads (<5s for 40K PVs) - **Efficient Restore Operations** - Parallel EPICS writes for quick machine state restoration - **Real-Time Updates** - WebSocket streaming with diff-based updates and multi-instance support - **Tag-based Organization** - Group and categorize PVs using hierarchical tags @@ -25,34 +25,14 @@ High-performance Python FastAPI backend for EPICS control system snapshot/restor | ORM | SQLAlchemy 2.0 (async) | | Cache/Queue | Redis 7+ | | Task Queue | Arq | -| EPICS | aioca (async Channel Access) | +| EPICS | aioca (async Channel Access), P4P (PVAccess)| | Migrations | Alembic | | Validation | Pydantic v2 | ## Architecture Overview -``` -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ API Server │ │ PV Monitor │ │ Arq Worker │ -│ (squirrel-api) │ │(squirrel-monitor)│ │(squirrel-worker)│ -│ REST/WebSocket │ │ EPICS → Redis │ │ Snapshot jobs │ -└────────┬────────┘ └────────┬────────┘ └────────┬────────┘ - │ │ │ - └───────────────────────┼───────────────────────┘ - │ - ┌────────────┴────────────┐ - ▼ ▼ - ┌─────────────┐ ┌─────────────┐ - │ Redis │ │ PostgreSQL │ - │ Cache/Queue │ │ Storage │ - └─────────────┘ └─────────────┘ - │ - ▼ - ┌─────────────┐ - │ EPICS IOCs │ - │ 40-50K PVs │ - └─────────────┘ -``` +![System architecture](assets/figure-1-system-architecture-light.png#only-light) +![System architecture](assets/figure-1-system-architecture-dark.png#only-dark) ## Quick Links diff --git a/setup.sh b/setup.sh index 346d064..21763c4 100755 --- a/setup.sh +++ b/setup.sh @@ -67,10 +67,7 @@ echo "" echo " 2. Run database migrations:" echo " alembic upgrade head" echo "" -echo " 3. (Optional) Load sample data:" -echo " python -m scripts.seed_pvs --count 100" -echo "" -echo " 4. Start services (in separate terminals):" +echo " 3. Start services (in separate terminals):" echo " uvicorn app.main:app --reload --port 8000 # API" echo " python -m app.monitor_main # Monitor" echo " arq app.worker.WorkerSettings # Worker"