From f3c2d63c8bfe2b267a2029b05556314e5110da71 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 12:35:10 +0800 Subject: [PATCH 01/21] docs: rework README with value-first intro, grouped features and badges --- README.md | 321 +++++++++++------------------------------------- README.zh-CN.md | 320 +++++++++++------------------------------------ 2 files changed, 142 insertions(+), 499 deletions(-) diff --git a/README.md b/README.md index 3e9f8631..9cc76497 100644 --- a/README.md +++ b/README.md @@ -1,317 +1,138 @@ -# ServerBee - -Language: English | [简体中文](./README.zh-CN.md) - -A lightweight, self-hosted VPS monitoring system built with Rust and React. - -## Features - -- **Real-time Dashboard** -- Server status, CPU/memory/disk/network metrics with live WebSocket updates -- **Server Card Ring Grid** -- Four ring charts per server (CPU, Memory, Disk, Traffic quota) plus inline disk I/O throughput, load trend, and billing-cycle "days remaining" hint -- **Server Groups** -- Organize servers by group with country flag display -- **iOS Mobile Companion** -- Native iOS app with QR pairing, push notifications, and real-time metrics -- **Detailed Metrics** -- Real-time streaming charts + historical views (1h/6h/24h/7d/30d) for CPU, memory, disk, network, load, temperature, GPU, disk I/O -- **Alert System** -- 14+ metric types, threshold/offline/traffic/expiration rules, AND logic, 70% sampling -- **Notifications** -- Webhook, Telegram, Bark, Email (via Resend) channels with notification groups -- **Network Quality Monitoring** -- Multi-target network probing (96 preset China 3-ISP + international nodes), real-time/historical latency charts, anomaly detection, per-server target assignment -- **Ping Monitoring** -- ICMP, TCP, HTTP probes with latency charts and success rate -- **Safer Agent Registration** -- One-time, short-lived enrollment codes minted by an admin (single-use, default 10 min TTL), stable machine fingerprints reuse existing server records, `auth.max_servers` soft-caps enrollment growth, run tokens can be rotated/revoked, and unconnected placeholders can be cleaned up -- **Web Terminal** -- Browser-based PTY terminal via WebSocket proxy -- **GPU Monitoring** -- NVIDIA GPU usage/temperature/memory (via nvml-wrapper, feature-gated) -- **Disk I/O Monitoring** -- Per-disk read/write throughput charts with merged and per-disk views. Linux via `/proc/diskstats`, macOS/Windows via sysinfo -- **GeoIP** -- Automatic region/country detection from agent IP with in-app database download/update -- **Custom Dashboard** -- Drag-and-drop dashboard with 13 widget types, multiple dashboards, editor mode -- **Custom Themes** -- Preset themes plus admin-built custom themes with OKLCH light/dark variables, applied across the dashboard and public status pages -- **OAuth & 2FA** -- GitHub/Google/OIDC login, TOTP two-factor authentication -- **Multi-user** -- Admin/Member roles, audit logging, rate limiting -- **File Management** -- Remote file browser with Monaco Editor, upload/download with progress, path sandbox security (`root_paths` + `deny_patterns`) -- **Docker Monitoring** -- Real-time Docker container monitoring with stats (CPU/memory/network/block I/O), container log streaming (stdout/stderr color-coded), events timeline, networks and volumes overview -- **Security Events** -- Per-agent detection of SSH logins, SSH brute-force attempts, and port scans, with severity grading, an aggregated event timeline, and alert integration -- **Firewall Blocklist** -- Block inbound traffic from IPs and CIDRs across one or more agents via nftables, with one-click blocking, preset rules, and an activity log -- **IP Quality** -- Per-agent egress IP assessment: streaming/AI/social service unlock detection (Netflix, Disney+, YouTube Premium, Amazon Prime, HBO Max, ChatGPT, Gemini, Spotify, TikTok), GeoIP metadata, and optional third-party fraud risk scoring (Scamalytics, IPQualityScore, proxycheck.io, AbuseIPDB). Custom services, status-page exposure with IP masking for guests, and a dual opt-in capability gate -- **Capability Toggles** -- Per-server feature controls (terminal, exec, upgrade, ping, file manager, Docker, security events, firewall, ip quality) with defense-in-depth enforcement -- **Uptime Timeline** -- 90-day uptime visualization with per-day color-coded bars on server detail, public status pages, and customizable dashboard widgets -- **Public Status Page** -- Unauthenticated status page with server groups, live metrics, and 90-day uptime timelines with configurable thresholds -- **Monthly Traffic Statistics** -- Billing cycle-aware traffic tracking with daily/hourly breakdowns, usage progress bars, and end-of-cycle prediction -- **Service Monitors (SSL/WHOIS/HTTP/Ping/TCP)** -- Scheduled service checks with normalized WHOIS hostnames, unsupported-TLD hints (e.g. `.app` / `.dev` → use SSL monitor), and reliable edit-form prefill -- **Billing Tracking** -- Price, billing cycle, expiration alerts, traffic limits per server -- **Backup & Restore** -- SQLite database backup/restore via admin API -- **Agent Auto-update** -- Remote binary upgrade with SHA-256 verification -- **Guided Deployment Management** -- `serverbee` CLI installs, upgrades, inspects, reconfigures, and uninstalls server and agent deployments in interactive or unattended mode -- **OpenAPI Documentation** -- Swagger UI at `/swagger-ui` with 50+ documented endpoints - -## Tech Stack - -| Component | Technology | -|-----------|-----------| -| Server | Rust, Axum 0.8, sea-orm, SQLite (WAL) | -| Agent | Rust, sysinfo 0.33, tokio-tungstenite | -| Frontend | React 19, Vite 7, TanStack Router/Query, Recharts, shadcn/ui, Tailwind CSS v4 | -| Auth | argon2 password hashing, session cookies, API keys, OAuth2, TOTP | -| Docs | Fumadocs MDX, TanStack Start, CN+EN bilingual | +
-## Quick Start +ServerBee logo -### Prerequisites +# ServerBee -- Rust 1.85+ (with cargo) -- Bun 1.x (for frontend build) +**Lightweight, self-hosted VPS monitoring — one Rust binary, real-time everything.** -### Build from Source +[![CI](https://github.com/ZingerLittleBee/ServerBee/actions/workflows/ci.yml/badge.svg)](https://github.com/ZingerLittleBee/ServerBee/actions/workflows/ci.yml) +[![Release](https://img.shields.io/github/v/release/ZingerLittleBee/ServerBee?include_prereleases&sort=semver)](https://github.com/ZingerLittleBee/ServerBee/releases) +[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](LICENSE) +[![GitHub stars](https://img.shields.io/github/stars/ZingerLittleBee/ServerBee?style=flat)](https://github.com/ZingerLittleBee/ServerBee/stargazers) +[![Rust](https://img.shields.io/badge/Rust-2024-000000?logo=rust&logoColor=white)](https://www.rust-lang.org) +[![React](https://img.shields.io/badge/React-19-61DAFB?logo=react&logoColor=black)](https://react.dev) -```bash -# Clone -git clone https://github.com/ZingerLittleBee/ServerBee.git -cd ServerBee +English | [简体中文](./README.zh-CN.md) -# Build frontend -cd apps/web && bun install && bun run build && cd ../.. +
-# Build server and agent -cargo build --release +--- -# Binaries are at: -# target/release/serverbee-server -# target/release/serverbee-agent -``` +ServerBee watches all your servers from one place. A central **server** receives metrics from lightweight **agents** over WebSocket, stores them in embedded SQLite, and serves a real-time React dashboard — no external database, no heavy runtime. -### Run the Server +- 🪶 **Tiny footprint** — agents use ~5–15 MB RAM; the server handles 1000 nodes in ~50–100 MB. +- ⚡ **Real-time** — live WebSocket dashboard for CPU, memory, disk, network, load, temperature, GPU, and disk I/O. +- 📦 **Single binary** — server + embedded web UI in one file. Deploy with Docker, a one-line script, or Railway. +- 🔋 **Batteries included** — alerts, notifications, web terminal, file manager, Docker, firewall, status pages, and more. +- 🔒 **Secure by default** — OAuth + 2FA, RBAC, audit logs, one-time agent enrollment, per-server capability gates. -```bash -./serverbee-server -# Default: http://localhost:9527 -# Admin password is auto-generated and printed to startup log -``` +> [!NOTE] +> ServerBee is in active development (`v1.0.0-alpha`). Expect rapid iteration. -### Run the Agent +## Quick Start -First, sign in to the server web UI as an admin, open **Settings**, and generate a -one-time enrollment code (single-use, expires after ~10 minutes by default). +### 1. Install the server ```bash -# Set server URL and the one-time enrollment code via environment variables -SERVERBEE_SERVER_URL=http://your-server:9527 \ -SERVERBEE_ENROLLMENT_CODE=YOUR_ONE_TIME_CODE \ -./serverbee-agent - -# Or create /etc/serverbee/agent.toml: -# server_url = "http://your-server:9527" -# enrollment_code = "YOUR_ONE_TIME_CODE" +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server ``` -The enrollment code is consumed on the agent's first successful registration. After -that the agent saves its per-server token to config and reconnects automatically on -restart -- the code is no longer needed. To onboard another agent (or re-enroll one -that lost its token), mint a fresh code in Settings. +Open `http://your-server:9527`. The admin password is auto-generated and printed to the startup log — change it on first login. -### Docker +> Prefer Docker? Run `docker compose up -d`. Prefer the cloud? Use the [Railway one-click deploy](#railway-one-click) below. -```bash -docker compose up -d -``` +### 2. Enroll an agent -### Development (Make) +Sign in as admin → **Settings** → generate a one-time **enrollment code** (single-use, expires in ~10 min). Then on each node: ```bash -# Start server (port 9527) + Vite dev server (port 5173) concurrently -make dev-full -# Visit http://localhost:5173, login with admin / admin123 - -# Or step by step: -make server-dev # Terminal 1: server on :9527 -SERVERBEE_ENROLLMENT_CODE="" make agent-dev # Terminal 2: agent - -# Testing & code quality: -make cargo-test # Run all Rust tests -make test # Run frontend tests -make cargo-clippy # Lint Rust code -make # Interactive menu (requires fzf) +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ + --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE ``` -Manual browser verification checklists are indexed in `tests/README.md`. +The agent saves a per-server token on first connect and reconnects automatically afterwards — the code is only needed once. That's it. 🎉 -Generate a one-time enrollment code from the server web UI **Settings** page and pass it to the agent. The code is single-use and expires after ~10 minutes; mint a fresh one whenever you need to enroll another agent. +## Features -> **Note**: `make dev-full` starts a Vite dev server with HMR at `http://localhost:5173` (proxies `/api/*` to the Rust server at `:9527`). For production builds, use `make build` then `make server-run`. +| | | +|---|---| +| **📊 Monitoring** | Real-time metrics (CPU/mem/disk/network/load/temp/GPU/disk I/O) · historical charts (1h–30d) · Docker container stats, logs & events · monthly traffic statistics with billing-cycle prediction | +| **🔔 Alerts** | 14+ metric types · threshold / offline / traffic / expiration rules · Webhook, Telegram, Bark & Email channels with notification groups | +| **🌐 Network** | Ping monitoring (ICMP/TCP/HTTP) · network-quality probing (96 China 3-ISP + international presets) · service monitors (SSL/WHOIS/HTTP/Ping/TCP) · IP-quality & streaming-unlock detection with fraud scoring | +| **🛠️ Remote management** | Browser web terminal (PTY over WS) · sandboxed file manager with Monaco editor · firewall blocklist via nftables · per-server capability toggles · agent auto-update | +| **🔐 Security & access** | SSH login / brute-force / port-scan detection · OAuth (GitHub/Google/OIDC) + TOTP 2FA · Admin/Member RBAC · audit logs · one-time agent enrollment codes | +| **🖥️ Dashboards & sharing** | Drag-and-drop custom dashboards (13 widget types) · public status pages with 90-day uptime timelines · custom OKLCH themes · server groups with country flags · native iOS companion app | +| **⚙️ Ops** | `serverbee` management CLI · backup & restore · GeoIP region detection · OpenAPI/Swagger docs (50+ endpoints) | ## Configuration -All config options can be set via TOML files or environment variables with `SERVERBEE_` prefix and `__` (double underscore) as nested separator. See [ENV.md](ENV.md) for the complete environment variable reference. - -### Server (`/etc/serverbee/server.toml`) +Configure via TOML files or `SERVERBEE_`-prefixed environment variables (`__` is the nested separator, e.g. `SERVERBEE_AUTH__MAX_SERVERS`). The minimum to get going: ```toml +# /etc/serverbee/server.toml [server] listen = "0.0.0.0:9527" data_dir = "/var/lib/serverbee" -trusted_proxies = [] # Defaults to private/loopback CIDRs; set to [] to disable - -[database] -path = "serverbee.db" -max_connections = 10 - -[auth] -session_ttl = 86400 # 24 hours -secure_cookie = true # Set false for HTTP-only dev -max_servers = 0 # Soft limit for newly enrolled servers [admin] -username = "admin" -password = "" # Leave empty to auto-generate - -[rate_limit] -login_max = 5 # Max login attempts per 15min window -register_max = 3 # Max agent registrations per 15min window - -[retention] -records_days = 7 # Raw metrics retention -records_hourly_days = 90 # Hourly aggregates retention -audit_logs_days = 180 # Audit log retention -network_probe_days = 7 # Network probe raw records retention -network_probe_hourly_days = 90 # Network probe hourly aggregates retention -traffic_hourly_days = 7 # Traffic hourly records retention -traffic_daily_days = 400 # Traffic daily records retention - -[scheduler] -timezone = "UTC" # Timezone for daily traffic aggregation (e.g. Asia/Shanghai) - -[geoip] -mmdb_path = "/var/lib/serverbee/GeoLite2-City.mmdb" # Non-empty path enables GeoIP - -[upgrade] -release_base_url = "https://github.com/ZingerLittleBee/ServerBee/releases" +password = "" # leave empty to auto-generate ``` -Environment variable examples: -```bash -export SERVERBEE_AUTH__MAX_SERVERS="50" -export SERVERBEE_GEOIP__MMDB_PATH="/path/to/GeoLite2-City.mmdb" -export SERVERBEE_OAUTH__GITHUB__CLIENT_ID="..." -``` - -### Agent (`/etc/serverbee/agent.toml`) - ```toml +# /etc/serverbee/agent.toml server_url = "http://your-server:9527" -token = "" # Auto-populated after registration -enrollment_code = "" # One-time code from Settings; used only for first registration +enrollment_code = "" # one-time code from Settings; only used for first registration [collector] -interval = 3 # Seconds between metric reports -enable_temperature = true -enable_gpu = false # Requires NVIDIA GPU + nvml - -[log] -level = "info" -``` - -Agent environment variables use the `SERVERBEE_` prefix without nesting (top-level keys): -```bash -export SERVERBEE_SERVER_URL="http://your-server:9527" -export SERVERBEE_ENROLLMENT_CODE="YOUR_ONE_TIME_CODE" +interval = 3 # seconds between reports ``` -### OAuth Setup - -```toml -[oauth] -base_url = "https://monitor.example.com" -allow_registration = false # Auto-create users on first OAuth login - -[oauth.github] -client_id = "..." -client_secret = "..." - -[oauth.google] -client_id = "..." -client_secret = "..." -``` - -Callback URL format: `https://your-domain/api/auth/oauth/{provider}/callback` +📖 Full reference: **[ENV.md](ENV.md)** · OAuth, retention, rate limiting, GeoIP, and more in the [documentation](apps/docs). ## Deployment -### Railway (One-Click) +### Railway (one-click) [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/serverbee-server) -1. Click the button above and deploy -2. Add a volume mounted at `/data` to persist data across deploys -3. Configure your agents to connect to the Railway URL -4. On first start the server auto-creates an admin account with a randomly generated password. Check the Railway deploy logs for the highlighted credentials banner; you must change this password on first login and may optionally pick a new username. +Add a volume mounted at `/data` to persist data. The server auto-creates an admin account on first start — check the deploy logs for the credentials banner. -### Install Script +### Management CLI -Install via curl (one-liner): +The installer drops a `serverbee` CLI at `/usr/local/bin/serverbee`: ```bash -# Server -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server - -# Agent (replace with your server URL and a one-time enrollment code from Settings) -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ - --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE +sudo serverbee status # status of all components +sudo serverbee upgrade -y # upgrade to latest +sudo serverbee restart # restart services +sudo serverbee config # view / set config +sudo serverbee uninstall agent -y ``` -The installer automatically places a `serverbee` management CLI at `/usr/local/bin/serverbee`. +### Reverse proxy -> **Note**: Re-running `install agent` adopts an existing `/usr/local/bin/serverbee-agent` instead of replacing it. Use `sudo serverbee upgrade agent -y` (or replace the binary manually) when you need to refresh an existing installation. +Behind Nginx/Caddy, proxy `/` to `127.0.0.1:9527` and make sure the WebSocket routes `/api/ws/` and `/api/agent/ws` forward the `Upgrade`/`Connection` headers with a long read timeout. See the [deployment docs](apps/docs) for a ready-to-use Nginx config. -### Management +## Development ```bash -sudo serverbee status # View status of all components -sudo serverbee upgrade -y # Upgrade all to latest version -sudo serverbee restart # Restart all services -sudo serverbee config # View current config -sudo serverbee config set # Update config -sudo serverbee uninstall agent -y # Uninstall agent -sudo serverbee uninstall server --purge # Uninstall server + remove data -``` +git clone https://github.com/ZingerLittleBee/ServerBee.git +cd ServerBee -### Reverse Proxy (Nginx) - -```nginx -server { - listen 443 ssl; - server_name monitor.example.com; - - location / { - proxy_pass http://127.0.0.1:9527; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - } - - # WebSocket (browser + agent + terminal) - location /api/ws/ { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } - - location /api/agent/ws { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } -} +make dev-full # server (:9527) + Vite dev server (:5173) — login admin / admin123 +make cargo-test # Rust tests +make test # frontend tests +make cargo-clippy # Rust lint ``` +> `make dev-full` runs Vite with HMR at `http://localhost:5173` and proxies `/api/*` to the Rust server at `:9527`. Generate a one-time enrollment code in **Settings** to connect a dev agent. + +**Stack:** Rust (Axum 0.8 · sea-orm · SQLite WAL) · React 19 (Vite 7 · TanStack Router/Query · Recharts · shadcn/ui · Tailwind CSS v4) · Rust agents (sysinfo · tokio-tungstenite). + ## API -Interactive API documentation is available at `/swagger-ui` when the server is running. +Interactive OpenAPI docs are served at `/swagger-ui` while the server runs. ## License diff --git a/README.zh-CN.md b/README.zh-CN.md index d85583ba..98ad79f3 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -1,316 +1,138 @@ -# ServerBee - -语言: [English](./README.md) | 简体中文 - -轻量级、自托管的 VPS 监控探针系统,基于 Rust 和 React 构建。 - -## 功能特性 - -- **实时仪表盘** -- 服务器状态、CPU/内存/磁盘/网络指标,WebSocket 实时推送 -- **服务器卡片环形网格** -- 每张卡片展示 CPU、内存、磁盘、流量配额四个环形图表,并行显示磁盘 I/O 吞吐、负载趋势和计费周期剩余天数 -- **服务器分组** -- 按组管理服务器,显示国旗标识 -- **iOS 移动端** -- 原生 iOS 应用,支持二维码配对、推送通知和实时指标查看 -- **详细指标** -- 实时流式图表 + 历史视图 (1h/6h/24h/7d/30d),涵盖 CPU、内存、磁盘、网络、负载、温度、GPU、磁盘 I/O -- **告警系统** -- 14+ 指标类型,阈值/离线/流量/到期规则,AND 逻辑,70% 采样 -- **通知渠道** -- Webhook、Telegram、Bark、邮件 (通过 Resend),支持通知组 -- **网络质量监控** -- 多目标网络探测 (96 个预设中国三网 + 国际节点),实时/历史延迟图表,异常检测,每服务器独立目标配置 -- **Ping 探测** -- ICMP、TCP、HTTP 探测,延迟图表和成功率统计 -- **安全的 Agent 注册** -- 由管理员铸造的一次性、短时效 enrollment code(单次使用,默认 10 分钟过期),基于稳定机器指纹复用既有服务器记录,`auth.max_servers` 软限制注册规模,运行 token 可轮换/吊销,并可清理未连接占位服务器 -- **Web 终端** -- 基于 WebSocket 代理的浏览器 PTY 终端 -- **GPU 监控** -- NVIDIA GPU 使用率/温度/显存 (nvml-wrapper,可选功能) -- **磁盘 I/O 监控** -- 每块磁盘读写吞吐量图表,支持合并和分盘视图。Linux 通过 `/proc/diskstats`,macOS/Windows 通过 sysinfo -- **GeoIP** -- 根据 Agent IP 自动检测地区/国家,支持应用内下载/更新 GeoIP 数据库 -- **自定义仪表盘** -- 拖拽式仪表盘布局,13 种 widget 类型,多仪表盘切换,编辑模式 -- **自定义主题** -- 内置预设主题,管理员还可创建完整自定义主题(OKLCH 明暗变量),仪表盘与公共状态页统一生效 -- **OAuth & 2FA** -- GitHub/Google/OIDC 登录,TOTP 两步验证 -- **多用户** -- Admin/Member 角色,审计日志,速率限制 -- **文件管理** -- 远程文件浏览器,Monaco 编辑器,上传/下载带进度显示,路径沙箱安全机制 (`root_paths` + `deny_patterns`) -- **Docker 监控** -- 实时 Docker 容器监控,CPU/内存/网络/块 I/O 统计,容器日志流(stdout/stderr 彩色区分),事件时间线,网络和卷概览 -- **安全事件** -- 逐 Agent 检测 SSH 登录、SSH 暴力破解尝试和端口扫描,按严重性分级,提供聚合事件时间线并可联动告警 -- **防火墙封禁** -- 通过 nftables 在一个或多个 Agent 上封禁来自指定 IP/CIDR 的入站流量,支持一键封禁、预设规则和操作日志 -- **IP 质量检测** -- 逐 Agent 出口 IP 评估:检测流媒体/AI/社交服务解锁状态(Netflix、Disney+、YouTube Premium、Amazon Prime、HBO Max、ChatGPT、Gemini、Spotify、TikTok),GeoIP 元数据,以及可选的第三方欺诈风险评分(Scamalytics、IPQualityScore、proxycheck.io、AbuseIPDB)。支持自定义服务、状态页展示(访客 IP 遮盖),双重 opt-in 能力门控 -- **能力开关** -- 每台服务器独立的功能控制(终端、执行、升级、探测、文件管理、Docker、安全事件、防火墙、IP 质量),服务端+Agent 双重校验 -- **可用性时间线** -- 90 天可用性可视化,按天展示彩色状态条,支持服务器详情页、公共状态页和自定义仪表盘组件 -- **公共状态页** -- 无需登录的服务器状态展示,含 90 天可用性时间线和可配置阈值 -- **月度流量统计** -- 按计费周期统计流量,日/小时维度图表,用量进度条,周期末预测 -- **服务监控 (SSL/WHOIS/HTTP/Ping/TCP)** -- 定时服务检查,自动规范化 WHOIS 主机名,对不支持的 TLD (如 `.app`/`.dev`) 给出友好提示,编辑表单可靠回填 -- **计费追踪** -- 价格、计费周期、到期提醒、流量限制 -- **备份恢复** -- SQLite 数据库备份/恢复 API -- **Agent 自动更新** -- 远程二进制升级,SHA-256 校验 -- **一体化部署管理** -- `serverbee` CLI 支持以交互式或无人值守方式安装、升级、查看状态、修改配置和卸载 Server/Agent -- **OpenAPI 文档** -- Swagger UI (`/swagger-ui`),50+ 完整注释端点 - -## 技术栈 - -| 组件 | 技术 | -|------|------| -| 服务端 | Rust, Axum 0.8, sea-orm, SQLite (WAL) | -| Agent | Rust, sysinfo 0.33, tokio-tungstenite | -| 前端 | React 19, Vite 7, TanStack Router/Query, Recharts, shadcn/ui, Tailwind CSS v4 | -| 认证 | argon2 密码哈希, Session Cookie, API Key, OAuth2, TOTP | -| 文档 | Fumadocs MDX, TanStack Start, 中英双语 | +
-## 快速开始 +ServerBee logo -### 前置条件 +# ServerBee -- Rust 1.85+ (含 cargo) -- Bun 1.x (用于前端构建) +**轻量、自托管的 VPS 监控系统 —— 一个 Rust 二进制,实时掌控一切。** -### 从源码构建 +[![CI](https://github.com/ZingerLittleBee/ServerBee/actions/workflows/ci.yml/badge.svg)](https://github.com/ZingerLittleBee/ServerBee/actions/workflows/ci.yml) +[![Release](https://img.shields.io/github/v/release/ZingerLittleBee/ServerBee?include_prereleases&sort=semver)](https://github.com/ZingerLittleBee/ServerBee/releases) +[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](LICENSE) +[![GitHub stars](https://img.shields.io/github/stars/ZingerLittleBee/ServerBee?style=flat)](https://github.com/ZingerLittleBee/ServerBee/stargazers) +[![Rust](https://img.shields.io/badge/Rust-2024-000000?logo=rust&logoColor=white)](https://www.rust-lang.org) +[![React](https://img.shields.io/badge/React-19-61DAFB?logo=react&logoColor=black)](https://react.dev) -```bash -# 克隆 -git clone https://github.com/ZingerLittleBee/ServerBee.git -cd ServerBee +[English](./README.md) | 简体中文 -# 构建前端 -cd apps/web && bun install && bun run build && cd ../.. +
-# 构建服务端和 Agent -cargo build --release +--- -# 二进制文件位于: -# target/release/serverbee-server -# target/release/serverbee-agent -``` +ServerBee 让你在一处掌控所有服务器。中心 **Server** 通过 WebSocket 接收来自轻量 **Agent** 的指标,存入内嵌 SQLite,并提供实时 React 仪表盘 —— 无外部数据库,无沉重运行时。 -### 启动服务端 +- 🪶 **极致轻量** —— Agent 内存占用约 5–15 MB;Server 管理 1000 个节点仅需约 50–100 MB。 +- ⚡ **实时刷新** —— WebSocket 实时仪表盘,涵盖 CPU、内存、磁盘、网络、负载、温度、GPU、磁盘 I/O。 +- 📦 **单一二进制** —— Server 与内嵌 Web UI 打包成一个文件,支持 Docker、一行脚本、Railway 部署。 +- 🔋 **开箱即用** —— 告警、通知、Web 终端、文件管理、Docker、防火墙、状态页等一应俱全。 +- 🔒 **默认安全** —— OAuth + 2FA、RBAC、审计日志、一次性 Agent 注册、逐服务器能力门控。 -```bash -./serverbee-server -# 默认地址: http://localhost:9527 -# 管理员密码在启动日志中自动生成并打印 -``` +> [!NOTE] +> ServerBee 正在活跃开发中(`v1.0.0-alpha`),迭代频繁。 -### 启动 Agent +## 快速开始 -首先以管理员身份登录服务端 Web UI,打开 **设置** 页,生成一个一次性 -enrollment code(单次使用,默认约 10 分钟后过期)。 +### 1. 安装 Server ```bash -# 通过环境变量设置服务端地址和一次性 enrollment code -SERVERBEE_SERVER_URL=http://your-server:9527 \ -SERVERBEE_ENROLLMENT_CODE=YOUR_ONE_TIME_CODE \ -./serverbee-agent - -# 或创建配置文件 /etc/serverbee/agent.toml: -# server_url = "http://your-server:9527" -# enrollment_code = "YOUR_ONE_TIME_CODE" +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server ``` -enrollment code 在 Agent 首次注册成功时被消费。之后 Agent 会将每个服务器的 -token 保存到配置文件,重启后自动重连 —— 不再需要该 code。要接入另一个 Agent -(或为丢失 token 的 Agent 重新注册),在设置页铸造一个新的 code。 +打开 `http://your-server:9527`。管理员密码会自动生成并打印在启动日志中 —— 首次登录后请修改。 -### Docker +> 偏好 Docker?执行 `docker compose up -d`。偏好云端?使用下方的 [Railway 一键部署](#railway一键部署)。 -```bash -docker compose up -d -``` +### 2. 接入 Agent -### 开发模式 (Make) +以管理员登录 → **设置** → 生成一个一次性 **enrollment code**(单次使用,约 10 分钟后过期)。然后在每个节点上: ```bash -# 同时启动服务端 (端口 9527) + Vite 开发服务器 (端口 5173) -make dev-full -# 访问 http://localhost:5173,使用 admin / admin123 登录 - -# 或分步启动: -make server-dev # 终端 1: 服务端 :9527 -SERVERBEE_ENROLLMENT_CODE="<一次性 code>" make agent-dev # 终端 2: Agent - -# 测试与代码质量: -make cargo-test # 运行全部 Rust 测试 -make test # 运行前端测试 -make cargo-clippy # Rust 代码检查 -make # 交互式菜单 (需要 fzf) +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ + --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE ``` -手动浏览器验证清单索引见 `tests/README.md`。 +Agent 首次连接时会保存每服务器 token 并自动重连 —— code 只需用一次。搞定。🎉 -在服务端 Web UI **设置** 页生成一个一次性 enrollment code,并将其传给 Agent。该 code 单次使用,约 10 分钟后过期;每次需要接入新的 Agent 时铸造一个新的 code。 +## 功能特性 -> **说明**: `make dev-full` 启动带 HMR 的 Vite 开发服务器 (`http://localhost:5173`),自动代理 `/api/*` 到 Rust 服务端 (`:9527`)。生产构建请使用 `make build` 然后 `make server-run`。 +| | | +|---|---| +| **📊 监控** | 实时指标(CPU/内存/磁盘/网络/负载/温度/GPU/磁盘 I/O)· 历史图表(1h–30d)· Docker 容器统计、日志与事件 · 按计费周期统计月度流量并预测 | +| **🔔 告警** | 14+ 指标类型 · 阈值 / 离线 / 流量 / 到期规则 · Webhook、Telegram、Bark、邮件渠道,支持通知组 | +| **🌐 网络** | Ping 探测(ICMP/TCP/HTTP)· 网络质量监控(96 个中国三网 + 国际预设)· 服务监控(SSL/WHOIS/HTTP/Ping/TCP)· IP 质量与流媒体解锁检测,含欺诈风险评分 | +| **🛠️ 远程管理** | 浏览器 Web 终端(WS 上的 PTY)· 沙箱化文件管理 + Monaco 编辑器 · 基于 nftables 的防火墙封禁 · 逐服务器能力开关 · Agent 自动更新 | +| **🔐 安全与访问** | SSH 登录 / 暴力破解 / 端口扫描检测 · OAuth(GitHub/Google/OIDC)+ TOTP 两步验证 · Admin/Member 角色 · 审计日志 · 一次性 Agent 注册码 | +| **🖥️ 仪表盘与分享** | 拖拽式自定义仪表盘(13 种 widget)· 含 90 天可用性时间线的公共状态页 · OKLCH 自定义主题 · 带国旗的服务器分组 · 原生 iOS 移动端 | +| **⚙️ 运维** | `serverbee` 管理 CLI · 备份与恢复 · GeoIP 地区检测 · OpenAPI/Swagger 文档(50+ 端点) | ## 配置 -所有配置项均可通过 TOML 文件或环境变量设置,环境变量使用 `SERVERBEE_` 前缀,`__` (双下划线) 作为嵌套分隔符。完整环境变量列表见 [ENV.md](ENV.md)。 - -### 服务端 (`/etc/serverbee/server.toml`) +通过 TOML 文件或 `SERVERBEE_` 前缀的环境变量配置(`__` 为嵌套分隔符,如 `SERVERBEE_AUTH__MAX_SERVERS`)。最小可运行配置: ```toml +# /etc/serverbee/server.toml [server] listen = "0.0.0.0:9527" data_dir = "/var/lib/serverbee" -trusted_proxies = [] # 默认信任私有/回环 CIDR;设为 [] 禁用 - -[database] -path = "serverbee.db" -max_connections = 10 - -[auth] -session_ttl = 86400 # 24 小时 -secure_cookie = true # 开发环境设为 false -max_servers = 0 # 新注册服务器的软上限 [admin] -username = "admin" -password = "" # 留空自动生成 - -[rate_limit] -login_max = 5 # 15 分钟内最大登录尝试次数 -register_max = 3 # 15 分钟内最大 Agent 注册次数 - -[retention] -records_days = 7 # 原始指标保留天数 -records_hourly_days = 90 # 小时聚合保留天数 -audit_logs_days = 180 # 审计日志保留天数 -network_probe_days = 7 # 网络探测原始记录保留天数 -network_probe_hourly_days = 90 # 网络探测小时聚合保留天数 -traffic_hourly_days = 7 # 流量小时记录保留天数 -traffic_daily_days = 400 # 流量日记录保留天数 - -[scheduler] -timezone = "UTC" # 流量日聚合时区(如 Asia/Shanghai) - -[geoip] -mmdb_path = "/var/lib/serverbee/GeoLite2-City.mmdb" # 路径非空即启用 GeoIP - -[upgrade] -release_base_url = "https://github.com/ZingerLittleBee/ServerBee/releases" +password = "" # 留空自动生成 ``` -环境变量示例: -```bash -export SERVERBEE_AUTH__MAX_SERVERS="50" -export SERVERBEE_GEOIP__MMDB_PATH="/path/to/GeoLite2-City.mmdb" -export SERVERBEE_OAUTH__GITHUB__CLIENT_ID="..." -``` - -### Agent (`/etc/serverbee/agent.toml`) - ```toml +# /etc/serverbee/agent.toml server_url = "http://your-server:9527" -token = "" # 注册后自动填充 -enrollment_code = "" # 来自设置页的一次性 code;仅用于首次注册 +enrollment_code = "" # 来自设置页的一次性 code,仅用于首次注册 [collector] -interval = 3 # 指标上报间隔 (秒) -enable_temperature = true -enable_gpu = false # 需要 NVIDIA GPU + nvml - -[log] -level = "info" -``` - -Agent 环境变量使用 `SERVERBEE_` 前缀,顶层键无需嵌套: -```bash -export SERVERBEE_SERVER_URL="http://your-server:9527" -export SERVERBEE_ENROLLMENT_CODE="YOUR_ONE_TIME_CODE" +interval = 3 # 上报间隔(秒) ``` -### OAuth 配置 - -```toml -[oauth] -base_url = "https://monitor.example.com" -allow_registration = false # 首次 OAuth 登录时自动创建用户 - -[oauth.github] -client_id = "..." -client_secret = "..." - -[oauth.google] -client_id = "..." -client_secret = "..." -``` - -回调 URL 格式: `https://your-domain/api/auth/oauth/{provider}/callback` +📖 完整参考:**[ENV.md](ENV.md)** · OAuth、数据保留、速率限制、GeoIP 等详见[文档](apps/docs)。 ## 部署 -### Railway(一键部署) +### Railway(一键部署) [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/serverbee-server) -1. 点击上方按钮,然后部署 -2. 添加 Volume 挂载到 `/data` 以持久化数据 -3. 将 Agent 配置连接到 Railway 提供的 URL -4. 首次启动时,Server 会自动创建管理员账号并随机生成密码。在 Railway 部署日志中查找醒目的凭据横幅获取该密码;首次登录时你必须修改此密码,并可选择一个新的用户名。 +添加挂载到 `/data` 的 Volume 以持久化数据。Server 首次启动会自动创建管理员账号 —— 在部署日志中查找凭据横幅。 -### 安装脚本 +### 管理 CLI -通过 curl 一键安装: +安装脚本会在 `/usr/local/bin/serverbee` 放置一个 `serverbee` CLI: ```bash -# 服务端 -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server - -# Agent(替换为你的服务端地址和来自设置页的一次性 enrollment code) -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ - --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE +sudo serverbee status # 查看所有组件状态 +sudo serverbee upgrade -y # 升级到最新版 +sudo serverbee restart # 重启服务 +sudo serverbee config # 查看 / 修改配置 +sudo serverbee uninstall agent -y ``` -安装脚本会自动将 `serverbee` 管理 CLI 安装到 `/usr/local/bin/serverbee`。 +### 反向代理 -> **说明**:重复执行 `install agent` 时,如果 `/usr/local/bin/serverbee-agent` 已存在,脚本会直接沿用现有二进制而不会覆盖。需要刷新已安装版本时,请使用 `sudo serverbee upgrade agent -y`,或手动替换该二进制文件。 +在 Nginx/Caddy 之后,将 `/` 代理到 `127.0.0.1:9527`,并确保 WebSocket 路由 `/api/ws/` 和 `/api/agent/ws` 透传 `Upgrade`/`Connection` 头且设置较长读超时。完整可用的 Nginx 配置见[部署文档](apps/docs)。 -### 管理命令 +## 开发 ```bash -sudo serverbee status # 查看所有组件状态 -sudo serverbee upgrade -y # 升级到最新版 -sudo serverbee restart # 重启所有服务 -sudo serverbee config # 查看当前配置 -sudo serverbee config set # 修改配置 -sudo serverbee uninstall agent -y # 卸载 Agent -sudo serverbee uninstall server --purge # 卸载服务端并清除数据 -``` +git clone https://github.com/ZingerLittleBee/ServerBee.git +cd ServerBee -### 反向代理 (Nginx) - -```nginx -server { - listen 443 ssl; - server_name monitor.example.com; - - location / { - proxy_pass http://127.0.0.1:9527; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - } - - # WebSocket (浏览器 + Agent + 终端) - location /api/ws/ { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } - - location /api/agent/ws { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } -} +make dev-full # Server(:9527)+ Vite 开发服务器(:5173)—— 使用 admin / admin123 登录 +make cargo-test # Rust 测试 +make test # 前端测试 +make cargo-clippy # Rust 代码检查 ``` +> `make dev-full` 启动带 HMR 的 Vite(`http://localhost:5173`),并代理 `/api/*` 到 `:9527` 的 Rust 服务端。在 **设置** 页生成一次性 enrollment code 即可接入开发用 Agent。 + +**技术栈:** Rust(Axum 0.8 · sea-orm · SQLite WAL)· React 19(Vite 7 · TanStack Router/Query · Recharts · shadcn/ui · Tailwind CSS v4)· Rust Agent(sysinfo · tokio-tungstenite)。 + ## API -服务端运行时可通过 `/swagger-ui` 访问交互式 API 文档。 +服务端运行时,可在 `/swagger-ui` 访问交互式 OpenAPI 文档。 ## 许可证 From 3125af3e265e878045a137f8c3dc43131adeeb5a Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 13:08:49 +0800 Subject: [PATCH 02/21] docs: soften footprint claim and document install methods (docker for server, binary for agent) --- README.md | 10 ++++++---- README.zh-CN.md | 10 ++++++---- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 9cc76497..acf41677 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ English | [简体中文](./README.zh-CN.md) ServerBee watches all your servers from one place. A central **server** receives metrics from lightweight **agents** over WebSocket, stores them in embedded SQLite, and serves a real-time React dashboard — no external database, no heavy runtime. -- 🪶 **Tiny footprint** — agents use ~5–15 MB RAM; the server handles 1000 nodes in ~50–100 MB. +- 🪶 **Tiny footprint** — agents typically use only ~5–15 MB of RAM, and the server stays lightweight as your fleet grows. - ⚡ **Real-time** — live WebSocket dashboard for CPU, memory, disk, network, load, temperature, GPU, and disk I/O. - 📦 **Single binary** — server + embedded web UI in one file. Deploy with Docker, a one-line script, or Railway. - 🔋 **Batteries included** — alerts, notifications, web terminal, file manager, Docker, firewall, status pages, and more. @@ -35,22 +35,24 @@ ServerBee watches all your servers from one place. A central **server** receives ### 1. Install the server ```bash -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server --method docker ``` Open `http://your-server:9527`. The admin password is auto-generated and printed to the startup log — change it on first login. -> Prefer Docker? Run `docker compose up -d`. Prefer the cloud? Use the [Railway one-click deploy](#railway-one-click) below. +> The install script supports both **Docker** and **native binary** installs via `--method docker|binary`. **Docker is recommended for the server**; omit the flag to choose interactively. Prefer the cloud? Use the [Railway one-click deploy](#railway-one-click) below. ### 2. Enroll an agent Sign in as admin → **Settings** → generate a one-time **enrollment code** (single-use, expires in ~10 min). Then on each node: ```bash -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent --method binary \ --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE ``` +> A **native binary is recommended for agents** — smallest footprint and full host-level metrics. Pass `--method docker` to run the agent in a container instead. + The agent saves a per-server token on first connect and reconnects automatically afterwards — the code is only needed once. That's it. 🎉 ## Features diff --git a/README.zh-CN.md b/README.zh-CN.md index 98ad79f3..832518c1 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -21,7 +21,7 @@ ServerBee 让你在一处掌控所有服务器。中心 **Server** 通过 WebSocket 接收来自轻量 **Agent** 的指标,存入内嵌 SQLite,并提供实时 React 仪表盘 —— 无外部数据库,无沉重运行时。 -- 🪶 **极致轻量** —— Agent 内存占用约 5–15 MB;Server 管理 1000 个节点仅需约 50–100 MB。 +- 🪶 **极致轻量** —— Agent 通常仅占用约 5–15 MB 内存,Server 即便管理大量节点也保持精简。 - ⚡ **实时刷新** —— WebSocket 实时仪表盘,涵盖 CPU、内存、磁盘、网络、负载、温度、GPU、磁盘 I/O。 - 📦 **单一二进制** —— Server 与内嵌 Web UI 打包成一个文件,支持 Docker、一行脚本、Railway 部署。 - 🔋 **开箱即用** —— 告警、通知、Web 终端、文件管理、Docker、防火墙、状态页等一应俱全。 @@ -35,22 +35,24 @@ ServerBee 让你在一处掌控所有服务器。中心 **Server** 通过 WebSoc ### 1. 安装 Server ```bash -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server --method docker ``` 打开 `http://your-server:9527`。管理员密码会自动生成并打印在启动日志中 —— 首次登录后请修改。 -> 偏好 Docker?执行 `docker compose up -d`。偏好云端?使用下方的 [Railway 一键部署](#railway一键部署)。 +> 安装脚本通过 `--method docker|binary` 同时支持 **Docker** 与 **二进制** 两种安装方式。**Server 推荐 Docker 安装**;省略该参数则进入交互式选择。偏好云端?使用下方的 [Railway 一键部署](#railway一键部署)。 ### 2. 接入 Agent 以管理员登录 → **设置** → 生成一个一次性 **enrollment code**(单次使用,约 10 分钟后过期)。然后在每个节点上: ```bash -curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ +curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent --method binary \ --server-url http://YOUR_SERVER:9527 --enrollment-code YOUR_ONE_TIME_CODE ``` +> **Agent 推荐二进制安装** —— 占用最小,且能采集完整的宿主机指标。如需在容器中运行 Agent,改用 `--method docker`。 + Agent 首次连接时会保存每服务器 token 并自动重连 —— code 只需用一次。搞定。🎉 ## 功能特性 From 3c75f51af99a2c07d0fec05eb63a0e4ff566382f Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 13:18:37 +0800 Subject: [PATCH 03/21] fix(server): restrict docker container logs websocket to admins The docker logs WS handler authenticated the user but never checked role, so any read-only member could stream logs from any container. Container logs routinely expose env vars, connection strings and tokens, making this closer to terminal-level access. Require role == "admin" (with an audit log on denial), matching the terminal WS handler. --- crates/server/src/router/ws/docker_logs.rs | 29 ++++++++++++++++++---- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/crates/server/src/router/ws/docker_logs.rs b/crates/server/src/router/ws/docker_logs.rs index 6845c314..96d27d53 100644 --- a/crates/server/src/router/ws/docker_logs.rs +++ b/crates/server/src/router/ws/docker_logs.rs @@ -38,7 +38,26 @@ async fn docker_logs_ws_handler( // Auth: session cookie or API key let user = validate_auth(&state, &headers).await; match user { - Some(user_id) => { + Some((user_id, role)) => { + // Docker log streaming exposes sensitive container output + // (env vars, connection strings, tokens), so it is admin-only, + // consistent with terminal access. + if role != "admin" { + let detail = serde_json::json!({ + "server_id": server_id, + "deny_reason": "role_forbidden", + }) + .to_string(); + let _ = AuditService::log( + &state.db, + &user_id, + "docker_logs_subscribe_denied", + Some(&detail), + &ip, + ) + .await; + return axum::http::StatusCode::FORBIDDEN.into_response(); + } // Check agent is online if !state.agent_manager.is_online(&server_id) { return (axum::http::StatusCode::BAD_REQUEST, "Agent is offline").into_response(); @@ -80,7 +99,7 @@ async fn docker_logs_ws_handler( } } -async fn validate_auth(state: &Arc, headers: &HeaderMap) -> Option { +async fn validate_auth(state: &Arc, headers: &HeaderMap) -> Option<(String, String)> { use crate::service::auth::AuthService; // Try session cookie @@ -89,7 +108,7 @@ async fn validate_auth(state: &Arc, headers: &HeaderMap) -> Option, headers: &HeaderMap) -> Option, headers: &HeaderMap) -> Option Date: Sun, 31 May 2026 13:18:46 +0800 Subject: [PATCH 04/21] fix(server): restrict file read/download endpoints to admins The file list/stat/read/download/transfers endpoints lived in read_router, so read-only members could pull arbitrary files off any managed host (e.g. /etc/passwd, application secrets). Their effective access is closer to terminal-level than to read-only monitoring, so move them into the admin-only file router and drop the read-router merge. --- crates/server/src/router/api/file.rs | 22 +++++++++++----------- crates/server/src/router/api/mod.rs | 4 +++- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/crates/server/src/router/api/file.rs b/crates/server/src/router/api/file.rs index 11a213a7..3121e0fb 100644 --- a/crates/server/src/router/api/file.rs +++ b/crates/server/src/router/api/file.rs @@ -105,17 +105,12 @@ pub struct TransfersResponse { // Routers // --------------------------------------------------------------------------- -/// Read endpoints accessible to all authenticated users (admin + member). -pub fn read_router() -> Router> { - Router::new() - .route("/files/{server_id}/list", post(list_files)) - .route("/files/{server_id}/stat", post(stat_file)) - .route("/files/{server_id}/read", post(read_file)) - .route("/files/download/{transfer_id}", get(download_file)) - .route("/files/transfers", get(list_transfers)) -} - -/// Write endpoints (POST/DELETE) restricted to admin users only. +/// All file endpoints are admin-only. File read/list/stat/download can pull +/// arbitrary files off a managed host (e.g. `/etc/passwd`, application +/// secrets), so they are not exposed to read-only members alongside ordinary +/// monitoring data — the effective access is closer to terminal-level than +/// read-only. +/// /// `max_upload_size` sets the Axum `DefaultBodyLimit` on the upload route /// so that the framework accepts bodies up to this size before the handler's /// streaming check kicks in. @@ -123,6 +118,11 @@ pub fn write_router(max_upload_size: usize) -> Router> { // Add overhead for multipart metadata (boundaries, headers, path field) let body_limit = max_upload_size.saturating_add(5 * 1024 * 1024); // +5 MB overhead Router::new() + .route("/files/{server_id}/list", post(list_files)) + .route("/files/{server_id}/stat", post(stat_file)) + .route("/files/{server_id}/read", post(read_file)) + .route("/files/download/{transfer_id}", get(download_file)) + .route("/files/transfers", get(list_transfers)) .route("/files/{server_id}/write", post(write_file)) .route("/files/{server_id}/delete", post(delete_file)) .route("/files/{server_id}/mkdir", post(mkdir)) diff --git a/crates/server/src/router/api/mod.rs b/crates/server/src/router/api/mod.rs index e88969e3..9f26e7c1 100644 --- a/crates/server/src/router/api/mod.rs +++ b/crates/server/src/router/api/mod.rs @@ -63,7 +63,9 @@ pub fn router(state: Arc) -> Router> { .merge(ping::read_router()) .merge(network_probe::read_router()) .merge(ip_quality::read_router()) - .merge(file::read_router()) + // Note: file endpoints (incl. read/list/stat/download) are + // admin-only via file::write_router() in the admin block below, + // since they can read arbitrary files off a managed host. .merge(docker::read_router()) .merge(traffic::read_router()) .merge(cost::read_router()) From 2d3d8933ef0c43aa7ddc834f50374124658a783e Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 13:18:53 +0800 Subject: [PATCH 05/21] fix(server): unify password policy and revoke sessions on password change Two related auth hardening fixes: - UserService create_user/update_user validated only len >= 6, weaker and inconsistent with the >= 8 policy applied to self-chosen passwords. Both now call AuthService::validate_password_strength. - A password change or admin reset left existing sessions valid until natural expiry, so a stolen session could outlive the change. change_password now revokes the user's other sessions (keeping the caller's current cookie/bearer session; an API-key caller has none, so all are revoked), and admin update_user revokes the target's sessions after a password reset. Adds regression tests covering keep-token, no-token, and admin-reset paths. --- crates/server/src/router/api/auth.rs | 33 +++++++++++ crates/server/src/service/auth.rs | 85 +++++++++++++++++++++++++++- crates/server/src/service/user.rs | 69 ++++++++++++++++++---- 3 files changed, 174 insertions(+), 13 deletions(-) diff --git a/crates/server/src/router/api/auth.rs b/crates/server/src/router/api/auth.rs index cc780014..9153dda4 100644 --- a/crates/server/src/router/api/auth.rs +++ b/crates/server/src/router/api/auth.rs @@ -357,11 +357,19 @@ pub async fn change_password( return Err(AppError::Validation("New password is required".to_string())); } + // Preserve the caller's current session (cookie or bearer) while revoking + // any other sessions the user may have, so the change can't be undone by a + // previously stolen session. An API-key authenticated caller has no session + // token here, so `current_token` is None and all of the user's sessions are + // revoked — the API key itself is unaffected. + let current_token = extract_session_cookie_token(&req_headers) + .or_else(|| extract_bearer_token_value(&req_headers)); AuthService::change_password( &state.db, ¤t_user.user_id, &body.old_password, &body.new_password, + current_token.as_deref(), ) .await?; @@ -628,3 +636,28 @@ fn extract_user_agent(headers: &HeaderMap) -> String { .unwrap_or("unknown") .to_string() } + +/// Extract the `session_token` value from the Cookie header, if present. +fn extract_session_cookie_token(headers: &HeaderMap) -> Option { + headers + .get("cookie")? + .to_str() + .ok()? + .split(';') + .find_map(|cookie| { + cookie + .trim() + .strip_prefix("session_token=") + .map(|v| v.to_string()) + }) +} + +/// Extract the bearer token from the Authorization header, if present. +fn extract_bearer_token_value(headers: &HeaderMap) -> Option { + headers + .get("authorization")? + .to_str() + .ok()? + .strip_prefix("Bearer ") + .map(|s| s.to_string()) +} diff --git a/crates/server/src/service/auth.rs b/crates/server/src/service/auth.rs index 53af2758..136b241d 100644 --- a/crates/server/src/service/auth.rs +++ b/crates/server/src/service/auth.rs @@ -483,6 +483,7 @@ impl AuthService { user_id: &str, old_password: &str, new_password: &str, + keep_session_token: Option<&str>, ) -> Result<(), AppError> { let user = user::Entity::find_by_id(user_id) .one(db) @@ -504,6 +505,17 @@ impl AuthService { active.updated_at = Set(Utc::now()); active.update(db).await?; + // Revoke the user's other sessions so a previously issued (possibly + // stolen) session can't outlive the password change. Keep the caller's + // current session when its token is known (web cookie / bearer flow), + // otherwise revoke all of them. + let mut revoke = + session::Entity::delete_many().filter(session::Column::UserId.eq(user_id)); + if let Some(token) = keep_session_token { + revoke = revoke.filter(session::Column::Token.ne(token)); + } + revoke.exec(db).await?; + Ok(()) } @@ -819,7 +831,8 @@ mod tests { .await .expect("create_user should succeed"); let result = - AuthService::change_password(&db, &user.id, "wrong_old_pass", "new_pass123").await; + AuthService::change_password(&db, &user.id, "wrong_old_pass", "new_pass123", None) + .await; assert!(result.is_err(), "wrong old password should return an error"); } @@ -829,7 +842,7 @@ mod tests { let user = AuthService::create_user(&db, "grace", "old_pass1", "member") .await .expect("create_user should succeed"); - AuthService::change_password(&db, &user.id, "old_pass1", "new_pass99") + AuthService::change_password(&db, &user.id, "old_pass1", "new_pass99", None) .await .expect("change_password should succeed"); // Login with new password should succeed @@ -840,6 +853,72 @@ mod tests { assert!(result2.is_err(), "login with old password should fail"); } + #[tokio::test] + async fn test_change_password_revokes_other_sessions() { + let (db, _tmp) = setup_test_db().await; + AuthService::create_user(&db, "heidi", "old_pass1", "member") + .await + .expect("create_user should succeed"); + // Two active sessions (e.g. two browsers logged in). + let (keep, _u) = AuthService::login(&db, login_params("heidi", "old_pass1")) + .await + .expect("login should succeed"); + let (other, _u) = AuthService::login(&db, login_params("heidi", "old_pass1")) + .await + .expect("login should succeed"); + + AuthService::change_password( + &db, + &keep.user_id, + "old_pass1", + "new_pass123", + Some(&keep.token), + ) + .await + .expect("change_password should succeed"); + + // The caller's own session is preserved... + let kept = AuthService::validate_session(&db, &keep.token, 3600) + .await + .expect("validate_session should not error"); + assert!( + kept.is_some(), + "the kept session must survive the password change" + ); + // ...while every other session is revoked. + let revoked = AuthService::validate_session(&db, &other.token, 3600) + .await + .expect("validate_session should not error"); + assert!( + revoked.is_none(), + "other sessions must be revoked after a password change" + ); + } + + #[tokio::test] + async fn test_change_password_without_keep_token_revokes_all_sessions() { + let (db, _tmp) = setup_test_db().await; + AuthService::create_user(&db, "ivan", "old_pass1", "member") + .await + .expect("create_user should succeed"); + let (sess, _u) = AuthService::login(&db, login_params("ivan", "old_pass1")) + .await + .expect("login should succeed"); + + // No keep token (e.g. API-key authenticated caller) -> revoke all. + AuthService::change_password(&db, &sess.user_id, "old_pass1", "new_pass123", None) + .await + .expect("change_password should succeed"); + + let validated = AuthService::validate_session(&db, &sess.token, 3600) + .await + .expect("validate_session should not error"); + assert!( + validated.is_none(), + "with no keep token, all sessions must be revoked" + ); + } + #[tokio::test] async fn test_init_admin_creates_random_admin_with_flag() { let (db, _tmp) = setup_test_db().await; @@ -949,7 +1028,7 @@ mod tests { let user = AuthService::create_user(&db, "weakp", "old_pass1", "member") .await .expect("create user"); - let r = AuthService::change_password(&db, &user.id, "old_pass1", "123").await; + let r = AuthService::change_password(&db, &user.id, "old_pass1", "123", None).await; assert!( matches!(r, Err(AppError::Validation(_))), "weak new password must be rejected, got {r:?}" diff --git a/crates/server/src/service/user.rs b/crates/server/src/service/user.rs index f7bda3b1..eb8d8da0 100644 --- a/crates/server/src/service/user.rs +++ b/crates/server/src/service/user.rs @@ -87,11 +87,7 @@ impl UserService { role: &str, ) -> Result { Self::validate_role(role)?; - if password.len() < 6 { - return Err(AppError::Validation( - "Password must be at least 6 characters".to_string(), - )); - } + AuthService::validate_password_strength(password)?; AuthService::create_user(db, username, password, role).await } @@ -125,18 +121,26 @@ impl UserService { active.role = Set(role.clone()); } + let password_reset = input.password.is_some(); if let Some(ref password) = input.password { - if password.len() < 6 { - return Err(AppError::Validation( - "Password must be at least 6 characters".to_string(), - )); - } + AuthService::validate_password_strength(password)?; let new_hash = AuthService::hash_password(password)?; active.password_hash = Set(new_hash); } active.updated_at = Set(Utc::now()); let updated = active.update(db).await?; + + // If an admin reset this user's password, revoke all their existing + // sessions so a previously issued (possibly stolen) session cannot + // outlive the reset. + if password_reset { + session::Entity::delete_many() + .filter(session::Column::UserId.eq(id)) + .exec(db) + .await?; + } + Ok(updated) } @@ -277,4 +281,49 @@ mod tests { assert_eq!(updated.role, "admin", "member should now have admin role"); } + + #[tokio::test] + async fn test_update_user_password_reset_revokes_sessions() { + use crate::service::auth::{AuthService, LoginParams}; + + let (db, _tmp) = setup_test_db().await; + let user = UserService::create_user(&db, "reset_target", "old_pass1", "member") + .await + .expect("create user should succeed"); + // An active session for the user. + let (sess, _u) = AuthService::login( + &db, + LoginParams { + username: "reset_target", + password: "old_pass1", + totp_code: None, + ip: "127.0.0.1", + user_agent: "test", + session_ttl: 3600, + }, + ) + .await + .expect("login should succeed"); + + // Admin resets the password. + UserService::update_user( + &db, + &user.id, + UpdateUserInput { + role: None, + password: Some("new_pass123".to_string()), + }, + ) + .await + .expect("update_user should succeed"); + + // The pre-existing session must be revoked by the reset. + let validated = AuthService::validate_session(&db, &sess.token, 3600) + .await + .expect("validate_session should not error"); + assert!( + validated.is_none(), + "an admin password reset must revoke the user's existing sessions" + ); + } } From 8b5c9cef410e264297a6736a564058cd8ae08ccf Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 14:09:17 +0800 Subject: [PATCH 06/21] docs: expand cn/en guides for oauth, mobile, architecture and reverse proxy Broaden the bilingual documentation: add OAuth/OIDC and mobile sections, a hub-and-spoke architecture overview, Caddy reverse-proxy config alongside the existing nginx example, and assorted updates across the agent, alerts, server, status-page and configuration pages. --- apps/docs/content/docs/cn/agent.mdx | 77 +++--- apps/docs/content/docs/cn/alerts.mdx | 159 ++++++----- apps/docs/content/docs/cn/architecture.mdx | 12 +- apps/docs/content/docs/cn/configuration.mdx | 32 ++- apps/docs/content/docs/cn/cost-insights.mdx | 4 +- apps/docs/content/docs/cn/index.mdx | 14 + apps/docs/content/docs/cn/server.mdx | 248 ++++++++++-------- .../docs/content/docs/cn/service-monitors.mdx | 4 +- apps/docs/content/docs/cn/status-page.mdx | 150 +++++------ apps/docs/content/docs/en/agent.mdx | 82 ++++-- apps/docs/content/docs/en/alerts.mdx | 75 ++++-- apps/docs/content/docs/en/architecture.mdx | 10 +- apps/docs/content/docs/en/configuration.mdx | 14 +- apps/docs/content/docs/en/cost-insights.mdx | 4 +- apps/docs/content/docs/en/index.mdx | 75 ++++-- apps/docs/content/docs/en/server.mdx | 26 +- .../docs/content/docs/en/service-monitors.mdx | 4 +- apps/docs/content/docs/en/status-page.mdx | 157 ++++++----- 18 files changed, 671 insertions(+), 476 deletions(-) diff --git a/apps/docs/content/docs/cn/agent.mdx b/apps/docs/content/docs/cn/agent.mdx index af0d6b6e..812a10d6 100644 --- a/apps/docs/content/docs/cn/agent.mdx +++ b/apps/docs/content/docs/cn/agent.mdx @@ -179,6 +179,15 @@ serverbee install agent --method \ 如果 Agent **已注册成功**(`agent.toml` 中已有 `token`),注册码便不再使用,无需更正;要把该 Agent 接到另一台 Server,只能用新 Server 的新码重新注册。 +### 手动指定 Token + +如果不想使用注册码,也可以在管理面板手动创建一条服务器记录,然后把它的 token 直接填进配置: + +```toml +server_url = "http://your-server-ip:9527" +token = "从管理面板获取的 Agent Token" +``` + ## 配置文件 手动运行的 Agent 默认读取 `/etc/serverbee/agent.toml`。通过安装脚本部署时,配置文件位于 `/opt/serverbee/etc/agent.toml`(`/etc/serverbee` 为旧版布局,脚本会自动迁移)。 @@ -291,44 +300,13 @@ sudo journalctl -u serverbee-agent -f `AmbientCapabilities=CAP_NET_RAW` 是 ICMP Ping 探测所需的权限。如果不需要 ICMP 探测功能,可以移除此行。 -## 采集指标详情 - -Agent 使用 `sysinfo` 库采集以下系统指标: - -| 指标类别 | 具体指标 | 采集来源 | -|----------|----------|----------| -| CPU | 使用率、型号、核心数、架构 | `sysinfo::System` | -| 内存 | 已用/总量、Swap 已用/总量 | `sysinfo::System` | -| 磁盘 | 已用/总量 | `sysinfo::Disks` | -| 网络 | 入站/出站速率、累计流量 | `sysinfo::Networks` + 差值计算 | -| 负载 | load1 / load5 / load15 | `sysinfo::System::load_average()` | -| 进程 | 进程数 | `sysinfo::System::processes()` | -| 连接数 | TCP / UDP 连接数 | `/proc/net/tcp` (Linux) | -| 温度 | 传感器温度 | `sysinfo::Components` | -| GPU | 利用率、显存、温度 | `nvml-wrapper` (可选) | -| 系统信息 | OS、内核版本、运行时间 | `sysinfo::System` | -| 虚拟化 | 虚拟化类型 | `systemd-detect-virt` / DMI | - -## 资源开销 - -Agent CPU 开销可忽略(<1%),内存稳态在数十 MB 级别。完整的 Agent 与 Server CPU/内存/磁盘/网络实测数据见 [资源开销](/cn/docs/resource-usage)。 - -## 断线重连 - -Agent 与 Server 之间的 WebSocket 连接断开后,会自动尝试重连: - -- **退避策略**:1s -> 2s -> 4s -> 8s -> 16s -> 30s(上限) -- **随机抖动**:每次退避时间增加 +/-20% 的随机偏移,避免大量 Agent 同时重连造成雷群效应 -- **重连恢复**:重连成功后自动重新上报 `SystemInfo` 静态信息 -- **心跳检测**:Server 每 30 秒发送 Ping,Agent 回复 Pong;超过 30 秒无上报判定为离线 - ## 平台支持 | 平台 | 支持级别 | 说明 | |------|----------|------| | Linux (amd64/arm64) | 完整支持 | 主要目标平台,所有功能可用 | | macOS (amd64/arm64) | 完整支持 | 适用于开发和测试 | -| Windows (amd64) | 基本支持 | TCP/UDP 连接数采集方式不同 | +| Windows (amd64) | 基本支持 | TCP/UDP 连接数采集走不同代码路径 | | FreeBSD | 基本支持 | `sysinfo` 对 FreeBSD 的支持有限 | ## 自动更新 @@ -342,7 +320,7 @@ Server 可以向在线 Agent 推送升级命令。触发升级时: 5. 替换为新二进制 6. 自动重启进程 -管理员可以在 Dashboard 的服务器详情页面触发升级,也可以通过 API: +管理员可以在 Dashboard 的服务器详情页触发升级,也可以通过 API: ```bash curl -X POST https://your-server/api/servers/{id}/upgrade \ @@ -352,9 +330,40 @@ curl -X POST https://your-server/api/servers/{id}/upgrade \ ``` -自动更新需要 Agent 具有 `upgrade` 能力(CAP_UPGRADE)。新注册 Agent 默认会启用该能力,管理员仍然可以在 Settings → Capabilities 中手动关闭。 +自动更新需要 Agent 具有 `upgrade` 能力(`CAP_UPGRADE`)。新注册 Agent 默认会启用该能力,管理员可在 Settings → Capabilities 中手动关闭。 +## 断线重连 + +Agent 与 Server 之间维持一条持久 WebSocket 连接。连接断开后会自动重连: + +- **指数退避**:从 1 秒起步(1s → 2s → 4s → 8s → 16s → 30s 上限) +- **随机抖动**:每次退避增加 +/-20% 的随机偏移,避免大量 Agent 同时重连造成雷群效应 +- **重连恢复**:重连成功后退避重置为 1 秒,并自动重新上报 `SystemInfo` +- **心跳检测**:Server 每 30 秒发送一次 Ping,Agent 回复 Pong;超过 30 秒无上报即判定为离线 + +## 采集指标详情 + +Agent 使用 `sysinfo` 库采集以下系统指标: + +| 指标类别 | 具体指标 | 采集来源 | +|----------|----------|----------| +| CPU | 使用率、型号、核心数、架构 | `sysinfo::System` | +| 内存 | 已用/总量、Swap 已用/总量 | `sysinfo::System` | +| 磁盘 | 已用/总量 | `sysinfo::Disks` | +| 网络 | 入站/出站速率、累计流量 | `sysinfo::Networks` + 差值计算 | +| 负载 | load1 / load5 / load15 | `sysinfo::System::load_average()` | +| 进程 | 进程数 | `sysinfo::System::processes()` | +| 连接数 | TCP / UDP 连接数 | `/proc/net/tcp` (Linux) | +| 温度 | 传感器温度 | `sysinfo::Components` | +| GPU | 利用率、显存、温度 | `nvml-wrapper`(可选) | +| 系统信息 | OS、内核版本、运行时间 | `sysinfo::System` | +| 虚拟化 | 虚拟化类型 | `systemd-detect-virt` / DMI | + +## 资源开销 + +Agent CPU 开销可忽略(<1%),内存稳态在数十 MB 级别。完整的 Agent 与 Server CPU/内存/磁盘/网络实测数据见[资源开销](/cn/docs/resource-usage)。 + diff --git a/apps/docs/content/docs/cn/alerts.mdx b/apps/docs/content/docs/cn/alerts.mdx index 9ac4894c..dfb3722a 100644 --- a/apps/docs/content/docs/cn/alerts.mdx +++ b/apps/docs/content/docs/cn/alerts.mdx @@ -4,53 +4,51 @@ description: 配置告警规则和通知渠道,及时发现和响应服务器 icon: Bell --- -ServerBee 提供灵活的告警系统,支持多种指标类型的阈值监控、事件驱动告警、多种通知渠道以及精细的触发控制。 +ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动告警、多种通知渠道以及精细的触发控制。 ## 告警概述 -告警系统的工作流程: +后台任务每 60 秒评估一次所有启用的告警规则: -``` -后台任务 (每 1 分钟) - --> 遍历所有启用的告警规则 - --> 对每条规则的覆盖范围内的服务器逐一评估 - --> 满足触发条件 --> 通过通知组发送通知 - --> 从触发状态恢复 --> 发送恢复通知(如配置) -``` +1. 解析每条启用规则覆盖的服务器范围。 +2. 逐台检查规则条件是否满足。 +3. 触发时通过通知组发送通知(受去抖限制)。 +4. 之前触发的规则恢复后,清除告警状态。 + +告警状态会持久化到数据库,Server 重启后依然有效。事件驱动规则(IP 变化、SSH 登录、爆破、端口扫描)在 Agent 上报事件时评估,不参与 60 秒轮询。 ## 创建告警规则 -每条告警规则包含以下核心要素: +每条告警规则包含以下要素: -- **规则名称**:用于标识和展示 -- **告警条件**:一条或多条指标阈值条件(AND 关系) -- **覆盖范围**:适用的服务器范围 -- **触发模式**:always(持续通知)或 once(仅首次通知) -- **通知组**:告警触发后的通知方式 -- **关联任务**:触发/恢复时自动执行的远程命令(可选) +- **规则名称**:用于标识和展示。 +- **告警条件**:一条或多条指标条件,所有条件必须同时满足(AND 逻辑)。 +- **覆盖范围**:规则适用的服务器范围。 +- **触发模式**:`always`(持续通知,带去抖)或 `once`(仅首次触发通知)。 +- **通知组**:通知发送目标。 +- **关联任务**:触发/恢复时自动执行的远程命令(可选)。 +- **阻断源 IP**:可选。对安全事件类规则,触发时自动指示 Agent 防火墙阻断攻击源 IP。 ## 支持的指标类型 -ServerBee 支持 14 种以上的告警指标: - ### 资源阈值类 -| 指标类型 | 判定依据 | 示例 | -|----------|----------|------| -| `cpu` | CPU 使用率 (%) | CPU > 90% | -| `memory` | 内存使用率 (%) | 内存 > 85% | -| `swap` | Swap 使用率 (%) | Swap > 50% | -| `disk` | 磁盘使用率 (%) | 磁盘 > 90% | -| `load1` | 1 分钟负载 | load1 > 10 | -| `load5` | 5 分钟负载 | load5 > 8 | -| `load15` | 15 分钟负载 | load15 > 5 | -| `temperature` | 传感器温度 (C) | 温度 > 80C | -| `gpu` | GPU 平均使用率 (%) | GPU > 95% | -| `tcp_conn` | TCP 连接数 | TCP > 10000 | -| `udp_conn` | UDP 连接数 | UDP > 5000 | -| `process` | 进程数 | 进程 > 500 | -| `net_in_speed` | 入站网络速率 (bytes/s) | 入站 > 100MB/s | -| `net_out_speed` | 出站网络速率 (bytes/s) | 出站 > 100MB/s | +| 指标类型 | 判定依据 | `min` 阈值含义 | +|----------|----------|----------------| +| `cpu` | CPU 使用率 (%) | 使用率大于等于阈值时触发 | +| `memory` | 内存已用(字节) | 已用量大于等于阈值时触发 | +| `swap` | Swap 已用(字节) | 已用量大于等于阈值时触发 | +| `disk` | 磁盘已用(字节) | 已用量大于等于阈值时触发 | +| `load1` | 1 分钟负载 | 负载大于等于阈值时触发 | +| `load5` | 5 分钟负载 | 负载大于等于阈值时触发 | +| `load15` | 15 分钟负载 | 负载大于等于阈值时触发 | +| `temperature` | CPU 温度 (C) | 温度大于等于阈值时触发 | +| `gpu` | GPU 使用率 (%) | 使用率大于等于阈值时触发 | +| `tcp_conn` | TCP 连接数 | 连接数大于等于阈值时触发 | +| `udp_conn` | UDP 连接数 | 连接数大于等于阈值时触发 | +| `process` | 进程数 | 进程数大于等于阈值时触发 | +| `net_in_speed` | 入站网络速率(bytes/s) | 速率大于等于阈值时触发 | +| `net_out_speed` | 出站网络速率(bytes/s) | 速率大于等于阈值时触发 | ### 流量周期类 @@ -103,17 +101,46 @@ ServerBee 支持 14 种以上的告警指标: - **`cycle_interval`**:流量周期类型:`hour`、`day`、`week`、`month`、`year`。 - **`cycle_limit`**:流量周期规则的字节阈值。 -一条告警规则可以包含多个条件,所有条件必须同时满足(AND 逻辑)才会触发告警。例如: +### 示例 + +**CPU 超过 90%:** +```json +{ "rule_type": "cpu", "min": 90.0 } +``` + +**内存超过 8 GB:** +```json +{ "rule_type": "memory", "min": 8589934592 } +``` + +**服务器离线超过 2 分钟:** +```json +{ "rule_type": "offline", "duration": 120 } +``` + +**每月出站流量超过 1 TB:** +```json +{ + "rule_type": "transfer_out_cycle", + "cycle_interval": "month", + "cycle_limit": 1099511627776 +} +``` + +**服务器 7 天内到期:** +```json +{ "rule_type": "expiration", "duration": 7 } +``` + +一条告警规则可以包含多个条件,所有条件必须同时满足(AND 逻辑)才会触发。例如,下面的规则在 CPU 已用 ≥ 90% 且内存已用 ≥ 8 GB 时才触发: ```json [ { "rule_type": "cpu", "min": 90.0 }, - { "rule_type": "memory", "min": 85.0 } + { "rule_type": "memory", "min": 8589934592 } ] ``` -上述规则表示:CPU 大于等于 90% 且内存大于等于 85% 时才触发。 - ## 覆盖类型 告警规则的覆盖范围有三种模式: @@ -162,15 +189,17 @@ ServerBee 支持 14 种以上的告警指标: 当一条之前触发的告警规则不再满足触发条件时: -1. 标记为已恢复 -2. 如果配置了恢复触发任务 (`recover_trigger_tasks`),自动执行对应的远程命令 -3. 清除告警状态,下次满足条件时可以重新触发 +1. 标记为已恢复,并清除内存缓存和数据库中的告警状态。 +2. 如果配置了恢复触发任务(`recover_trigger_tasks`),自动执行对应的远程命令。 +3. 下次满足条件时按触发模式重新触发。 ### 维护窗口抑制 -当受影响服务器处于活动维护窗口时,ServerBee 会抑制该服务器的告警通知。规则评估仍会执行,但维护结束前不会发送通知。 +当受影响服务器处于活动维护窗口时,ServerBee 会抑制该服务器的告警通知。规则评估仍会执行,但维护结束前不会发送通知。事件驱动规则与轮询规则遵循同样的覆盖范围和维护窗口抑制逻辑。 -`ip_changed` 属于事件驱动规则,不参与每分钟轮询。它在 Agent 上报 IP 变化事件时评估,并遵循同样的覆盖范围和维护窗口抑制逻辑。 +### 阻断源 IP + +安全事件类规则(`ssh_brute_force_detected`、`port_scan_detected`)可以开启**阻断源 IP**。触发时,ServerBee 会指示受影响 Agent 的防火墙阻断攻击源 IP,把检测变为自动处置。该能力需要服务器具备 `CAP_FIREWALL_BLOCK` 权限。详见 [安全事件检测](/cn/docs/security-events) 和 [防火墙管理](/cn/docs/firewall)。 ## 通知渠道 @@ -251,8 +280,8 @@ APNs 需要 Apple Developer key、Team ID、Bundle ID 和私钥。只有开发 | `{{event}}` | 事件类型(触发/恢复) | | `{{message}}` | 告警详细信息 | | `{{time}}` | 事件发生时间 | -| `{{cpu}}` | 当前 CPU 使用率 | -| `{{memory}}` | 当前内存使用率 | +| `{{cpu}}` | 当前 CPU 使用率字符串 | +| `{{memory}}` | 当前内存使用字符串 | 默认通知模板: @@ -272,29 +301,37 @@ POST /api/notifications/:id/test ## 通知组 -通知组用于将多个通知渠道组合在一起。一条告警规则关联一个通知组,触发时会同时通过组内所有渠道发送通知。 +通知渠道按**通知组**组织。一条告警规则关联一个通知组,触发时会同时分发到组内所有启用的渠道。借此可以: -例如,你可以创建一个名为「紧急告警」的通知组,同时包含 Telegram 和 Email 两个渠道,确保重要告警不会遗漏。 +- 把同一条告警同时发到多个渠道(例如 Telegram + Email)。 +- 在多条告警规则间复用渠道配置。 +- 单独启用/禁用某个渠道,无需改动告警规则。 -## 离线检测告警 +## 离线检测 -离线检测是一种特殊的告警类型,用于监控服务器的在线状态: +离线状态由独立的后台任务判定,并配合 `offline` 类型规则发送通知: -- 通过后台任务每 10 秒扫描 Agent 连接状态 -- 当 Agent 最后一次上报时间超过 30 秒时判定为离线 -- 配合 `offline` 类型的告警规则,可以在服务器离线指定时长后发送通知 +- 每 10 秒扫描一次 Agent 连接状态。 +- Agent 最后一次上报超过 30 秒即判定为离线。 +- `offline` 规则的 `duration` 指定离线持续多少秒后触发告警。例如 `{ "rule_type": "offline", "duration": 60 }` 表示离线超过 60 秒触发。 -```json -{ - "rule_type": "offline", - "duration": 60 -} -``` +## 完整示例 + +用 Telegram 监控所有服务器 CPU 的典型配置: + +1. **创建 Telegram 通知渠道**,填入 bot token 和 chat ID。 +2. **创建通知组**,加入该 Telegram 渠道。 +3. **创建告警规则:** + - 名称:「High CPU Usage」 + - 条件:`[{"rule_type": "cpu", "min": 90.0}]` + - 触发模式:`always` + - 覆盖类型:`all` + - 通知组:上一步创建的组 -上述规则表示:服务器持续离线超过 60 秒后触发告警。 +此后,任意服务器在 10 分钟采样窗口内有 ≥ 70% 的采样点 CPU 超过 90% 时,你都会收到 Telegram 消息。条件持续期间,后续通知按 5 分钟去抖。 - + + - diff --git a/apps/docs/content/docs/cn/architecture.mdx b/apps/docs/content/docs/cn/architecture.mdx index 66559011..af6d7d4d 100644 --- a/apps/docs/content/docs/cn/architecture.mdx +++ b/apps/docs/content/docs/cn/architecture.mdx @@ -139,7 +139,7 @@ Agent Server |--- WebSocket 连接 + token ------>| | | 验证 token,查找 server |<-- Welcome { server_id, | - | protocol_version: 1, | + | protocol_version: 4, | | report_interval: 3 } -------| | | |--- SystemInfo { cpu_name, | 首次/重连后上报静态信息 @@ -181,7 +181,7 @@ Binary 帧格式:`[1 byte session_id 长度][session_id bytes][payload bytes]` ## 数据库设计 -ServerBee 使用 SQLite 数据库,通过 sea-orm ORM 管理。共有 25 个实体(表): +ServerBee 使用 SQLite 数据库(WAL 模式),通过 sea-orm 管理。下面按类别列出核心数据表: ### 用户与认证 @@ -313,10 +313,10 @@ Server 启动时通过 `tokio::spawn` 创建多个后台任务: | 接口 | 限制 | |------|------| -| `POST /api/auth/login` | 每 IP 每分钟 5 次 | -| `POST /api/agent/register` | 每 IP 每分钟 3 次 | +| `POST /api/auth/login` | 每 IP 15 分钟内 5 次 | +| `POST /api/agent/register` | 每 IP 15 分钟内 10 次 | -超限返回 `429 Too Many Requests`。 +超限返回 `429 Too Many Requests`。管理员可在「设置 → 速率限制」中清除活跃窗口。 ## Cargo Workspace 结构 @@ -337,7 +337,7 @@ ServerBee/ | | +-- state.rs # AppState | | +-- router/ # REST API + WebSocket handlers | | +-- service/ # 业务逻辑层 -| | +-- entity/ # sea-orm Entity (21 表) +| | +-- entity/ # sea-orm Entity (每表一个模块) | | +-- migration/ # 数据库迁移 | | +-- middleware/ # 认证、日志中间件 | +-- agent/ # Agent 采集端 diff --git a/apps/docs/content/docs/cn/configuration.mdx b/apps/docs/content/docs/cn/configuration.mdx index 473bc41d..95a6ea17 100644 --- a/apps/docs/content/docs/cn/configuration.mdx +++ b/apps/docs/content/docs/cn/configuration.mdx @@ -8,21 +8,19 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 ## 配置加载优先级 -配置项的加载顺序如下(后者覆盖前者): +配置项从多个来源合并加载,后者覆盖前者,因此环境变量始终具有最高优先级: -1. **内置默认值** -- 程序中硬编码的默认配置 -2. **TOML 配置文件** -- `/etc/serverbee/server.toml` 或 `/etc/serverbee/agent.toml` -3. **运行时环境变量** -- 以 `SERVERBEE_` 为前缀的环境变量 +1. 内置默认值 +2. `/etc/serverbee/server.toml` 或 `/etc/serverbee/agent.toml` +3. `/opt/serverbee/etc/server.toml` 或 `/opt/serverbee/etc/agent.toml`(安装脚本使用的路径) +4. 工作目录下的 `server.toml` 或 `agent.toml` +5. 以 `SERVERBEE_` 为前缀的环境变量 -即运行时环境变量具有最高优先级,可以在不修改配置文件的情况下覆盖任意配置项。 +这样无需修改 TOML 文件即可在运行时覆盖任意单个配置项。 ## 环境变量映射规则 -运行时环境变量遵循以下命名规则: - -- 统一使用 `SERVERBEE_` 前缀 -- 配置层级用 `__`(双下划线)分隔 -- 全部大写 +每个 TOML 键都直接对应一个环境变量:加 `SERVERBEE_` 前缀、键名全大写、每层嵌套用 `__`(双下划线)分隔。例如 `auth.secure_cookie` 对应 `SERVERBEE_AUTH__SECURE_COOKIE`。 ## 开发工作流环境变量 @@ -279,11 +277,11 @@ demo_data = false # --- 速率限制 --- [rate_limit] -# 登录接口速率限制:每 IP 每分钟最大请求数 +# 登录接口速率限制:每 IP 在 15 分钟窗口内的最大尝试次数 # 默认: 5 login_max = 5 -# Agent 注册接口速率限制:每 IP 每分钟最大请求数 +# Agent 注册接口速率限制:每 IP 在 15 分钟窗口内的最大尝试次数 # 默认: 10 register_max = 10 @@ -594,13 +592,13 @@ release_cert_spki_sha256 = "" | 配置项 | 默认值 | 说明 | |--------|--------|------| | 监听端口 | `9527` | HTTP 和 WebSocket 共用端口 | -| 数据目录 | `/var/lib/serverbee` | 数据库和持久化数据 | -| 数据库文件 | `serverbee.db` | SQLite 数据库 | +| 数据目录 | `./data` | 数据库和持久化数据(安装脚本部署为 `/opt/serverbee/data`) | +| 数据库文件 | `serverbee.db` | SQLite 数据库(相对于数据目录) | | 连接池大小 | `10` | 最大并发数据库连接 | | Session 有效期 | `86400` 秒(24 小时) | 滑动过期 | | 管理员用户名 | `admin` | 仅首次初始化时使用 | -| 登录速率限制 | `5` 次/分钟/IP | 防暴力破解 | -| 注册速率限制 | `3` 次/分钟/IP | 防滥用注册 | +| 登录速率限制 | `5` 次/15 分钟/IP | 防暴力破解 | +| 注册速率限制 | `10` 次/15 分钟/IP | 防滥用注册 | | 分钟级指标保留 | `7` 天 | 自动清理过期数据 | | 小时级指标保留 | `90` 天 | 长期趋势分析 | | GPU 指标保留 | `7` 天 | 与分钟级指标一致 | @@ -626,7 +624,7 @@ release_cert_spki_sha256 = "" | 文件管理 | 关闭 | 需在 Agent 和 Server 同时启用 | | 文件大小限制 | 1GB | 读取/下载的最大文件体积 | | IP 变更检测 | 开启 | 默认检测网络接口 IP 变更 | -| 外部 IP 检测 | 关闭 | 需手动启用 | +| 外部 IP 检测 | 开启 | 默认通过公共 IP 服务发现公网地址;设为空数组可禁用 | | IP 检测间隔 | `300` 秒(5 分钟) | 定期检查间隔 | | 日志级别 | `info` | 推荐生产环境使用 | diff --git a/apps/docs/content/docs/cn/cost-insights.mdx b/apps/docs/content/docs/cn/cost-insights.mdx index 62416807..9ae5a456 100644 --- a/apps/docs/content/docs/cn/cost-insights.mdx +++ b/apps/docs/content/docs/cn/cost-insights.mdx @@ -4,7 +4,9 @@ description: ServerBee 如何把账单 + Agent 指标转成 burn rate、资源 icon: CircleDollarSign --- -ServerBee 会把管理员录入的 `price` / `billing_cycle` / `currency` / `expired_at` 等账单信息,结合 Agent 上报的资源容量、利用率和在线时长,自动算出一组成本相关的衍生指标。这里只描述算法本身;如何在 UI 中录入账单字段见[管理员手册的"计费信息"小节](/cn/docs/admin#计费信息)。 +ServerBee 会把管理员录入的 `price` / `billing_cycle` / `currency` / `expired_at` 等账单信息,结合 Agent 上报的资源容量、利用率和在线时长,自动算出一组成本信号:burn rate、各资源单价以及 0–100 的价值评分。这些信号会显示在服务器列表、仪表盘服务器卡片,以及服务器详情页的成本面板中。 + +本页描述评分算法本身;如何在 UI 中录入账单字段见[管理员手册的"计费信息"小节](/cn/docs/admin#计费信息)。 ## 衍生输出 diff --git a/apps/docs/content/docs/cn/index.mdx b/apps/docs/content/docs/cn/index.mdx index 79c22493..95d79759 100644 --- a/apps/docs/content/docs/cn/index.mdx +++ b/apps/docs/content/docs/cn/index.mdx @@ -45,6 +45,10 @@ Agent 通过 WebSocket 与 Server 保持长连接,实现实时数据推送。 录入服务器的价格和计费周期后,ServerBee 会基于资源、利用率和在线时长自动计算每台 VPS 的价值评分(excellent / good / okay / poor / waste),并把月度等价成本、burn 速率、剩余天数等信号同时呈现在服务器列表、仪表盘 server card 以及服务器详情页的成本洞察面板。 +### OAuth 与移动端 + +支持通过 GitHub、Google 或任意通用 OIDC 提供商登录。iOS 客户端支持移动端会话、设备配对和推送通知。 + ## 技术栈 | 组件 | 技术选型 | @@ -63,6 +67,16 @@ Agent 通过 WebSocket 与 Server 保持长连接,实现实时数据推送。 - **单二进制部署**:Server 将前端静态资源通过 `rust-embed` 嵌入二进制文件,部署时只需一个可执行文件 - **零依赖**:无需安装 MySQL、Redis 等外部服务,SQLite 内嵌于程序中 +## 工作原理 + +ServerBee 采用 hub-and-spoke(中心辐射)架构: + +1. **Server** 是中心节点:提供 Web 界面和 REST API,管理所有 WebSocket 连接,运行后台任务。 +2. **Agent** 跑在每台被监控的 VPS 上,每隔几秒采集并上报指标。 +3. **前端**是嵌入 Server 二进制中的 React SPA,无需额外部署。 + +Agent 与 Server 之间的通信全部走 WebSocket(JSON 消息)。终端会话和部分流式功能使用专用的 WebSocket 路由(二进制帧)。 + ## 快速链接 diff --git a/apps/docs/content/docs/cn/server.mdx b/apps/docs/content/docs/cn/server.mdx index 3b2c74e6..874f2dda 100644 --- a/apps/docs/content/docs/cn/server.mdx +++ b/apps/docs/content/docs/cn/server.mdx @@ -60,123 +60,113 @@ cargo build --release -p serverbee-server ## 配置文件 -手动运行的 Server 默认读取 `/etc/serverbee/server.toml`;文件不存在时使用内置默认值。 +Server 按以下顺序读取 TOML 配置,靠后的来源会覆盖靠前的: + +1. `/etc/serverbee/server.toml`(系统级) +2. `server.toml`(工作目录) +3. 以 `SERVERBEE_` 为前缀的环境变量 通过安装脚本部署时,配置文件位于 `/opt/serverbee/etc/server.toml`,数据目录为 `/opt/serverbee/data/`(脚本会显式写入这些路径,覆盖下方的内置默认值)。`/etc/serverbee`、`/var/lib/serverbee` 是旧版布局,仅在历史安装中出现,脚本会自动迁移。 -```toml title="/etc/serverbee/server.toml" +下面的 `server.toml` 列出最常用的配置项及其默认值。完整选项见[配置参考](/cn/docs/configuration)。 + +```toml [server] -listen = "0.0.0.0:9527" -data_dir = "/var/lib/serverbee" +listen = "0.0.0.0:9527" # 监听地址和端口 +data_dir = "./data" # 数据库及其他数据文件的存储目录 +trusted_proxies = [] # 默认为内网/回环 CIDR;设为 [] 表示禁用 [database] -path = "serverbee.db" -max_connections = 10 +path = "serverbee.db" # 数据库文件名(相对于 data_dir) +max_connections = 10 # SQLite 连接池最大连接数 [auth] -session_ttl = 86400 -max_servers = 0 - -[rate_limit] -login_max = 5 -register_max = 3 +session_ttl = 86400 # Session 过期时间,单位秒(默认 24 小时) +max_servers = 0 # 新接入服务器的软上限(0 表示不限制) +secure_cookie = true # 是否给 Session Cookie 设置 Secure 标记(纯 HTTP 本地调试需关闭) [retention] -records_days = 7 -records_hourly_days = 90 -gpu_records_days = 7 -ping_records_days = 7 -audit_logs_days = 180 - -[geoip] -mmdb_path = "" # 路径非空即启用 GeoIP - -[log] -level = "info" -file = "" -``` - -### 配置项说明 - -#### `[server]` 基础配置 - -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `listen` | string | `"0.0.0.0:9527"` | 监听地址和端口 | -| `data_dir` | string | `"/var/lib/serverbee"` | 数据存储目录(数据库文件、日志等) | - -#### `[database]` 数据库配置 - -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `path` | string | `"serverbee.db"` | 数据库文件名,相对于 `data_dir` | -| `max_connections` | int | `10` | 连接池最大连接数 | - -#### `[auth]` 认证配置 +records_days = 7 # 分钟级指标保留天数 +records_hourly_days = 90 # 小时级聚合指标保留天数 +gpu_records_days = 7 # GPU 指标保留天数 +ping_records_days = 7 # Ping 探测记录保留天数 +audit_logs_days = 180 # 审计日志保留天数 +network_probe_days = 7 # 网络探测记录保留天数 +network_probe_hourly_days = 90 # 小时级网络探测聚合保留天数 +traffic_hourly_days = 7 # 流量小时记录保留天数 +traffic_daily_days = 400 # 流量日记录保留天数 +task_results_days = 7 # 任务执行结果保留天数 +docker_events_days = 7 # Docker 事件记录保留天数 +service_monitor_days = 30 # 服务监控记录保留天数 -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `session_ttl` | int | `86400` | Session 过期时间,单位秒(默认 24 小时) | -| `max_servers` | int | `0` | 通过注册码接入允许创建的新服务器软上限(0 表示不限制) | +[rate_limit] +login_max = 5 # 每个时间窗口内最大登录尝试次数 +register_max = 3 # 每个时间窗口内最大 Agent 注册尝试次数 -#### `[rate_limit]` 速率限制 +[scheduler] +timezone = "UTC" # 每日流量聚合所用时区(如 Asia/Shanghai) -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `login_max` | int | `5` | 登录接口每 IP 每分钟最大请求数 | -| `register_max` | int | `3` | Agent 注册接口每 IP 每分钟最大请求数 | +[log] +level = "info" # 日志级别:trace / debug / info / warn / error +file = "" # 日志文件路径(留空仅输出到 stdout) -#### `[retention]` 数据保留策略 +[upgrade] +release_base_url = "https://github.com/ZingerLittleBee/ServerBee/releases" # Agent 升级的基础 URL -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `records_days` | int | `7` | 分钟级指标保留天数 | -| `records_hourly_days` | int | `90` | 小时级指标保留天数 | -| `gpu_records_days` | int | `7` | GPU 指标保留天数 | -| `ping_records_days` | int | `7` | Ping 记录保留天数 | -| `audit_logs_days` | int | `180` | 审计日志保留天数 | +[geoip] +mmdb_path = "" # MaxMind GeoLite2-City.mmdb 路径(非空即启用 GeoIP) -#### `[geoip]` 地理位置 +[oauth] +base_url = "" # ServerBee 实例的公网 URL +allow_registration = false # 是否允许首次 OAuth 登录时自动创建用户 -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `mmdb_path` | string | `""` | MaxMind MMDB 数据库文件路径,路径非空即启用 GeoIP | +[oauth.github] +client_id = "" +client_secret = "" -#### `[log]` 日志配置 +[oauth.google] +client_id = "" +client_secret = "" -| 配置项 | 类型 | 默认值 | 说明 | -|--------|------|--------|------| -| `level` | string | `"info"` | 日志级别:`trace`/`debug`/`info`/`warn`/`error` | -| `file` | string | `""` | 日志文件路径,留空输出到 stdout | +[oauth.oidc] +issuer_url = "" +client_id = "" +client_secret = "" +scopes = ["openid", "email", "profile"] +``` ## 环境变量 -所有配置项都可以通过环境变量覆盖。环境变量使用 `SERVERBEE_` 前缀,层级用 `__`(双下划线)分隔。 +每个配置项都可以通过环境变量设置:前缀 `SERVERBEE_`,层级用 `__`(双下划线)分隔。环境变量的优先级高于配置文件。 ```bash -# 等同于 server.listen = "0.0.0.0:8080" -export SERVERBEE_SERVER__LISTEN="0.0.0.0:8080" +# server.listen +export SERVERBEE_SERVER__LISTEN="0.0.0.0:9527" -# 等同于 retention.records_days = 14 +# retention.records_days export SERVERBEE_RETENTION__RECORDS_DAYS=14 +# oauth.github.client_id +export SERVERBEE_OAUTH__GITHUB__CLIENT_ID="your-github-client-id" + # geoip.mmdb_path(路径非空即启用 GeoIP) export SERVERBEE_GEOIP__MMDB_PATH="/path/to/GeoLite2-City.mmdb" ``` -环境变量的优先级高于配置文件。 - ## 数据库 -ServerBee 使用 SQLite 作为数据存储,选择 SQLite 的原因是部署简单、无需额外服务、适合轻量级 VPS 场景。 +ServerBee 使用 SQLite 存储所有持久化数据。首次启动时会自动在 `data_dir` 下创建数据库文件。启动时自动运行数据库迁移,无需手动维护表结构。 + +以下 SQLite pragma 会自动设置: -- **自动创建**:首次启动时自动在 `data_dir` 下创建数据库文件 -- **自动迁移**:启动时自动运行数据库迁移,无需手动操作 -- **WAL 模式**:自动启用 WAL (Write-Ahead Logging) 模式,提高并发读性能 -- **Busy Timeout**:设为 5000ms,多个后台任务并发写入时自动等待锁释放 -- **同步模式**:使用 `PRAGMA synchronous=NORMAL`,在 WAL 模式下兼顾安全性和性能 +| Pragma | 取值 | 作用 | +|--------|------|------| +| `journal_mode` | WAL | 提高并发读性能 | +| `synchronous` | NORMAL | 兼顾安全性与速度 | +| `busy_timeout` | 5000ms | 数据库被锁时最多等待 5 秒 | +| `foreign_keys` | ON | 强制外键引用完整性 | ## 初始管理员账户 @@ -196,21 +186,6 @@ Server 首次启动时,如果 `users` 表为空,会自动创建管理员账 自动生成的密码只在日志中显示一次。请在日志轮转前记录下来,并在首次登录时完成强制改密,再将 Server 暴露到公网。 -## GeoIP 设置 - -ServerBee 支持通过 MaxMind MMDB 数据库查询 Agent 的地理位置信息。 - -1. 从 [MaxMind](https://www.maxmind.com/) 下载 GeoLite2-City.mmdb 文件 -2. 将文件放置到服务器上的某个路径 -3. 在配置中启用: - -```toml -[geoip] -mmdb_path = "/var/lib/serverbee/GeoLite2-City.mmdb" -``` - -启用后,Agent 连接时 Server 会自动查询其 IP 对应的地理位置,并在服务器信息中显示所在地区和国家。 - ## 一次性注册码(Enrollment Code) Agent 首次注册通过一次性注册码完成。由管理员在 Web 管理面板的「设置」页面生成(或调用 `POST /api/agent/enrollments`,仅管理员可用,返回 `{ id, code, expires_at }`)。注册码具有以下特性: @@ -236,33 +211,71 @@ Agent 首次注册通过一次性注册码完成。由管理员在 Web 管理面 这个恢复流程用于“同一台逻辑机器重装后重新接回”。它不适用于任意两条服务器记录的合并,也不适用于迁移到完全不同的硬件主机。 +## GeoIP 设置 + +启用 Agent 地理位置识别: + +1. 下载免费的 [MaxMind GeoLite2-City](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data) MMDB 数据库(需要免费的 MaxMind 账号) +2. 将 `GeoLite2-City.mmdb` 文件放到服务器上的某个可访问路径 +3. 在配置中启用: + +```toml +[geoip] +mmdb_path = "/path/to/GeoLite2-City.mmdb" +``` + +启用后,Agent 连接时 Server 会自动把其 IP 解析为所在地区和国家代码。 + ## OAuth 设置 -ServerBee 支持通过 OAuth 第三方登录,目前支持以下提供商: +ServerBee 支持通过 GitHub、Google 以及任意 OpenID Connect 提供商进行第三方登录。 + +### 前置条件 + +把 `oauth.base_url` 设为 ServerBee 实例的公网 URL,用于拼接回调地址: -- **GitHub** -- **Google** -- **通用 OIDC**(OpenID Connect) +```toml +[oauth] +base_url = "https://serverbee.example.com" +allow_registration = true # 设为 true 时,首次 OAuth 登录会自动创建账户 +``` -OAuth 配置示例: +### GitHub + +1. 前往 [GitHub Developer Settings](https://github.com/settings/developers) 创建一个 OAuth App +2. 回调地址设为 `https://serverbee.example.com/api/auth/oauth/github/callback` +3. 把凭据填入配置: ```toml [oauth.github] -client_id = "your_github_client_id" -client_secret = "your_github_client_secret" +client_id = "your-client-id" +client_secret = "your-client-secret" +``` +### Google + +1. 在 [Google Cloud Console](https://console.cloud.google.com/apis/credentials) 创建 OAuth 凭据 +2. 回调地址设为 `https://serverbee.example.com/api/auth/oauth/google/callback` +3. 把凭据填入配置: + +```toml [oauth.google] -client_id = "your_google_client_id" -client_secret = "your_google_client_secret" +client_id = "your-client-id" +client_secret = "your-client-secret" +``` +### 通用 OIDC + +适用于任意 OpenID Connect 提供商(如 Keycloak、Authentik、Authelia): + +```toml [oauth.oidc] -issuer_url = "https://your-oidc-provider.com" -client_id = "your_client_id" -client_secret = "your_client_secret" +issuer_url = "https://auth.example.com/realms/main" +client_id = "serverbee" +client_secret = "your-client-secret" +scopes = ["openid", "email", "profile"] ``` -OAuth 回调地址格式为 `https://your-domain/api/auth/oauth/{provider}/callback`,请在对应平台的 OAuth 应用设置中配置。 - ## 反向代理 ServerBee 本身不处理 TLS,生产环境建议使用反向代理来提供 HTTPS 支持。 @@ -343,6 +356,21 @@ server_url = "https://monitor.example.com" 更多反向代理配置(含 Traefik)请参阅[部署指南](/cn/docs/deployment)。 +## 后台任务 + +Server 会自动运行多个后台任务: + +| 任务 | 间隔 | 作用 | +|------|------|------| +| RecordWriter | 60s | 把缓存的 Agent 上报写入数据库 | +| OfflineChecker | 10s | 检测停止上报的 Agent(30s 阈值) | +| Aggregator | 每小时 | 将原始记录聚合为小时级汇总 | +| Cleanup | 每小时 | 按保留策略清理过期记录 | +| SessionCleaner | 定期 | 清理过期的用户 Session | +| AlertEvaluator | 60s | 评估所有启用的告警规则 | + +所有任务自动启动,无需手动配置。 + diff --git a/apps/docs/content/docs/cn/service-monitors.mdx b/apps/docs/content/docs/cn/service-monitors.mdx index c0cf869d..97bb7ac3 100644 --- a/apps/docs/content/docs/cn/service-monitors.mdx +++ b/apps/docs/content/docs/cn/service-monitors.mdx @@ -4,9 +4,9 @@ description: 从 ServerBee 服务端监控 SSL 证书、DNS 记录、HTTP 关键 icon: Radar --- -服务监控(Service Monitors)是一类由 ServerBee 服务端主动执行的合成检查。它适合监控公开服务,即使这些服务并不运行在安装了 ServerBee Agent 的主机上也可以使用。 +服务监控(Service Monitors)是一类由 ServerBee 服务端主动执行的合成检查。即使目标服务没有运行在安装了 ServerBee Agent 的主机上,也能用它来监控公开服务。 -与 Ping 监控不同,Ping 监控由 Agent 从各节点发起探测;服务监控由中心 Server 进程执行。检查结果会写入 SQLite,在仪表盘中展示,也可以通过通知组发送告警。 +与 Ping 监控不同:Ping 监控由各节点的 Agent 发起探测,而服务监控由中心 Server 进程执行。检查结果会写入 SQLite、在仪表盘中展示,并可通过通知组触发告警。 ## 支持的监控类型 diff --git a/apps/docs/content/docs/cn/status-page.mdx b/apps/docs/content/docs/cn/status-page.mdx index 133ed842..92729812 100644 --- a/apps/docs/content/docs/cn/status-page.mdx +++ b/apps/docs/content/docs/cn/status-page.mdx @@ -1,88 +1,101 @@ --- title: 公开状态页 -description: 发布包含事件公告、维护窗口和可用性历史的公开健康状态页。 +description: 发布包含实时指标、事件公告、维护窗口和可用性历史的公开健康状态页。 icon: Globe --- -ServerBee 提供两种公开状态页: +ServerBee 提供一个公开状态页,地址为 `https://your-server/status`。它无需登录即可访问,方便你向用户或相关方公示服务健康状态。 -- **默认状态页:** `https://your-server/status`,由 `GET /api/status` 提供数据,展示所有非隐藏服务器。 -- **可配置状态页:** `https://your-server/status/{slug}`,由 `GET /api/status/{slug}` 提供数据,展示管理员选择的服务器和配置项。 +页面展示管理员选定的服务器和模块,数据来自公开端点 `GET /api/status/config` 和 `GET /api/status`。 -两者均为公开访问,不需要登录认证。 +## 页面展示内容 -## 默认 `/status` 页面 +- 在线/总服务器数量。 +- 每台选中服务器的在线/离线状态和分组标签。 +- 在线服务器的实时指标:CPU、内存、Swap、磁盘、磁盘 I/O、网络速率/流量、负载、连接数、uptime。 +- 每台服务器的 90 天可用性时间线。 +- 已配置的公开备注。 +- 可选模块:服务器详情、网络质量、IP 质量、事件公告、维护窗口。 -默认状态页适合不创建自定义页面时快速公开服务器概览。 +IP 地址、主机名、网卡等敏感标识会在 API 层脱敏,不会出现在公开页面上。 -它展示: +## 配置状态页 -- 在线/总服务器数量 -- 所有 `hidden = false` 的服务器 -- 服务器分组标签 -- 在线/离线状态 -- 在线服务器的实时指标:CPU、内存、磁盘、网络速率/流量、uptime、负载 -- 已配置的公开备注 - -公开 API: - -```http -GET /api/status -``` - -## 可配置状态页 - -在 **Settings → Status Pages** 中创建和管理状态页。每个页面都有独立 slug,可以单独分享。 - -### 页面设置 +在 **Settings → Status Page** 中配置。状态页是单例:只有一个页面,原地编辑,而非按 slug 创建多个。 | 设置 | API 字段 | 说明 | |------|----------|------| +| 启用 | `enabled` | 禁用后公开页面不返回数据 | | 标题 | `title` | 公开页面标题 | -| Slug | `slug` | `/status/{slug}` 的 URL 片段 | | 描述 | `description` | 可选介绍文本 | -| 服务器 | `server_ids_json` | 此页面展示的服务器 | -| 按服务器分组 | `group_by_server_group` | 按 ServerBee 分组组织服务器 | -| 显示数值 | `show_values` | 在公开页显示可用性/状态数值 | -| 自定义 CSS | `custom_css` | 应用于该页面的额外 CSS | -| 启用 | `enabled` | 禁用后页面返回 404 | +| 服务器 | `server_ids` | 页面展示的服务器 | +| 默认布局 | `default_layout` | `list` 或 `grid` | +| 显示服务器详情 | `show_server_detail` | 允许下钻查看单台服务器详情 | +| 显示网络 | `show_network` | 显示网络质量模块 | +| 显示 IP 质量 | `show_ip_quality` | 显示 IP 质量模块 | +| 显示事件 | `show_incidents` | 显示事件公告模块 | +| 显示维护 | `show_maintenance` | 显示维护窗口模块 | | 黄色可用性阈值 | `uptime_yellow_threshold` | 低于该百分比的日期显示为降级 | | 红色可用性阈值 | `uptime_red_threshold` | 低于该百分比的日期显示为严重故障 | - -当前 API 请求字段使用 `server_ids_json` 和 `status_page_ids_json` 表示选择的 ID。这些字段在请求体中接受 JSON 数组。 - +### 管理 API + +| 方法 | 路径 | 说明 | +|------|------|------| +| GET | `/api/status-page` | 读取状态页配置 | +| PUT | `/api/status-page` | 更新状态页配置 | -### 公开页面数据 +更新示例: -```http -GET /api/status/{slug} +```json +{ + "title": "Production Status", + "description": "Public health for production services", + "server_ids": ["server-id-1", "server-id-2"], + "default_layout": "grid", + "show_server_detail": true, + "show_network": true, + "show_incidents": true, + "show_maintenance": true, + "enabled": true, + "uptime_yellow_threshold": 99.9, + "uptime_red_threshold": 95 +} ``` -响应包含: +## 公开 API -- `page` -- 页面元数据和显示选项 -- `servers` -- 选中服务器状态、可用性百分比和 90 天每日可用性数据 -- `active_incidents` -- 关联到页面且尚未解决的事件 -- `planned_maintenances` -- 关联到页面的活动/计划维护窗口 -- `recent_incidents` -- 最近历史中的已解决事件 +以下端点无需认证,是公开页面的数据来源: -服务器条目包含 `server_id`、`server_name`、地区/国家、操作系统、分组、`online`、`uptime_percent`、`uptime_daily` 和 `in_maintenance`。 +| 方法 | 路径 | 说明 | +|------|------|------| +| GET | `/api/status/config` | 页面元数据和显示选项 | +| GET | `/api/status` | 选中服务器的状态、实时指标和可用性 | +| GET | `/api/status/servers/{id}` | 单台服务器详情 | +| GET | `/api/status/servers/{id}/metrics` | 单台服务器的时序指标 | +| GET | `/api/status/servers/{id}/uptime-daily` | 单台服务器的 90 天每日可用性 | +| GET | `/api/status/network` | 网络质量概览 | +| GET | `/api/status/network/{id}` | 单台服务器网络质量详情 | +| GET | `/api/status/ip-quality` | IP 质量概览 | +| GET | `/api/status/incidents` | 活动和近期事件 | +| GET | `/api/status/maintenances` | 活动和计划中的维护窗口 | + +每个服务器条目包含 `id`、`name`、`group_name`、地区/国家、`os`、`online`、`in_maintenance`、`uptime_percent` 和 `uptime_daily`。 ## 可用性时间线 -可配置状态页上的每台服务器都可以展示 90 天可用性时间线。每根条代表一天: +每台服务器展示 90 天可用性时间线,每根条代表一天: -- **绿色** -- 健康可用性 -- **黄色** -- 低于该页面黄色阈值 -- **红色** -- 低于该页面红色阈值 -- **灰色** -- 无数据 +- **绿色** -- 健康可用性。 +- **黄色** -- 低于黄色阈值。 +- **红色** -- 低于红色阈值。 +- **灰色** -- 无数据。 可用性数据来自 `uptime_daily` 表,由服务端后台聚合任务生成。缺失日期会自动补齐,保证时间线连续。 ## 事件公告(Incidents) -事件公告用于公开说明故障或服务降级。事件可以关联到指定服务器、状态页或两者。 +事件公告用于公开说明故障或服务降级,可选关联到指定服务器。 ### 字段 @@ -92,9 +105,9 @@ GET /api/status/{slug} | `status` | `investigating`、`identified`、`monitoring` 或 `resolved` | | `severity` | `minor`、`major` 或 `critical` | | `server_ids_json` | 可选,受影响服务器 | -| `status_page_ids_json` | 可选,受影响状态页 | +| `is_public` | 是否在公开状态页展示 | -一个事件可以包含多条 update。添加 update 会记录消息,并把事件状态更新为该 update 的状态。状态变为 `resolved` 时会设置 `resolved_at`。 +一个事件可以包含多条 update,每条 update 有自己的 `status` 和 `message`。添加 update 会记录消息,并把事件状态更新为该 update 的状态。状态变为 `resolved` 时会设置 `resolved_at`。 ### API @@ -108,7 +121,7 @@ GET /api/status/{slug} ## 维护窗口 -维护窗口用于公告计划维护,并在活动期间抑制相关服务器的通知。 +维护窗口用于公告计划维护,并在活动期间抑制相关服务器的告警通知。 ### 字段 @@ -119,7 +132,7 @@ GET /api/status/{slug} | `start_at` | UTC 开始时间 | | `end_at` | UTC 结束时间,必须晚于 `start_at` | | `server_ids_json` | 可选,受影响服务器 | -| `status_page_ids_json` | 可选,受影响状态页 | +| `is_public` | 是否在公开状态页展示 | | `active` | 是否启用该维护窗口 | ### API @@ -131,33 +144,8 @@ GET /api/status/{slug} | PUT | `/api/maintenances/{id}` | 更新维护窗口 | | DELETE | `/api/maintenances/{id}` | 删除维护窗口 | -## 管理 API - -| 方法 | 路径 | 说明 | -|------|------|------| -| GET | `/api/status-pages` | 列出已配置状态页 | -| POST | `/api/status-pages` | 创建状态页 | -| PUT | `/api/status-pages/{id}` | 更新状态页 | -| DELETE | `/api/status-pages/{id}` | 删除状态页 | - -创建示例: - -```json -{ - "title": "Production Status", - "slug": "production", - "description": "Public health for production services", - "server_ids_json": ["server-id-1", "server-id-2"], - "group_by_server_group": true, - "show_values": true, - "enabled": true, - "uptime_yellow_threshold": 99.9, - "uptime_red_threshold": 95 -} -``` - - + diff --git a/apps/docs/content/docs/en/agent.mdx b/apps/docs/content/docs/en/agent.mdx index 9c9d2745..5ed81f19 100644 --- a/apps/docs/content/docs/en/agent.mdx +++ b/apps/docs/content/docs/en/agent.mdx @@ -58,6 +58,9 @@ chmod +x serverbee-agent ```bash cargo build --release -p serverbee-agent + +# With NVIDIA GPU monitoring (optional) +cargo build --release -p serverbee-agent --features gpu ``` The binary is located at `target/release/serverbee-agent`. @@ -213,24 +216,35 @@ This policy is local to the agent process. The server can only further restrict ## GPU Monitoring -GPU monitoring is disabled by default because it requires `nvidia-smi` to be available on the system. To enable it: - -1. Ensure NVIDIA drivers and `nvidia-smi` are installed -2. Set `enable_gpu = true` in the agent config - -```toml -[collector] -enable_gpu = true -``` - -When enabled, the agent collects per-GPU metrics: -- Device name -- Memory total and used -- GPU utilization percentage -- GPU temperature +NVIDIA GPU monitoring is disabled by default and requires all three of the following: + +1. **Build time** -- compile the agent with the `gpu` feature flag (the prebuilt release binaries do not include it): + ```bash + cargo build --release -p serverbee-agent --features gpu + ``` +2. **Runtime** -- NVIDIA drivers and the NVML library installed on the host +3. **Config** -- set `enable_gpu = true`: + ```toml + [collector] + enable_gpu = true + ``` + +When enabled, the agent collects per-GPU metrics for every device: + +| Metric | Description | +|--------|-------------| +| Device name | GPU model | +| Memory total | Total VRAM | +| Memory used | VRAM currently in use | +| GPU utilization | Compute utilization percentage | +| GPU temperature | Current temperature | These metrics appear in the server dashboard and can be used in alert rules. + +Only NVIDIA GPUs are supported (via the `nvml-wrapper` library). AMD and Intel GPU support is planned for a future release. + + ## Running as a Systemd Service For production deployments, run the agent as a systemd service so it starts automatically on boot. @@ -278,27 +292,47 @@ journalctl -u serverbee-agent -f The agent needs to run as root to access all system metrics (temperature sensors, process lists, etc.) and to open PTY sessions for the web terminal. If you do not need terminal access, you can run it as a non-root user, though some metrics may be unavailable. +## Platform Support + +| Platform | Support level | Notes | +|----------|---------------|-------| +| Linux (amd64/arm64) | Full | Primary target; all features available | +| macOS (amd64/arm64) | Full | Suitable for development and testing | +| Windows (amd64) | Basic | TCP/UDP connection counts use a different code path | +| FreeBSD | Basic | `sysinfo` support for FreeBSD is limited | + ## Auto-Update The server can push upgrade commands to connected agents. When an upgrade is triggered: 1. The server sends an `Upgrade` message with a download URL and version 2. The agent downloads the new binary -3. Verifies the SHA-256 checksum (if provided via `x-checksum-sha256` header) +3. Verifies the SHA-256 checksum (if provided via the `x-checksum-sha256` header) 4. Backs up the current binary (`.bak` extension) 5. Replaces itself with the new binary 6. Restarts automatically -This process is transparent and requires no manual intervention on the agent side. +Trigger an upgrade from the server detail page in the dashboard, or via the API: + +```bash +curl -X POST https://your-server/api/servers/{id}/upgrade \ + -H "Cookie: session=..." \ + -H "Content-Type: application/json" \ + -d '{"url": "https://github.com/.../releases/download/v1.2.0/serverbee-agent", "version": "1.2.0"}' +``` + + +Auto-update requires the agent to have the `upgrade` capability (`CAP_UPGRADE`), which is enabled by default for newly registered agents. An admin can disable it under Settings → Capabilities. + ## Connection Behavior The agent maintains a persistent WebSocket connection to the server. If the connection drops: -- It reconnects using **exponential backoff** starting at 1 second -- The maximum backoff interval is 30 seconds -- A random jitter of +/-20% is applied to avoid thundering herd reconnections -- Upon successful reconnection, the backoff resets to 1 second +- It reconnects using **exponential backoff** starting at 1 second (1s → 2s → 4s → 8s → 16s → 30s cap) +- A random jitter of +/-20% is applied to avoid thundering-herd reconnections +- Upon successful reconnection, the backoff resets to 1 second and the agent re-sends its `SystemInfo` +- **Heartbeat**: the server pings every 30 seconds and the agent replies with a pong; an agent that stops reporting for more than 30 seconds is marked offline During the initial connection: @@ -331,3 +365,9 @@ During the initial connection: ## Resource Footprint The Agent has negligible CPU cost (<1%) and a steady-state memory footprint in the tens of MB. For the full measured Agent and Server CPU/memory/disk/network data, see [Resource Usage](/en/docs/resource-usage). + + + + + + diff --git a/apps/docs/content/docs/en/alerts.mdx b/apps/docs/content/docs/en/alerts.mdx index a80c889d..e1f47b3b 100644 --- a/apps/docs/content/docs/en/alerts.mdx +++ b/apps/docs/content/docs/en/alerts.mdx @@ -4,28 +4,29 @@ description: Configure alert rules with threshold monitoring and multi-channel n icon: Bell --- -ServerBee includes a flexible alerting system that evaluates metric thresholds and sends notifications through multiple channels when conditions are met. +ServerBee includes a flexible alerting system that evaluates metric thresholds and security events, then sends notifications through multiple channels when conditions are met. ## How Alerts Work -The alert system runs as a background task that evaluates all enabled rules every 60 seconds: +A background task evaluates all enabled rules every 60 seconds: -1. For each enabled alert rule, the server resolves which servers are covered -2. For each covered server, it checks whether the rule's conditions are met -3. If a rule triggers, a notification is sent (subject to debounce logic) -4. If a previously triggered rule recovers, the alert state is cleared +1. Resolve which servers each enabled rule covers. +2. Check whether the rule's conditions are met for each covered server. +3. Send a notification when a rule triggers (subject to debounce). +4. Clear the alert state when a previously triggered rule recovers. -All alert state is persisted in the database and survives server restarts. +Alert state is persisted in the database and survives server restarts. Event-driven rules (IP change, SSH login, brute-force, port scan) are evaluated when an agent reports the event rather than on the 60-second cycle. ## Creating Alert Rules An alert rule consists of: -- **Name** -- A descriptive label for the rule -- **Rules** -- One or more metric conditions (all must be true simultaneously -- AND logic) -- **Trigger mode** -- `always` (repeat with debounce) or `once` (notify only on first trigger) -- **Cover type** -- Which servers this rule applies to -- **Notification group** -- Where to send notifications +- **Name** -- A descriptive label. +- **Rules** -- One or more metric conditions, all of which must be true simultaneously (AND logic). +- **Trigger mode** -- `always` (repeat with debounce) or `once` (notify only on the first trigger). +- **Cover type** -- Which servers the rule applies to. +- **Notification group** -- Where notifications are sent. +- **Block source IP** -- Optional. For security-event rules, automatically instructs the agent's firewall to block the offending source IP when the rule triggers. ### Supported Metric Types @@ -111,11 +112,11 @@ Each rule item supports these fields: ## Sampling and Trigger Logic -For resource threshold alerts (CPU, memory, disk, load, etc.), ServerBee does not trigger on a single spike. Instead: +For resource threshold alerts (CPU, memory, disk, load, etc.), ServerBee does not trigger on a single spike: -1. The evaluator looks at all raw metric records from the **last 10 minutes** -2. It counts how many records exceed the threshold -3. The alert triggers only if **70% or more** of the samples exceed the threshold +1. The evaluator reads all raw metric records from the **last 10 minutes**. +2. It counts how many records exceed the threshold. +3. The alert triggers only if **70% or more** of the samples exceed the threshold. This prevents false positives from brief, transient spikes. @@ -138,22 +139,26 @@ Each alert rule specifies which servers it applies to: ### `once` Mode -- Sends a notification only on the **first trigger** -- No further notifications are sent until the condition recovers and triggers again +- Sends a notification only on the **first trigger**. +- No further notifications are sent until the condition recovers and triggers again. + +Because alert state is persisted, a `once` rule that is already triggered will not re-fire after a server restart. ### Recovery -When a previously triggered alert recovers (the condition is no longer met), the alert state is cleared in both the in-memory cache and the database. If the condition triggers again later, notifications will fire according to the trigger mode. +When a previously triggered alert recovers (the condition is no longer met), the alert state is cleared in both the in-memory cache and the database. If a `recover_trigger_tasks` command is configured, it runs at this point. If the condition triggers again later, notifications fire according to the trigger mode. ## Maintenance Suppression -Alert notifications are suppressed while the affected server is in an active maintenance window. The rule evaluation still runs, but ServerBee skips notification delivery for that server until the maintenance window ends. +Alert notifications are suppressed while the affected server is in an active maintenance window. Rule evaluation still runs, but ServerBee skips notification delivery for that server until the window ends. Event-driven rules follow the same cover-type and maintenance suppression rules as polling-based rules. + +## Blocking Source IPs -`ip_changed` is event-driven rather than polling-based. It is evaluated when an agent reports an IP change event, and it follows the same cover-type and maintenance suppression rules as other alerts. +Security-event rules (`ssh_brute_force_detected`, `port_scan_detected`) can enable **Block source IP**. When the rule triggers, ServerBee tells the affected agent's firewall to block the offending IP, turning detection into automatic mitigation. This requires the `CAP_FIREWALL_BLOCK` capability on the server. See [Security Events](/en/docs/security-events) and [Firewall](/en/docs/firewall). ## Notification Channels -ServerBee supports five notification channel types. Each channel is configured as a separate entity that can be reused across multiple notification groups. +ServerBee supports five notification channel types. Each channel is a separate entity that can be reused across multiple notification groups. ### Webhook @@ -236,9 +241,15 @@ Notification channels are organized into **groups**. An alert rule is linked to This allows you to: -- Send the same alert to multiple channels (e.g., Telegram + Email) -- Reuse channel configurations across different alert rules -- Enable/disable individual channels without modifying alert rules +- Send the same alert to multiple channels (e.g., Telegram + Email). +- Reuse channel configurations across different alert rules. +- Enable or disable individual channels without modifying alert rules. + +After creating a channel, verify its configuration with the test endpoint: + +``` +POST /api/notifications/:id/test +``` ## Template Variables @@ -263,6 +274,14 @@ The default notification template is: Time: {{time}} ``` +## Offline Detection + +Offline status is determined by a dedicated background task and works together with `offline` rules: + +- The task scans agent connection state every 10 seconds. +- A server is marked offline when its last report is more than 30 seconds old. +- The `offline` rule's `duration` sets how many seconds a server must stay offline before the alert fires. For example, `{ "rule_type": "offline", "duration": 60 }` triggers after 60 seconds offline. + ## Example: Complete Alert Setup Here is a typical setup for monitoring CPU usage across all servers with Telegram notifications: @@ -277,3 +296,9 @@ Here is a typical setup for monitoring CPU usage across all servers with Telegra - Notification group: (the group you created) Now, when any server's CPU stays above 90% for at least 70% of a 10-minute sampling window, you will receive a Telegram message. Subsequent notifications are debounced to once every 5 minutes while the condition persists. + + + + + + diff --git a/apps/docs/content/docs/en/architecture.mdx b/apps/docs/content/docs/en/architecture.mdx index 301519df..0fb89f0c 100644 --- a/apps/docs/content/docs/en/architecture.mdx +++ b/apps/docs/content/docs/en/architecture.mdx @@ -90,7 +90,7 @@ The built frontend is embedded into the server binary at compile time, so there ## Communication Protocol -All communication uses WebSocket with JSON-encoded messages. Terminal data uses base64 encoding within the JSON frames. +All communication uses WebSocket. Control and metric messages are JSON text frames; terminal data uses dedicated binary frames to minimize latency. The current protocol version is `4`, sent to the agent in the `Welcome` message on connect. ### Agent to Server (`AgentMessage`) @@ -142,7 +142,7 @@ All communication uses WebSocket with JSON-encoded messages. Terminal data uses ## Database Design -ServerBee uses SQLite with 25 tables across 5 categories: +ServerBee uses SQLite (WAL mode), with tables grouped into the following categories. The representative tables below cover the core schema: ### Authentication (4 tables) @@ -220,7 +220,7 @@ Two roles exist: | Role | Capabilities | |------|-------------| | `admin` | Full access: manage users, servers, alerts, notifications, terminal, settings | -| `user` | Read-only dashboard access (no terminal, no administrative actions) | +| `member` | Read-only dashboard access (no terminal, no administrative actions) | ### OAuth @@ -228,7 +228,7 @@ OAuth login is supported for GitHub, Google, and generic OIDC providers. The `al ### Rate Limiting -Login and agent registration endpoints are rate-limited to prevent brute-force attacks (default: 5 login attempts, 3 registration attempts per window). +Login and agent registration endpoints are rate-limited per IP to prevent brute-force attacks. Defaults are 5 login attempts and 10 agent registrations within a 15-minute window. Admins can clear an active window from Settings → Rate limits. ### TOTP @@ -244,7 +244,7 @@ ServerBee/ agent/ # Agent binary (collector, reporter, terminal) apps/ web/ # React frontend (Vite, TanStack, shadcn/ui) - fumadocs/ # Documentation site + docs/ # Documentation site (TanStack Start + Fumadocs) docs/ # Design documents and plans Cargo.toml # Workspace root ``` diff --git a/apps/docs/content/docs/en/configuration.mdx b/apps/docs/content/docs/en/configuration.mdx index 7a4a7081..cdf2d3c1 100644 --- a/apps/docs/content/docs/en/configuration.mdx +++ b/apps/docs/content/docs/en/configuration.mdx @@ -8,15 +8,19 @@ ServerBee uses [Figment](https://github.com/SergioBenitez/Figment) for configura ## Configuration Loading Priority -Configuration values are merged in the following order. Later sources override earlier ones: +Configuration values are merged from multiple sources. Later sources override earlier ones, so environment variables always win: -1. **TOML file (system):** `/etc/serverbee/server.toml` or `/etc/serverbee/agent.toml` -2. **TOML file (local):** `server.toml` or `agent.toml` in the working directory -3. **Runtime environment variables:** Prefixed with `SERVERBEE_`, using `__` (double underscore) as the nested key separator +1. Built-in defaults +2. `/etc/serverbee/server.toml` or `/etc/serverbee/agent.toml` +3. `/opt/serverbee/etc/server.toml` or `/opt/serverbee/etc/agent.toml` (the path used by the install script) +4. `server.toml` or `agent.toml` in the working directory +5. Environment variables prefixed with `SERVERBEE_` + +This lets you override any single value at runtime without editing the TOML file. ## Environment Variable Mapping -Every TOML runtime configuration key maps directly to an environment variable. Replace dots with `__` and prefix with `SERVERBEE_`: +Every TOML key maps directly to an environment variable: prefix with `SERVERBEE_`, uppercase the key, and replace each level of nesting with `__` (double underscore). For example, `auth.secure_cookie` becomes `SERVERBEE_AUTH__SECURE_COOKIE`. ## Developer Workflow Env Vars diff --git a/apps/docs/content/docs/en/cost-insights.mdx b/apps/docs/content/docs/en/cost-insights.mdx index 4feac86e..17664f83 100644 --- a/apps/docs/content/docs/en/cost-insights.mdx +++ b/apps/docs/content/docs/en/cost-insights.mdx @@ -4,7 +4,9 @@ description: How ServerBee turns billing fields plus agent metrics into burn rat icon: CircleDollarSign --- -ServerBee combines admin-entered billing fields (`price`, `billing_cycle`, `currency`, `expired_at`, ...) with the resource capacity, utilization, and uptime that each agent reports, then derives a set of cost-related signals. This page documents the algorithm itself. To learn how to enter billing data in the UI, see the [Billing Information section in the admin guide](/en/docs/admin#billing-information). +ServerBee combines admin-entered billing fields (`price`, `billing_cycle`, `currency`, `expired_at`, ...) with the resource capacity, utilization, and uptime each agent reports, then derives a set of cost signals: burn rate, per-resource unit cost, and a 0–100 value score. These appear in the server list, the dashboard server card, and the cost panel on the server detail page. + +This page documents the scoring algorithm. To learn how to enter billing data in the UI, see the [Billing Information section in the admin guide](/en/docs/admin#billing-information). ## Derived outputs diff --git a/apps/docs/content/docs/en/index.mdx b/apps/docs/content/docs/en/index.mdx index aff15379..91486541 100644 --- a/apps/docs/content/docs/en/index.mdx +++ b/apps/docs/content/docs/en/index.mdx @@ -4,23 +4,50 @@ description: ServerBee is a lightweight, modern VPS monitoring system built with icon: Rocket --- -ServerBee is a lightweight, self-hosted VPS monitoring probe system built from the ground up with a modern Rust backend and React frontend. It is designed to be fast, resource-efficient, and easy to deploy on small VPS instances. +ServerBee is a lightweight, self-hosted VPS monitoring probe system designed for individuals and small teams. Built with a full Rust stack and a React frontend, it pairs high performance with a very small resource footprint. + +## What Is ServerBee + +ServerBee has two core components: + +- **Server** -- the control center you run on a management host. It serves the web dashboard, stores data, evaluates alerts, runs background jobs, and exposes the REST API. +- **Agent** -- a lightweight collector you install on each monitored host. It gathers system metrics and reports them in real time, and (when authorized) runs probes, terminal, file, and Docker operations. + +Agents keep a persistent WebSocket connection to the server for real-time push. All data is stored in SQLite, so there is no external database to install. ## Key Features -- **Real-time monitoring** -- CPU, memory, disk, network, load average, temperature, GPU, disk I/O, and traffic metrics streamed over WebSocket -- **Custom dashboards** -- Build multiple dashboard layouts with server cards, charts, maps, uptime timelines, Markdown notes, and service status widgets -- **Alert rules** -- Flexible threshold and event-based alerts with debounce logic, maintenance suppression, and multiple notification channels -- **Service monitors** -- SSL certificate, DNS, HTTP keyword, TCP, and WHOIS checks with history and notifications -- **Web terminal and remote tasks** -- Browser-based PTY sessions, one-shot commands, and scheduled cron tasks with retries -- **File manager** -- Controlled remote browse/read/write/upload/download operations with path sandboxing and audit logs -- **Ping and network quality monitoring** -- ICMP, TCP, HTTP probes, preset network targets, packet loss/latency charts, CSV export, and traceroute -- **Docker management** -- Container list, stats, events, logs, networks, volumes, and container actions when enabled -- **Public status pages** -- Share service health with custom slugs, incidents, maintenance windows, uptime timelines, themes, and custom CSS -- **Branding and themes** -- Preset/custom OKLCH themes plus white-label title, logo, favicon, and footer text -- **Lightweight footprint** -- Single binary server and agent, SQLite database, no external database required -- **OAuth and mobile support** -- GitHub, Google, generic OIDC login, mobile sessions, pairing, and push notification support -- **VPS cost insights** -- Score price-to-resource value with normalized cost metrics, surfaced in the servers list, the dashboard server card, and a per-server insights panel +### Real-time monitoring and custom dashboards + +View the live status of every server through a WebSocket-driven dashboard. Metrics include CPU, memory, disk, network, load average, temperature, GPU, disk I/O, and traffic, arranged across multiple dashboards and widget layouts. + +### Alerts and service monitors + +A flexible alert engine supports resource thresholds, traffic cycles, network quality, offline, expiry, and IP-change events, with debounce logic and maintenance suppression. Service monitors cover SSL certificate, DNS, HTTP keyword, TCP, and WHOIS checks, and deliver notifications through Webhook, Telegram, Bark, Email, APNs, and other channels. + +### Web terminal, remote tasks, and file manager + +Open a browser-based PTY shell on any server, run one-shot commands, or schedule cron tasks with retries. The file manager provides controlled remote browse, read, edit, upload, and download operations with path sandboxing and audit logs. + +### Ping, network quality, and traceroute + +Run ICMP, TCP, and HTTP probes from multiple agents to measure target reachability, latency, and packet loss, with charts, CSV export, and preset network targets. The network detail page also offers traceroute for troubleshooting. + +### Docker management + +When the Docker capability is enabled, view the container list, live stats, lifecycle events, log streams, networks, and volumes, and perform container actions. + +### Public status pages, themes, and branding + +Create multiple public status pages with their own slug, server scope, incidents, maintenance windows, uptime timelines, availability thresholds, themes, and custom CSS. Appearance settings support preset and custom OKLCH themes plus white-label title, logo, favicon, and footer text. + +### VPS cost insights + +After you record each server's price and billing cycle, ServerBee computes a value score per VPS (excellent / good / okay / poor / waste) from resources, utilization, and uptime, and surfaces monthly-equivalent cost, burn rate, and remaining days in the servers list, the dashboard server card, and a per-server insights panel. + +### OAuth and mobile support + +Sign in via GitHub, Google, or any generic OIDC provider. Mobile sessions, device pairing, and push notifications are supported for the iOS client. ## Tech Stack @@ -30,22 +57,30 @@ ServerBee is a lightweight, self-hosted VPS monitoring probe system built from t | Agent | Rust (sysinfo, tokio-tungstenite) | | Frontend | React 19, Vite 7, TanStack Router, TanStack Query, Recharts, shadcn/ui, Tailwind CSS v4 | | Protocol | WebSocket (JSON frames, binary for terminal data) | -| Database | SQLite with WAL mode | +| Database | SQLite (WAL mode) | +| Deployment | Single binary / Docker / install script | + +### Why Rust + +- **High performance** -- the agent uses only 5-15 MB of memory; the server uses roughly 50-100 MB when managing 1000 nodes. +- **Reliability** -- Rust's memory safety eliminates a whole class of common runtime errors. +- **Single-binary deployment** -- the server embeds the frontend assets via `rust-embed`, so you ship one executable. +- **Zero dependencies** -- no MySQL or Redis to install; SQLite is embedded in the binary. ## How It Works ServerBee follows a hub-and-spoke architecture: -1. The **Server** is the central dashboard that runs the web UI, REST API, background jobs, and manages all WebSocket connections -2. **Agents** are installed on each VPS you want to monitor -- they collect system metrics and report back to the server every few seconds -3. The **Frontend** is a React SPA embedded into the server binary, so there is nothing extra to deploy +1. The **Server** is the central node: it serves the web UI and REST API, manages all WebSocket connections, and runs background jobs. +2. **Agents** run on each monitored VPS, collecting metrics and reporting back every few seconds. +3. The **Frontend** is a React SPA embedded into the server binary, so there is nothing extra to deploy. -All communication between agents and the server happens over WebSocket with JSON-encoded messages. Terminal sessions and some streaming features use dedicated WebSocket routes with binary or structured frames. +All agent-to-server communication uses WebSocket with JSON-encoded messages. Terminal sessions and some streaming features use dedicated WebSocket routes with binary frames. ## Next Steps - + diff --git a/apps/docs/content/docs/en/server.mdx b/apps/docs/content/docs/en/server.mdx index d207b252..d2d82fe3 100644 --- a/apps/docs/content/docs/en/server.mdx +++ b/apps/docs/content/docs/en/server.mdx @@ -280,7 +280,9 @@ scopes = ["openid", "email", "profile"] If you don't want to configure a reverse proxy by hand, the install script can do it for you: add `--domain monitor.example.com --email admin@example.com` at install time, or run `sudo serverbee domain setup --domain monitor.example.com --email admin@example.com` on an already-installed server. The script verifies DNS, installs Caddy, writes the Caddyfile, issues an HTTPS certificate, and sets `auth.secure_cookie` to `true`. The manual configuration below is only needed if you want to manage the reverse proxy yourself. -When running behind a reverse proxy, you must forward WebSocket connections correctly. Here is an nginx example: +When running behind a reverse proxy, you must forward WebSocket connections correctly. + +### Nginx ```nginx server { @@ -311,6 +313,22 @@ server { The long `proxy_read_timeout` and `proxy_send_timeout` values are important for WebSocket connections. Without them, nginx may close idle connections prematurely, causing agents and terminal sessions to disconnect. +### Caddy + +Caddy handles HTTPS certificates and WebSocket proxying automatically, so the config is minimal: + +```txt title="Caddyfile" +monitor.example.com { + reverse_proxy 127.0.0.1:9527 +} +``` + +Behind a reverse proxy, point agents at the `https://` address: + +```toml title="agent.toml" +server_url = "https://monitor.example.com" +``` + For more reverse proxy configurations including Traefik, see the [Deployment Guide](/en/docs/deployment). ## Background Tasks @@ -327,3 +345,9 @@ The server runs several background tasks automatically: | AlertEvaluator | 60s | Evaluates all enabled alert rules | All tasks start automatically and require no manual configuration. + + + + + + diff --git a/apps/docs/content/docs/en/service-monitors.mdx b/apps/docs/content/docs/en/service-monitors.mdx index da8f7540..c393f8ba 100644 --- a/apps/docs/content/docs/en/service-monitors.mdx +++ b/apps/docs/content/docs/en/service-monitors.mdx @@ -4,9 +4,9 @@ description: Monitor SSL certificates, DNS records, HTTP keywords, TCP ports, an icon: Radar --- -Service Monitors are synthetic checks that run from the ServerBee server. They are useful for monitoring public-facing services even when those services are not running on a ServerBee agent host. +Service Monitors are synthetic checks that run from the central ServerBee server. They let you monitor public-facing services even when those services are not running on a ServerBee agent host. -Unlike Ping Monitoring, which asks agents to probe network targets, Service Monitors are evaluated by the central server process. Results are stored in SQLite, displayed in the dashboard, and can send notifications through notification groups. +Unlike Ping Monitoring, which asks agents to probe network targets, Service Monitors are evaluated by the server process. Results are stored in SQLite, shown in the dashboard, and can trigger notifications through notification groups. ## Supported Monitor Types diff --git a/apps/docs/content/docs/en/status-page.mdx b/apps/docs/content/docs/en/status-page.mdx index 589f811d..68189b9d 100644 --- a/apps/docs/content/docs/en/status-page.mdx +++ b/apps/docs/content/docs/en/status-page.mdx @@ -1,88 +1,102 @@ --- -title: Status Pages -description: Publish public server health pages with incidents, maintenance windows, and uptime history. +title: Status Page +description: Publish a public server health page with live metrics, incidents, maintenance windows, and uptime history. icon: Globe --- -ServerBee provides two public status experiences: +ServerBee serves a single public status page at `https://your-server/status`. It is publicly accessible and requires no authentication, so you can share it with users or stakeholders to communicate service health. -- **Default status page:** `https://your-server/status`, backed by `GET /api/status`, showing all non-hidden servers. -- **Configurable status pages:** `https://your-server/status/{slug}`, backed by `GET /api/status/{slug}`, showing the specific servers and options selected by an administrator. +The page shows the servers and sections an administrator selects, backed by the public `GET /api/status/config` and `GET /api/status` endpoints. -Both are public and do not require authentication. +## What the Page Shows -## Default `/status` Page +- Online/total server count. +- Each selected server's online/offline status and group label. +- Live metrics for online servers: CPU, memory, swap, disk, disk I/O, network speed/transfer, load, connections, and uptime. +- A 90-day uptime timeline per server. +- Public server remarks where configured. +- Optional sections: server detail, network quality, IP quality, incidents, and maintenance windows. -The default status page is useful when you want a quick public overview without creating a custom page. +Sensitive identifiers such as IP addresses, hostnames, and interface details are stripped at the API boundary and never appear on the public page. -It shows: +## Configuring the Page -- Online/total server count -- All servers where `hidden = false` -- Server group labels -- Online/offline status -- Live metrics for online servers: CPU, memory, disk, network speed/transfer, uptime, and load -- Public server remarks where configured - -Public API: - -```http -GET /api/status -``` - -## Configurable Status Pages - -Create and manage pages in **Settings → Status Pages**. Each page has its own slug and can be shared independently. - -### Page Settings +Configure the page in **Settings → Status Page**. The status page is a singleton: there is one page, edited in place rather than created per slug. | Setting | API field | Description | |---------|-----------|-------------| +| Enabled | `enabled` | When disabled, the public page returns no data | | Title | `title` | Public page title | -| Slug | `slug` | URL segment for `/status/{slug}` | | Description | `description` | Optional introductory text | -| Servers | `server_ids_json` | Servers shown on this page | -| Group by server group | `group_by_server_group` | Organize servers by their ServerBee group | -| Show values | `show_values` | Show numeric uptime/status values on the public page | -| Custom CSS | `custom_css` | Extra CSS applied to this page | -| Enabled | `enabled` | Disabled pages return 404 | +| Servers | `server_ids` | Servers shown on the page | +| Default layout | `default_layout` | `list` or `grid` | +| Show server detail | `show_server_detail` | Allow drilling into a server's detail view | +| Show network | `show_network` | Show the network quality section | +| Show IP quality | `show_ip_quality` | Show the IP quality section | +| Show incidents | `show_incidents` | Show the incidents section | +| Show maintenance | `show_maintenance` | Show the maintenance section | | Yellow uptime threshold | `uptime_yellow_threshold` | Days below this percentage show as degraded | -| Red uptime threshold | `uptime_red_threshold` | Days below this percentage show as major outage | +| Red uptime threshold | `uptime_red_threshold` | Days below this percentage show as a major outage | + +### Admin API - -The current API request fields use `server_ids_json` and `status_page_ids_json` for selected IDs. These fields accept JSON arrays in request bodies. - +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/status-page` | Read the status page configuration | +| PUT | `/api/status-page` | Update the status page configuration | -### Public Page Data +Update example: -```http -GET /api/status/{slug} +```json +{ + "title": "Production Status", + "description": "Public health for production services", + "server_ids": ["server-id-1", "server-id-2"], + "show_ip_quality": false, + "default_layout": "grid", + "show_server_detail": true, + "show_network": true, + "show_incidents": true, + "show_maintenance": true, + "enabled": true, + "uptime_yellow_threshold": 99.9, + "uptime_red_threshold": 95 +} ``` -The response includes: +## Public API -- `page` -- page metadata and display options -- `servers` -- selected server statuses, uptime percentages, and 90-day daily uptime data -- `active_incidents` -- unresolved incidents linked to the page -- `planned_maintenances` -- active/upcoming maintenance windows linked to the page -- `recent_incidents` -- resolved incidents from the recent history window +These endpoints are unauthenticated and power the public page: -Server entries include `server_id`, `server_name`, region/country, OS, group, `online`, `uptime_percent`, `uptime_daily`, and `in_maintenance`. +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/status/config` | Page metadata and display options | +| GET | `/api/status` | Selected server statuses with live metrics and uptime | +| GET | `/api/status/servers/{id}` | Detail view for one server | +| GET | `/api/status/servers/{id}/metrics` | Time-series metrics for one server | +| GET | `/api/status/servers/{id}/uptime-daily` | 90-day daily uptime for one server | +| GET | `/api/status/network` | Network quality overview | +| GET | `/api/status/network/{id}` | Network quality detail for one server | +| GET | `/api/status/ip-quality` | IP quality overview | +| GET | `/api/status/incidents` | Active and recent incidents | +| GET | `/api/status/maintenances` | Active and upcoming maintenance windows | + +Each server entry includes `id`, `name`, `group_name`, region/country, `os`, `online`, `in_maintenance`, `uptime_percent`, and `uptime_daily`. ## Uptime Timeline -Each server on a configurable status page can show a 90-day uptime timeline. Each bar represents one day: +Each server shows a 90-day uptime timeline where every bar represents one day: -- **Green** -- healthy uptime -- **Yellow** -- uptime below the page's yellow threshold -- **Red** -- uptime below the page's red threshold -- **Gray** -- no data +- **Green** -- healthy uptime. +- **Yellow** -- uptime below the yellow threshold. +- **Red** -- uptime below the red threshold. +- **Gray** -- no data. -Uptime data comes from the `uptime_daily` table, populated by the server's background aggregation tasks. Missing dates are gap-filled so the timeline remains continuous. +Uptime data comes from the `uptime_daily` table, populated by the server's background aggregation tasks. Missing dates are gap-filled so the timeline stays continuous. ## Incidents -Incidents are public announcements for outages or degraded service. They can be linked to specific servers, status pages, or both. +Incidents are public announcements for outages or degraded service. They can optionally be linked to specific servers. ### Fields @@ -92,9 +106,9 @@ Incidents are public announcements for outages or degraded service. They can be | `status` | `investigating`, `identified`, `monitoring`, or `resolved` | | `severity` | `minor`, `major`, or `critical` | | `server_ids_json` | Optional affected servers | -| `status_page_ids_json` | Optional affected status pages | +| `is_public` | Whether the incident is shown on the public status page | -An incident can have multiple updates. Adding an update records a message and moves the incident to the update's status. Setting status to `resolved` also sets `resolved_at`. +An incident can carry multiple updates. Each update has its own `status` and `message`. Adding an update records the message and moves the incident to the update's status. Setting status to `resolved` also sets `resolved_at`. ### API @@ -108,7 +122,7 @@ An incident can have multiple updates. Adding an update records a message and mo ## Maintenance Windows -Maintenance windows announce planned work and also suppress notifications for associated servers while active. +Maintenance windows announce planned work and also suppress alert notifications for the associated servers while active. ### Fields @@ -119,7 +133,7 @@ Maintenance windows announce planned work and also suppress notifications for as | `start_at` | UTC start time | | `end_at` | UTC end time; must be after `start_at` | | `server_ids_json` | Optional affected servers | -| `status_page_ids_json` | Optional affected status pages | +| `is_public` | Whether the window is shown on the public status page | | `active` | Whether the window is enabled | ### API @@ -131,33 +145,8 @@ Maintenance windows announce planned work and also suppress notifications for as | PUT | `/api/maintenances/{id}` | Update a maintenance window | | DELETE | `/api/maintenances/{id}` | Delete a maintenance window | -## Admin API - -| Method | Path | Description | -|--------|------|-------------| -| GET | `/api/status-pages` | List configured status pages | -| POST | `/api/status-pages` | Create a status page | -| PUT | `/api/status-pages/{id}` | Update a status page | -| DELETE | `/api/status-pages/{id}` | Delete a status page | - -Create example: - -```json -{ - "title": "Production Status", - "slug": "production", - "description": "Public health for production services", - "server_ids_json": ["server-id-1", "server-id-2"], - "group_by_server_group": true, - "show_values": true, - "enabled": true, - "uptime_yellow_threshold": 99.9, - "uptime_red_threshold": 95 -} -``` - - + From 67b43443ab54f15561dd020596ceeab0e36a60f6 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 14:09:17 +0800 Subject: [PATCH 07/21] chore: sync bun.lock to web 1.0.0-alpha.5 --- bun.lock | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bun.lock b/bun.lock index 0216451d..a7825c11 100644 --- a/bun.lock +++ b/bun.lock @@ -50,7 +50,7 @@ }, "apps/web": { "name": "@serverbee/web", - "version": "1.0.0-alpha.4", + "version": "1.0.0-alpha.5", "dependencies": { "@base-ui/react": "^1.2.0", "@fontsource-variable/inter": "^5.2.8", From e021adb4dc3e1b46eb934700734bbd5b02b32534 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 14:10:51 +0800 Subject: [PATCH 08/21] refactor(docs): rename Chinese locale from cn to zh and point README links to docs site Use the ISO 639-1 'zh' code instead of the 'cn' country code for the docs i18n locale, renaming content/docs/cn to content/docs/zh and updating all internal links, route handlers, and landing translations. README doc links now point to https://docs.serverbee.app (/en and /zh). --- README.md | 4 +- README.zh-CN.md | 4 +- apps/docs/content/docs/{cn => zh}/admin.mdx | 12 +++--- apps/docs/content/docs/{cn => zh}/agent.mdx | 8 ++-- apps/docs/content/docs/{cn => zh}/alerts.mdx | 12 +++--- .../content/docs/{cn => zh}/api-reference.mdx | 6 +-- .../content/docs/{cn => zh}/architecture.mdx | 6 +-- .../content/docs/{cn => zh}/capabilities.mdx | 14 +++---- .../content/docs/{cn => zh}/configuration.mdx | 8 ++-- .../content/docs/{cn => zh}/cost-insights.mdx | 8 ++-- .../docs/{cn => zh}/custom-widgets.mdx | 8 ++-- .../content/docs/{cn => zh}/dashboards.mdx | 6 +-- .../content/docs/{cn => zh}/deployment.mdx | 6 +-- .../content/docs/{cn => zh}/file-manager.mdx | 6 +-- .../docs/content/docs/{cn => zh}/firewall.mdx | 10 ++--- apps/docs/content/docs/{cn => zh}/index.mdx | 42 +++++++++---------- .../content/docs/{cn => zh}/ip-quality.mdx | 12 +++--- apps/docs/content/docs/{cn => zh}/meta.json | 0 apps/docs/content/docs/{cn => zh}/mobile.mdx | 0 .../content/docs/{cn => zh}/monitoring.mdx | 8 ++-- apps/docs/content/docs/{cn => zh}/ping.mdx | 6 +-- .../content/docs/{cn => zh}/quick-start.mdx | 8 ++-- .../docs/{cn => zh}/resource-usage.mdx | 10 ++--- .../docs/{cn => zh}/security-events.mdx | 12 +++--- .../docs/content/docs/{cn => zh}/security.mdx | 8 ++-- apps/docs/content/docs/{cn => zh}/server.mdx | 12 +++--- .../docs/{cn => zh}/service-monitors.mdx | 6 +-- .../content/docs/{cn => zh}/status-page.mdx | 6 +-- .../docs/{cn => zh}/storage-sizing.mdx | 6 +-- .../docs/content/docs/{cn => zh}/terminal.mdx | 6 +-- .../src/components/landing/sections/bento.tsx | 2 +- .../src/components/landing/translations.ts | 4 +- apps/docs/src/lib/i18n.ts | 2 +- apps/docs/src/routes/$lang/index.tsx | 2 +- apps/docs/src/routes/__root.tsx | 2 +- apps/docs/src/routes/api/search.ts | 2 +- 36 files changed, 137 insertions(+), 137 deletions(-) rename apps/docs/content/docs/{cn => zh}/admin.mdx (96%) rename apps/docs/content/docs/{cn => zh}/agent.mdx (98%) rename apps/docs/content/docs/{cn => zh}/alerts.mdx (96%) rename apps/docs/content/docs/{cn => zh}/api-reference.mdx (97%) rename apps/docs/content/docs/{cn => zh}/architecture.mdx (98%) rename apps/docs/content/docs/{cn => zh}/capabilities.mdx (94%) rename apps/docs/content/docs/{cn => zh}/configuration.mdx (99%) rename apps/docs/content/docs/{cn => zh}/cost-insights.mdx (97%) rename apps/docs/content/docs/{cn => zh}/custom-widgets.mdx (97%) rename apps/docs/content/docs/{cn => zh}/dashboards.mdx (96%) rename apps/docs/content/docs/{cn => zh}/deployment.mdx (99%) rename apps/docs/content/docs/{cn => zh}/file-manager.mdx (96%) rename apps/docs/content/docs/{cn => zh}/firewall.mdx (92%) rename apps/docs/content/docs/{cn => zh}/index.mdx (77%) rename apps/docs/content/docs/{cn => zh}/ip-quality.mdx (95%) rename apps/docs/content/docs/{cn => zh}/meta.json (100%) rename apps/docs/content/docs/{cn => zh}/mobile.mdx (100%) rename apps/docs/content/docs/{cn => zh}/monitoring.mdx (98%) rename apps/docs/content/docs/{cn => zh}/ping.mdx (96%) rename apps/docs/content/docs/{cn => zh}/quick-start.mdx (96%) rename apps/docs/content/docs/{cn => zh}/resource-usage.mdx (93%) rename apps/docs/content/docs/{cn => zh}/security-events.mdx (95%) rename apps/docs/content/docs/{cn => zh}/security.mdx (95%) rename apps/docs/content/docs/{cn => zh}/server.mdx (97%) rename apps/docs/content/docs/{cn => zh}/service-monitors.mdx (96%) rename apps/docs/content/docs/{cn => zh}/status-page.mdx (96%) rename apps/docs/content/docs/{cn => zh}/storage-sizing.mdx (97%) rename apps/docs/content/docs/{cn => zh}/terminal.mdx (95%) diff --git a/README.md b/README.md index acf41677..17894921 100644 --- a/README.md +++ b/README.md @@ -90,7 +90,7 @@ enrollment_code = "" # one-time code from Settings; only used for first regist interval = 3 # seconds between reports ``` -📖 Full reference: **[ENV.md](ENV.md)** · OAuth, retention, rate limiting, GeoIP, and more in the [documentation](apps/docs). +📖 Full reference: **[ENV.md](ENV.md)** · OAuth, retention, rate limiting, GeoIP, and more in the [documentation](https://docs.serverbee.app/en/docs/configuration). ## Deployment @@ -114,7 +114,7 @@ sudo serverbee uninstall agent -y ### Reverse proxy -Behind Nginx/Caddy, proxy `/` to `127.0.0.1:9527` and make sure the WebSocket routes `/api/ws/` and `/api/agent/ws` forward the `Upgrade`/`Connection` headers with a long read timeout. See the [deployment docs](apps/docs) for a ready-to-use Nginx config. +Behind Nginx/Caddy, proxy `/` to `127.0.0.1:9527` and make sure the WebSocket routes `/api/ws/` and `/api/agent/ws` forward the `Upgrade`/`Connection` headers with a long read timeout. See the [deployment docs](https://docs.serverbee.app/en/docs/deployment) for a ready-to-use Nginx config. ## Development diff --git a/README.zh-CN.md b/README.zh-CN.md index 832518c1..a70c698d 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -90,7 +90,7 @@ enrollment_code = "" # 来自设置页的一次性 code,仅用于首次注册 interval = 3 # 上报间隔(秒) ``` -📖 完整参考:**[ENV.md](ENV.md)** · OAuth、数据保留、速率限制、GeoIP 等详见[文档](apps/docs)。 +📖 完整参考:**[ENV.md](ENV.md)** · OAuth、数据保留、速率限制、GeoIP 等详见[文档](https://docs.serverbee.app/zh/docs/configuration)。 ## 部署 @@ -114,7 +114,7 @@ sudo serverbee uninstall agent -y ### 反向代理 -在 Nginx/Caddy 之后,将 `/` 代理到 `127.0.0.1:9527`,并确保 WebSocket 路由 `/api/ws/` 和 `/api/agent/ws` 透传 `Upgrade`/`Connection` 头且设置较长读超时。完整可用的 Nginx 配置见[部署文档](apps/docs)。 +在 Nginx/Caddy 之后,将 `/` 代理到 `127.0.0.1:9527`,并确保 WebSocket 路由 `/api/ws/` 和 `/api/agent/ws` 透传 `Upgrade`/`Connection` 头且设置较长读超时。完整可用的 Nginx 配置见[部署文档](https://docs.serverbee.app/zh/docs/deployment)。 ## 开发 diff --git a/apps/docs/content/docs/cn/admin.mdx b/apps/docs/content/docs/zh/admin.mdx similarity index 96% rename from apps/docs/content/docs/cn/admin.mdx rename to apps/docs/content/docs/zh/admin.mdx index e69c9b19..2c64fba0 100644 --- a/apps/docs/content/docs/cn/admin.mdx +++ b/apps/docs/content/docs/zh/admin.mdx @@ -213,7 +213,7 @@ GET /api/audit-logs?limit=50&offset=0 ### 流量告警 -使用 `transfer_in_cycle` / `transfer_out_cycle` / `transfer_all_cycle` 告警类型,可以监控当前计费周期内的累计流量是否超过设定阈值。详见 [告警与通知](/cn/docs/alerts)。 +使用 `transfer_in_cycle` / `transfer_out_cycle` / `transfer_all_cycle` 告警类型,可以监控当前计费周期内的累计流量是否超过设定阈值。详见 [告警与通知](/zh/docs/alerts)。 ### 成本洞察 @@ -225,11 +225,11 @@ GET /api/audit-logs?limit=50&offset=0 #### 价值评分 -每台已配置成本的服务器都会拿到 0-100 的 `score` 和对应的 `grade`(`excellent` / `good` / `okay` / `poor` / `waste`),由资源单价分位、利用率、可用性三部分加权而成。完整算法、各子项权重、reasons 含义、confidence 计算与输入校验规则见[成本洞察与价值评分](/cn/docs/cost-insights)。 +每台已配置成本的服务器都会拿到 0-100 的 `score` 和对应的 `grade`(`excellent` / `good` / `okay` / `poor` / `waste`),由资源单价分位、利用率、可用性三部分加权而成。完整算法、各子项权重、reasons 含义、confidence 计算与输入校验规则见[成本洞察与价值评分](/zh/docs/cost-insights)。 #### API -同样的数据通过只读 API 暴露:`GET /api/cost/overview`(按币种汇总的舰队总览 + 每台服务器摘要)和 `GET /api/servers/{id}/cost-insights`(单台服务器的完整明细)。认证细节见 [API 参考](/cn/docs/api-reference#已认证读取端点)。 +同样的数据通过只读 API 暴露:`GET /api/cost/overview`(按币种汇总的舰队总览 + 每台服务器摘要)和 `GET /api/servers/{id}/cost-insights`(单台服务器的完整明细)。认证细节见 [API 参考](/zh/docs/api-reference#已认证读取端点)。 ## Agent 注册管理 @@ -244,7 +244,7 @@ GET /api/audit-logs?limit=50&offset=0 如果失败的接入流程留下了离线的 `New Server` 占位条目,`/servers` 页面会显示 **Clean up unconnected** 操作。它只删除从未完成初始化的离线占位服务器,已经在线但尚未上报 `SystemInfo` 的节点会被保留。 - - - + + + diff --git a/apps/docs/content/docs/cn/agent.mdx b/apps/docs/content/docs/zh/agent.mdx similarity index 98% rename from apps/docs/content/docs/cn/agent.mdx rename to apps/docs/content/docs/zh/agent.mdx index 812a10d6..0df53f86 100644 --- a/apps/docs/content/docs/cn/agent.mdx +++ b/apps/docs/content/docs/zh/agent.mdx @@ -362,10 +362,10 @@ Agent 使用 `sysinfo` 库采集以下系统指标: ## 资源开销 -Agent CPU 开销可忽略(<1%),内存稳态在数十 MB 级别。完整的 Agent 与 Server CPU/内存/磁盘/网络实测数据见[资源开销](/cn/docs/resource-usage)。 +Agent CPU 开销可忽略(<1%),内存稳态在数十 MB 级别。完整的 Agent 与 Server CPU/内存/磁盘/网络实测数据见[资源开销](/zh/docs/resource-usage)。 - - - + + + diff --git a/apps/docs/content/docs/cn/alerts.mdx b/apps/docs/content/docs/zh/alerts.mdx similarity index 96% rename from apps/docs/content/docs/cn/alerts.mdx rename to apps/docs/content/docs/zh/alerts.mdx index dfb3722a..35a5b934 100644 --- a/apps/docs/content/docs/cn/alerts.mdx +++ b/apps/docs/content/docs/zh/alerts.mdx @@ -76,7 +76,7 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 | `offline` | 服务器持续离线超过指定时长后触发 | | `expiration` | 服务器 `expired_at` 距今小于等于指定天数时触发 | | `ip_changed` | Agent 上报 IP 变化事件时触发(事件驱动,不参与每分钟轮询) | -| `ssh_login_detected` | 成功 SSH 登录事件触发;支持 `first_seen_only` 过滤。详见 [安全事件检测](/cn/docs/security-events) | +| `ssh_login_detected` | 成功 SSH 登录事件触发;支持 `first_seen_only` 过滤。详见 [安全事件检测](/zh/docs/security-events) | | `ssh_brute_force_detected` | SSH 爆破事件触发;支持 `severity_min`(medium/high/critical)和 `exclude_cidrs` 过滤 | | `port_scan_detected` | 单一源 IP 的端口扫描事件触发;支持 `severity_min` 和 `exclude_cidrs` 过滤 | @@ -199,7 +199,7 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 ### 阻断源 IP -安全事件类规则(`ssh_brute_force_detected`、`port_scan_detected`)可以开启**阻断源 IP**。触发时,ServerBee 会指示受影响 Agent 的防火墙阻断攻击源 IP,把检测变为自动处置。该能力需要服务器具备 `CAP_FIREWALL_BLOCK` 权限。详见 [安全事件检测](/cn/docs/security-events) 和 [防火墙管理](/cn/docs/firewall)。 +安全事件类规则(`ssh_brute_force_detected`、`port_scan_detected`)可以开启**阻断源 IP**。触发时,ServerBee 会指示受影响 Agent 的防火墙阻断攻击源 IP,把检测变为自动处置。该能力需要服务器具备 `CAP_FIREWALL_BLOCK` 权限。详见 [安全事件检测](/zh/docs/security-events) 和 [防火墙管理](/zh/docs/firewall)。 ## 通知渠道 @@ -238,7 +238,7 @@ ServerBee 支持以下通知渠道: 邮件通知通过 [Resend](https://resend.com/) 发送。使用前两步准备: -1. 在服务器设置 `SERVERBEE_RESEND__API_KEY`(参考[配置](/cn/docs/configuration)页面)。 +1. 在服务器设置 `SERVERBEE_RESEND__API_KEY`(参考[配置](/zh/docs/configuration)页面)。 2. 在 [resend.com/domains](https://resend.com/domains) 添加并验证发件域名。各通道的 `from` 必须属于已验证的域名。 通道配置: @@ -331,7 +331,7 @@ POST /api/notifications/:id/test 此后,任意服务器在 10 分钟采样窗口内有 ≥ 70% 的采样点 CPU 超过 90% 时,你都会收到 Telegram 消息。条件持续期间,后续通知按 5 分钟去抖。 - - - + + + diff --git a/apps/docs/content/docs/cn/api-reference.mdx b/apps/docs/content/docs/zh/api-reference.mdx similarity index 97% rename from apps/docs/content/docs/cn/api-reference.mdx rename to apps/docs/content/docs/zh/api-reference.mdx index ea08af0d..986c8e70 100644 --- a/apps/docs/content/docs/cn/api-reference.mdx +++ b/apps/docs/content/docs/zh/api-reference.mdx @@ -213,7 +213,7 @@ curl https://your-server/api/auth/me \ | 500 | 服务器内部错误 | - - - + + + diff --git a/apps/docs/content/docs/cn/architecture.mdx b/apps/docs/content/docs/zh/architecture.mdx similarity index 98% rename from apps/docs/content/docs/cn/architecture.mdx rename to apps/docs/content/docs/zh/architecture.mdx index af6d7d4d..e1b7b933 100644 --- a/apps/docs/content/docs/cn/architecture.mdx +++ b/apps/docs/content/docs/zh/architecture.mdx @@ -354,7 +354,7 @@ ServerBee/ ``` - - - + + + diff --git a/apps/docs/content/docs/cn/capabilities.mdx b/apps/docs/content/docs/zh/capabilities.mdx similarity index 94% rename from apps/docs/content/docs/cn/capabilities.mdx rename to apps/docs/content/docs/zh/capabilities.mdx index 409539bd..59de7130 100644 --- a/apps/docs/content/docs/cn/capabilities.mdx +++ b/apps/docs/content/docs/zh/capabilities.mdx @@ -24,7 +24,7 @@ ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 -文件管理功能需要在 Agent 端额外配置 `root_paths` 和 `deny_patterns` 以实现路径沙箱安全。详见 [Agent 配置](/cn/docs/agent) 和 [配置参考](/cn/docs/configuration) 页面。 +文件管理功能需要在 Agent 端额外配置 `root_paths` 和 `deny_patterns` 以实现路径沙箱安全。详见 [Agent 配置](/zh/docs/agent) 和 [配置参考](/zh/docs/configuration) 页面。 ### 低风险功能(默认启用) @@ -35,8 +35,8 @@ ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 | **ICMP Ping** | `CAP_PING_ICMP` (8) | 允许执行 ICMP 探测任务 | | **TCP Probe** | `CAP_PING_TCP` (16) | 允许执行 TCP 端口探测任务 | | **HTTP Probe** | `CAP_PING_HTTP` (32) | 允许执行 HTTP 探测任务 | -| **Security Events** | `CAP_SECURITY_EVENTS` (256) | 允许 Agent 上报 SSH 登录 / 爆破 / 端口扫描事件(详见 [安全事件检测](/cn/docs/security-events))| -| **Firewall Blocklist** | `CAP_FIREWALL_BLOCK` (512) | 允许 Agent 应用 Server 下发的 nftables 黑名单。需要 root 或 `CAP_NET_ADMIN`,并在主机安装 `nft` CLI。详见 [防火墙黑名单](/cn/docs/firewall) | +| **Security Events** | `CAP_SECURITY_EVENTS` (256) | 允许 Agent 上报 SSH 登录 / 爆破 / 端口扫描事件(详见 [安全事件检测](/zh/docs/security-events))| +| **Firewall Blocklist** | `CAP_FIREWALL_BLOCK` (512) | 允许 Agent 应用 Server 下发的 nftables 黑名单。需要 root 或 `CAP_NET_ADMIN`,并在主机安装 `nft` CLI。详见 [防火墙黑名单](/zh/docs/firewall) | | **IP Quality** | `CAP_IP_QUALITY` (1024) | 允许 Agent 调用第三方 IP 质量 API 给出口 IP 评分 | 新注册的 Agent 默认 capabilities 值为 `1852`(自动升级 + 三个 Ping 功能 + 安全事件检测 + 防火墙黑名单 + IP 质量)。 @@ -136,8 +136,8 @@ ServerBee 采用纵深防御(defense in depth)策略,在 Server 端和 Age 当 Agent 本地策略关闭某个能力时,UI 会把对应开关直接禁用,并显示 tooltip `客户端关闭`。这表示当前运行中的 Agent 已经在本地把它锁死,Server 端不能强行重新打开,除非 Agent 以新的本地策略重新启动。 - - - - + + + + diff --git a/apps/docs/content/docs/cn/configuration.mdx b/apps/docs/content/docs/zh/configuration.mdx similarity index 99% rename from apps/docs/content/docs/cn/configuration.mdx rename to apps/docs/content/docs/zh/configuration.mdx index 95a6ea17..2145a781 100644 --- a/apps/docs/content/docs/cn/configuration.mdx +++ b/apps/docs/content/docs/zh/configuration.mdx @@ -120,7 +120,7 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 #### 防火墙(Firewall,可选) -[防火墙黑名单](/cn/docs/firewall) 的第二道护栏。即使管理员主动尝试,`POST /api/firewall/blocks` 也会拒绝插入此列表中的 CIDR/IP。第一道护栏(硬编码的保留段:回环、RFC 1918、链路本地、组播、未指定地址)始终生效。 +[防火墙黑名单](/zh/docs/firewall) 的第二道护栏。即使管理员主动尝试,`POST /api/firewall/blocks` 也会拒绝插入此列表中的 CIDR/IP。第一道护栏(硬编码的保留段:回环、RFC 1918、链路本地、组播、未指定地址)始终生效。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| @@ -128,7 +128,7 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 #### IP 质量检测(IP Quality) -默认开箱即用,通过 [ipapi.is](https://ipapi.is)(无需 API Key,按源 IP 限 1000 次/天)获取风险评分。主 Provider 失败时自动回退到 [ip-api.com](https://ip-api.com)(提供地理 + 代理/托管标志,无风险评分)。详见 [IP 质量检测](/cn/docs/ip-quality)。 +默认开箱即用,通过 [ipapi.is](https://ipapi.is)(无需 API Key,按源 IP 限 1000 次/天)获取风险评分。主 Provider 失败时自动回退到 [ip-api.com](https://ip-api.com)(提供地理 + 代理/托管标志,无风险评分)。详见 [IP 质量检测](/zh/docs/ip-quality)。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| @@ -655,6 +655,6 @@ release_cert_spki_sha256 = "" | 重连抖动 | +/-20% | 避免雷群效应 | - - + + diff --git a/apps/docs/content/docs/cn/cost-insights.mdx b/apps/docs/content/docs/zh/cost-insights.mdx similarity index 97% rename from apps/docs/content/docs/cn/cost-insights.mdx rename to apps/docs/content/docs/zh/cost-insights.mdx index 9ae5a456..0f4bee1e 100644 --- a/apps/docs/content/docs/cn/cost-insights.mdx +++ b/apps/docs/content/docs/zh/cost-insights.mdx @@ -6,7 +6,7 @@ icon: CircleDollarSign ServerBee 会把管理员录入的 `price` / `billing_cycle` / `currency` / `expired_at` 等账单信息,结合 Agent 上报的资源容量、利用率和在线时长,自动算出一组成本信号:burn rate、各资源单价以及 0–100 的价值评分。这些信号会显示在服务器列表、仪表盘服务器卡片,以及服务器详情页的成本面板中。 -本页描述评分算法本身;如何在 UI 中录入账单字段见[管理员手册的"计费信息"小节](/cn/docs/admin#计费信息)。 +本页描述评分算法本身;如何在 UI 中录入账单字段见[管理员手册的"计费信息"小节](/zh/docs/admin#计费信息)。 ## 衍生输出 @@ -146,7 +146,7 @@ sleeping_money → idle_burn → expensive_cpu → low_uptime → insufficient_d - 月费换算固定按 30 天,季付/年付会折算成等价月费再对比 - - - + + + diff --git a/apps/docs/content/docs/cn/custom-widgets.mdx b/apps/docs/content/docs/zh/custom-widgets.mdx similarity index 97% rename from apps/docs/content/docs/cn/custom-widgets.mdx rename to apps/docs/content/docs/zh/custom-widgets.mdx index d8929392..9ce80113 100644 --- a/apps/docs/content/docs/cn/custom-widgets.mdx +++ b/apps/docs/content/docs/zh/custom-widgets.mdx @@ -247,7 +247,7 @@ curl -X POST "https://your-host/api/widget-modules" \ - `aspect-square` — 始终保持 1:1,会就近吸附到最接近的层级; - `content-height` — 宽度可调,高度由内容决定。 -更多布局与编辑细节见[仪表盘与组件](/cn/docs/dashboards)。 +更多布局与编辑细节见[仪表盘与组件](/zh/docs/dashboards)。 ## SDK 速览 @@ -288,7 +288,7 @@ curl -X DELETE "https://your-host/api/widget-modules/com.example.hello" \ - **Action 按钮**:通过 `defineWidget({ actions })` 声明的按钮自带确认对话框(`confirm` 配置生效时)、加载状态和成功/失败 toast 提示。 - - - + + + diff --git a/apps/docs/content/docs/cn/dashboards.mdx b/apps/docs/content/docs/zh/dashboards.mdx similarity index 96% rename from apps/docs/content/docs/cn/dashboards.mdx rename to apps/docs/content/docs/zh/dashboards.mdx index 454911f3..dd4844b8 100644 --- a/apps/docs/content/docs/cn/dashboards.mdx +++ b/apps/docs/content/docs/zh/dashboards.mdx @@ -128,7 +128,7 @@ Server Map 组件需要 GeoIP 数据。你可以: ``` - - - + + + diff --git a/apps/docs/content/docs/cn/deployment.mdx b/apps/docs/content/docs/zh/deployment.mdx similarity index 99% rename from apps/docs/content/docs/cn/deployment.mdx rename to apps/docs/content/docs/zh/deployment.mdx index 1205f047..89bba2a6 100644 --- a/apps/docs/content/docs/cn/deployment.mdx +++ b/apps/docs/content/docs/zh/deployment.mdx @@ -611,7 +611,7 @@ ServerBee 在启动时会自动运行数据库迁移,升级后无需手动执 - - - + + + diff --git a/apps/docs/content/docs/cn/file-manager.mdx b/apps/docs/content/docs/zh/file-manager.mdx similarity index 96% rename from apps/docs/content/docs/cn/file-manager.mdx rename to apps/docs/content/docs/zh/file-manager.mdx index f7ab2585..7029e26e 100644 --- a/apps/docs/content/docs/cn/file-manager.mdx +++ b/apps/docs/content/docs/zh/file-manager.mdx @@ -126,7 +126,7 @@ curl -X POST https://your-server/api/files/server-id/upload \ ``` - - - + + + diff --git a/apps/docs/content/docs/cn/firewall.mdx b/apps/docs/content/docs/zh/firewall.mdx similarity index 92% rename from apps/docs/content/docs/cn/firewall.mdx rename to apps/docs/content/docs/zh/firewall.mdx index edbec6ab..7f5ef32a 100644 --- a/apps/docs/content/docs/cn/firewall.mdx +++ b/apps/docs/content/docs/zh/firewall.mdx @@ -45,7 +45,7 @@ allow_list = ["198.51.100.42", "203.0.113.0/29"] ## 告警触发自动封禁 -爆破和端口扫描类告警规则可以追加 `block_source_ip` 动作,把事件来源 IP 自动写入黑名单。详见 [安全事件检测 → 自动封禁来源 IP](/cn/docs/security-events)。 +爆破和端口扫描类告警规则可以追加 `block_source_ip` 动作,把事件来源 IP 自动写入黑名单。详见 [安全事件检测 → 自动封禁来源 IP](/zh/docs/security-events)。 自动封禁按规范化目标做去重。如果已有记录覆盖了触发该事件的服务器,则静默跳过;如果存在但 **未** 覆盖该服务器,冲突会写入审计日志(`firewall_auto_block_skipped_conflict`),不会新建记录 —— 操作员可手工扩大已有记录的覆盖范围。 @@ -92,8 +92,8 @@ nft delete table inet serverbee - 仅 `input` 链,不过滤 `forward` / `output` - - - - + + + + diff --git a/apps/docs/content/docs/cn/index.mdx b/apps/docs/content/docs/zh/index.mdx similarity index 77% rename from apps/docs/content/docs/cn/index.mdx rename to apps/docs/content/docs/zh/index.mdx index 95d79759..3f900a65 100644 --- a/apps/docs/content/docs/cn/index.mdx +++ b/apps/docs/content/docs/zh/index.mdx @@ -80,25 +80,25 @@ Agent 与 Server 之间的通信全部走 WebSocket(JSON 消息)。终端会 ## 快速链接 - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/apps/docs/content/docs/cn/ip-quality.mdx b/apps/docs/content/docs/zh/ip-quality.mdx similarity index 95% rename from apps/docs/content/docs/cn/ip-quality.mdx rename to apps/docs/content/docs/zh/ip-quality.mdx index 6a1260ce..272c3269 100644 --- a/apps/docs/content/docs/cn/ip-quality.mdx +++ b/apps/docs/content/docs/zh/ip-quality.mdx @@ -15,7 +15,7 @@ IP 质量检测功能让每台 Agent 评估其 VPS 出口 IP 的质量,并将 - 每台服务器必须启用 **`CAP_IP_QUALITY`**(位值 `1024`)。**默认关闭**,不在新 Agent 的默认能力集中。 - Agent 启动时还需加上 `--allow-cap ip_quality`。双方都需要主动开启,详见下方[双重 opt-in](#双重-opt-in)。 -- 建议配置 GeoIP MMDB 数据库以获得完整的 IP 元数据。未配置时只有基本地理位置信息。详见[配置](/cn/docs/configuration)页中的 `geoip.mmdb_path`。 +- 建议配置 GeoIP MMDB 数据库以获得完整的 IP 元数据。未配置时只有基本地理位置信息。详见[配置](/zh/docs/configuration)页中的 `geoip.mmdb_path`。 ## 双重 opt-in @@ -126,7 +126,7 @@ risk_provider = "none" **从旧版配置迁移?** 旧有提供商名称(`scamalytics`、`ipqs`、`proxycheck`、`abuseipdb`)已不再支持。如果您的 `server.toml` 或环境变量中仍引用这些名称,Server 启动时会输出警告日志并静默跳过风险评分。请将 `risk_provider` 更新为 `ipapi_is`(或设为 `none` 关闭评分)。 -完整配置参考:[配置 → IP 质量检测](/cn/docs/configuration)。 +完整配置参考:[配置 → IP 质量检测](/zh/docs/configuration)。 ## 查看结果 @@ -183,8 +183,8 @@ Server ``` - - - - + + + + diff --git a/apps/docs/content/docs/cn/meta.json b/apps/docs/content/docs/zh/meta.json similarity index 100% rename from apps/docs/content/docs/cn/meta.json rename to apps/docs/content/docs/zh/meta.json diff --git a/apps/docs/content/docs/cn/mobile.mdx b/apps/docs/content/docs/zh/mobile.mdx similarity index 100% rename from apps/docs/content/docs/cn/mobile.mdx rename to apps/docs/content/docs/zh/mobile.mdx diff --git a/apps/docs/content/docs/cn/monitoring.mdx b/apps/docs/content/docs/zh/monitoring.mdx similarity index 98% rename from apps/docs/content/docs/cn/monitoring.mdx rename to apps/docs/content/docs/zh/monitoring.mdx index 1f7a5f29..8d4471db 100644 --- a/apps/docs/content/docs/cn/monitoring.mdx +++ b/apps/docs/content/docs/zh/monitoring.mdx @@ -18,7 +18,7 @@ ServerBee 提供全面的服务器监控能力,通过 WebSocket 实时推送 - 服务器名称、地区、国旗、操作系统及网络质量迷你图 - **实时刷新**:所有数据通过 WebSocket 驱动,无需手动刷新 -如需创建面向不同场景的运维视图,可以使用 [仪表盘与组件](/cn/docs/dashboards) 创建额外仪表盘布局,包含图表、地图、服务状态、Markdown 说明和可用性时间线等组件。 +如需创建面向不同场景的运维视图,可以使用 [仪表盘与组件](/zh/docs/dashboards) 创建额外仪表盘布局,包含图表、地图、服务状态、Markdown 说明和可用性时间线等组件。 ### GeoIP 显示 @@ -459,7 +459,7 @@ WebSocket 推送 `BrowserMessage::TracerouteUpdate` 携带每轮增量结果。 - `network_packet_loss` -- 当丢包率超过阈值时触发 - - - + + + diff --git a/apps/docs/content/docs/cn/ping.mdx b/apps/docs/content/docs/zh/ping.mdx similarity index 96% rename from apps/docs/content/docs/cn/ping.mdx rename to apps/docs/content/docs/zh/ping.mdx index 00521569..89fef8b6 100644 --- a/apps/docs/content/docs/cn/ping.mdx +++ b/apps/docs/content/docs/zh/ping.mdx @@ -174,7 +174,7 @@ Server 在以下时机会向 Agent 同步探测任务: 在 Web 控制台中,创建、删除、启用、禁用 Ping 任务都会根据当前语言显示对应的成功或失败提示。启用/禁用请求发送期间,切换按钮会暂时禁用,以减少误触发的重复提交。 - - - + + + diff --git a/apps/docs/content/docs/cn/quick-start.mdx b/apps/docs/content/docs/zh/quick-start.mdx similarity index 96% rename from apps/docs/content/docs/cn/quick-start.mdx rename to apps/docs/content/docs/zh/quick-start.mdx index 3188e46d..8a8ae900 100644 --- a/apps/docs/content/docs/cn/quick-start.mdx +++ b/apps/docs/content/docs/zh/quick-start.mdx @@ -160,7 +160,7 @@ sudo serverbee restart # 重启 Agent sudo serverbee uninstall agent # 卸载 Agent ``` -更多采集、日志等可调项见 [Agent 配置](/cn/docs/agent)和[完整配置参考](/cn/docs/configuration)。 +更多采集、日志等可调项见 [Agent 配置](/zh/docs/agent)和[完整配置参考](/zh/docs/configuration)。 --- @@ -208,7 +208,7 @@ docker compose logs serverbee-server - - - + + + diff --git a/apps/docs/content/docs/cn/resource-usage.mdx b/apps/docs/content/docs/zh/resource-usage.mdx similarity index 93% rename from apps/docs/content/docs/cn/resource-usage.mdx rename to apps/docs/content/docs/zh/resource-usage.mdx index 5a1e64e6..d4eb48da 100644 --- a/apps/docs/content/docs/cn/resource-usage.mdx +++ b/apps/docs/content/docs/zh/resource-usage.mdx @@ -4,7 +4,7 @@ description: ServerBee Agent 与 Server 的 CPU、内存、磁盘、网络运行 icon: Gauge --- -本页记录 ServerBee **Agent** 与 **Server** 在运行时的资源开销实测数据,涵盖 CPU、内存、磁盘和网络。磁盘(数据库)增长的详细容量规划见 [存储与容量规划](/cn/docs/storage-sizing)。 +本页记录 ServerBee **Agent** 与 **Server** 在运行时的资源开销实测数据,涵盖 CPU、内存、磁盘和网络。磁盘(数据库)增长的详细容量规划见 [存储与容量规划](/zh/docs/storage-sizing)。 以下数值为实测,非估算。除非另有说明,均测于 2026-05-19,使用 v0.9.3。资源占用会随采集间隔、连接的 Agent 数量、是否启用 GPU/温度采集以及面板查询负载浮动,这里给出的是稳定的数量级参考。 @@ -60,7 +60,7 @@ Server 内存开销主要受连接的 Agent 数量、面板/API 查询负载和 ### 磁盘 -Server 磁盘占用来自 SQLite 数据库,随服务器数量、启用功能和保留策略增长。完整的 30 天容量公式与场景目录见 [存储与容量规划](/cn/docs/storage-sizing)。生产环境请在数据库基础大小上额外预留 10%–20% 作为 WAL 与突发写入缓冲。 +Server 磁盘占用来自 SQLite 数据库,随服务器数量、启用功能和保留策略增长。完整的 30 天容量公式与场景目录见 [存储与容量规划](/zh/docs/storage-sizing)。生产环境请在数据库基础大小上额外预留 10%–20% 作为 WAL 与突发写入缓冲。 ## 注意事项 @@ -72,7 +72,7 @@ Server 磁盘占用来自 SQLite 数据库,随服务器数量、启用功能和 ## 相关文档 - - - + + + diff --git a/apps/docs/content/docs/cn/security-events.mdx b/apps/docs/content/docs/zh/security-events.mdx similarity index 95% rename from apps/docs/content/docs/cn/security-events.mdx rename to apps/docs/content/docs/zh/security-events.mdx index 8294f315..1dcde8a7 100644 --- a/apps/docs/content/docs/cn/security-events.mdx +++ b/apps/docs/content/docs/zh/security-events.mdx @@ -19,7 +19,7 @@ Agent 在每台主机上检测三类主机级入侵信号,并将结构化事 ## 前置条件 - **仅支持 Linux。** SSH 检测读取 systemd journal 或 `/var/log/auth.log`;端口扫描检测依赖 `conntrack`。 -- 每台服务器需启用 **`CAP_SECURITY_EVENTS`**(位值 `256`,新 Agent 默认启用)。详见 [Capabilities](/cn/docs/capabilities)。 +- 每台服务器需启用 **`CAP_SECURITY_EVENTS`**(位值 `256`,新 Agent 默认启用)。详见 [Capabilities](/zh/docs/capabilities)。 - 端口扫描检测需安装 `conntrack` CLI,并将 `security.port_scan.enabled` 设为 `true`: ```bash # Debian / Ubuntu @@ -52,7 +52,7 @@ Agent 在每台主机上检测三类主机级入侵信号,并将结构化事 | `security.port_scan.distinct_port_threshold` | `SERVERBEE_SECURITY__PORT_SCAN__DISTINCT_PORT_THRESHOLD` | `20` | 同一源 IP 在窗口内命中的不同端口数达到该值即触发一个 `port_scan` 事件 | | `security.data_dir` | `SERVERBEE_SECURITY__DATA_DIR` | `/var/lib/serverbee/security` | 持久化 `first_seen` 存储目录,记录已知 `(user, IP)` 组合 | -完整环境变量参考:[配置 → 安全事件检测(Security,Agent)](/cn/docs/configuration)。 +完整环境变量参考:[配置 → 安全事件检测(Security,Agent)](/zh/docs/configuration)。 sshd 对一次失败尝试通常会写两行日志(`Invalid user …` 后跟 `Failed password …`)。在默认 `failed_threshold=10` 下,同一 IP 每发起约 **5** 次真实失败尝试就会产生一个 `ssh_brute_force` 事件。 @@ -109,7 +109,7 @@ Alerts 页面提供三张 **预设卡片**,一键创建规则。预设已填 自动写入的记录使用 `origin = "auto"` 并保留触发的 `origin_event_id`。系统按规范化目标做去重:若已有手工或更早的自动封禁覆盖了触发事件所在的服务器,则静默跳过;若已存在但 **未** 覆盖该服务器,则把冲突记入审计日志 `firewall_auto_block_skipped_conflict`,并且不创建新的记录。 -完整功能、护栏与审计日志,详见 [防火墙黑名单](/cn/docs/firewall)。 +完整功能、护栏与审计日志,详见 [防火墙黑名单](/zh/docs/firewall)。 ## 数据保留 @@ -133,7 +133,7 @@ Server ─► security_event 表 ``` - - - + + + diff --git a/apps/docs/content/docs/cn/security.mdx b/apps/docs/content/docs/zh/security.mdx similarity index 95% rename from apps/docs/content/docs/cn/security.mdx rename to apps/docs/content/docs/zh/security.mdx index 5e4cd897..8566d88a 100644 --- a/apps/docs/content/docs/cn/security.mdx +++ b/apps/docs/content/docs/zh/security.mdx @@ -50,7 +50,7 @@ ServerBee 支持三种 OAuth 提供商: ### 配置 OAuth -在 `server.toml` 中添加 OAuth 配置(详见 [Server 配置](/cn/docs/server)): +在 `server.toml` 中添加 OAuth 配置(详见 [Server 配置](/zh/docs/server)): ```toml [oauth] @@ -114,7 +114,7 @@ Agent 注册端点同样受限: - 开发环境可通过 `auth.secure_cookie = false` 关闭 Secure 标志 - - - + + + diff --git a/apps/docs/content/docs/cn/server.mdx b/apps/docs/content/docs/zh/server.mdx similarity index 97% rename from apps/docs/content/docs/cn/server.mdx rename to apps/docs/content/docs/zh/server.mdx index 874f2dda..d981a62a 100644 --- a/apps/docs/content/docs/cn/server.mdx +++ b/apps/docs/content/docs/zh/server.mdx @@ -21,7 +21,7 @@ curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/depl --domain monitor.example.com --email admin@example.com -y ``` -脚本安装后的目录布局:二进制在 `/opt/serverbee/bin/`,配置在 `/opt/serverbee/etc/server.toml`,数据在 `/opt/serverbee/data/`,管理 CLI 软链为 `/usr/local/bin/serverbee`。完整步骤见[快速安装](/cn/docs/quick-start)。 +脚本安装后的目录布局:二进制在 `/opt/serverbee/bin/`,配置在 `/opt/serverbee/etc/server.toml`,数据在 `/opt/serverbee/data/`,管理 CLI 软链为 `/usr/local/bin/serverbee`。完整步骤见[快速安装](/zh/docs/quick-start)。 ### 二进制安装(手动) @@ -70,7 +70,7 @@ Server 按以下顺序读取 TOML 配置,靠后的来源会覆盖靠前的: 通过安装脚本部署时,配置文件位于 `/opt/serverbee/etc/server.toml`,数据目录为 `/opt/serverbee/data/`(脚本会显式写入这些路径,覆盖下方的内置默认值)。`/etc/serverbee`、`/var/lib/serverbee` 是旧版布局,仅在历史安装中出现,脚本会自动迁移。 -下面的 `server.toml` 列出最常用的配置项及其默认值。完整选项见[配置参考](/cn/docs/configuration)。 +下面的 `server.toml` 列出最常用的配置项及其默认值。完整选项见[配置参考](/zh/docs/configuration)。 ```toml [server] @@ -354,7 +354,7 @@ monitor.example.com { server_url = "https://monitor.example.com" ``` -更多反向代理配置(含 Traefik)请参阅[部署指南](/cn/docs/deployment)。 +更多反向代理配置(含 Traefik)请参阅[部署指南](/zh/docs/deployment)。 ## 后台任务 @@ -372,7 +372,7 @@ Server 会自动运行多个后台任务: 所有任务自动启动,无需手动配置。 - - - + + + diff --git a/apps/docs/content/docs/cn/service-monitors.mdx b/apps/docs/content/docs/zh/service-monitors.mdx similarity index 96% rename from apps/docs/content/docs/cn/service-monitors.mdx rename to apps/docs/content/docs/zh/service-monitors.mdx index 97bb7ac3..81378c3f 100644 --- a/apps/docs/content/docs/cn/service-monitors.mdx +++ b/apps/docs/content/docs/zh/service-monitors.mdx @@ -165,7 +165,7 @@ icon: Radar - - - + + + diff --git a/apps/docs/content/docs/cn/status-page.mdx b/apps/docs/content/docs/zh/status-page.mdx similarity index 96% rename from apps/docs/content/docs/cn/status-page.mdx rename to apps/docs/content/docs/zh/status-page.mdx index 92729812..d066ca21 100644 --- a/apps/docs/content/docs/cn/status-page.mdx +++ b/apps/docs/content/docs/zh/status-page.mdx @@ -145,7 +145,7 @@ IP 地址、主机名、网卡等敏感标识会在 API 层脱敏,不会出现 | DELETE | `/api/maintenances/{id}` | 删除维护窗口 | - - - + + + diff --git a/apps/docs/content/docs/cn/storage-sizing.mdx b/apps/docs/content/docs/zh/storage-sizing.mdx similarity index 97% rename from apps/docs/content/docs/cn/storage-sizing.mdx rename to apps/docs/content/docs/zh/storage-sizing.mdx index 640c2076..f36de8ab 100644 --- a/apps/docs/content/docs/cn/storage-sizing.mdx +++ b/apps/docs/content/docs/zh/storage-sizing.mdx @@ -147,7 +147,7 @@ S_30d = ## 相关文档 - - - + + + diff --git a/apps/docs/content/docs/cn/terminal.mdx b/apps/docs/content/docs/zh/terminal.mdx similarity index 95% rename from apps/docs/content/docs/cn/terminal.mdx rename to apps/docs/content/docs/zh/terminal.mdx index 26e7a5d2..caef6a1d 100644 --- a/apps/docs/content/docs/cn/terminal.mdx +++ b/apps/docs/content/docs/zh/terminal.mdx @@ -110,7 +110,7 @@ ExecStart=/usr/local/bin/serverbee-agent 在生产环境中,务必通过 HTTPS 反向代理来保护终端数据的传输安全。未加密的 WebSocket 连接可能导致终端输入输出被窃听。 - - - + + + diff --git a/apps/docs/src/components/landing/sections/bento.tsx b/apps/docs/src/components/landing/sections/bento.tsx index 8615dfe4..498adb73 100644 --- a/apps/docs/src/components/landing/sections/bento.tsx +++ b/apps/docs/src/components/landing/sections/bento.tsx @@ -61,5 +61,5 @@ function Card({ title, body, span, children }: { title: string; body: string; sp } function bentoTitle(lang: LandingLang): string { - return lang === 'cn' ? '一个探针,覆盖运维的方方面面。' : 'One probe. Every job your VPS needs.' + return lang === 'zh' ? '一个探针,覆盖运维的方方面面。' : 'One probe. Every job your VPS needs.' } diff --git a/apps/docs/src/components/landing/translations.ts b/apps/docs/src/components/landing/translations.ts index 716618dc..aff851fa 100644 --- a/apps/docs/src/components/landing/translations.ts +++ b/apps/docs/src/components/landing/translations.ts @@ -1,7 +1,7 @@ export const INSTALL_COMMAND = 'curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- server' -export type LandingLang = 'en' | 'cn' +export type LandingLang = 'en' | 'zh' export const translations = { en: { @@ -86,7 +86,7 @@ export const translations = { star: 'Star on GitHub' } }, - cn: { + zh: { hero: { eyebrow: '开源 · MIT · Rust 构建', headline1: '自托管的 VPS 监控,', diff --git a/apps/docs/src/lib/i18n.ts b/apps/docs/src/lib/i18n.ts index 2b07336e..f5b23c1e 100644 --- a/apps/docs/src/lib/i18n.ts +++ b/apps/docs/src/lib/i18n.ts @@ -1,7 +1,7 @@ import { defineI18n } from 'fumadocs-core/i18n' export const i18n = defineI18n({ - languages: ['en', 'cn'], + languages: ['en', 'zh'], defaultLanguage: 'en', parser: 'dir' }) diff --git a/apps/docs/src/routes/$lang/index.tsx b/apps/docs/src/routes/$lang/index.tsx index 9571c382..5924ce69 100644 --- a/apps/docs/src/routes/$lang/index.tsx +++ b/apps/docs/src/routes/$lang/index.tsx @@ -11,7 +11,7 @@ export const Route = createFileRoute('/$lang/')({ function Home() { const { lang } = useParams({ from: '/$lang/' }) - const landingLang: LandingLang = lang === 'cn' ? 'cn' : 'en' + const landingLang: LandingLang = lang === 'zh' ? 'zh' : 'en' return ( diff --git a/apps/docs/src/routes/__root.tsx b/apps/docs/src/routes/__root.tsx index 6c066aaf..70b6c56e 100644 --- a/apps/docs/src/routes/__root.tsx +++ b/apps/docs/src/routes/__root.tsx @@ -10,7 +10,7 @@ const { provider } = defineI18nUI(i18n, { en: { displayName: 'English' }, - cn: { + zh: { displayName: '中文', search: '搜索文档' } diff --git a/apps/docs/src/routes/api/search.ts b/apps/docs/src/routes/api/search.ts index 681ec340..7fd08ad5 100644 --- a/apps/docs/src/routes/api/search.ts +++ b/apps/docs/src/routes/api/search.ts @@ -6,7 +6,7 @@ import { source } from '@/lib/source' const server = createFromSource(source, { localeMap: { en: { language: 'english' }, - cn: { language: 'english' } + zh: { language: 'english' } } }) From 4674483b37b36ebb1254fb95ca63db9ffb8052c6 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:18 +0800 Subject: [PATCH 09/21] docs: correct terminal transport to JSON text with base64 data field --- AGENTS.md | 2 +- apps/docs/content/docs/en/api-reference.mdx | 6 +++--- apps/docs/content/docs/zh/api-reference.mdx | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index f41dee6c..f9653ecb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -63,7 +63,7 @@ Agent → WebSocket (JSON) → Server → SQLite (sea-orm) - **Agent→Server**: `AgentMessage` variants (SystemInfo, Report, PingResult, TaskResult, SecurityEvent, CapabilityDenied, file/terminal/network results) - **Server→Agent**: `ServerMessage` variants (Welcome, Ack, Exec, TerminalOpen, PingTasksSync, NetworkProbeSync, file ops) - **Server→Browser**: `BrowserMessage` variants (FullSync, Update, ServerOnline/Offline, CapabilitiesChanged, SecurityEvent) -- Terminal data uses Binary WebSocket frames (session_id prefix + payload) +- Terminal data is carried in JSON text messages (`Message::Text`); the raw PTY byte stream rides in a base64-encoded `data` field. The protocol uses no binary WebSocket frames. ### AppState diff --git a/apps/docs/content/docs/en/api-reference.mdx b/apps/docs/content/docs/en/api-reference.mdx index e97d04d5..d7d56646 100644 --- a/apps/docs/content/docs/en/api-reference.mdx +++ b/apps/docs/content/docs/en/api-reference.mdx @@ -4,7 +4,7 @@ description: ServerBee REST API overview, authentication, WebSocket endpoints, a icon: FileCode --- -ServerBee exposes the same capabilities used by the web dashboard through REST and WebSocket APIs. The authoritative, schema-level reference is generated from OpenAPI annotations in the server binary. +ServerBee exposes the same capabilities the web dashboard uses through REST and WebSocket APIs. The authoritative, schema-level reference is generated from OpenAPI annotations in the server binary. ## Swagger UI @@ -14,7 +14,7 @@ Open the built-in interactive documentation at: https://your-server/swagger-ui/ ``` -Swagger UI lets you inspect request/response schemas, authentication requirements, and test requests against your own deployment. The raw OpenAPI document is available at: +Swagger UI lets you inspect request/response schemas and authentication requirements, and send test requests against your own deployment. The raw OpenAPI document is available at: ```text https://your-server/api-docs/openapi.json @@ -196,7 +196,7 @@ Rotates (revokes) the agent run token for the given server. The old token is inv |------|------|-------------| | `/api/agent/ws?token=` | Agent token query parameter | Agent metrics, commands, pings, files, Docker, traceroute | | `/api/ws/servers` | Session cookie, API key, or Bearer token | Browser/mobile real-time server updates | -| `/api/ws/terminal/{server_id}` | Authenticated Admin + `CAP_TERMINAL` | Web terminal proxy; terminal payloads use binary frames | +| `/api/ws/terminal/{server_id}` | Authenticated Admin + `CAP_TERMINAL` | Web terminal proxy; JSON text messages with terminal data base64-encoded | | `/api/ws/docker/logs/{server_id}` | Authenticated + `CAP_DOCKER` | Per-container Docker log streaming | ## Common Status Codes diff --git a/apps/docs/content/docs/zh/api-reference.mdx b/apps/docs/content/docs/zh/api-reference.mdx index 986c8e70..ff5ad6a1 100644 --- a/apps/docs/content/docs/zh/api-reference.mdx +++ b/apps/docs/content/docs/zh/api-reference.mdx @@ -4,7 +4,7 @@ description: ServerBee REST API 概览、认证方式、WebSocket 端点和 Swag icon: FileCode --- -ServerBee 将 Web 管理面板使用的能力同时通过 REST 和 WebSocket API 暴露。最权威的 schema 级参考由服务端二进制中的 OpenAPI 注解自动生成。 +ServerBee 把 Web 管理面板使用的能力同时通过 REST 和 WebSocket API 暴露。最权威的 schema 级参考由服务端二进制中的 OpenAPI 注解自动生成。 ## Swagger UI @@ -14,7 +14,7 @@ ServerBee 将 Web 管理面板使用的能力同时通过 REST 和 WebSocket API https://your-server/swagger-ui/ ``` -你可以在 Swagger UI 中查看请求/响应模型、认证要求,并直接向自己的部署发送测试请求。原始 OpenAPI 文档地址: +你可以在 Swagger UI 中查看请求/响应模型和认证要求,并直接向自己的部署发送测试请求。原始 OpenAPI 文档地址: ```text https://your-server/api-docs/openapi.json @@ -196,7 +196,7 @@ curl https://your-server/api/auth/me \ |------|------|------| | `/api/agent/ws?token=` | Agent token 查询参数 | Agent 指标、命令、Ping、文件、Docker、Traceroute | | `/api/ws/servers` | Session cookie、API Key 或 Bearer token | 浏览器/移动端实时服务器更新 | -| `/api/ws/terminal/{server_id}` | 已认证 Admin + `CAP_TERMINAL` | Web 终端代理;终端数据使用二进制帧 | +| `/api/ws/terminal/{server_id}` | 已认证 Admin + `CAP_TERMINAL` | Web 终端代理;JSON 文本消息,终端数据 base64 编码 | | `/api/ws/docker/logs/{server_id}` | 已认证 + `CAP_DOCKER` | 按容器流式传输 Docker 日志 | ## 常见状态码 From 9dcdedd59f978c9882aaba29b288b8449724034e Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:18 +0800 Subject: [PATCH 10/21] docs: drop the removed SERVERBEE_FEATURE__CUSTOM_THEMES env var --- ENV.md | 1 - 1 file changed, 1 deletion(-) diff --git a/ENV.md b/ENV.md index 2d3dc8b7..2afc71af 100644 --- a/ENV.md +++ b/ENV.md @@ -36,7 +36,6 @@ These variables are for local repo tooling and development workflows. They are n | `SERVERBEE_SERVER__DATA_DIR` | `server.data_dir` | string | `./data` | Data directory for SQLite and backups | | `SERVERBEE_AUTH__MAX_SERVERS` | `auth.max_servers` | u32 | `0` | Maximum servers allowed via enrollment (0 = no limit). Best-effort soft cap | | `SERVERBEE_SCHEDULER__TIMEZONE` | `scheduler.timezone` | string | `UTC` | Timezone for daily traffic aggregation and cron scheduling (e.g. `Asia/Shanghai`) | -| `SERVERBEE_FEATURE__CUSTOM_THEMES` | `feature.custom_themes` | bool | `true` | Disable user-defined themes when false. Custom refs are read-coerced to `preset:default` | | `SERVERBEE_LOG__LEVEL` | `log.level` | string | `info` | Log level: `trace`, `debug`, `info`, `warn`, `error` | | `SERVERBEE_LOG__FILE` | `log.file` | string | `""` | Log file path. Empty means stdout only | From d8611020fa14613a963af8668fe7be5d669d33c9 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:18 +0800 Subject: [PATCH 11/21] docs(architecture): expand internals and update terminal transport --- apps/docs/content/docs/en/architecture.mdx | 284 ++++++++++---- apps/docs/content/docs/zh/architecture.mdx | 406 ++++++++++++--------- 2 files changed, 449 insertions(+), 241 deletions(-) diff --git a/apps/docs/content/docs/en/architecture.mdx b/apps/docs/content/docs/en/architecture.mdx index 0fb89f0c..c1b61078 100644 --- a/apps/docs/content/docs/en/architecture.mdx +++ b/apps/docs/content/docs/en/architecture.mdx @@ -4,7 +4,7 @@ description: Technical overview of ServerBee's system design, components, and pr icon: Boxes --- -This page describes the internal architecture of ServerBee for developers and operators who want to understand how the system works under the hood. +This page describes ServerBee's internal architecture for developers and operators who want to understand how the system works under the hood. ServerBee follows a hub-and-spoke design: a central **Server** receives metrics from distributed **Agents** over WebSocket, stores them in SQLite, and serves a React SPA dashboard. ## System Overview @@ -32,7 +32,7 @@ This page describes the internal architecture of ServerBee for developers and op | | +-- SQLite | | | +---------------------------------------------+ | +--------------------------+-----------------------+ - | WebSocket (JSON + Binary) + | WebSocket (JSON) +--------------------------v-----------------------+ | ServerBee Agent (Rust) | | common crate (shared types/protocol) | @@ -56,9 +56,26 @@ The central hub that: - Runs background tasks (record writing, aggregation, cleanup, alerting, offline detection) - Dispatches notifications +Its main service modules are: + +| Module | Responsibility | +|--------|----------------| +| Axum Router | Routes and handles HTTP/WebSocket requests | +| AgentManager | Manages agent WebSocket connections, the live metric cache, and update broadcasting | +| AuthService | User login (argon2), session management, API key verification | +| ServerService | Server CRUD, grouping, tagging, ordering | +| RecordService | Metric writes, history queries, hourly aggregation, retention cleanup | +| AlertService | Alert rule CRUD, periodic evaluation, notification dispatch | +| NotificationService | Notification channel management and message dispatch (Webhook/Telegram/Email/Bark) | +| PingService | Ping task CRUD, task assignment, result storage | +| TaskService | Remote command dispatch, result queries | +| ConfigService | System configuration read/write (key-value store) | +| GeoIpService | IP geolocation lookup (MaxMind MMDB) | +| UserService | User CRUD and role management | + ### Agent (`crates/agent`) -A lightweight daemon that: +A lightweight daemon deployed on each monitored server that: - Collects system metrics using the `sysinfo` crate - Reports metrics to the server over WebSocket every N seconds @@ -67,48 +84,61 @@ A lightweight daemon that: - Executes remote commands dispatched by the server - Supports self-upgrade when instructed +Its main modules are: + +| Module | Responsibility | +|--------|----------------| +| Collector | Periodically collects CPU, memory, disk, network, and other metrics via `sysinfo` | +| Reporter | Manages the WebSocket connection, reports metrics, handles server commands, reconnects on disconnect | +| ProbeManager | Runs ICMP/TCP/HTTP probe tasks dispatched by the server | +| Executor | Executes remote shell commands dispatched by the server | +| Terminal | Manages PTY terminal sessions and forwards terminal I/O | + ### Common (`crates/common`) -A shared library crate containing: +A shared library crate containing the types and protocol shared between the agent and server: - **Protocol definitions** -- `AgentMessage`, `ServerMessage`, `BrowserMessage` enums - **Data types** -- `SystemInfo`, `SystemReport`, `GpuReport`, `PingResult`, `ServerStatus`, etc. -- **Constants** -- Default ports, timeouts, retention periods, alert parameters +- **Constants** -- Protocol version, default ports, timeouts, retention periods, alert parameters ### Frontend (`apps/web`) -A React 19 single-page application: +A React 19 single-page application, embedded into the server binary via `rust-embed` after build: - **Routing** -- TanStack Router (file-based) -- **Data fetching** -- TanStack Query for REST, native WebSocket for real-time updates +- **Data fetching** -- TanStack Query for REST, native WebSocket for real-time updates (sharing the same cache) - **UI components** -- shadcn/ui with the base-nova theme - **Styling** -- Tailwind CSS v4 - **Charts** -- Recharts via shadcn Chart wrappers +- **Terminal** -- xterm.js web terminal emulator - **Build** -- Vite 7 The built frontend is embedded into the server binary at compile time, so there is nothing to deploy separately. ## Communication Protocol -All communication uses WebSocket. Control and metric messages are JSON text frames; terminal data uses dedicated binary frames to minimize latency. The current protocol version is `4`, sent to the agent in the `Welcome` message on connect. +All communication uses WebSocket. Every message is a JSON text frame; terminal I/O data is carried inside the message's `data` field, base64-encoded. The current protocol version is `4`, sent to the agent in the `Welcome` message on connect. + +The agent connects to `ws:///api/agent/ws?token=`. ### Agent to Server (`AgentMessage`) -| Message Type | Purpose | -|-------------|---------| -| `SystemInfo` | Static system information, sent once on connect | -| `Report` | Periodic metric report (every N seconds) | -| `PingResult` | Result of a ping probe | -| `TaskResult` | Output from a remote command execution | -| `TerminalOutput` | PTY output data (base64-encoded) | -| `TerminalStarted` | Confirmation that a PTY session was created | -| `TerminalError` | Error from a terminal session | -| `Pong` | Protocol-level heartbeat response | -| `DockerInfo` | Docker system info and feature report | -| `DockerContainers` | Current container list with status | -| `DockerStats` | Container resource usage statistics | -| `DockerLog` | Container log entries (batched) | -| `DockerEvent` | Container lifecycle event | +| Message Type | Purpose | Needs ACK | +|-------------|---------|-----------| +| `SystemInfo` | Static system information, sent once on connect/reconnect | Yes | +| `Report` | Periodic metric report (every N seconds) | No | +| `PingResult` | Result of a ping probe | No | +| `TaskResult` | Output from a remote command execution | Yes | +| `TerminalOutput` | PTY output data (base64-encoded) | No | +| `TerminalStarted` | Confirmation that a PTY session was created | No | +| `TerminalError` | Error from a terminal session | No | +| `Pong` | Protocol-level heartbeat response | No | +| `DockerInfo` | Docker system info and feature report | No | +| `DockerContainers` | Current container list with status | No | +| `DockerStats` | Container resource usage statistics | No | +| `DockerLog` | Container log entries (batched) | No | +| `DockerEvent` | Container lifecycle event | No | ### Server to Agent (`ServerMessage`) @@ -122,7 +152,7 @@ All communication uses WebSocket. Control and metric messages are JSON text fram | `TerminalInput` | Forward keyboard input to a PTY session | | `TerminalResize` | Resize a PTY session | | `TerminalClose` | Close a PTY session | -| `Ping` | Protocol-level heartbeat | +| `Ping` | Protocol-level heartbeat (every 30s) | | `Upgrade` | Instruct the agent to self-upgrade | | `DockerLogsStart` | Start streaming logs for a container | | `DockerLogsStop` | Stop streaming logs | @@ -130,6 +160,8 @@ All communication uses WebSocket. Control and metric messages are JSON text fram ### Server to Browser (`BrowserMessage`) +Browsers obtain data two ways: the REST API for initial load and historical queries, and a WebSocket at `ws:///api/ws/servers` for real-time pushes. The WebSocket is one-way (Server -> Browser). Both share the TanStack Query cache key `['servers']` for seamless updates. + | Message Type | Purpose | |-------------|---------| | `FullSync` | Complete state of all servers (sent on browser connect) | @@ -140,48 +172,94 @@ All communication uses WebSocket. Control and metric messages are JSON text fram | `DockerEvent` | Docker container lifecycle event | | `DockerAvailabilityChanged` | Docker daemon availability changed | +### Handshake + +``` +Agent Server + | | + |--- WebSocket connect + token --->| + | | verify token, look up server + |<-- Welcome { server_id, | + | protocol_version: 4, | + | report_interval: 3 } -------| + | | + |--- SystemInfo { cpu_name, | static info on connect/reconnect + | os, mem_total, ... } ------>| + |<-- Ack { msg_id } -------------| + | | + |--- Report (every 3s) ---------->| periodic metric report +``` + +### Message Size Limits + +| Limit | Maximum Size | +|-------|--------------| +| WebSocket message size | 1 MB | +| Command output | 512 KB | +| Command length | 8 KB | + +### Terminal Data Transport + +Terminal traffic uses the same JSON text protocol as the rest of the system, relayed through the server: + +``` +Browser <--JSON WebSocket--> Server <--JSON WebSocket--> Agent (PTY) +``` + +The raw PTY byte stream is base64-encoded into the `data` field of a JSON message, because terminal output is not UTF-8 safe. + +Browser <-> Server messages (all JSON text): + +- Browser -> Server: `{ "type": "input", "data": }`, `{ "type": "resize", "rows": , "cols": }` +- Server -> Browser: `{ "type": "session", "session_id": }`, `{ "type": "started" }`, `{ "type": "output", "data": }`, `{ "type": "error", "error": }` + +Server <-> Agent messages reuse the agent protocol: + +- Server -> Agent: `ServerMessage::TerminalOpen`, `TerminalInput` (`data` base64), `TerminalResize`, `TerminalClose` +- Agent -> Server: `AgentMessage::TerminalStarted`, `TerminalOutput` (`data` base64), `TerminalError` + ## Database Design -ServerBee uses SQLite (WAL mode), with tables grouped into the following categories. The representative tables below cover the core schema: +ServerBee uses SQLite (WAL mode), managed through sea-orm. The representative tables below cover the core schema, grouped by category. -### Authentication (4 tables) +### Authentication | Table | Purpose | |-------|---------| -| `users` | User accounts (id, username, password_hash, role, TOTP secret) | +| `users` | User accounts (id, username, argon2 password hash, role, TOTP secret) | | `sessions` | Active login sessions (token, IP, user agent, expiry) | -| `api_keys` | API key credentials (hashed key, prefix, last used) | +| `api_keys` | API key credentials (argon2-hashed key, prefix, last used) | | `oauth_accounts` | Linked OAuth provider accounts | -### Server Management (4 tables) +### Server Management | Table | Purpose | |-------|---------| -| `servers` | Registered servers (system info, metadata, pricing, group assignment) | +| `servers` | Registered servers (token hash, system info, metadata, pricing, group assignment) | | `server_groups` | Logical groups for organizing servers | -| `server_tags` | Tag labels attached to servers | +| `server_tags` | Tag labels attached to servers (many-to-many) | | `configs` | Key-value configuration store (runtime settings, etc.) | -### Monitoring Data (9 tables) +### Monitoring Data | Table | Purpose | |-------|---------| -| `records` | Raw metric records (one row per server per minute) | -| `records_hourly` | Hourly aggregated metric averages | +| `records` | Raw metric records, one row per server per minute (composite index server_id + time) | +| `records_hourly` | Hourly aggregated metric averages (same shape as `records`) | | `gpu_records` | Per-GPU device metrics (device index, name, memory, utilization, temp) | -| `ping_records` | Ping probe results (latency, success, error) | +| `ping_records` | Ping probe results (latency, success, error; composite index task_id + server_id + time) | | `ping_tasks` | Ping task definitions (probe type, target, interval, assigned servers) | | `traffic_hourly` | Hourly traffic byte counters (in/out) per server | | `traffic_daily` | Daily traffic byte counters (in/out) per server | | `traffic_state` | Last-known cumulative traffic counters for delta calculation | | `docker_events` | Docker container lifecycle events (start/stop/die/create) | -### Alerting & Operations (6 tables) +### Alerting & Operations | Table | Purpose | |-------|---------| | `alert_rules` | Alert rule definitions (conditions, cover type, trigger mode) | -| `alert_states` | Current alert state per rule/server pair (triggered, resolved) | +| `alert_states` | Current alert state per rule/server pair (composite unique index rule_id + server_id) | | `notifications` | Notification channel configs (webhook, telegram, bark, email) | | `notification_groups` | Groups of notification channels | | `tasks` | Remote command tasks (command, target servers, created by) | @@ -190,37 +268,82 @@ ServerBee uses SQLite (WAL mode), with tables grouped into the following categor All tables use UTC timestamps. IDs are UUIDs (string type) for most entities, and auto-incrementing integers for high-volume records. +### SQLite Configuration + +- **WAL mode** -- `PRAGMA journal_mode=WAL` runs on startup to improve concurrent read performance. +- **Busy timeout** -- 5000ms; writers wait automatically on write contention. +- **Synchronous mode** -- `PRAGMA synchronous=NORMAL`. +- **Connection pool** -- Up to 10 connections, primarily for concurrent reads. + ## Background Tasks -The server spawns six long-running background tasks: +The server spawns several long-running background tasks via `tokio::spawn`: | Task | Interval | Description | |------|----------|-------------| -| **RecordWriter** | 60s | Flushes all cached agent reports from memory to the database | -| **OfflineChecker** | 10s | Scans for agents that have not sent a heartbeat in 30 seconds and marks them offline | -| **Aggregator** | 1 hour | Computes hourly averages from raw records for long-term storage | -| **Cleanup** | 1 hour | Deletes records, GPU data, ping records, and audit logs older than their retention periods | -| **SessionCleaner** | Periodic | Removes expired user login sessions | -| **AlertEvaluator** | 60s | Evaluates all enabled alert rules against current data and dispatches notifications | +| **RecordWriter** | 60s | Flushes the latest cached agent reports from memory to the `records` table | +| **AlertEvaluator** | 60s (after record write) | Evaluates all enabled alert rules against current data and dispatches notifications | +| **OfflineChecker** | 10s | Scans agent connection state and triggers offline events/alerts for agents with no heartbeat in 30 seconds | +| **Aggregator** | 1 hour | Computes hourly averages from raw `records` into `records_hourly` for long-term storage | +| **Cleanup** | 1 hour (after aggregation) | Deletes records, GPU data, ping records, and audit logs older than their retention periods | +| **PingTasksSync** | On config change | Pushes `PingTasksSync` to the relevant agents | +| **SessionCleaner** | 1 hour | Removes expired user login sessions | + +**Ordering guarantees:** + +- Hourly tasks: aggregation -> cleanup (aggregate before cleaning so no long-term data is lost). +- Minute tasks: record write -> alert evaluation (so evaluation reads the freshly written data). ## Security Model ### Authentication Layers -ServerBee supports three authentication mechanisms: +ServerBee supports three authentication mechanisms, all checked in the auth middleware: -1. **Session-based** (browser) -- User logs in with username/password, receives a session cookie. Sessions have a configurable TTL (default 24 hours). -2. **API Key** (automation) -- Users can create named API keys for programmatic access. Keys are hashed with SHA-256 and stored with an 8-character prefix for identification. -3. **Agent Token** (agents) -- Each registered server has a unique token hashed with argon2. Agents authenticate by providing the raw token as a query parameter on the WebSocket connection. +``` +Request arrives + +-- Cookie: session_token=xxx --> Session auth (browser) + +-- Header: X-API-Key: serverbee_xxx --> API key auth (automation) + +-- Query: ?token=xxx --> Agent token auth (agent WebSocket) +``` -### Role-Based Access +#### Session-based (browser) + +- User logs in with username/password, verified with argon2. +- A random 32-byte token (base64url-encoded) is generated. +- The token is stored in the `sessions` table with an HttpOnly + SameSite=Strict cookie. +- Sliding expiry: each valid request extends `expires_at` (default TTL 24 hours). -Two roles exist: +#### API Key (automation) -| Role | Capabilities | -|------|-------------| -| `admin` | Full access: manage users, servers, alerts, notifications, terminal, settings | -| `member` | Read-only dashboard access (no terminal, no administrative actions) | +- Users can create named API keys for programmatic access. +- Key format: `serverbee_` + 32 random bytes (base64url). +- Keys are hashed with argon2; an 8-character plaintext prefix is stored for identification. +- On verification, the prefix narrows the lookup, then argon2 verifies the key. + +#### Agent Token (agents) + +- During registration, an agent authenticates with a one-time enrollment code (single-use, short-lived). +- The returned token is used for subsequent WebSocket connections. +- The token is hashed with argon2 and stored in the `servers` table. +- Agents authenticate by providing the raw token as a query parameter on the WebSocket connection. + +### Role-Based Access + +Two roles exist: `admin` (full access) and `member` (read-only dashboard access). + +| Resource | Admin | Member | API Key | Agent | +|----------|-------|--------|---------|-------| +| Server list/detail | Yes | Yes (limited) | Yes | - | +| Server create/update/delete | Yes | No | Yes | - | +| Alert rule management | Yes | No | Yes | - | +| Notification config | Yes | No | No | - | +| User management | Yes | No | No | - | +| System settings | Yes | No | No | - | +| Real-time WebSocket | Yes | Yes | Yes | - | +| Web terminal | Yes | No | No | - | +| Remote commands | Yes | No | Yes | - | +| Metric reporting | - | - | - | Yes | ### OAuth @@ -228,7 +351,14 @@ OAuth login is supported for GitHub, Google, and generic OIDC providers. The `al ### Rate Limiting -Login and agent registration endpoints are rate-limited per IP to prevent brute-force attacks. Defaults are 5 login attempts and 10 agent registrations within a 15-minute window. Admins can clear an active window from Settings → Rate limits. +Implemented as `tower` middleware with an in-memory map of IP to counter. Login and agent registration endpoints are rate-limited per IP to prevent brute-force attacks. + +| Endpoint | Limit | +|----------|-------| +| `POST /api/auth/login` | 5 attempts per IP within a 15-minute window | +| `POST /api/agent/register` | 10 attempts per IP within a 15-minute window | + +Exceeding a limit returns `429 Too Many Requests`. Admins can clear an active window from Settings → Rate limits. ### TOTP @@ -236,17 +366,45 @@ Users can enable TOTP-based two-factor authentication for additional security. ## Workspace Structure +The project uses a Cargo workspace for the Rust crates. The frontend is a separate Bun/Node project under `apps/web/`. + ``` ServerBee/ + Cargo.toml # Workspace root crates/ - common/ # Shared types, protocol, constants - server/ # Server binary (Axum, sea-orm, background tasks) - agent/ # Agent binary (collector, reporter, terminal) + common/ # Shared: protocol, data types, constants + src/ + lib.rs + protocol.rs # Agent <-> Server message types + types.rs # SystemReport, GpuInfo, etc. + constants.rs # Protocol version, default values + server/ # Server binary (Axum, sea-orm, background tasks) + src/ + main.rs + config.rs + state.rs # AppState + router/ # REST API + WebSocket handlers + service/ # Business logic + entity/ # sea-orm entities (one module per table) + migration/ # Database migrations + middleware/ # Auth, logging middleware + agent/ # Agent binary (collector, reporter, terminal) + src/ + main.rs + config.rs + collector/ # Per-metric collectors + reporter.rs # WebSocket reporting + reconnect + probe/ # ICMP/TCP/HTTP probes + executor.rs # Remote command execution + terminal.rs # PTY terminal apps/ - web/ # React frontend (Vite, TanStack, shadcn/ui) - docs/ # Documentation site (TanStack Start + Fumadocs) - docs/ # Design documents and plans - Cargo.toml # Workspace root + web/ # React frontend (Vite, TanStack, shadcn/ui) + docs/ # Documentation site (TanStack Start + Fumadocs) + docs/ # Design documents and plans ``` -The project uses a Cargo workspace for the Rust crates. The frontend is a separate Bun/Node project under `apps/web/`. + + + + + diff --git a/apps/docs/content/docs/zh/architecture.mdx b/apps/docs/content/docs/zh/architecture.mdx index e1b7b933..7ff053a5 100644 --- a/apps/docs/content/docs/zh/architecture.mdx +++ b/apps/docs/content/docs/zh/architecture.mdx @@ -4,69 +4,87 @@ description: ServerBee 的系统架构、组件设计、通信协议和安全模 icon: Boxes --- -本文详细介绍 ServerBee 的内部架构设计,帮助你理解系统的工作原理。 +本文介绍 ServerBee 的内部架构,帮助希望深入了解系统工作原理的开发者和运维人员。ServerBee 采用 hub-and-spoke(中心辐射)设计:中央 **Server** 通过 WebSocket 接收来自分布式 **Agent** 的指标,存入 SQLite,并对外提供 React SPA 仪表盘。 ## 系统概览 ``` -+---------------------------------------------+ -| ServerBee Dashboard (Server) | -| | -| +---------------------------------------+ | -| | 前端 (React SPA, rust-embed 嵌入) | | -| | React 19 / TanStack Router & Query | | -| | shadcn/ui (base-nova) / Tailwind v4 | | -| +---------------------------------------+ | -| +---------------------------------------+ | -| | 服务端 (Rust / Axum) | | -| | Axum Router | | -| | +-- REST API handlers | | -| | +-- WebSocket: Agent + Browser | | -| | +-- 静态文件 (rust-embed) | | -| | Service Layer | | -| | +-- AgentManager (连接/状态) | | -| | +-- AlertService (告警评估) | | -| | +-- RecordService (指标/聚合) | | -| | +-- NotificationService (通知) | | -| | Entity Layer (sea-orm + SQLite) | | -| +---------------------------------------+ | -+-------------------+-------------------------+ - | WebSocket -+-------------------v-------------------------+ -| ServerBee Agent (Rust) | -| common crate 共享类型/协议 | -| +-- Collector (系统指标采集) | -| +-- Reporter (WebSocket 上报 + 重连) | -| +-- Probe (ICMP/TCP/HTTP 探测) | -| +-- Executor (远程命令) | -| +-- Terminal (PTY 终端) | -+----------------------------------------------+ ++--------------------------------------------------+ +| ServerBee Dashboard | +| | +| +---------------------------------------------+ | +| | 前端 (React SPA, rust-embed 嵌入) | | +| | React 19, TanStack Router/Query | | +| | shadcn/ui, Tailwind CSS v4, Recharts | | +| +---------------------------------------------+ | +| +---------------------------------------------+ | +| | 服务端 (Rust) | | +| | Axum Router | | +| | +-- REST API handlers | | +| | +-- WebSocket: Agent + Browser + Terminal | | +| | +-- 静态文件 (rust-embed) | | +| | Service Layer | | +| | +-- AgentManager (连接/状态) | | +| | +-- AlertService (告警评估) | | +| | +-- RecordService (指标/聚合) | | +| | +-- NotificationService (通知分发) | | +| | Entity Layer (sea-orm) | | +| | +-- SQLite | | +| +---------------------------------------------+ | ++--------------------------+-----------------------+ + | WebSocket (JSON) ++--------------------------v-----------------------+ +| ServerBee Agent (Rust) | +| common crate (共享类型/协议) | +| +-- Collector (系统指标采集) | +| +-- Reporter (WebSocket + 重连) | +| +-- Pinger (ICMP/TCP/HTTP 探测) | +| +-- TerminalManager (PTY 终端会话) | ++---------------------------------------------------+ ``` ## 组件职责 -### Server(服务端) +### Server(服务端,`crates/server`) -Server 是整个系统的中枢,承担以下职责: +整个系统的中枢,承担以下职责: + +- 提供 Web 仪表盘(React SPA 通过 `rust-embed` 嵌入) +- 暴露 REST API 进行 CRUD 操作 +- 管理来自 Agent 和浏览器的 WebSocket 连接 +- 将全部数据存入 SQLite +- 运行后台任务(指标写入、聚合、清理、告警、离线检测) +- 分发通知 + +其主要服务模块: | 模块 | 职责 | |------|------| | Axum Router | HTTP/WebSocket 请求路由和处理 | | AgentManager | 管理 Agent WebSocket 连接、维护实时指标缓存、广播更新 | -| AuthService | 用户登录验证 (argon2)、Session 管理、API Key 校验 | +| AuthService | 用户登录验证(argon2)、Session 管理、API Key 校验 | | ServerService | 服务器 CRUD、分组、标签、排序管理 | | RecordService | 指标写入、历史查询、小时聚合、过期清理 | | AlertService | 告警规则 CRUD、周期性规则评估、触发通知 | -| NotificationService | 通知渠道管理、消息分发 (Webhook/Telegram/Email/Bark) | +| NotificationService | 通知渠道管理、消息分发(Webhook/Telegram/Email/Bark) | | PingService | Ping 探测任务 CRUD、任务下发、结果存储 | | TaskService | 远程命令下发、执行结果查询 | -| ConfigService | 系统配置读写 (K-V 存储) | -| GeoIpService | IP 地理位置查询 (MaxMind MMDB) | +| ConfigService | 系统配置读写(K-V 存储) | +| GeoIpService | IP 地理位置查询(MaxMind MMDB) | | UserService | 用户 CRUD、角色管理 | -### Agent(采集端) +### Agent(采集端,`crates/agent`) + +部署在被监控服务器上的轻量级守护进程,负责: -Agent 是部署在被监控服务器上的轻量级程序: +- 使用 `sysinfo` crate 采集系统指标 +- 每 N 秒通过 WebSocket 向 Server 上报指标 +- 执行 Server 下发的 Ping 探测任务 +- 提供 PTY 终端会话以实现远程 Shell 访问 +- 执行 Server 下发的远程命令 +- 支持按指令自升级 + +其主要模块: | 模块 | 职责 | |------|------| @@ -76,62 +94,85 @@ Agent 是部署在被监控服务器上的轻量级程序: | Executor | 执行 Server 下发的远程 Shell 命令 | | Terminal | 管理 PTY 终端会话,转发终端输入输出 | -### Frontend(前端) +### Common(共享 crate,`crates/common`) -基于 React 19 的单页应用,构建后通过 `rust-embed` 嵌入到 Server 二进制中: +定义 Agent 和 Server 之间共享的类型和协议: -- **TanStack Router**:文件式路由 -- **TanStack Query**:服务器状态管理,与 WebSocket 共享 cache -- **Recharts**:通过 shadcn Chart 封装的指标折线图和图表 -- **shadcn/ui (base-nova)**:UI 组件库 -- **xterm.js**:Web 终端模拟器 +- **协议定义** —— `AgentMessage`、`ServerMessage`、`BrowserMessage` 枚举 +- **数据类型** —— `SystemInfo`、`SystemReport`、`GpuReport`、`PingResult`、`ServerStatus` 等 +- **常量定义** —— 协议版本号、默认端口、超时、保留周期、告警参数 -### Common(共享 crate) +### Frontend(前端,`apps/web`) -`common` crate 定义了 Agent 和 Server 之间共享的类型和协议: +基于 React 19 的单页应用,构建后通过 `rust-embed` 嵌入到 Server 二进制中: -- 通信消息类型(`AgentMessage`、`ServerMessage`、`BrowserMessage`) -- 数据结构(`SystemReport`、`SystemInfo`、`GpuInfo` 等) -- 常量定义(版本号、默认值) +- **路由** —— TanStack Router(文件式路由) +- **数据获取** —— TanStack Query 处理 REST,原生 WebSocket 处理实时更新(共享同一 cache) +- **UI 组件** —— shadcn/ui(base-nova 主题) +- **样式** —— Tailwind CSS v4 +- **图表** —— 通过 shadcn Chart 封装的 Recharts +- **终端** —— xterm.js Web 终端模拟器 +- **构建** —— Vite 7 -## 通信协议 +构建后的前端在编译期嵌入 Server 二进制,因此无需单独部署。 -### Agent 与 Server 之间 +## 通信协议 -Agent 和 Server 之间使用 WebSocket 通信,连接地址为 `ws:///api/agent/ws?token=`。 +所有通信都使用 WebSocket。所有消息均为 JSON 文本帧;终端 I/O 数据放在消息的 `data` 字段中,以 base64 编码。当前协议版本为 `4`,连接时通过 `Welcome` 消息下发给 Agent。 -**消息格式:** JSON (文本帧),终端数据使用 Binary 帧。 +Agent 的连接地址为 `ws:///api/agent/ws?token=`。 -**Agent -> Server 消息:** +### Agent -> Server(`AgentMessage`) | 消息类型 | 说明 | 需要 ACK | |----------|------|----------| -| `SystemInfo` | 静态系统信息(首次/重连后上报) | 是 | -| `Report` | 实时指标数据(每 3 秒) | 否 | +| `SystemInfo` | 静态系统信息,连接/重连后上报一次 | 是 | +| `Report` | 周期性指标上报(每 N 秒) | 否 | | `PingResult` | Ping 探测结果 | 否 | | `TaskResult` | 远程命令执行结果 | 是 | -| `Pong` | 心跳响应 | 否 | +| `TerminalOutput` | PTY 输出数据(base64 编码) | 否 | +| `TerminalStarted` | PTY 会话创建成功确认 | 否 | +| `TerminalError` | 终端会话错误 | 否 | +| `Pong` | 协议层心跳响应 | 否 | | `DockerInfo` | Docker 系统信息和功能上报 | 否 | | `DockerContainers` | 当前容器列表及状态 | 否 | | `DockerStats` | 容器资源使用统计 | 否 | | `DockerLog` | 容器日志条目(批量) | 否 | | `DockerEvent` | 容器生命周期事件 | 否 | -**Server -> Agent 消息:** +### Server -> Agent(`ServerMessage`) | 消息类型 | 说明 | |----------|------| -| `Welcome` | 连接确认,包含 server_id、协议版本、上报间隔 | -| `Ack` | 确认收到需要 ACK 的消息 | -| `PingTasksSync` | 下发探测任务列表 | -| `Exec` | 下发远程命令 | -| `TerminalClose` | 关闭终端会话 | -| `Ping` | 心跳探测(每 30 秒) | +| `Welcome` | 连接确认,包含 server_id、protocol_version、report_interval | +| `Ack` | 确认收到的消息(按 msg_id) | +| `PingTasksSync` | 下发全部分配的探测任务 | +| `Exec` | 在 Agent 上执行 Shell 命令 | +| `TerminalOpen` | 请求打开新的 PTY 会话 | +| `TerminalInput` | 向 PTY 会话转发键盘输入 | +| `TerminalResize` | 调整 PTY 会话尺寸 | +| `TerminalClose` | 关闭 PTY 会话 | +| `Ping` | 协议层心跳(每 30 秒) | +| `Upgrade` | 指示 Agent 自升级 | | `DockerLogsStart` | 开始为容器流式传输日志 | | `DockerLogsStop` | 停止流式传输日志 | | `DockerAction` | 执行容器操作(start/stop/restart/remove) | -**握手流程:** +### Server -> Browser(`BrowserMessage`) + +浏览器通过两种方式获取数据:REST API 用于初始加载和历史查询,WebSocket(地址 `ws:///api/ws/servers`)用于实时推送。该 WebSocket 为单向推送(Server -> 浏览器)。二者共享 TanStack Query 的 cache key `['servers']`,实现无缝更新。 + +| 消息类型 | 说明 | +|----------|------| +| `FullSync` | 所有服务器的完整状态(浏览器连接时下发) | +| `Update` | 变更服务器最新指标的增量更新 | +| `ServerOnline` | 服务器上线通知 | +| `ServerOffline` | 服务器离线通知 | +| `DockerUpdate` | Docker 容器和统计更新 | +| `DockerEvent` | Docker 容器生命周期事件 | +| `DockerAvailabilityChanged` | Docker daemon 可用性变化 | + +### 握手流程 ``` Agent Server @@ -142,72 +183,74 @@ Agent Server | protocol_version: 4, | | report_interval: 3 } -------| | | - |--- SystemInfo { cpu_name, | 首次/重连后上报静态信息 + |--- SystemInfo { cpu_name, | 连接/重连后上报静态信息 | os, mem_total, ... } ------>| |<-- Ack { msg_id } -------------| | | |--- Report (每 3 秒) ----------->| 周期性指标上报 ``` -**消息大小限制:** +### 消息大小限制 -| 帧类型 | 最大大小 | +| 限制项 | 最大大小 | |--------|----------| -| JSON 文本帧 | 1 MB | -| Binary 帧(终端数据) | 64 KB | +| WebSocket 消息大小 | 1 MB | | 命令输出 | 512 KB | | 命令长度 | 8 KB | -### Server 与浏览器之间 - -浏览器通过两种方式获取数据: +### 终端数据传输 -1. **REST API**:初始加载和历史数据查询 -2. **WebSocket**:实时数据推送,连接地址 `ws:///api/ws/servers` +终端流量与系统其余部分使用同一套 JSON 文本协议,经 Server 中转: -WebSocket 为单向推送(Server -> 浏览器),消息类型包括 `FullSync`、`Update`、`ServerOnline`、`ServerOffline`、`DockerUpdate`、`DockerEvent`、`DockerAvailabilityChanged`。 +``` +浏览器 <--JSON WebSocket--> Server <--JSON WebSocket--> Agent (PTY) +``` -REST API 和 WebSocket 共享 TanStack Query 的 cache key `['servers']`,实现无缝的数据更新。 +PTY 原始字节流以 base64 编码放入 JSON 消息的 `data` 字段,因为终端输出并非 UTF-8 安全。 -### 终端数据传输 +浏览器 <-> Server 消息(均为 JSON 文本): -终端数据使用 Binary 帧传输,不经过 JSON 编码,以最小化延迟: +- 浏览器 -> Server:`{ "type": "input", "data": }`、`{ "type": "resize", "rows": , "cols": }` +- Server -> 浏览器:`{ "type": "session", "session_id": }`、`{ "type": "started" }`、`{ "type": "output", "data": }`、`{ "type": "error", "error": }` -``` -浏览器 <--Binary WebSocket--> Server <--Binary WebSocket--> Agent (PTY) -``` +Server <-> Agent 消息复用 Agent 协议: -Binary 帧格式:`[1 byte session_id 长度][session_id bytes][payload bytes]` +- Server -> Agent:`ServerMessage::TerminalOpen`、`TerminalInput`(`data` base64)、`TerminalResize`、`TerminalClose` +- Agent -> Server:`AgentMessage::TerminalStarted`、`TerminalOutput`(`data` base64)、`TerminalError` ## 数据库设计 -ServerBee 使用 SQLite 数据库(WAL 模式),通过 sea-orm 管理。下面按类别列出核心数据表: +ServerBee 使用 SQLite 数据库(WAL 模式),通过 sea-orm 管理。下面按类别列出核心数据表。 ### 用户与认证 | 表名 | 说明 | |------|------| -| `users` | 用户账户(用户名、argon2 密码哈希、角色) | -| `sessions` | 登录会话(Token、过期时间、IP、User-Agent) | +| `users` | 用户账户(id、用户名、argon2 密码哈希、角色、TOTP 密钥) | +| `sessions` | 登录会话(Token、IP、User-Agent、过期时间) | | `api_keys` | API 密钥(argon2 哈希、前缀、最后使用时间) | +| `oauth_accounts` | 已关联的 OAuth 提供商账户 | ### 服务器管理 | 表名 | 说明 | |------|------| -| `servers` | 服务器信息(Token 哈希、静态信息、管理信息) | -| `server_groups` | 服务器分组 | -| `server_tags` | 服务器标签(多对多) | +| `servers` | 已注册服务器(Token 哈希、静态信息、元数据、计费、分组) | +| `server_groups` | 用于组织服务器的逻辑分组 | +| `server_tags` | 附加到服务器的标签(多对多) | +| `configs` | 键值配置存储(运行时设置等) | ### 指标记录 | 表名 | 说明 | |------|------| -| `records` | 分钟级指标记录(复合索引 server_id + time) | -| `records_hourly` | 小时级聚合记录(结构同 records,值为平均值) | -| `gpu_records` | GPU 每卡详细指标 | -| `traffic_hourly` | 每小时流量字节数(入站/出站) | -| `traffic_daily` | 每日流量字节数(入站/出站) | +| `records` | 分钟级原始指标记录,每服务器每分钟一行(复合索引 server_id + time) | +| `records_hourly` | 小时级聚合记录(结构同 `records`,值为平均值) | +| `gpu_records` | GPU 每卡详细指标(设备索引、名称、显存、利用率、温度) | +| `ping_records` | Ping 探测结果(延迟、成功、错误;复合索引 task_id + server_id + time) | +| `ping_tasks` | Ping 任务定义(探测类型、目标、间隔、分配的服务器) | +| `traffic_hourly` | 每服务器每小时流量字节数(入站/出站) | +| `traffic_daily` | 每服务器每日流量字节数(入站/出站) | | `traffic_state` | 最新累计流量计数器(用于增量计算) | | `docker_events` | Docker 容器生命周期事件(start/stop/die/create) | @@ -215,84 +258,79 @@ ServerBee 使用 SQLite 数据库(WAL 模式),通过 sea-orm 管理。下 | 表名 | 说明 | |------|------| -| `alert_rules` | 告警规则定义 | -| `alert_states` | 告警触发状态(持久化,复合唯一索引 rule_id + server_id) | -| `notifications` | 通知渠道配置 | +| `alert_rules` | 告警规则定义(条件、覆盖类型、触发模式) | +| `alert_states` | 各规则/服务器对的当前告警状态(复合唯一索引 rule_id + server_id) | +| `notifications` | 通知渠道配置(webhook、telegram、bark、email) | | `notification_groups` | 通知组 | +| `tasks` | 远程命令任务(命令、目标服务器、创建者) | +| `task_results` | 命令执行结果(输出、退出码) | +| `audit_logs` | 用户操作审计日志(操作、详情、IP) | -### 探测与任务 +所有表均使用 UTC 时间戳。多数实体的 ID 为 UUID(字符串类型),高写入量记录使用自增整型。 -| 表名 | 说明 | -|------|------| -| `ping_tasks` | Ping 探测任务 | -| `ping_records` | 探测结果记录(复合索引 task_id + server_id + time) | -| `tasks` | 远程命令任务 | -| `task_results` | 命令执行结果 | - -### 系统 - -| 表名 | 说明 | -|------|------| -| `configs` | 系统配置(K-V 存储) | -| `audit_logs` | 审计日志 | +### SQLite 配置 -**SQLite 配置:** - -- **WAL 模式**:启动时自动执行 `PRAGMA journal_mode=WAL`,提高并发读性能 -- **Busy Timeout**:5000ms,写冲突时自动等待 -- **同步模式**:`PRAGMA synchronous=NORMAL` -- **连接池**:最大 10 个连接,主要用于并发读 +- **WAL 模式** —— 启动时执行 `PRAGMA journal_mode=WAL`,提高并发读性能。 +- **Busy Timeout** —— 5000ms,写冲突时自动等待。 +- **同步模式** —— `PRAGMA synchronous=NORMAL`。 +- **连接池** —— 最大 10 个连接,主要用于并发读。 ## 后台任务 -Server 启动时通过 `tokio::spawn` 创建多个后台任务: +Server 启动时通过 `tokio::spawn` 创建多个长期运行的后台任务: | 任务 | 频率 | 职责 | |------|------|------| -| 指标写入 (RecordWriter) | 每 1 分钟 | 从 `latest_reports` 内存缓存取最新值写入 `records` 表 | -| 告警评估 (AlertEvaluator) | 每 1 分钟(指标写入后) | 遍历启用的告警规则,评估各服务器指标 | -| 离线检测 (OfflineChecker) | 每 10 秒 | 扫描 Agent 连接状态,触发离线事件和告警 | -| 小时聚合 (Aggregator) | 每 1 小时 | 将 `records` 表数据聚合为 `records_hourly` | -| 数据清理 (Cleaner) | 每 1 小时(聚合后) | 删除过期的 records / ping_records / gpu_records | -| Ping 任务同步 | 配置变更时 | 向相关 Agent 推送 `PingTasksSync` | -| Session 清理 | 每 1 小时 | 删除过期的 Session | +| **指标写入 (RecordWriter)** | 每 60 秒 | 从内存缓存把最新的 Agent 上报刷入 `records` 表 | +| **告警评估 (AlertEvaluator)** | 每 60 秒(指标写入后) | 评估所有启用的告警规则并分发通知 | +| **离线检测 (OfflineChecker)** | 每 10 秒 | 扫描 Agent 连接状态,对 30 秒无心跳的 Agent 触发离线事件/告警 | +| **小时聚合 (Aggregator)** | 每 1 小时 | 将 `records` 原始数据聚合为 `records_hourly`,用于长期存储 | +| **数据清理 (Cleanup)** | 每 1 小时(聚合后) | 删除超过保留周期的 records、GPU 数据、ping 记录和审计日志 | +| **Ping 任务同步 (PingTasksSync)** | 配置变更时 | 向相关 Agent 推送 `PingTasksSync` | +| **Session 清理 (SessionCleaner)** | 每 1 小时 | 删除过期的用户登录 Session | **执行顺序保证:** -- 每小时任务:小时聚合 -> 数据清理(确保聚合后再清理,不丢失长期数据) -- 每分钟任务:指标写入 -> 告警评估(确保评估读到最新写入的数据) +- 每小时任务:小时聚合 -> 数据清理(先聚合再清理,避免丢失长期数据)。 +- 每分钟任务:指标写入 -> 告警评估(确保评估读到刚写入的数据)。 ## 安全模型 ### 三条认证路径 +ServerBee 支持三种认证机制,均在认证中间件中校验: + ``` 请求进入 - +-- Cookie: session_token=xxx --> Session 认证(浏览器) + +-- Cookie: session_token=xxx --> Session 认证(浏览器) +-- Header: X-API-Key: serverbee_xxx --> API Key 认证(自动化) - +-- Query: ?token=xxx --> Agent Token 认证(Agent WebSocket) + +-- Query: ?token=xxx --> Agent Token 认证(Agent WebSocket) ``` -### Session 认证 +#### Session 认证(浏览器) -- 登录时通过 argon2 验证密码 -- 生成 32 字节随机 token(base64url 编码) -- 存入 `sessions` 表,设置 HttpOnly + SameSite=Strict Cookie -- 滑动过期:每次有效请求自动延长 `expires_at` +- 登录时通过 argon2 验证用户名/密码。 +- 生成 32 字节随机 token(base64url 编码)。 +- 存入 `sessions` 表,设置 HttpOnly + SameSite=Strict Cookie。 +- 滑动过期:每次有效请求自动延长 `expires_at`(默认 TTL 24 小时)。 -### API Key 认证 +#### API Key 认证(自动化) -- Key 格式:`serverbee_` + 32 字节随机 base64url -- 存储 argon2 哈希 + 前 8 位明文前缀 -- 校验时用前缀缩小查询范围,再用 argon2 验证 +- 用户可创建具名 API 密钥用于程序化访问。 +- Key 格式:`serverbee_` + 32 字节随机 base64url。 +- 以 argon2 哈希存储,并保存 8 位明文前缀用于标识。 +- 校验时先用前缀缩小查询范围,再用 argon2 验证。 -### Agent Token 认证 +#### Agent Token 认证(Agent) -- 注册时通过一次性注册码(Enrollment Code)认证,注册码单次使用且短时有效 -- 返回的 Token 用于后续 WebSocket 连接 -- Token 以 argon2 哈希存储在 `servers` 表 +- 注册时通过一次性注册码(Enrollment Code)认证,注册码单次使用且短时有效。 +- 返回的 Token 用于后续 WebSocket 连接。 +- Token 以 argon2 哈希存储在 `servers` 表。 +- Agent 在 WebSocket 连接的查询参数中提供原始 Token 进行认证。 -### 基于角色的访问控制 (RBAC) +### 基于角色的访问控制(RBAC) + +存在两种角色:`admin`(完全访问)和 `member`(只读仪表盘访问)。 | 资源 | Admin | Member | API Key | Agent | |------|-------|--------|---------|-------| @@ -307,9 +345,13 @@ Server 启动时通过 `tokio::spawn` 创建多个后台任务: | 远程命令 | 允许 | 禁止 | 允许 | - | | 指标上报 | - | - | - | 允许 | +### OAuth + +支持 GitHub、Google 和通用 OIDC 提供商的 OAuth 登录。`allow_registration` 开关控制首次 OAuth 登录时是否自动创建新用户。 + ### 速率限制 -基于 `tower` middleware 实现,内存中维护 IP 到计数器的映射: +基于 `tower` middleware 实现,内存中维护 IP 到计数器的映射。登录和 Agent 注册接口按 IP 限流,以防止暴力破解。 | 接口 | 限制 | |------|------| @@ -318,39 +360,47 @@ Server 启动时通过 `tokio::spawn` 创建多个后台任务: 超限返回 `429 Too Many Requests`。管理员可在「设置 → 速率限制」中清除活跃窗口。 -## Cargo Workspace 结构 +### TOTP + +用户可启用基于 TOTP 的两步验证以增强安全性。 + +## Workspace 结构 + +Rust crate 采用 Cargo workspace 管理;前端是 `apps/web/` 下独立的 Bun/Node 项目。 ``` ServerBee/ -+-- Cargo.toml # workspace 定义 -+-- crates/ -| +-- common/ # 共享: 协议定义、数据类型、序列化 -| | +-- src/ -| | +-- lib.rs -| | +-- protocol.rs # Agent <-> Server 消息类型 -| | +-- types.rs # SystemReport, GpuInfo 等 -| | +-- constants.rs # 版本号、默认值 -| +-- server/ # 服务端 -| | +-- src/ -| | +-- main.rs -| | +-- config.rs -| | +-- state.rs # AppState -| | +-- router/ # REST API + WebSocket handlers -| | +-- service/ # 业务逻辑层 -| | +-- entity/ # sea-orm Entity (每表一个模块) -| | +-- migration/ # 数据库迁移 -| | +-- middleware/ # 认证、日志中间件 -| +-- agent/ # Agent 采集端 -| +-- src/ -| +-- main.rs -| +-- config.rs -| +-- collector/ # 各指标采集器 -| +-- reporter.rs # WebSocket 上报 + 重连 -| +-- probe/ # ICMP/TCP/HTTP 探测 -| +-- executor.rs # 远程命令执行 -| +-- terminal.rs # PTY 终端 -+-- apps/ - +-- web/ # React SPA 前端 + Cargo.toml # workspace 根 + crates/ + common/ # 共享:协议、数据类型、常量 + src/ + lib.rs + protocol.rs # Agent <-> Server 消息类型 + types.rs # SystemReport、GpuInfo 等 + constants.rs # 协议版本、默认值 + server/ # 服务端二进制(Axum、sea-orm、后台任务) + src/ + main.rs + config.rs + state.rs # AppState + router/ # REST API + WebSocket handlers + service/ # 业务逻辑层 + entity/ # sea-orm Entity(每表一个模块) + migration/ # 数据库迁移 + middleware/ # 认证、日志中间件 + agent/ # Agent 二进制(collector、reporter、terminal) + src/ + main.rs + config.rs + collector/ # 各指标采集器 + reporter.rs # WebSocket 上报 + 重连 + probe/ # ICMP/TCP/HTTP 探测 + executor.rs # 远程命令执行 + terminal.rs # PTY 终端 + apps/ + web/ # React SPA 前端(Vite、TanStack、shadcn/ui) + docs/ # 文档站点(TanStack Start + Fumadocs) + docs/ # 设计文档与计划 ``` From 785ae6045a7d30862c404d40bd91cebe1716c629 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:18 +0800 Subject: [PATCH 12/21] docs(index): correct status page to single page and JSON terminal transport --- apps/docs/content/docs/en/index.mdx | 12 ++++++------ apps/docs/content/docs/zh/index.mdx | 16 ++++++++-------- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/apps/docs/content/docs/en/index.mdx b/apps/docs/content/docs/en/index.mdx index 91486541..70de20e7 100644 --- a/apps/docs/content/docs/en/index.mdx +++ b/apps/docs/content/docs/en/index.mdx @@ -37,13 +37,13 @@ Run ICMP, TCP, and HTTP probes from multiple agents to measure target reachabili When the Docker capability is enabled, view the container list, live stats, lifecycle events, log streams, networks, and volumes, and perform container actions. -### Public status pages, themes, and branding +### Public status page, themes, and branding -Create multiple public status pages with their own slug, server scope, incidents, maintenance windows, uptime timelines, availability thresholds, themes, and custom CSS. Appearance settings support preset and custom OKLCH themes plus white-label title, logo, favicon, and footer text. +Publish a single public status page at `/status`—no authentication required—covering a selected server scope, live metrics, 90-day uptime timelines, availability thresholds, incidents, and maintenance windows. Appearance settings support preset and custom OKLCH themes plus white-label title, logo, favicon, and footer text. ### VPS cost insights -After you record each server's price and billing cycle, ServerBee computes a value score per VPS (excellent / good / okay / poor / waste) from resources, utilization, and uptime, and surfaces monthly-equivalent cost, burn rate, and remaining days in the servers list, the dashboard server card, and a per-server insights panel. +After you record each server's price and billing cycle, ServerBee computes a per-VPS value score (excellent / good / okay / poor / waste) from resources, utilization, and uptime. It surfaces the monthly-equivalent cost, burn rate, and remaining days in the servers list, the dashboard server card, and a per-server insights panel. ### OAuth and mobile support @@ -56,7 +56,7 @@ Sign in via GitHub, Google, or any generic OIDC provider. Mobile sessions, devic | Server | Rust (Axum 0.8, sea-orm, tokio, SQLite) | | Agent | Rust (sysinfo, tokio-tungstenite) | | Frontend | React 19, Vite 7, TanStack Router, TanStack Query, Recharts, shadcn/ui, Tailwind CSS v4 | -| Protocol | WebSocket (JSON frames, binary for terminal data) | +| Protocol | WebSocket (JSON text messages; terminal data base64-encoded) | | Database | SQLite (WAL mode) | | Deployment | Single binary / Docker / install script | @@ -75,7 +75,7 @@ ServerBee follows a hub-and-spoke architecture: 2. **Agents** run on each monitored VPS, collecting metrics and reporting back every few seconds. 3. The **Frontend** is a React SPA embedded into the server binary, so there is nothing extra to deploy. -All agent-to-server communication uses WebSocket with JSON-encoded messages. Terminal sessions and some streaming features use dedicated WebSocket routes with binary frames. +All agent-to-server communication uses WebSocket with JSON-encoded messages. Terminal sessions reuse the same JSON text protocol, with the raw PTY byte stream carried in a base64-encoded `data` field. ## Next Steps @@ -91,7 +91,7 @@ All agent-to-server communication uses WebSocket with JSON-encoded messages. Ter - + diff --git a/apps/docs/content/docs/zh/index.mdx b/apps/docs/content/docs/zh/index.mdx index 3f900a65..7b0f513b 100644 --- a/apps/docs/content/docs/zh/index.mdx +++ b/apps/docs/content/docs/zh/index.mdx @@ -21,17 +21,17 @@ Agent 通过 WebSocket 与 Server 保持长连接,实现实时数据推送。 通过 WebSocket 驱动的实时仪表盘查看所有服务器运行状态。支持 CPU、内存、磁盘、网络、负载、温度、GPU、磁盘 I/O、流量等指标,以及多仪表盘、多组件布局。 -### 告警通知与服务监控 +### 告警与服务监控 -灵活的告警规则引擎支持资源阈值、流量周期、网络质量、离线、到期和 IP 变化事件。服务监控支持 SSL、DNS、HTTP 关键字、TCP 和 WHOIS 检查,并可通过 Webhook、Telegram、Bark、Email、APNs 等通知组发送告警。 +灵活的告警规则引擎支持资源阈值、流量周期、网络质量、离线、到期和 IP 变化事件,内置防抖逻辑和维护期抑制。服务监控支持 SSL 证书、DNS、HTTP 关键字、TCP 和 WHOIS 检查,并可通过 Webhook、Telegram、Bark、Email、APNs 等通知组发送告警。 ### Web 终端、远程任务与文件管理 -通过浏览器访问服务器 Shell 终端,执行一次性命令或 cron 计划任务。文件管理器提供受控的远程浏览、读取、编辑、上传和下载能力,配合路径沙箱和审计日志使用。 +通过浏览器在任意服务器上打开 PTY Shell,执行一次性命令,或调度带重试的 cron 任务。文件管理器提供受控的远程浏览、读取、编辑、上传和下载能力,配合路径沙箱和审计日志使用。 ### Ping、网络质量与 Traceroute -支持 ICMP、TCP、HTTP 探测,可从多个 Agent 节点检测目标可用性、延迟和丢包。网络详情页还提供 Traceroute 排障能力。 +支持从多个 Agent 节点发起 ICMP、TCP、HTTP 探测,检测目标可用性、延迟和丢包,并提供图表、CSV 导出和预设网络目标。网络详情页还提供 Traceroute 排障能力。 ### Docker 管理 @@ -39,11 +39,11 @@ Agent 通过 WebSocket 与 Server 保持长连接,实现实时数据推送。 ### 公开状态页、主题和品牌 -可创建多个公开状态页,配置独立 slug、服务器范围、事件公告、维护窗口、可用性阈值、主题和自定义 CSS。外观设置支持预设/自定义主题,以及站点标题、Logo、Favicon 和页脚文本。 +对外发布单个公开状态页(路径 `/status`,无需认证),可配置服务器范围、实时指标、90 天可用性时间线、可用性阈值、事件公告和维护窗口。外观设置支持预设/自定义 OKLCH 主题,以及白标站点标题、Logo、Favicon 和页脚文本。 ### VPS 成本洞察 -录入服务器的价格和计费周期后,ServerBee 会基于资源、利用率和在线时长自动计算每台 VPS 的价值评分(excellent / good / okay / poor / waste),并把月度等价成本、burn 速率、剩余天数等信号同时呈现在服务器列表、仪表盘 server card 以及服务器详情页的成本洞察面板。 +录入每台服务器的价格和计费周期后,ServerBee 会基于资源、利用率和在线时长自动计算每台 VPS 的价值评分(excellent / good / okay / poor / waste),并把月度等价成本、消耗速率、剩余天数等信号同时呈现在服务器列表、仪表盘服务器卡片以及服务器详情页的成本洞察面板。 ### OAuth 与移动端 @@ -56,7 +56,7 @@ Agent 通过 WebSocket 与 Server 保持长连接,实现实时数据推送。 | Server | Rust (Axum 0.8 + sea-orm + SQLite + tokio) | | Agent | Rust (sysinfo + tokio-tungstenite) | | Frontend | React 19 + Vite + TanStack Router + TanStack Query + Recharts + shadcn/ui | -| 通信 | WebSocket (JSON + Binary) | +| 通信 | WebSocket(JSON 文本消息,终端数据 base64 编码) | | 数据库 | SQLite (WAL 模式) | | 部署 | 单二进制 / Docker / 安装脚本 | @@ -75,7 +75,7 @@ ServerBee 采用 hub-and-spoke(中心辐射)架构: 2. **Agent** 跑在每台被监控的 VPS 上,每隔几秒采集并上报指标。 3. **前端**是嵌入 Server 二进制中的 React SPA,无需额外部署。 -Agent 与 Server 之间的通信全部走 WebSocket(JSON 消息)。终端会话和部分流式功能使用专用的 WebSocket 路由(二进制帧)。 +Agent 与 Server 之间的通信全部走 WebSocket(JSON 消息)。终端会话复用同一套 JSON 文本协议,PTY 原始字节流放在 base64 编码的 `data` 字段中传输。 ## 快速链接 From f07c7863a1711702d11f562cb28e2f26ed57e0f5 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:18 +0800 Subject: [PATCH 13/21] docs(terminal): rewrite transport, limits and security sections --- apps/docs/content/docs/en/terminal.mdx | 117 ++++++++++++------ apps/docs/content/docs/zh/terminal.mdx | 160 +++++++++++++++++-------- 2 files changed, 195 insertions(+), 82 deletions(-) diff --git a/apps/docs/content/docs/en/terminal.mdx b/apps/docs/content/docs/en/terminal.mdx index d605e498..87f05e29 100644 --- a/apps/docs/content/docs/en/terminal.mdx +++ b/apps/docs/content/docs/en/terminal.mdx @@ -4,50 +4,66 @@ description: Access a full shell on your servers directly from the browser. icon: SquareTerminal --- -ServerBee includes a built-in web terminal that gives you direct shell access to any connected server through your browser. No SSH client needed -- just click and type. +ServerBee includes a built-in web terminal that gives administrators direct shell access to any connected server through the browser. No SSH client needed -- just click and type. ## How It Works -The web terminal uses a three-hop WebSocket relay: +The web terminal uses a three-hop WebSocket relay, with the server acting as a proxy between the browser and the agent: ``` -Browser <--WebSocket--> Server <--WebSocket--> Agent (PTY) +Browser (xterm.js) <--WebSocket--> Server (proxy) <--WebSocket--> Agent (PTY) ``` -1. The browser opens a WebSocket connection to the server at `/ws/terminal/{server_id}` -2. The server authenticates the user (session cookie or API key) and verifies admin role -3. The server sends a `TerminalOpen` command to the agent via its existing WebSocket connection -4. The agent spawns a PTY (pseudo-terminal) process with the system's default shell -5. Input from the browser is forwarded to the agent's PTY, and output flows back the same path +1. The browser opens a WebSocket connection to the server at `/api/ws/terminal/{server_id}` +2. The server authenticates the user (session cookie or API key) and verifies the admin role +3. The server sends a `TerminalOpen` command to the agent over its existing WebSocket connection +4. The agent spawns a PTY (pseudo-terminal) process via the `portable-pty` library, running the system's default shell +5. Input from the browser is forwarded through the server to the agent's PTY, and output flows back along the same path -All terminal data is base64-encoded within JSON messages. The actual shell process runs entirely on the agent machine. +The shell process runs entirely on the agent machine. Terminal payloads are relayed over WebSocket. ## Usage -1. Navigate to a server's detail page in the dashboard -2. Click the **Terminal** button (available only for online servers) -3. A terminal panel opens with a full interactive shell session - -You can: -- Run any command your shell supports -- Use tab completion, history, and keyboard shortcuts -- Resize the terminal window (the PTY automatically adjusts) -- Open terminals to multiple servers simultaneously in different tabs - -## Authentication and Access Control +1. Sign in to the ServerBee dashboard +2. Open the server list and select the target server +3. On the server detail page, click the **Terminal** button (available only for online servers) +4. Once the connection is established, you have a full interactive shell in the browser -Web terminal access is restricted to users with the **admin** role only. Non-admin users will receive a 403 Forbidden response when attempting to open a terminal session. +The web terminal is restricted to the **admin** role. Non-admin (member) users cannot access the terminal and receive a 403 Forbidden response when attempting to open a session. +## Features + +### Full Shell Experience + +- Supports standard shells (bash, zsh, etc., depending on the agent host's default shell) +- Tab completion, command history, and keyboard shortcuts +- Interactive programs such as `vim`, `top`, and `htop` +- Color output and ANSI escape sequences + +### Automatic Resizing + +The terminal window adapts to the browser window automatically. When you resize the browser, the terminal's rows and columns are synced to the remote PTY so the display always renders correctly. + +### Tokyo Night Theme + +The web terminal uses the Tokyo Night color scheme to match ServerBee's dark theme. + +### Multiple Sessions + +You can open multiple terminal sessions to the same server simultaneously, making it easy to run different tasks in different tabs. + +## Authentication and Access Control + -Web terminal access also requires the **terminal** capability (CAP_TERMINAL) to be enabled on the target server. If disabled, the WebSocket upgrade will be rejected with 403 Forbidden. Administrators can manage capabilities in Settings → Capabilities. +The web terminal also requires the **terminal** capability (`CAP_TERMINAL`) to be enabled on the target server. If it is disabled, the WebSocket upgrade is rejected with 403 Forbidden. Administrators manage capabilities in Settings → Capabilities. -Authentication works through the same mechanisms as the REST API: +Authentication uses the same mechanisms as the REST API: -- **Session cookie** -- Automatically sent by the browser after login -- **API key** -- Sent via the `X-API-Key` header (useful for programmatic access) +- **Session cookie** -- sent automatically by the browser after login +- **API key** -- sent via the `X-API-Key` header (useful for programmatic access) The server validates the user's identity and role before upgrading the connection to a WebSocket. @@ -59,18 +75,16 @@ To prevent resource exhaustion, the following limits are enforced: |-------|-------| | Maximum concurrent terminal sessions per server | 3 | | Idle timeout | 10 minutes | -| Maximum WebSocket message size | 1 MB | -| Maximum binary frame size | 64 KB | +| Maximum message size | 1 MB | +| Access | Admin only | ### Idle Timeout -If no input is received from the browser for 10 minutes, the session is automatically closed. The browser receives an error message indicating the timeout, and the agent-side PTY process is terminated. - -The idle timer resets every time the user sends any input (including resize events). +If no input is received from the browser for 10 minutes, the session is closed automatically: the browser receives a timeout error and the agent-side PTY process is terminated. Any input, including resize events, resets the idle timer. ## Terminal Protocol -The browser and server communicate using JSON messages: +The browser and server exchange JSON control messages. ### Browser to Server @@ -118,18 +132,51 @@ The `data` field contains base64-encoded input. - Agent disconnects - Server sends `TerminalClose` to the agent -On close, the server sends a `TerminalClose` message to the agent, which terminates the PTY process. The terminal session is unregistered from the agent manager. +On close, the server sends a `TerminalClose` message to the agent, which terminates the PTY process, and the session is unregistered from the agent manager. + +## Security + +### PTY Run-As User + +The PTY process runs as the system user the agent runs as. The install script configures the agent as a systemd service running as `root` by default, so terminal sessions also run with root privileges. + +If this is a concern, run the agent under a dedicated, less-privileged user via the systemd `User=` directive: + +```ini title="serverbee-agent.service" +[Service] +User=serverbee +Group=serverbee +ExecStart=/usr/local/bin/serverbee-agent +``` + + +Running the agent as a non-root user limits the user's permissions in the terminal and may restrict some system-metric collection (such as TCP/UDP connection counts). ICMP ping probes additionally require the `CAP_NET_RAW` capability. + + +### Audit Logging + +Terminal connect and disconnect events are written to the audit log. Administrators can review all session access under Settings → Audit Logs, including the operator's username, target server, connect/disconnect times, and source IP address. + +### Transport Encryption + +In production, always serve the terminal behind an HTTPS reverse proxy. Unencrypted WebSocket connections can expose terminal input and output to eavesdropping. ## Troubleshooting ### "Agent is offline" -The terminal requires the target server's agent to be connected. If the agent is offline, you will see an error message and the WebSocket upgrade will be rejected. +The terminal requires the target server's agent to be connected. If the agent is offline, the WebSocket upgrade is rejected and an error is shown. ### Connection drops frequently -If you are running behind a reverse proxy, ensure WebSocket connections are properly forwarded and that read/write timeouts are set to at least 86400 seconds (24 hours). See the [Server Setup](/en/docs/server) guide for an nginx configuration example. +Behind a reverse proxy, ensure WebSocket connections are forwarded correctly and that read/write timeouts are set to at least 86400 seconds (24 hours). See the [Server Setup](/en/docs/server) guide for an nginx example. ### Terminal is slow or laggy -Terminal data is relayed through the central server. If the server is geographically far from either the browser or the agent, you may experience latency. This is inherent to the relay architecture. For latency-sensitive work, consider using direct SSH access instead. +Terminal data is relayed through the central server. If the server is geographically distant from the browser or the agent, you may notice latency -- this is inherent to the relay architecture. For latency-sensitive work, use direct SSH access instead. + + + + + + diff --git a/apps/docs/content/docs/zh/terminal.mdx b/apps/docs/content/docs/zh/terminal.mdx index caef6a1d..7901c9a1 100644 --- a/apps/docs/content/docs/zh/terminal.mdx +++ b/apps/docs/content/docs/zh/terminal.mdx @@ -4,86 +4,143 @@ description: 通过浏览器直接访问服务器的 Shell 终端。 icon: SquareTerminal --- -ServerBee 提供基于 Web 的远程终端功能,允许管理员通过浏览器直接访问被监控服务器的 Shell,无需额外的 SSH 客户端。 +ServerBee 内置基于 Web 的远程终端,管理员可以通过浏览器直接访问被监控服务器的 Shell,无需额外的 SSH 客户端,点击即用。 ## 工作原理 -Web 终端采用三段式架构,Server 作为中间代理转发终端数据: +Web 终端采用三段式 WebSocket 中转架构,Server 作为中间代理在浏览器和 Agent 之间转发终端数据: ``` -浏览器 (xterm.js) <--WebSocket--> Server (代理转发) <--WebSocket--> Agent (PTY) +浏览器 (xterm.js) <--WebSocket--> Server (代理转发) <--WebSocket--> Agent (PTY) ``` -详细流程: +1. 浏览器通过 WebSocket 连接到 Server:`/api/ws/terminal/{server_id}` +2. Server 校验用户身份(Session Cookie 或 API Key)并验证管理员角色 +3. Server 通过已建立的 Agent WebSocket 连接,向 Agent 下发 `TerminalOpen` 指令 +4. Agent 使用 `portable-pty` 库创建伪终端(PTY)进程,并启动系统默认 Shell +5. 浏览器输入的字符经 Server 转发到 Agent 的 PTY,PTY 的输出再经 Server 原路转发回浏览器 -1. 管理员在浏览器中打开终端页面 -2. 浏览器通过 WebSocket 连接到 Server:`ws:///api/ws/terminal/` -3. Server 通过已建立的 Agent WebSocket 连接,指令 Agent 创建 PTY 会话 -4. Agent 使用 `portable-pty` 库创建伪终端(PTY),启动 Shell 进程 -5. 浏览器输入的字符经 Server 转发到 Agent 的 PTY -6. Agent PTY 的输出经 Server 转发回浏览器显示 - -终端数据使用 WebSocket Binary 帧直接转发,不经过 JSON 封装。Binary 帧格式: - -``` -[1 byte session_id 长度][session_id bytes][payload bytes] -``` +Shell 进程完全运行在 Agent 所在机器上。终端数据通过 WebSocket 中转传输。 ## 使用方法 1. 登录 ServerBee 管理面板 2. 进入服务器列表,点击目标服务器 -3. 在服务器详情页面,点击「终端」按钮 -4. 等待连接建立,即可在浏览器中使用完整的 Shell +3. 在服务器详情页点击「终端」按钮(仅在线服务器可用) +4. 连接建立后,即可在浏览器中使用完整的交互式 Shell -Web 终端功能仅对管理员 (Admin) 角色可用。普通成员 (Member) 无法访问终端。 +Web 终端仅对**管理员**(Admin)角色可用。普通成员(Member)无法访问终端,尝试打开会收到 403 Forbidden 响应。 ## 功能特性 ### 完整 Shell 体验 -- 支持标准的 Shell 操作(bash、zsh 等,取决于 Agent 所在系统的默认 Shell) -- 支持 Tab 补全、历史命令、快捷键 -- 支持运行交互式程序(如 vim、top、htop 等) +- 支持标准 Shell(bash、zsh 等,取决于 Agent 所在系统的默认 Shell) +- 支持 Tab 补全、历史命令和快捷键 +- 支持运行交互式程序,如 `vim`、`top`、`htop` 等 - 支持彩色输出和 ANSI 转义序列 ### 自动调整大小 -终端窗口大小会自动适配浏览器窗口。当你调整浏览器窗口大小时,终端的行数和列数会自动同步到远端 PTY,确保显示效果始终正确。 +终端窗口大小会自动适配浏览器窗口。调整浏览器窗口时,终端的行数和列数会同步到远端 PTY,确保显示效果始终正确。 ### Tokyo Night 主题 -Web 终端使用 Tokyo Night 配色方案,提供舒适的视觉体验,与 ServerBee 的暗色主题风格一致。 +Web 终端使用 Tokyo Night 配色方案,与 ServerBee 的暗色主题风格一致。 ### 多会话支持 -同一台服务器支持同时打开多个终端会话,方便在不同窗口执行不同任务。 - -## 限制 +同一台服务器支持同时打开多个终端会话,方便在不同标签页中执行不同任务。 -| 限制项 | 值 | 说明 | -|--------|-----|------| -| 最大并发会话 | 3 | 每台服务器最多 3 个同时活跃的终端会话 | -| 空闲超时 | 10 分钟 | 无任何输入超过 10 分钟后自动断开 | -| 单帧数据上限 | 64 KB | 单条 WebSocket Binary 帧最大体积 | -| 访问权限 | 仅管理员 | Member 角色无法使用终端功能 | +## 认证与访问控制 -Web 终端需要目标服务器启用 `terminal` 能力(CAP_TERMINAL)。如果该能力被禁用,服务器将返回 403 Forbidden。管理员可以在 Settings → Capabilities 中管理此设置。 +Web 终端还要求目标服务器启用 **terminal** 能力(`CAP_TERMINAL`)。该能力被禁用时,WebSocket 升级请求会被 403 Forbidden 拒绝。管理员可以在 Settings → Capabilities 中管理此设置。 +终端认证与 REST API 使用相同的机制: + +- **Session Cookie**:登录后由浏览器自动携带 +- **API Key**:通过 `X-API-Key` 请求头传递(适用于程序化访问) + +Server 会在升级 WebSocket 连接前校验用户身份和角色。 + +## 会话限制 + +为防止资源耗尽,系统强制以下限制: + +| 限制项 | 值 | +|--------|-----| +| 每台服务器最大并发终端会话 | 3 | +| 空闲超时 | 10 分钟 | +| 最大消息大小 | 1 MB | +| 访问权限 | 仅管理员 | + +### 空闲超时 + +无任何输入超过 10 分钟后,会话会自动关闭:浏览器收到超时错误,Agent 端的 PTY 进程被终止。任何输入(包括调整大小事件)都会重置空闲计时器。 + +## 终端协议 + +浏览器与 Server 之间交换 JSON 控制消息。 + +### 浏览器 → Server + +**输入**——向 Shell 发送按键: +```json +{ "type": "input", "data": "bHM=" } +``` +`data` 字段为 base64 编码的输入。 + +**调整大小**——通知终端窗口尺寸变化: +```json +{ "type": "resize", "rows": 40, "cols": 120 } +``` + +### Server → 浏览器 + +**会话已建立:** +```json +{ "type": "session", "session_id": "uuid-here" } +``` + +**终端已启动:** +```json +{ "type": "started" } +``` + +**终端输出:** +```json +{ "type": "output", "data": "base64-encoded-output" } +``` + +**错误:** +```json +{ "type": "error", "error": "Session timed out due to inactivity" } +``` + +## 会话生命周期 + +1. **打开**——浏览器连接,Server 分配 `session_id` 并指令 Agent 创建 PTY +2. **活跃**——输入由浏览器流向 Agent,输出由 Agent 流回浏览器 +3. **关闭**——由以下任一情况触发: + - 用户关闭终端面板 + - 浏览器 WebSocket 断开 + - 空闲超时(10 分钟) + - Agent 断开连接 + - Server 向 Agent 发送 `TerminalClose` + +关闭时,Server 向 Agent 发送 `TerminalClose` 消息,Agent 终止 PTY 进程,会话从 Agent 管理器中注销。 + ## 安全说明 ### PTY 运行用户 -Agent 创建的 PTY 进程以 Agent 运行时的系统用户身份执行。安装脚本默认将 Agent 配置为以 root 身份运行的 systemd 服务,因此终端也将以 root 权限运行。 - -如果你有安全方面的顾虑,可以: +PTY 进程以 Agent 运行时的系统用户身份执行。安装脚本默认将 Agent 配置为以 root 身份运行的 systemd 服务,因此终端也以 root 权限运行。 -- 创建专用的系统用户运行 Agent,限制其权限 -- 使用 systemd 的 `User=` 指令指定运行用户 +如有安全顾虑,可以用 systemd 的 `User=` 指令让 Agent 以专用的低权限用户运行: ```ini title="serverbee-agent.service" [Service] @@ -93,21 +150,30 @@ ExecStart=/usr/local/bin/serverbee-agent ``` -使用非 root 用户运行 Agent 时,部分系统指标的采集可能受限(如 TCP/UDP 连接数),且终端操作也会受到对应用户权限的约束。同时 ICMP Ping 探测需要 `CAP_NET_RAW` 权限。 +以非 root 用户运行 Agent 时,终端操作会受到该用户权限的约束,部分系统指标采集也可能受限(如 TCP/UDP 连接数)。此外,ICMP Ping 探测需要 `CAP_NET_RAW` 权限。 ### 审计日志 -终端连接和断开事件会记录到审计日志中。管理员可以在 Settings → Audit Logs 页面查看所有终端会话的访问记录,包括: - -- 操作者用户名 -- 目标服务器 -- 连接时间和断开时间 -- 来源 IP 地址 +终端的连接和断开事件会写入审计日志。管理员可以在 Settings → Audit Logs 页面查看所有终端会话的访问记录,包括操作者用户名、目标服务器、连接与断开时间,以及来源 IP 地址。 ### 连接加密 -在生产环境中,务必通过 HTTPS 反向代理来保护终端数据的传输安全。未加密的 WebSocket 连接可能导致终端输入输出被窃听。 +生产环境中务必通过 HTTPS 反向代理保护终端数据传输。未加密的 WebSocket 连接可能导致终端输入输出被窃听。 + +## 故障排查 + +### "Agent is offline" + +终端要求目标服务器的 Agent 处于在线状态。Agent 离线时,WebSocket 升级请求会被拒绝并提示错误。 + +### 连接频繁断开 + +在反向代理后部署时,请确保正确转发 WebSocket 连接,并将读写超时设为至少 86400 秒(24 小时)。nginx 配置示例见 [Server 配置](/zh/docs/server) 指南。 + +### 终端卡顿或延迟高 + +终端数据经由中心 Server 中转。若 Server 与浏览器或 Agent 地理距离较远,可能出现延迟,这是中转架构的固有特性。对延迟敏感的操作建议改用直连 SSH。 From 832a3e33591a48190d77a7b4253a9b99d6f8d456 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:19 +0800 Subject: [PATCH 14/21] docs(monitoring): document retention tiers and server management --- apps/docs/content/docs/en/monitoring.mdx | 177 +++++++++++---- apps/docs/content/docs/zh/monitoring.mdx | 277 ++++++++++++++--------- 2 files changed, 310 insertions(+), 144 deletions(-) diff --git a/apps/docs/content/docs/en/monitoring.mdx b/apps/docs/content/docs/en/monitoring.mdx index dbe2b407..3e579f44 100644 --- a/apps/docs/content/docs/en/monitoring.mdx +++ b/apps/docs/content/docs/en/monitoring.mdx @@ -1,15 +1,16 @@ --- title: Monitoring -description: Real-time system monitoring, dashboards, and historical data. +description: Real-time system monitoring, dashboards, historical data, and data retention. icon: Activity --- -ServerBee provides real-time monitoring of all your connected servers through a unified web dashboard. Metrics are streamed over WebSocket for instant updates without polling. +ServerBee monitors all your connected servers in real time through a unified web dashboard. Metrics are streamed over WebSocket for instant updates without polling, with historical data available for querying and trend analysis. ## Dashboard Overview The main dashboard shows all registered servers with their current status at a glance: +- **Top summary cards** -- Online / Offline / Total counts, average CPU usage, average memory usage, and total bandwidth - **Online/Offline status** with color indicators - **Ring grid** of four donut charts per card: CPU, Memory, Disk, and monthly **Traffic quota** utilization. When no quota is configured the traffic ring falls back to cumulative bytes transferred, and when a billing cycle is active a "days remaining" hint is shown under the footer - **Disk I/O throughput** (read / write bytes per second, aggregated across devices) streamed live alongside network speed @@ -17,7 +18,7 @@ The main dashboard shows all registered servers with their current status at a g - **Load trend** (`load5 · load15` next to `load1`) and **uptime / swap / process / TCP / UDP** summary row - **Region and country** flags (when GeoIP is enabled) -Servers are organized by groups and sorted by weight. You can filter, search, and batch-operate on servers from this view. +All data is driven over WebSocket, so the view refreshes in real time without a manual reload. Servers are organized by groups and sorted by weight. You can filter, search, and batch-operate on servers from this view. For custom operations views, use [Dashboards & Widgets](/en/docs/dashboards) to create additional dashboard layouts with charts, maps, service status widgets, Markdown notes, and uptime timelines. @@ -27,17 +28,20 @@ Region/country labels and the Server Map widget require GeoIP data. You can conf ## Real-Time Updates -The browser connects to the server via WebSocket at `/api/ws/servers`. The communication flow works as follows: +The browser connects to the server via WebSocket at `/api/ws/servers`, then receives: -1. On initial connection, the server sends a `FullSync` message containing the current state of all servers -2. As agents report new metrics, the server broadcasts `Update` messages to all connected browsers -3. When an agent connects or disconnects, `ServerOnline` / `ServerOffline` events are sent +| Message | When | Description | +|---------|------|-------------| +| `FullSync` | On connection | Current state of all servers | +| `Update` | When an agent reports | Broadcast of changed server status | +| `ServerOnline` | When an agent connects | A server has come online | +| `ServerOffline` | When an agent disconnects | A server has gone offline | -This means the dashboard updates in real time -- there is no need to refresh the page or wait for polling intervals. +The dashboard therefore updates in real time -- no page refresh or polling interval required. ## Metric Types -The agent collects the following metrics at a configurable interval (default: every 3 seconds): +The agent collects the following metrics at a configurable interval (default: every 3 seconds, set via `collector.interval`; the server can dynamically adjust it through the `Welcome` message): ### System Resources @@ -143,30 +147,46 @@ The dashboard charts automatically switch between raw and hourly data depending ### GPU Records -GPU metrics are stored separately in a dedicated table with per-device granularity. Each record includes the device index, name, memory, utilization, and temperature. These are retained for **7 days** by default. +GPU metrics are stored separately in a dedicated table with per-device granularity. Each record includes the device index, name, memory, utilization, and temperature. These are retained for **7 days** by default (configurable via `retention.gpu_records_days`). -## Server Groups +### Query Strategy -Organize your servers into logical groups for easier management: +Historical metrics are queried via the REST API: -- Create groups with custom names and sort weights -- Assign servers to groups -- Groups appear as sections in the dashboard -- Sort weight controls the display order (lower weight = higher position) +``` +GET /api/servers/:id/records?from=2026-03-13T00:00:00Z&to=2026-03-14T00:00:00Z&interval=auto +``` -Groups can represent environments (production, staging), regions (US-East, EU-West), providers (AWS, Hetzner), or any other organizational structure that makes sense for your setup. +The `interval` parameter controls data granularity: + +| Value | Description | +|-------|-------------| +| `raw` | Returns raw minute-level records | +| `hourly` | Returns hourly aggregated records | +| `auto` | Chooses automatically: `raw` for ranges within 24 hours, `hourly` beyond 24 hours | + +### Storage Tiers + +``` +Agent report (every 3s) + --> In-memory cache: keeps only the latest sample per server (for real-time push) + --> Every 1 minute: written to the records table (minute-level records) + --> Every 1 hour: aggregated into the records_hourly table (averages) +``` ## Server Details Each server has a detail page showing: -- Real-time streaming charts (default mode) -- System information (hardware, OS, network) -- Historical trend charts with time range selection -- Disk I/O charts with merged and per-disk views (all platforms, historical mode) -- 90-day uptime timeline with daily availability breakdown -- Server metadata (group, tags, remarks, pricing) -- Actions (terminal access, edit, delete) +- **Basic info** -- OS, CPU model, total memory, IP addresses, region, agent version +- **Real-time streaming charts** (default mode) -- live data streamed from WebSocket updates +- **Historical trend charts** -- switch to 1h / 6h / 24h / 7d / 30d to query records from the database +- **Disk I/O charts** -- merged and per-disk views (all platforms, historical mode) +- **90-day uptime timeline** -- daily availability breakdown as a colored bar chart +- **GPU panel** -- per-GPU utilization and VRAM charts, if the server reports GPU data (historical mode only) +- **Traffic statistics** -- cumulative network traffic summary +- **Server metadata** -- group, tags, remarks, pricing +- **Actions** -- terminal access, edit, delete ### Real-Time Charts @@ -201,28 +221,74 @@ The server detail page includes an uptime card with a 90-day timeline. Each day Uptime data is queried via `GET /api/servers/{server_id}/uptime-daily?days=90`. The endpoint returns a `UptimeDailyEntry` per day with `date`, `online_minutes`, `total_minutes`, and `uptime_percent` fields. Missing dates are gap-filled with zero values. -## Network Quality Views +## Data Retention -The `/network` overview and `/network/{server_id}` detail pages summarize configured probe targets for each server. +ServerBee automatically cleans up expired data via a background task that runs once per hour: -- Newly assigned targets appear immediately, even before the first probe result is written -- Targets without probe data render a no-data state instead of disappearing from the summary -- The overview search box follows the active UI language +| Data Type | Default Retention | Config Key | +|-----------|------------------|------------| +| Minute-level metrics (records) | 7 days | `retention.records_days` | +| Hourly metrics (records_hourly) | 90 days | `retention.records_hourly_days` | +| GPU metrics (gpu_records) | 7 days | `retention.gpu_records_days` | +| Ping probe records | 7 days | `retention.ping_records_days` | +| Traffic hourly (traffic_hourly) | 7 days | `retention.traffic_hourly_days` | +| Traffic daily (traffic_daily) | 400 days | `retention.traffic_daily_days` | +| Task results (task_results) | 7 days | `retention.task_results_days` | +| Audit logs | 180 days | `retention.audit_logs_days` | + + +The hourly task runs in a fixed order: hourly aggregation first, then cleanup. This guarantees that minute-level data is rolled up into hourly summaries before it is deleted, so long-term trend data is never lost. + + +Example of overriding the retention policy: + +```toml title="server.toml" +[retention] +records_days = 14 # keep minute-level data for 14 days +records_hourly_days = 365 # keep hourly data for 1 year +``` -### Time Range Selector +### Disk Space Estimate -The time range bar offers these options: +Estimated disk usage for 1000 agents: -| Mode | Data Source | Description | -|------|-------------|-------------| -| **Real-time** | WebSocket ring buffer | Live streaming data (default) | -| **1h** | REST API (raw records) | Last 1 hour from database | -| **6h** | REST API (raw records) | Last 6 hours | -| **24h** | REST API (raw records) | Last 24 hours | -| **7d** | REST API (hourly records) | Last 7 days (aggregated) | -| **30d** | REST API (hourly records) | Last 30 days (aggregated) | +| Data Type | Estimate | +|-----------|----------| +| Minute-level records (7 days) | ~5 GB | +| Hourly records (90 days) | ~3 GB | +| Total (30 days) | ~8 GB | -When switching from Real-time to a historical view, the REST API queries are enabled and chart data loads from the database. When switching back to Real-time, the accumulated buffer data is displayed immediately (data continues to accumulate in the background even while viewing historical data). +Actual usage depends on the number of servers and the configured retention periods. + +## Server Groups and Management + +### Server Groups + +Organize your servers into logical groups by purpose, region, or any other dimension: + +- Create groups with custom names and sort weights +- Assign servers to different groups +- The dashboard lists servers grouped by section +- Sort weight controls the display order (lower weight = higher position) +- Ungrouped servers appear in the default section + +Groups can represent environments (production, staging), regions (US-East, EU-West), providers (AWS, Hetzner), or any other organizational structure that makes sense for your setup. + +### Server Tags + +In addition to groups, you can attach multiple tags to a server for more flexible classification and filtering. Tags are a many-to-many relationship -- a single server can carry multiple tags. + +### Server Management Actions + +| Action | Description | +|--------|-------------| +| Edit name and remarks | Change the server's display name and remarks | +| Set group | Assign the server to a specific group | +| Manage tags | Add or remove server tags | +| Adjust sorting | Control display order via the weight value | +| Hide server | Hide a specific server on the dashboard | +| Delete server | Remove the server and all of its historical data | +| Batch operations | Delete multiple servers at once | ## Data Flow @@ -247,9 +313,32 @@ Agent Server Browser The agent reports every 3 seconds. The server caches the latest report in memory and immediately broadcasts it to connected browsers. Every 60 seconds, all cached reports are batch-written to SQLite. Every hour, raw records are aggregated into hourly summaries, and expired data is cleaned up based on retention settings. +## Network Quality Views + +The `/network` overview and `/network/{server_id}` detail pages summarize configured probe targets for each server. + +- Newly assigned targets appear immediately, even before the first probe result is written +- Targets without probe data render a no-data state instead of disappearing from the summary +- The overview search box follows the active UI language + +### Time Range Selector + +The time range bar offers these options: + +| Mode | Data Source | Description | +|------|-------------|-------------| +| **Real-time** | WebSocket ring buffer | Live streaming data (default) | +| **1h** | REST API (raw records) | Last 1 hour from database | +| **6h** | REST API (raw records) | Last 6 hours | +| **24h** | REST API (raw records) | Last 24 hours | +| **7d** | REST API (hourly records) | Last 7 days (aggregated) | +| **30d** | REST API (hourly records) | Last 30 days (aggregated) | + +When switching from Real-time to a historical view, the REST API queries are enabled and chart data loads from the database. When switching back to Real-time, the accumulated buffer data is displayed immediately (data continues to accumulate in the background even while viewing historical data). + ## Traffic Statistics -ServerBee tracks network traffic at hourly and daily granularity, enabling billing cycle-aware usage monitoring with prediction capabilities. +ServerBee tracks network traffic at hourly and daily granularity, with billing cycle-aware usage and end-of-cycle prediction. ### How It Works @@ -439,3 +528,9 @@ Two alert rule types are available for network quality: - `network_latency` -- Triggers when average latency exceeds a threshold - `network_packet_loss` -- Triggers when packet loss exceeds a threshold + + + + + + diff --git a/apps/docs/content/docs/zh/monitoring.mdx b/apps/docs/content/docs/zh/monitoring.mdx index 8d4471db..3acbd75f 100644 --- a/apps/docs/content/docs/zh/monitoring.mdx +++ b/apps/docs/content/docs/zh/monitoring.mdx @@ -4,19 +4,21 @@ description: 了解 ServerBee 的实时监控、指标类型、历史数据和 icon: Activity --- -ServerBee 提供全面的服务器监控能力,通过 WebSocket 实时推送指标数据,并支持历史数据查询和趋势分析。 +ServerBee 通过统一的 Web 面板实时监控所有已接入的服务器。指标经 WebSocket 流式推送,无需轮询即可即时更新,同时支持历史数据查询和趋势分析。 ## 仪表盘概览 -登录管理面板后,仪表盘页面会展示所有受监控服务器的实时状态: +登录管理面板后,仪表盘页面会一目了然地展示所有已注册服务器的实时状态: -- **顶部统计卡片**:在线/离线/总数、CPU 平均使用率、内存平均使用率、总带宽 -- **服务器卡片网格**:按分组排列,每张卡片包含: - - 四个环形图表(CPU、内存、磁盘、月度**流量配额**)。未配置配额时流量环退化为显示累计字节;若设置了计费周期,底部还会显示剩余天数提示 - - 磁盘 I/O 实时吞吐(按设备聚合后的读/写速率)与网络上下行速率并列显示 - - 负载趋势(`load5 · load15`)和 uptime / swap / 进程数 / TCP / UDP 汇总行 - - 服务器名称、地区、国旗、操作系统及网络质量迷你图 -- **实时刷新**:所有数据通过 WebSocket 驱动,无需手动刷新 +- **顶部统计卡片** -- 在线 / 离线 / 总数、CPU 平均使用率、内存平均使用率、总带宽 +- **在线/离线状态** -- 带颜色指示器 +- **环形图表网格** -- 每张卡片包含四个环形图(CPU、内存、磁盘、月度**流量配额**使用率)。未配置配额时流量环退化为显示累计传输字节;若设置了计费周期,底部还会显示剩余天数提示 +- **磁盘 I/O 吞吐** -- 按设备聚合后的读/写速率,与网络速率并列实时显示 +- **网络吞吐** -- 上行 / 下行速率 +- **负载趋势** -- `load5 · load15` 紧邻 `load1`,以及 uptime / swap / 进程数 / TCP / UDP 汇总行 +- **地区和国旗** -- 启用 GeoIP 时显示 + +所有数据通过 WebSocket 驱动,无需手动刷新即可实时更新。服务器按分组组织并依据权重排序。你可以在此视图中筛选、搜索和批量操作服务器。 如需创建面向不同场景的运维视图,可以使用 [仪表盘与组件](/zh/docs/dashboards) 创建额外仪表盘布局,包含图表、地图、服务状态、Markdown 说明和可用性时间线等组件。 @@ -26,19 +28,7 @@ ServerBee 提供全面的服务器监控能力,通过 WebSocket 实时推送 ## 实时指标推送 -ServerBee 通过 WebSocket 实现端到端的实时数据传输: - -``` -Agent (每3秒采集) --WebSocket--> Server (内存缓存) --WebSocket--> 浏览器 -``` - -### 数据流详解 - -1. **Agent 采集**:每 3 秒采集一次系统指标,通过 WebSocket 发送 `Report` 消息 -2. **Server 缓存**:AgentManager 将每台服务器的最新指标保存在内存中 -3. **浏览器推送**:Server 通过 `broadcast` 通道将更新推送给所有已连接的浏览器 - -浏览器端通过 `/api/ws/servers` 连接 WebSocket 后的消息流程: +浏览器端通过 `/api/ws/servers` 连接 WebSocket 后,会收到以下消息: | 消息类型 | 时机 | 说明 | |----------|------|------| @@ -47,35 +37,52 @@ Agent (每3秒采集) --WebSocket--> Server (内存缓存) --WebSocket--> 浏览 | `ServerOnline` | Agent 上线时 | 通知某台服务器上线 | | `ServerOffline` | Agent 离线时 | 通知某台服务器离线 | +因此仪表盘会实时更新,无需刷新页面或设置轮询间隔。 + ## 指标类型 -### 基础指标 +Agent 按可配置的间隔采集以下指标(默认每 3 秒,通过 `collector.interval` 设置;Server 可经 `Welcome` 消息动态调整): + +### 系统资源 | 指标 | 单位 | 说明 | |------|------|------| -| CPU 使用率 | % | 所有核心的平均使用率 | +| CPU 使用率 | % | 所有核心的整体使用率(0-100) | | 内存使用量 | bytes | 已使用的物理内存 | | Swap 使用量 | bytes | 已使用的交换空间 | | 磁盘使用量 | bytes | 所有磁盘的总使用量 | -| 负载 (load1/5/15) | - | 1/5/15 分钟系统负载平均值 | -| 进程数 | 个 | 当前运行的进程数量 | ### 网络指标 | 指标 | 单位 | 说明 | |------|------|------| -| 入站速率 | bytes/s | 当前网络接收速率 | -| 出站速率 | bytes/s | 当前网络发送速率 | -| 入站累计流量 | bytes | 累计接收的总字节数 | -| 出站累计流量 | bytes | 累计发送的总字节数 | -| TCP 连接数 | 个 | 当前 TCP 连接数量 | -| UDP 连接数 | 个 | 当前 UDP 连接数量 | +| 入站速率 | bytes/s | 当前网络下载速率 | +| 出站速率 | bytes/s | 当前网络上传速率 | +| 入站累计流量 | bytes | Agent 启动以来累计接收的总字节数 | +| 出站累计流量 | bytes | Agent 启动以来累计发送的总字节数 | -### 温度指标 +### 系统负载 -| 指标 | 单位 | 说明 | -|------|------|------| -| 传感器温度 | C | 来自系统传感器的温度读数 | +| 指标 | 说明 | +|------|------| +| 1 分钟负载 | 最近 1 分钟的系统负载平均值 | +| 5 分钟负载 | 最近 5 分钟的系统负载平均值 | +| 15 分钟负载 | 最近 15 分钟的系统负载平均值 | + +### 连接和进程 + +| 指标 | 说明 | +|------|------| +| TCP 连接数 | 当前活跃的 TCP 连接数量 | +| UDP 连接数 | 当前活跃的 UDP 连接数量 | +| 进程数 | 当前运行的进程总数 | + +### 环境指标 + +| 指标 | 说明 | +|------|------| +| 温度 | CPU 温度(摄氏度,可选) | +| 运行时长 | 系统运行时长(秒) | ### 磁盘 I/O 指标 @@ -94,85 +101,53 @@ Agent 启动后的首次采样建立基线并上报空列表,后续采样基 磁盘 I/O 数据以 JSON 列(`disk_io_json`)存储在 `records` 和 `records_hourly` 表中。小时聚合器按设备计算平均读写速率。 -### GPU 指标 - -启用 GPU 监控后,每块 GPU 独立记录以下指标: +### GPU 指标(可选) -| 指标 | 单位 | 说明 | -|------|------|------| -| GPU 利用率 | % | 计算核心使用率 | -| 显存总量 | bytes | GPU 显存总大小 | -| 显存使用量 | bytes | 已使用的显存 | -| GPU 温度 | C | GPU 核心温度 | -| 设备名称 | - | GPU 型号名称 | - -## 历史数据和图表 - -### 服务器详情页 - -点击进入某台服务器的详情页面,可以查看以下内容: - -- **基本信息**:操作系统、CPU 型号、内存总量、IP 地址、地区、Agent 版本 -- **实时流式图表**:默认模式,展示 WebSocket 推送的实时指标 -- **历史图表**:切换到 1h / 6h / 24h / 7d / 30d 查看数据库中的历史记录 -- **磁盘 I/O 图表**:合并和分盘两种视图,展示每块磁盘的读写吞吐量(全平台、历史模式) -- **90 天可用性时间线**:按天展示可用性状况的彩色条形图 -- **GPU 面板**:如果该服务器上报了 GPU 数据,显示各 GPU 的利用率和显存图表(仅历史模式) -- **流量统计**:网络累计流量的统计信息 - -### 实时图表模式 - -服务器详情页默认为**实时模式**,图表展示 WebSocket 推送的实时数据流: +启用 GPU 监控(`enable_gpu = true`)后,每块 GPU 独立采集以下指标: -- **数据来源**:通过 TanStack Query 缓存订阅 `BrowserMessage::Update` 事件,自动累积数据点 -- **更新频率**:约 3 秒一次(与 Agent 上报间隔一致) -- **缓冲区大小**:10 分钟环形缓冲区(约 200 个数据点),超出自动裁剪 -- **去重机制**:基于服务端 `last_active` 时间戳过滤重复事件 -- **可用图表**:CPU、内存、磁盘、网络入/出、负载(1 分钟) -- **时间轴格式**:第一个刻度显示 `HH:mm:ss`,后续刻度显示 `mm:ss` - -温度、GPU 和磁盘 I/O 图表在实时模式下不可用,因为 WebSocket 推送的 `ServerStatus` 消息中不包含这些字段。切换到历史视图即可查看。 - -### 磁盘 I/O 图表 +| 指标 | 说明 | +|------|------| +| 设备名称 | GPU 型号名称 | +| GPU 利用率 | GPU 计算核心使用率百分比 | +| 显存使用量 | 已使用的显存(bytes) | +| 显存总量 | 显存总大小(bytes) | +| GPU 温度 | 设备温度(摄氏度) | -当存在历史磁盘 I/O 数据时,服务器详情页会显示磁盘 I/O 图表,支持两种视图: +## 服务器信息 -- **合并视图 (Merged)** -- 所有物理磁盘的读写吞吐量汇总 -- **分盘视图 (Per Disk)** -- 每块物理磁盘的独立图表(如 `sda`、`nvme0n1`) +除了周期性指标外,每个 Agent 在首次连接时还会上报静态系统信息: -两种视图均以面积图展示读取速率(蓝色)和写入速率(绿色)。缺失的数据点会自动补零,保持时间轴连续。 +- CPU 名称、核心数和架构 +- 操作系统和内核版本 +- 内存、Swap 和磁盘总容量 +- IPv4 和 IPv6 地址 +- 虚拟化类型(KVM、Xen、Docker 等) +- Agent 版本 -### 可用性时间线 +这些信息显示在服务器详情页并存储到数据库中。 -服务器详情页包含一个可用性卡片,展示 90 天的可用性时间线。每天显示为一根彩色条: +## 历史数据和图表 -- **绿色** -- 100% 可用 -- **黄色** -- 低于黄色阈值(可用性下降) -- **红色** -- 低于红色阈值(严重故障) -- **灰色** -- 无数据 +ServerBee 以两种粒度存储指标记录: -可用性数据通过 `GET /api/servers/{server_id}/uptime-daily?days=90` 查询。端点返回每天一条 `UptimeDailyEntry`,包含 `date`、`online_minutes`、`total_minutes` 和 `uptime_percent` 字段。缺失日期会自动补零。 +### 分钟级原始记录 -## 网络质量视图 +- 由 RecordWriter 后台任务每 60 秒写入一次 +- 默认保留 **7 天**(可通过 `retention.records_days` 配置) +- 每条记录捕获某一时间点的全部指标值 -`/network` 总览页和 `/network/{server_id}` 详情页会汇总每台服务器已配置的探测目标。 +### 小时级聚合记录 -- 新分配的目标会立即显示,即使首条探测结果尚未写入 -- 尚无探测数据的目标会显示为空状态,而不是从汇总中消失 -- 总览页搜索框会跟随当前界面语言显示占位文案 +- 由 Aggregator 后台任务计算 +- 取每小时内所有原始记录的平均值 +- 默认保留 **90 天**(可通过 `retention.records_hourly_days` 配置) +- 用于长期趋势可视化 -### 时间范围选择器 +仪表盘图表会根据所选时间范围自动在原始记录和小时记录之间切换。 -| 模式 | 数据来源 | 说明 | -|------|----------|------| -| **Real-time** | WebSocket 环形缓冲区 | 实时流式数据(默认) | -| **1h** | REST API(分钟级记录) | 最近 1 小时的数据库记录 | -| **6h** | REST API(分钟级记录) | 最近 6 小时 | -| **24h** | REST API(分钟级记录) | 最近 24 小时 | -| **7d** | REST API(小时级记录) | 最近 7 天(聚合数据) | -| **30d** | REST API(小时级记录) | 最近 30 天(聚合数据) | +### GPU 记录 -从实时模式切换到历史视图时,系统自动启用 REST API 查询从数据库加载数据。切换回实时模式时,立即显示已累积的缓冲区数据(即使在查看历史数据期间,实时数据也会在后台持续累积)。 +GPU 指标按设备粒度单独存储在专用表中。每条记录包含设备索引、名称、显存、利用率和温度,默认保留 **7 天**(可通过 `retention.gpu_records_days` 配置)。 ### 数据查询策略 @@ -199,6 +174,53 @@ Agent 上报 (每 3 秒) --> 每 1 小时: 聚合写入 records_hourly 表(平均值) ``` +## 服务器详情 + +每台服务器都有一个详情页,展示以下内容: + +- **基本信息** -- 操作系统、CPU 型号、内存总量、IP 地址、地区、Agent 版本 +- **实时流式图表**(默认模式) -- 展示 WebSocket 推送的实时指标 +- **历史趋势图表** -- 切换到 1h / 6h / 24h / 7d / 30d 查询数据库中的历史记录 +- **磁盘 I/O 图表** -- 合并和分盘两种视图(全平台、历史模式) +- **90 天可用性时间线** -- 按天展示可用性状况的彩色条形图 +- **GPU 面板** -- 如果该服务器上报了 GPU 数据,显示各 GPU 的利用率和显存图表(仅历史模式) +- **流量统计** -- 网络累计流量的统计信息 +- **服务器元数据** -- 分组、标签、备注、定价 +- **操作** -- 终端访问、编辑、删除 + +### 实时图表模式 + +服务器详情页默认为**实时模式**,图表展示 WebSocket 推送的实时数据流: + +- **数据来源**:通过 `['servers']` TanStack Query 缓存订阅 `BrowserMessage::Update` 事件,自动累积数据点 +- **更新频率**:约 3 秒一次(与 Agent 上报间隔一致) +- **缓冲区大小**:10 分钟环形缓冲区(约 200 个数据点),超出自动裁剪 +- **去重机制**:基于服务端 `last_active` 时间戳过滤重复事件 +- **可用图表**:CPU、内存、磁盘、网络入/出、负载(1 分钟) +- **时间轴格式**:第一个刻度显示 `HH:mm:ss`,后续刻度显示 `mm:ss` + +温度、GPU 和磁盘 I/O 图表在实时模式下不可用,因为 WebSocket 推送的 `ServerStatus` 消息中不包含这些字段。切换到历史视图即可查看温度、GPU 和磁盘 I/O 数据。 + +### 磁盘 I/O 图表 + +当存在历史磁盘 I/O 数据时,服务器详情页会显示磁盘 I/O 图表,支持两种视图: + +- **合并视图 (Merged)** -- 所有物理磁盘的读写吞吐量汇总 +- **分盘视图 (Per Disk)** -- 每块物理磁盘的独立图表(如 `sda`、`nvme0n1`) + +两种视图均以面积图展示读取速率(蓝色)和写入速率(绿色)。缺失的数据点会自动补零,保持时间轴连续。 + +### 可用性时间线 + +服务器详情页包含一个可用性卡片,展示 90 天的可用性时间线。每天显示为一根彩色条: + +- **绿色** -- 100% 可用 +- **黄色** -- 低于黄色阈值(可用性下降) +- **红色** -- 低于红色阈值(严重故障) +- **灰色** -- 无数据 + +可用性数据通过 `GET /api/servers/{server_id}/uptime-daily?days=90` 查询。端点返回每天一条 `UptimeDailyEntry`,包含 `date`、`online_minutes`、`total_minutes` 和 `uptime_percent` 字段。缺失日期会自动补零。 + ## 数据保留策略 ServerBee 自动清理过期数据,由后台任务每小时执行一次: @@ -215,7 +237,7 @@ ServerBee 自动清理过期数据,由后台任务每小时执行一次: | 审计日志 | 180 天 | `retention.audit_logs_days` | -清理任务按固定顺序执行:先进行小时聚合,再执行数据清理。这保证了分钟级数据在被清理前已经完成了小时级聚合,不会丢失长期趋势数据。 +每小时任务按固定顺序执行:先进行小时聚合,再执行数据清理。这保证了分钟级数据在被清理前已经完成了小时级聚合,不会丢失长期趋势数据。 修改保留策略示例: @@ -247,8 +269,11 @@ records_hourly_days = 365 # 小时级数据保留 1 年 - 创建分组并设置排序权重 - 将服务器分配到不同分组 - 仪表盘按分组展示服务器列表 +- 排序权重控制显示顺序(权重越小位置越靠前) - 未分组的服务器显示在默认区域 +分组可以表示环境(生产、预发)、地区(US-East、EU-West)、服务商(AWS、Hetzner)或任何适合你需求的组织结构。 + ### 服务器标签 除了分组外,还可以为服务器添加多个标签(Tag),用于更灵活的分类和筛选。标签是多对多关系,一台服务器可以有多个标签。 @@ -265,9 +290,55 @@ records_hourly_days = 365 # 小时级数据保留 1 年 | 删除服务器 | 移除服务器及其所有历史数据 | | 批量操作 | 支持批量删除多台服务器 | +## 数据流 + +``` +Agent Server 浏览器 + | | | + |-- Report (3s) ------>| | + | |-- 缓存到 AgentManager | + | | | + | |-- RecordWriter (60s) -->| + | | 写入 SQLite | + | | | + | |-- Update (广播) ------->| + | | 实时 UI + | | | + | |-- Aggregator (每小时) ->| + | | 小时级平均值 | + | | | + | |-- Cleanup (每小时) ---->| + | | 删除过期记录 | +``` + +Agent 每 3 秒上报一次。Server 将最新一份上报缓存在内存中,并立即广播给已连接的浏览器。每 60 秒,所有缓存的上报会批量写入 SQLite。每小时,原始记录会聚合为小时级汇总,并根据保留策略清理过期数据。 + +## 网络质量视图 + +`/network` 总览页和 `/network/{server_id}` 详情页会汇总每台服务器已配置的探测目标。 + +- 新分配的目标会立即显示,即使首条探测结果尚未写入 +- 尚无探测数据的目标会显示为空状态,而不是从汇总中消失 +- 总览页搜索框会跟随当前界面语言显示占位文案 + +### 时间范围选择器 + +时间范围栏提供以下选项: + +| 模式 | 数据来源 | 说明 | +|------|----------|------| +| **Real-time** | WebSocket 环形缓冲区 | 实时流式数据(默认) | +| **1h** | REST API(原始记录) | 最近 1 小时的数据库记录 | +| **6h** | REST API(原始记录) | 最近 6 小时 | +| **24h** | REST API(原始记录) | 最近 24 小时 | +| **7d** | REST API(小时记录) | 最近 7 天(聚合数据) | +| **30d** | REST API(小时记录) | 最近 30 天(聚合数据) | + +从实时模式切换到历史视图时,系统自动启用 REST API 查询从数据库加载数据。切换回实时模式时,立即显示已累积的缓冲区数据(即使在查看历史数据期间,实时数据也会在后台持续累积)。 + ## 流量统计 -ServerBee 以小时和天为粒度跟踪网络流量,支持按计费周期查询用量并提供预测能力。 +ServerBee 以小时和天为粒度跟踪网络流量,支持按计费周期查询用量并提供周期末预测能力。 ### 工作原理 @@ -417,7 +488,7 @@ ServerBee 内置了网络质量监控系统,通过各 Agent 对网络目标发 - ECMP 多路径时同一 TTL 会有多个 IP,第一行显示主 IP,悬停 `+N` chip 查看其他 IP。 - 已完成的 Traceroute 自动保存到本地 SQLite,可以在 dialog 历史区点击切换查看; 管理员可以删除单条或一键清空。 -- POST 触发需要管理员权限;只读用户能看到历史,但不能发起新 trace 或删除。 +- 触发 trace 需要管理员权限;只读用户能浏览历史,但不能发起新 trace 或删除记录。 #### 权限 From 85d89c758729a0d72507b2eb8b3a8a2d7317aafb Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:24 +0800 Subject: [PATCH 15/21] docs(configuration): restructure config reference into tables --- apps/docs/content/docs/en/configuration.mdx | 22 +- apps/docs/content/docs/zh/configuration.mdx | 747 +++++++++----------- 2 files changed, 327 insertions(+), 442 deletions(-) diff --git a/apps/docs/content/docs/en/configuration.mdx b/apps/docs/content/docs/en/configuration.mdx index cdf2d3c1..74d669cc 100644 --- a/apps/docs/content/docs/en/configuration.mdx +++ b/apps/docs/content/docs/en/configuration.mdx @@ -4,11 +4,11 @@ description: Complete reference for all ServerBee server and agent configuration icon: Settings --- -ServerBee uses [Figment](https://github.com/SergioBenitez/Figment) for configuration loading, which supports layered configuration from multiple sources. This page provides a complete reference for every configuration option. +ServerBee loads configuration with [Figment](https://github.com/SergioBenitez/Figment), which layers values from multiple sources. This page is a complete reference for every configuration option. ## Configuration Loading Priority -Configuration values are merged from multiple sources. Later sources override earlier ones, so environment variables always win: +Values are merged from the sources below. Later sources override earlier ones, so environment variables always win: 1. Built-in defaults 2. `/etc/serverbee/server.toml` or `/etc/serverbee/agent.toml` @@ -20,11 +20,11 @@ This lets you override any single value at runtime without editing the TOML file ## Environment Variable Mapping -Every TOML key maps directly to an environment variable: prefix with `SERVERBEE_`, uppercase the key, and replace each level of nesting with `__` (double underscore). For example, `auth.secure_cookie` becomes `SERVERBEE_AUTH__SECURE_COOKIE`. +Every TOML key maps directly to an environment variable: add the `SERVERBEE_` prefix, uppercase the key, and replace each level of nesting with `__` (double underscore). For example, `auth.secure_cookie` becomes `SERVERBEE_AUTH__SECURE_COOKIE`. ## Developer Workflow Env Vars -These variables are for local repo tooling and developer workflows. They are not Figment-backed runtime config for the ServerBee server or agent binaries. +These variables drive local repo tooling and developer workflows. They are not Figment-backed runtime config for the server or agent binaries. | Environment Variable | Used By | Description | |---------------------|---------|-------------| @@ -33,7 +33,7 @@ These variables are for local repo tooling and developer workflows. They are not | `SERVERBEE_PROD_READONLY_API_KEY` | `make web-dev-prod` | Member-scoped API key injected by the frontend dev proxy for live production browsing | | `ALLOW_WRITES` | `make web-dev-prod` | Local opt-in override. Set to `1` to disable the proxy's read-method-only block. When set, the UI banner changes from the normal read-only warning to a stronger write-enabled warning | -These variables are intentionally scoped to local tooling. `ALLOW_WRITES` is not a server feature flag, it is an explicit local override for the frontend prod-proxy workflow only. +These variables are intentionally scoped to local tooling. `ALLOW_WRITES` is not a server feature flag; it is an explicit local override for the frontend prod-proxy workflow only. ### Server Environment Variables @@ -152,7 +152,7 @@ Average-latency cutoffs used to classify network-probe records returned by `/api #### Internal -> The following variables have sensible defaults and rarely need modification. Only adjust when you have a specific requirement. +> These variables have sensible defaults and rarely need changing. Adjust them only when you have a specific requirement. | Environment Variable | Default | Description | |---------------------|---------|-------------| @@ -182,7 +182,7 @@ Agent top-level keys use single underscore. Nested keys use `__` (double undersc | Environment Variable | Default | Description | |---------------------|---------|-------------| | `SERVERBEE_COLLECTOR__INTERVAL` | `3` | Metric report interval in seconds | -| `SERVERBEE_COLLECTOR__ENABLE_GPU` | `false` | Enable NVIDIA GPU monitoring (requires nvml) | +| `SERVERBEE_COLLECTOR__ENABLE_GPU` | `false` | Enable NVIDIA GPU monitoring (requires the NVIDIA driver / NVML) | | `SERVERBEE_COLLECTOR__ENABLE_TEMPERATURE` | `true` | Enable CPU temperature monitoring | | `SERVERBEE_FILE__ENABLED` | `false` | Enable file management on this agent | | `SERVERBEE_FILE__ROOT_PATHS` | `[]` | Allowed root paths (comma-separated, e.g. `/home,/var/log`). Empty rejects all file operations | @@ -192,7 +192,7 @@ Agent top-level keys use single underscore. Nested keys use `__` (double undersc #### Internal -> The following variables have sensible defaults and rarely need modification. Only adjust when you have a specific requirement. +> These variables have sensible defaults and rarely need changing. Adjust them only when you have a specific requirement. | Environment Variable | Default | Description | |---------------------|---------|-------------| @@ -274,7 +274,7 @@ Tunes the agent-side security event detectors (SSH login / brute force, port sca | `ip_quality_event_days` | u32 | `90` | Days to keep IP quality status-change event records | -Raw metric records are collected every 60 seconds and retained for 7 days by default. The hourly aggregator computes averages so you can keep long-term trends for 90 days without excessive storage. Adjust these values based on your disk space and monitoring needs. +Raw metric records are written every 60 seconds and retained for 7 days by default. The hourly aggregator computes averages so you can keep 90 days of long-term trends without excessive storage. Tune these values to match your disk space and monitoring needs. ### `[scheduler]` -- Scheduler @@ -411,7 +411,7 @@ Default risk-scoring works out of the box via [ipapi.is](https://ipapi.is) (no A | Key | Type | Default | Description | |-----|------|---------|-------------| | `interval` | u32 | `3` | Collection interval in seconds | -| `enable_gpu` | bool | `false` | Enable NVIDIA GPU monitoring (requires `nvidia-smi`) | +| `enable_gpu` | bool | `false` | Enable NVIDIA GPU monitoring (requires the NVIDIA driver / NVML) | | `enable_temperature` | bool | `true` | Enable CPU temperature sensor monitoring | ### `[file]` -- File Management @@ -475,7 +475,7 @@ openssl x509 -in cert.pem -pubkey -noout \ data_dir = "./data" ``` -Everything else uses sensible defaults. On first startup, ServerBee creates the `admin` user with a random password and prints it once in the server logs. +Everything else uses sensible defaults. On first startup, ServerBee creates the `admin` user with a random password and prints it once to the server logs. ## Example: Production Server Configuration diff --git a/apps/docs/content/docs/zh/configuration.mdx b/apps/docs/content/docs/zh/configuration.mdx index 2145a781..34b95755 100644 --- a/apps/docs/content/docs/zh/configuration.mdx +++ b/apps/docs/content/docs/zh/configuration.mdx @@ -4,11 +4,11 @@ description: ServerBee Server 和 Agent 的完整配置参考文档。 icon: Settings --- -ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载配置,支持 TOML 配置文件和环境变量两种方式。 +ServerBee 使用 [Figment](https://github.com/SergioBenitez/Figment) 加载配置,按层级从多个来源合并配置值。本文是所有配置项的完整参考。 ## 配置加载优先级 -配置项从多个来源合并加载,后者覆盖前者,因此环境变量始终具有最高优先级: +配置值从以下来源合并,后者覆盖前者,因此环境变量始终具有最高优先级: 1. 内置默认值 2. `/etc/serverbee/server.toml` 或 `/etc/serverbee/agent.toml` @@ -24,23 +24,23 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 ## 开发工作流环境变量 -这些变量用于仓库本地工具和开发工作流,不属于 ServerBee Server / Agent 二进制通过 Figment 加载的运行时配置,也不会映射到 `server.toml` 或 `agent.toml`。 +这些变量用于仓库本地工具和开发工作流,不属于 ServerBee Server / Agent 二进制通过 Figment 加载的运行时配置。 | 环境变量 | 用途 | 说明 | |----------|------|------| | `SERVERBEE_PROD_URL` | `make db-pull`、`make web-dev-prod` | 生产环境基础 URL,供数据库拉取脚本和前端 prod-proxy 工作流共用 | | `SERVERBEE_PROD_API_KEY` | `make db-pull` | 生产备份 API 使用的管理员 API Key。不要把它复用到 `make web-dev-prod` | | `SERVERBEE_PROD_READONLY_API_KEY` | `make web-dev-prod` | 前端 dev proxy 注入的 member 角色 API Key,用于浏览生产实时数据 | -| `ALLOW_WRITES` | `make web-dev-prod` | 本地显式覆盖开关。设为 `1` 后,代理不再拦截非只读 HTTP 方法,UI 横幅也会从普通的只读提示切换为更强的可写警告 | +| `ALLOW_WRITES` | `make web-dev-prod` | 本地显式覆盖开关。设为 `1` 后,代理不再拦截非只读 HTTP 方法;设置后 UI 横幅也会从普通的只读提示切换为更强的可写警告 | -这些变量刻意只服务于本地工具链。`ALLOW_WRITES` 不是服务端功能开关,它只影响前端 `make web-dev-prod` 这条本地 prod-proxy 工作流。 +这些变量刻意只服务于本地工具链。`ALLOW_WRITES` 不是服务端功能开关,它只是前端 prod-proxy 工作流的一个显式本地覆盖。 ### Server 环境变量 #### 快速开始(Quick Start) -没有管理员用户名/密码环境变量。首次启动(数据库中没有任何用户)时,Server 会自动创建管理员账号,随机生成密码,并以醒目的凭据横幅在 Server/容器日志中打印一次。请从日志中获取该密码,首次登录时你将被要求修改它,并可在此时选择一个新的用户名。 +没有管理员用户名 / 密码环境变量。首次启动(数据库中没有任何用户)时,Server 会自动创建管理员账号,随机生成密码,并以醒目的凭据横幅在 Server / 容器日志中打印一次。请从日志中获取该密码,首次登录时你将被要求修改它,并可在此时选择一个新的用户名。 | 环境变量 | 默认值 | 说明 | @@ -52,17 +52,17 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 | 环境变量 | 默认值 | 说明 | |----------|--------|------| | `SERVERBEE_SERVER__DATA_DIR` | `./data` | 数据目录(存放数据库和备份) | -| `SERVERBEE_AUTH__MAX_SERVERS` | `0` | 通过注册码接入的最大服务器数(0 = 不限制),尽力软限制 | -| `SERVERBEE_SERVER__TRUSTED_PROXIES` | 私有/回环 CIDR | 受信任的反向代理 CIDR 列表,默认信任 RFC 1918 + 回环地址。设为 `[]` 禁用 | +| `SERVERBEE_AUTH__MAX_SERVERS` | `0` | 通过注册码接入的最大服务器数(0 = 不限制)。尽力软限制 | +| `SERVERBEE_SERVER__TRUSTED_PROXIES` | 私有 / 回环 CIDR | 受信任的反向代理 CIDR 列表,默认信任 RFC 1918 + 回环地址。设为 `[]` 禁用 | | `SERVERBEE_SCHEDULER__TIMEZONE` | `UTC` | 流量日聚合时区(如 `Asia/Shanghai`) | -| `SERVERBEE_LOG__LEVEL` | `info` | 日志级别:`trace`/`debug`/`info`/`warn`/`error` | -| `SERVERBEE_LOG__FILE` | `""` | 日志文件路径,留空输出到 stdout | +| `SERVERBEE_LOG__LEVEL` | `info` | 日志级别:`trace`、`debug`、`info`、`warn`、`error` | +| `SERVERBEE_LOG__FILE` | `""` | 日志文件路径,留空仅输出到 stdout | #### 仅本地开发 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_DEV__DEMO_DATA` | `false` | 重置并写入本地合成 demo 数据集。仅允许与 `SERVERBEE_DATABASE__PATH=dev-demo.db` 一起使用;会创建 `admin` / `admin123` 并启动内存中的 demo agents | +| `SERVERBEE_DEV__DEMO_DATA` | `false` | 重置并写入本地合成 demo 数据集。仅允许与 `SERVERBEE_DATABASE__PATH=dev-demo.db` 一起使用;会创建 `admin` / `admin123` 并启动内存中的 demo agents,用于本地开发 | #### OAuth(按需配置) @@ -83,14 +83,14 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_GEOIP__MMDB_PATH` | `""` | MaxMind 兼容 MMDB 文件路径。路径非空时使用该自定义 GeoIP 数据库;为空时管理员可在 Settings → GeoIP Database 下载 DB-IP Lite 数据库 | -| `SERVERBEE_ASN__MMDB_PATH` | `""` | DB-IP Lite ASN / MaxMind GeoLite2-ASN MMDB 文件路径。路径非空时启用路由追踪 ASN 富化;为空时管理员可在 Settings → ASN Database 下载 DB-IP Lite ASN 数据库 | +| `SERVERBEE_GEOIP__MMDB_PATH` | `""` | MaxMind 兼容 MMDB 文件路径。路径非空时启用该自定义 GeoIP 数据库;否则管理员可在「设置 → GeoIP Database」下载 DB-IP Lite 数据库 | +| `SERVERBEE_ASN__MMDB_PATH` | `""` | DB-IP Lite ASN / MaxMind GeoLite2-ASN MMDB 文件路径。路径非空时启用路由追踪 ASN 富化;否则管理员可在「设置 → ASN Database」下载 DB-IP Lite ASN | #### Resend(邮件通知) | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_RESEND__API_KEY` | `""` | Resend API Key,使用邮件通知通道时必填。各邮件通道的 `from` 发件地址必须属于你在 resend.com/domains 已验证的域名 | +| `SERVERBEE_RESEND__API_KEY` | `""` | Resend API Key,使用邮件通知时必填。发件域名必须在 resend.com/domains 验证 | #### 数据保留(可选调优) @@ -108,10 +108,10 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 | `SERVERBEE_RETENTION__TASK_RESULTS_DAYS` | `7` | 任务执行结果保留天数 | | `SERVERBEE_RETENTION__DOCKER_EVENTS_DAYS` | `7` | Docker 事件记录保留天数 | | `SERVERBEE_RETENTION__SERVICE_MONITOR_DAYS` | `30` | 服务监控记录保留天数 | -| `SERVERBEE_RETENTION__SECURITY_EVENT_DAYS` | `30` | 安全事件记录保留天数(SSH 登录/爆破、端口扫描) | +| `SERVERBEE_RETENTION__SECURITY_EVENT_DAYS` | `30` | 安全事件记录保留天数 | | `SERVERBEE_RETENTION__IP_QUALITY_EVENT_DAYS` | `90` | IP 质量状态变更事件记录保留天数 | -#### 移动端认证(Mobile,可选) +#### 移动端(Mobile,可选) | 环境变量 | 默认值 | 说明 | |----------|--------|------| @@ -120,30 +120,30 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 #### 防火墙(Firewall,可选) -[防火墙黑名单](/zh/docs/firewall) 的第二道护栏。即使管理员主动尝试,`POST /api/firewall/blocks` 也会拒绝插入此列表中的 CIDR/IP。第一道护栏(硬编码的保留段:回环、RFC 1918、链路本地、组播、未指定地址)始终生效。 +[防火墙黑名单](/zh/docs/firewall) 功能的第二道护栏(Tier-2)。即使管理员主动尝试,`POST /api/firewall/blocks` 也会拒绝插入此列表中的 CIDR / IP。第一道护栏(硬编码的保留段:回环、RFC 1918、链路本地、组播、未指定地址)始终生效。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_FIREWALL__ALLOW_LIST` | `[]` | 服务端拒绝写入 `block_list` 的 CIDR/IP 列表。叠加在硬编码 Tier-1 保留段之上的 Tier-2 护栏 | +| `SERVERBEE_FIREWALL__ALLOW_LIST` | `[]` | 服务端拒绝写入 `block_list` 的 CIDR / IP 列表。叠加在硬编码 Tier-1 保留段之上的 Tier-2 护栏 | #### IP 质量检测(IP Quality) -默认开箱即用,通过 [ipapi.is](https://ipapi.is)(无需 API Key,按源 IP 限 1000 次/天)获取风险评分。主 Provider 失败时自动回退到 [ip-api.com](https://ip-api.com)(提供地理 + 代理/托管标志,无风险评分)。详见 [IP 质量检测](/zh/docs/ip-quality)。 +默认开箱即用,通过 [ipapi.is](https://ipapi.is)(无需 API Key,按源 IP 限约 1000 次 / 天)获取风险评分。主 Provider 失败时自动回退到 [ip-api.com](https://ip-api.com)(提供地理 + 代理 / 托管标志,无风险评分)。功能详情见 [IP 质量检测](/zh/docs/ip-quality)。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_IP_QUALITY__RISK_PROVIDER` | `"ipapi_is"` | 主风险评分 Provider。可选:`none` / `ipapi_is` / `ip-api`。 | +| `SERVERBEE_IP_QUALITY__RISK_PROVIDER` | `"ipapi_is"` | 主风险评分 Provider。可选:`none`、`ipapi_is`、`ip-api`。 | | `SERVERBEE_IP_QUALITY__RISK_PROVIDER_FALLBACK` | `"ip-api"` | 主 Provider 失败时的兜底。设为 `none` 关闭。 | -| `SERVERBEE_IP_QUALITY__IPAPI_IS__API_KEY` | -- | 可选。配置后享受更高的账户级速率。 | -| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `https://api.ipapi.is` | 自建镜像 / 测试时覆盖。 | +| `SERVERBEE_IP_QUALITY__IPAPI_IS__API_KEY` | -- | 可选。配置后享受更高的账户级速率限制。 | +| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `https://api.ipapi.is` | 自建镜像或测试时覆盖。 | -**老版本升级说明**:早期版本支持 4 个付费 Provider(Scamalytics / IPQualityScore / ProxyCheck / AbuseIPDB),通过 `SERVERBEE_IP_QUALITY__{SCAMALYTICS,IPQS,PROXYCHECK,ABUSEIPDB}__*` 配置。这些环境变量会被静默忽略。如需恢复对应能力,请从 2026-05-25 之前的 tag 中 fork 或 vendor 对应实现。 +**老版本升级说明:** 早期版本支持 4 个付费 Provider(Scamalytics、IPQualityScore、ProxyCheck、AbuseIPDB),通过 `SERVERBEE_IP_QUALITY__{SCAMALYTICS,IPQS,PROXYCHECK,ABUSEIPDB}__*` 配置。这些环境变量会被静默忽略。如需恢复对应能力,请从 2026-05-25 之前的 tag 中 fork 或 vendor 对应实现。 #### 网络探测异常阈值(Network Probe Anomaly Thresholds) -`/api/servers/{id}/network-probes/anomalies` 与网络探测概览的 `anomaly_count` 字段用来给探测记录打 `high_latency` / `very_high_latency` 标签的平均延迟阈值。`avg_latency` 严格大于 `very_high_latency_ms` 标为 `very_high_latency`;严格大于 `high_latency_ms` 且不超过 very-high 阈值的标为 `high_latency`。 +用于将 `/api/servers/{id}/network-probes/anomalies` 与网络探测概览 `anomaly_count` 字段中的探测记录分类的平均延迟阈值。`avg_latency` 严格大于 `very_high_latency_ms` 的记录标为 `very_high_latency`;大于 `high_latency_ms` 且不超过 very-high 阈值的记录标为 `high_latency`。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| @@ -158,10 +158,10 @@ ServerBee 使用 [figment](https://github.com/SergioBenitez/Figment) 库加载 |----------|--------|------| | `SERVERBEE_DATABASE__PATH` | `serverbee.db` | SQLite 数据库文件路径(相对于 data_dir) | | `SERVERBEE_DATABASE__MAX_CONNECTIONS` | `10` | 数据库连接池最大连接数 | -| `SERVERBEE_AUTH__SESSION_TTL` | `86400` | Session 有效期(秒),默认 24 小时 | -| `SERVERBEE_AUTH__SECURE_COOKIE` | `true` | Cookie 的 Secure 标记。仅当浏览器通过普通 HTTP 访问 ServerBee 时设为 `false`,例如 IP 直连的快速开始安装 | -| `SERVERBEE_RATE_LIMIT__LOGIN_MAX` | `5` | 15 分钟内每 IP 最大登录尝试次数 | -| `SERVERBEE_RATE_LIMIT__REGISTER_MAX` | `10` | 15 分钟内每 IP 最大 Agent 注册次数。管理员可在「设置 → 速率限制」中清除活跃窗口 | +| `SERVERBEE_AUTH__SESSION_TTL` | `86400` | Session Token 有效期(秒),默认 24 小时 | +| `SERVERBEE_AUTH__SECURE_COOKIE` | `true` | 为 Session Cookie 设置 Secure 标记。仅当浏览器通过普通 HTTP 访问 ServerBee 时设为 `false`,例如 IP 直连的快速开始安装 | +| `SERVERBEE_RATE_LIMIT__LOGIN_MAX` | `5` | 15 分钟窗口内每 IP 最大登录尝试次数 | +| `SERVERBEE_RATE_LIMIT__REGISTER_MAX` | `10` | 15 分钟窗口内每 IP 最大 Agent 注册次数。管理员可在「设置 → 速率限制」中清除活跃窗口 | | `SERVERBEE_UPGRADE__RELEASE_BASE_URL` | `https://github.com/ZingerLittleBee/ServerBee/releases` | Agent 升级 Release 资产的基础 URL | | `SERVERBEE_UPGRADE__LATEST_VERSION_URL` | `""` | 可选的自定义最新版本 API URL,留空则使用 GitHub API | | `SERVERBEE_FILE__MAX_UPLOAD_SIZE` | `104857600` | 文件上传最大大小(字节),默认 100 MB | @@ -174,21 +174,21 @@ Agent 顶层键使用单下划线,嵌套键使用 `__`(双下划线)。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_SERVER_URL` | --(必填) | Server 的 HTTP 地址(如 `http://your-server:9527`),Agent 自动拼接 API 路径 | -| `SERVERBEE_ENROLLMENT_CODE` | `""` | 一次性注册码,由管理员在设置页生成;单次使用、短时有效(默认 10 分钟),仅在 Token 为空的首次注册时使用 | +| `SERVERBEE_SERVER_URL` | --(必填) | Server 的 HTTP 基础地址(如 `http://your-server:9527`),Agent 自动拼接 API 路径 | +| `SERVERBEE_ENROLLMENT_CODE` | `""` | 一次性注册码,由管理员在设置页生成;单次使用、短时有效(默认 10 分钟),仅在 Token 为空时使用 | #### 常用配置(Common) | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_COLLECTOR__INTERVAL` | `3` | 指标采集和上报间隔(秒) | -| `SERVERBEE_COLLECTOR__ENABLE_GPU` | `false` | 启用 NVIDIA GPU 监控(需要 nvml 驱动) | +| `SERVERBEE_COLLECTOR__INTERVAL` | `3` | 指标上报间隔(秒) | +| `SERVERBEE_COLLECTOR__ENABLE_GPU` | `false` | 启用 NVIDIA GPU 监控(需要 NVIDIA 驱动 / NVML) | | `SERVERBEE_COLLECTOR__ENABLE_TEMPERATURE` | `true` | 启用 CPU 温度监控 | -| `SERVERBEE_FILE__ENABLED` | `false` | 启用文件管理功能 | -| `SERVERBEE_FILE__ROOT_PATHS` | `[]` | 允许浏览的根路径(逗号分隔,如 `/home,/var/log`),留空则拒绝所有文件操作 | +| `SERVERBEE_FILE__ENABLED` | `false` | 在该 Agent 上启用文件管理 | +| `SERVERBEE_FILE__ROOT_PATHS` | `[]` | 允许的根路径(逗号分隔,如 `/home,/var/log`)。留空则拒绝所有文件操作 | | `SERVERBEE_IP_CHANGE__ENABLED` | `true` | 启用周期性 IP 变更检测 | -| `SERVERBEE_LOG__LEVEL` | `info` | 日志级别:`trace`/`debug`/`info`/`warn`/`error` | -| `SERVERBEE_LOG__FILE` | `""` | 日志文件路径,留空输出到 stdout | +| `SERVERBEE_LOG__LEVEL` | `info` | 日志级别:`trace`、`debug`、`info`、`warn`、`error` | +| `SERVERBEE_LOG__FILE` | `""` | 日志文件路径,留空仅输出到 stdout | #### 内部配置(Internal) @@ -196,18 +196,18 @@ Agent 顶层键使用单下划线,嵌套键使用 `__`(双下划线)。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_TOKEN` | 注册后自动填充 | Agent 认证 Token,无需手动设置 | -| `SERVERBEE_FILE__MAX_FILE_SIZE` | `1073741824` | 文件读取/下载的最大字节数(默认 1GB) | -| `SERVERBEE_FILE__DENY_PATTERNS` | `*.key,*.pem,...` | 拒绝访问的文件名 glob 模式 | -| `SERVERBEE_IP_CHANGE__EXTERNAL_IP_URLS` | `["https://api.ipify.org","https://ifconfig.me/ip","https://icanhazip.com","https://checkip.amazonaws.com"]` | 外部 IP 查询服务列表,按顺序逐个尝试直到首个成功。在容器、NAT 或网卡看不到公网地址的环境下是必需的。完全离线部署可设为空数组跳过外部查询 | +| `SERVERBEE_TOKEN` | 注册后自动填充 | Agent 认证 Token。注册后自动填充,无需手动设置 | +| `SERVERBEE_FILE__MAX_FILE_SIZE` | `1073741824` | 文件读取 / 下载的最大字节数(默认 1GB) | +| `SERVERBEE_FILE__DENY_PATTERNS` | `*.key,*.pem,...` | Agent 拒绝访问的文件名 glob 模式 | +| `SERVERBEE_IP_CHANGE__EXTERNAL_IP_URLS` | `["https://api.ipify.org","https://ifconfig.me/ip","https://icanhazip.com","https://checkip.amazonaws.com"]` | 公网 IP 查询服务的有序列表,Agent 启动和每次 IP 变更检测时按顺序逐个尝试,首个成功即采用。在 NAT、容器或网卡看不到可路由 IP 的环境下是必需的。完全离线部署可设为 `[]` 跳过外部查询 | | `SERVERBEE_IP_CHANGE__INTERVAL_SECS` | `300` | IP 检测间隔(秒),默认 5 分钟 | #### 升级配置(Upgrade,Agent) | 环境变量 | 默认值 | 说明 | |----------|--------|------| -| `SERVERBEE_UPGRADE__RELEASE_REPO_URL` | `https://github.com/ZingerLittleBee/ServerBee/releases` | Agent 下载升级包时使用的固定发布源基础 URL。任何镜像需复现 GitHub Releases 的目录结构:`{base}/download/v{version}/{asset}` 用于下载二进制,`{base}/download/v{version}/checksums.txt` 用于校验。编译时默认值可通过构建环境变量 `SERVERBEE_RELEASE_REPO` 覆盖 | -| `SERVERBEE_UPGRADE__RELEASE_CERT_SPKI_SHA256` | `""` | 可选的发布源 TLS 证书 SPKI Pin。格式为 64 位小写十六进制字符(叶证书 SubjectPublicKeyInfo DER 的 SHA-256)。留空则禁用。设置后,Agent 在标准证书链验证通过后额外校验叶证书 SPKI。格式非法(非 64 位或含非十六进制字符)时启动即报错 | +| `SERVERBEE_UPGRADE__RELEASE_REPO_URL` | `https://github.com/ZingerLittleBee/ServerBee/releases` | Agent 下载升级包时使用的固定发布源基础 URL。任何镜像 HTTPS 主机只要复现 GitHub Releases 的路径布局 `{base}/download/v{version}/{asset}` 和 `{base}/download/v{version}/checksums.txt` 即可。编译时默认值可通过构建期环境变量 `SERVERBEE_RELEASE_REPO` 覆盖 | +| `SERVERBEE_UPGRADE__RELEASE_CERT_SPKI_SHA256` | `""` | 可选的发布源 TLS 证书 SPKI Pin。64 位小写十六进制字符 = 叶证书 SubjectPublicKeyInfo DER 的 SHA-256。留空则禁用。设置后,Agent 在标准证书链验证通过后额外校验叶证书 SPKI。格式非法(非 64 位或含非十六进制字符)时启动即报错 | #### 安全事件检测(Security,Agent) @@ -216,257 +216,249 @@ Agent 顶层键使用单下划线,嵌套键使用 `__`(双下划线)。 | 环境变量 | 默认值 | 说明 | |----------|--------|------| | `SERVERBEE_SECURITY__ENABLED` | `true` | 安全事件检测器总开关。设为 `false` 时 Agent 不再上报任何 `security_event` 消息 | -| `SERVERBEE_SECURITY__SSH__WINDOW_SECONDS` | `60` | SSH 爆破检测的滑动窗口长度(秒)| -| `SERVERBEE_SECURITY__SSH__FAILED_THRESHOLD` | `10` | 窗口内累计失败次数达到该值即触发 `ssh_brute_force` 事件。触发后队列清空,需要重新累积 | -| `SERVERBEE_SECURITY__PORT_SCAN__ENABLED` | `false` | 是否启用端口扫描检测。需要系统安装 `conntrack` CLI(仅 Linux)| -| `SERVERBEE_SECURITY__PORT_SCAN__WINDOW_SECONDS` | `30` | 端口扫描检测的滑动窗口长度(秒)| +| `SERVERBEE_SECURITY__SSH__WINDOW_SECONDS` | `60` | SSH 爆破检测的滑动窗口长度(秒) | +| `SERVERBEE_SECURITY__SSH__FAILED_THRESHOLD` | `10` | 窗口内累计失败次数达到该值即触发 `ssh_brute_force` 事件。触发后队列清空 | +| `SERVERBEE_SECURITY__PORT_SCAN__ENABLED` | `false` | 启用端口扫描检测。需要系统安装 `conntrack` CLI(仅 Linux) | +| `SERVERBEE_SECURITY__PORT_SCAN__WINDOW_SECONDS` | `30` | 端口扫描检测的滑动窗口长度(秒) | | `SERVERBEE_SECURITY__PORT_SCAN__DISTINCT_PORT_THRESHOLD` | `20` | 同一源 IP 在窗口内命中的不同目标端口数达到该值即触发 `port_scan` 事件 | -| `SERVERBEE_SECURITY__DATA_DIR` | `/var/lib/serverbee/security` | 用于持久化 `first_seen` 存储的目录,用来标记 `ssh_login` 事件是否为新 (user, IP) 组合 | +| `SERVERBEE_SECURITY__DATA_DIR` | `/var/lib/serverbee/security` | 用于持久化 `first_seen` 存储的目录,用来将 `ssh_login` 事件标记为新的 (user, IP) 组合 | -## 完整 server.toml 参考 +## Server 配置(server.toml) -```toml title="/etc/serverbee/server.toml" -# ============================================================ -# ServerBee Server 配置文件 -# ============================================================ +### `[server]` —— 核心服务器设置 -# --- 基础配置 --- -[server] -# 监听地址和端口 -# 默认: "0.0.0.0:9527" -listen = "0.0.0.0:9527" +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `listen` | string | `"0.0.0.0:9527"` | Server 监听的 IP 地址和端口 | +| `data_dir` | string | `"./data"` | 数据库文件和其他持久化数据的目录 | +| `trusted_proxies` | string[] | 私有 / 回环 CIDR | 受信任的反向代理 CIDR 段。默认信任 RFC 1918 + 回环段。设为 `[]` 禁用 X-Forwarded-For 解析 | -# 数据存储目录,数据库文件和其他持久化数据存放于此 -# 默认: "/var/lib/serverbee" -data_dir = "/var/lib/serverbee" +### `[database]` —— 数据库设置 -# 受信任的反向代理 CIDR 列表 -# 来自这些地址的请求将从 X-Forwarded-For / X-Real-IP 头读取真实客户端 IP -# 默认: 私有/回环 CIDR(RFC 1918 + 127.0.0.0/8 + ::1/128) -# 设为 [] 可禁用 X-Forwarded-For 解析 -trusted_proxies = [] +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `path` | string | `"serverbee.db"` | 数据库文件名(相对于 `data_dir`) | +| `max_connections` | u32 | `10` | SQLite 连接池最大连接数 | -# --- 数据库配置 --- -[database] -# SQLite 数据库文件名(相对于 data_dir) -# 默认: "serverbee.db" -path = "serverbee.db" +### `[auth]` —— 认证设置 -# 连接池最大连接数 -# SQLite 写锁是全局的,多连接主要用于并发读 -# 默认: 10 -max_connections = 10 +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `session_ttl` | i64 | `86400` | Session Cookie 有效期(秒),24 小时 | +| `max_servers` | u32 | `0` | 通过注册码接入的最大服务器数(0 = 不限制)。尽力软限制 | +| `secure_cookie` | bool | `true` | 为 Session Cookie 设置 `Secure` 标记。仅当浏览器通过普通 HTTP 访问 ServerBee 时设为 `false` | -# --- 认证配置 --- -[auth] -# Session 过期时间(秒),采用滑动过期策略 -# 每次有效请求自动延长过期时间 -# 默认: 86400 (24 小时) -session_ttl = 86400 - -# 通过注册码接入的最大服务器数(0 = 不限制) -# 尽力软限制,不保证精确 -# 默认: 0 -max_servers = 0 - -# --- 本地开发 --- -[dev] -# 重置并写入合成 demo 数据。仅允许 database.path = "dev-demo.db"。 -# 默认: false -demo_data = false - -# --- 速率限制 --- -[rate_limit] -# 登录接口速率限制:每 IP 在 15 分钟窗口内的最大尝试次数 -# 默认: 5 -login_max = 5 - -# Agent 注册接口速率限制:每 IP 在 15 分钟窗口内的最大尝试次数 -# 默认: 10 -register_max = 10 - -# --- 数据保留策略 --- -[retention] -# 分钟级指标记录保留天数 -# 默认: 7 -records_days = 7 - -# 小时级聚合指标保留天数 -# 默认: 90 -records_hourly_days = 90 - -# GPU 指标记录保留天数 -# 默认: 7 -gpu_records_days = 7 - -# Ping 探测记录保留天数 -# 默认: 7 -ping_records_days = 7 - -# 网络质量探测原始记录保留天数 -# 默认: 7 -network_probe_days = 7 - -# 网络质量探测小时聚合记录保留天数 -# 默认: 90 -network_probe_hourly_days = 90 - -# 审计日志保留天数 -# 默认: 180 -audit_logs_days = 180 - -# 流量小时记录保留天数 -# 默认: 7 -traffic_hourly_days = 7 - -# 流量日记录保留天数 -# 默认: 400 -traffic_daily_days = 400 - -# 任务执行结果保留天数 -# 默认: 7 -task_results_days = 7 - -# Docker 事件记录保留天数 -# 默认: 7 -docker_events_days = 7 - -# 服务监控记录保留天数 -# 默认: 30 -service_monitor_days = 30 - -# 安全事件记录保留天数(SSH 登录/爆破、端口扫描) -# 默认: 30 -security_event_days = 30 - -# IP 质量状态变更事件记录保留天数 -# 默认: 90 -ip_quality_event_days = 90 - -# --- 调度器 --- -[scheduler] -# 流量日聚合和计费周期计算使用的时区 -# 使用 IANA 时区名称,如 Asia/Shanghai、US/Eastern -# 默认: "UTC" -timezone = "UTC" - -# --- GeoIP 地理位置 --- -[geoip] -# MaxMind 兼容 MMDB 数据库文件路径,路径非空时使用自定义 GeoIP;为空时可在设置页下载 DB-IP Lite -# 默认: "" -mmdb_path = "" +### `[dev]` —— 本地开发 -# --- ASN 路由追踪富化 --- -[asn] -# DB-IP Lite ASN / MaxMind GeoLite2-ASN MMDB 文件路径,路径非空时启用路由追踪 ASN 富化; -# 为空时可在设置页下载 DB-IP Lite ASN -# 默认: "" -mmdb_path = "" - -# --- Resend 邮件通知 --- -[resend] -# Resend API Key(https://resend.com/api-keys) -# 使用邮件通知通道时必填 -# 各邮件通道的 from 发件地址必须属于你在 https://resend.com/domains 已验证的域名 -# 默认: "" -api_key = "" - -# --- 日志配置 --- -[log] -# 日志级别: trace / debug / info / warn / error -# 默认: "info" -level = "info" +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `demo_data` | bool | `false` | 重置并写入本地合成开发数据集。被限制为 `database.path = "dev-demo.db"`,因此不会误跑在正常本地或生产副本数据库上 | + +### `[retention]` —— 数据保留 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `records_days` | u32 | `7` | 原始指标记录保留天数 | +| `records_hourly_days` | u32 | `90` | 小时聚合记录保留天数 | +| `gpu_records_days` | u32 | `7` | 单 GPU 指标记录保留天数 | +| `ping_records_days` | u32 | `7` | Ping 探测记录保留天数 | +| `network_probe_days` | u32 | `7` | 原始网络质量探测记录保留天数 | +| `network_probe_hourly_days` | u32 | `90` | 小时聚合网络质量探测记录保留天数 | +| `audit_logs_days` | u32 | `180` | 审计日志保留天数 | +| `traffic_hourly_days` | u32 | `7` | 流量小时记录保留天数 | +| `traffic_daily_days` | u32 | `400` | 流量日记录保留天数 | +| `task_results_days` | u32 | `7` | 任务执行结果保留天数 | +| `docker_events_days` | u32 | `7` | Docker 事件记录保留天数 | +| `service_monitor_days` | u32 | `30` | 服务监控检查记录保留天数 | +| `security_event_days` | u32 | `30` | 安全事件记录保留天数(SSH 登录 / 爆破、端口扫描) | +| `ip_quality_event_days` | u32 | `90` | IP 质量状态变更事件记录保留天数 | -# 日志文件路径,留空则输出到 stdout -# 默认: "" -file = "" + +原始指标记录每 60 秒写入一次,默认保留 7 天。小时聚合器计算平均值,让你无需占用过多存储即可保留 90 天的长期趋势。请根据磁盘空间和监控需求调整这些值。 + -# --- 升级配置 --- -[upgrade] -# Agent 自动升级时下载 Release 资产的基础 URL -# Server 会在此 URL 后拼接 /download/v{version}/ 构造完整下载地址 -# 默认: "https://github.com/ZingerLittleBee/ServerBee/releases" -release_base_url = "https://github.com/ZingerLittleBee/ServerBee/releases" +### `[scheduler]` —— 调度器 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `timezone` | string | `"UTC"` | 流量日聚合和计费周期计算使用的时区。使用 IANA 时区名称(如 `Asia/Shanghai`、`US/Eastern`) | -# 可选的自定义最新版本 API URL -# 留空则使用 GitHub API 查询最新版本 -# 用于自定义版本发布渠道或私有镜像源 -# 默认: "" -latest_version_url = "" +### `[rate_limit]` —— 速率限制 -# --- 文件上传配置 --- -[file] -# 文件上传最大大小(字节) -# 默认: 104857600 (100MB) -max_upload_size = 104857600 - -# --- 移动端认证配置(可选)--- -[mobile] -# 移动端 Access Token 有效期(秒) -# 默认: 900 (15 分钟) -access_ttl = 900 - -# 移动端 Refresh Token 有效期(秒) -# 默认: 2592000 (30 天) -refresh_ttl = 2592000 - -# --- 防火墙黑名单 Tier-2 护栏(可选)--- -[firewall] -# 服务端拒绝写入 block_list 的 CIDR/IP 列表 -# 默认: [] -allow_list = [] - -# --- IP 质量检测 --- -# 主风险评分 Provider。可选值: none / ipapi_is / ip-api -# 默认: "ipapi_is"(开箱即用,无需 API Key,按源 IP 限 1000 次/天) -[ip_quality] -risk_provider = "ipapi_is" - -# 主 Provider 失败时的兜底,设为 "none" 关闭 -# 默认: "ip-api" -risk_provider_fallback = "ip-api" - -# ipapi.is(可选配置,不填也能使用免费额度) -# [ip_quality.ipapi_is] -# api_key = "" # 配置后享受更高的账户级速率 -# endpoint = "" # 自建镜像 / 测试时覆盖,留空使用默认值 - -# --- OAuth 配置(可选)--- -# [oauth.github] -# client_id = "" -# client_secret = "" - -# [oauth.google] -# client_id = "" -# client_secret = "" - -# [oauth.oidc] -# issuer_url = "" -# client_id = "" -# client_secret = "" -``` +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `login_max` | u32 | `5` | 每个速率限制窗口内的最大登录尝试次数 | +| `register_max` | u32 | `10` | 每个速率限制窗口内的最大 Agent 注册尝试次数。管理员可在「设置 → 速率限制」中清除活跃窗口 | + +### `[log]` —— 日志 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `level` | string | `"info"` | 日志级别:`trace`、`debug`、`info`、`warn`、`error` | +| `file` | string | `""` | 日志文件路径。留空则仅输出到 stdout | + +日志级别也可通过 `RUST_LOG` 环境变量设置,且该变量优先级更高。 + +### `[geoip]` —— GeoIP 地理位置查询 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `mmdb_path` | string | `""` | MaxMind 兼容 MMDB 文件路径。路径非空时启用该自定义 GeoIP 数据库;为空时可在 UI 中将 DB-IP Lite 下载到 Server 数据目录 | -## Agent 升级配置(agent.toml `[upgrade]`) +### `[asn]` —— ASN 查询(路由追踪) -Agent 从本地固定的发布源下载升级包,不再信任 Server 下发的下载地址。 +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `mmdb_path` | string | `""` | DB-IP Lite ASN / MaxMind GeoLite2-ASN MMDB 文件路径。路径非空时启用该自定义 ASN 数据库;为空时可在 UI 中将 DB-IP Lite ASN 下载到 Server 数据目录。用于为每个路由追踪跳点标注其自治系统 | + +### `[resend]` —— 邮件通知 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `api_key` | string | `""` | Resend API Key([resend.com/api-keys](https://resend.com/api-keys))。使用邮件通知通道时必填。各邮件通道的 `from` 发件地址必须属于你在 [resend.com/domains](https://resend.com/domains) 已验证的域名 | + +### `[oauth]` —— OAuth / SSO + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `base_url` | string | `""` | ServerBee 实例的公网 URL(用于回调 URL) | +| `allow_registration` | bool | `false` | 首次 OAuth 登录时创建新用户账号 | + +### `[oauth.github]` —— GitHub OAuth + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `client_id` | string | -- | GitHub OAuth App Client ID | +| `client_secret` | string | -- | GitHub OAuth App Client Secret | + +### `[oauth.google]` —— Google OAuth + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `client_id` | string | -- | Google OAuth Client ID | +| `client_secret` | string | -- | Google OAuth Client Secret | + +### `[oauth.oidc]` —— OpenID Connect | 键 | 类型 | 默认值 | 说明 | |----|------|--------|------| -| `release_repo_url` | string | `"https://github.com/ZingerLittleBee/ServerBee/releases"` | Agent 下载升级包的发布源基础 URL。镜像站需复现 GitHub Releases 目录结构:`{base}/download/v{version}/{asset}`(二进制)和 `{base}/download/v{version}/checksums.txt`(校验文件)。编译时默认值可通过构建环境变量 `SERVERBEE_RELEASE_REPO` 覆盖 | -| `release_cert_spki_sha256` | string | `""` | 可选的发布源 TLS 证书 SPKI Pin。设为 64 位小写十六进制字符(叶证书 SubjectPublicKeyInfo DER 的 SHA-256)。留空禁用。设置后,Agent 在标准证书链验证通过后额外校验叶证书 SPKI。格式非法(非 64 位或含非十六进制字符)时启动即报错 | +| `issuer_url` | string | -- | OIDC Issuer URL(如 `https://auth.example.com/realms/main`) | +| `client_id` | string | -- | OIDC Client ID | +| `client_secret` | string | -- | OIDC Client Secret | +| `scopes` | string[] | `["openid", "email", "profile"]` | OAuth 请求的 scope | -**配置优先级**(越后越高,高者覆盖低者): +### `[upgrade]` —— Agent 升级 -1. 编译时默认值(官方 GitHub Releases URL) -2. `/etc/serverbee/agent.toml` 或 `agent.toml` 中的 `[upgrade]` 节 -3. `SERVERBEE_UPGRADE__RELEASE_REPO_URL` 环境变量 -4. `--release-repo` CLI 参数(最高) +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `release_base_url` | string | `"https://github.com/ZingerLittleBee/ServerBee/releases"` | Agent 升级 Release 资产的基础 URL。Server 会在此 URL 后拼接 `/download/v{version}/` 构造资产下载地址 | +| `latest_version_url` | string | `""` | 可选的自定义最新版本 API URL。留空则由 Server 查询 GitHub API 确定最新版本。可用此项覆盖为自定义版本接口 | + +### `[file]` —— 文件上传(Server 端) + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `max_upload_size` | u64 | `104857600` | 文件上传最大大小(字节),默认 100 MB | + +### `[mobile]` —— 移动端认证 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `access_ttl` | i64 | `900` | 移动端 Access Token 有效期(秒),默认 15 分钟 | +| `refresh_ttl` | i64 | `2592000` | 移动端 Refresh Token 有效期(秒),默认 30 天 | + +### `[firewall]` —— 防火墙黑名单护栏 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `allow_list` | string[] | `[]` | 服务端拒绝写入 `block_list` 的 CIDR / IP 列表。叠加在硬编码 Tier-1 保留段(回环 + RFC 1918 + 链路本地 + 组播 + 未指定地址)之上的 Tier-2 护栏。见 [防火墙黑名单](/zh/docs/firewall) | + +### `[ip_quality]` —— IP 质量风险评分 + +默认开箱即用,通过 [ipapi.is](https://ipapi.is)(无需 API Key,按源 IP 限约 1000 次 / 天)获取风险评分。主 Provider 失败时自动回退到 [ip-api.com](https://ip-api.com)。基础 IP 元数据(国家、ASN、IP 类型)始终来自本地 GeoIP MMDB。功能详情见 [IP 质量检测](/zh/docs/ip-quality)。 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `risk_provider` | string | `"ipapi_is"` | 主风险评分 Provider。可选:`none`、`ipapi_is`、`ip-api`。 | +| `risk_provider_fallback` | string | `"ip-api"` | 主 Provider 失败时的兜底。设为 `none` 关闭。 | + +### `[ip_quality.ipapi_is]` —— ipapi.is + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `api_key` | string | -- | 可选。配置后享受更高的账户级速率限制。 | +| `endpoint` | string | `"https://api.ipapi.is"` | 自建镜像或测试时覆盖。 | + +--- + +## Agent 配置(agent.toml) + +> **Docker Agent:** 挂载宿主机的 machine-id 以确保指纹识别正确: +> ``` +> -v /etc/machine-id:/etc/machine-id:ro +> ``` + +### 顶层选项 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `server_url` | string | **必填** | ServerBee Server 的 URL(如 `http://10.0.0.1:9527`) | +| `token` | string | `""` | Agent 认证 Token(注册后自动填充) | +| `enrollment_code` | string | `""` | 一次性注册码,来自 Server 设置页(仅在 `token` 为空时使用;首次注册成功后即被消费) | + +### `[collector]` —— 指标采集 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `interval` | u32 | `3` | 采集间隔(秒) | +| `enable_gpu` | bool | `false` | 启用 NVIDIA GPU 监控(需要 NVIDIA 驱动 / NVML) | +| `enable_temperature` | bool | `true` | 启用 CPU 温度传感器监控 | + +### `[file]` —— 文件管理 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `enabled` | bool | `false` | 启用文件管理能力。Server 端还需为该 Agent 启用 `CAP_FILE` | +| `root_paths` | string[] | `[]` | 将浏览限制在这些目录内。空数组拒绝所有文件操作 | +| `max_file_size` | u64 | `1073741824` | 文件读取和下载操作的最大大小(字节),默认 1 GB | +| `deny_patterns` | string[] | `["*.key", "*.pem", "id_rsa*", ".env*", "shadow", "passwd"]` | Agent 拒绝访问的文件名 glob 模式 | + +### `[ip_change]` —— IP 变更检测 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `enabled` | bool | `true` | 启用周期性 IP 变更检测。Agent 枚举网卡地址并上报变更 | +| `external_ip_urls` | string[] | `["https://api.ipify.org", "https://ifconfig.me/ip", "https://icanhazip.com", "https://checkip.amazonaws.com"]` | 返回 Agent 公网 IP 的服务有序列表。在启动和每次检测时按顺序尝试,首个成功即采用。在 NAT 或容器中运行的 Agent 需要它。设为 `[]` 可禁用外部查询(离线部署) | +| `interval_secs` | u64 | `300` | IP 检测间隔(秒),默认 5 分钟 | + +### `[log]` —— 日志 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `level` | string | `"info"` | 日志级别:`trace`、`debug`、`info`、`warn`、`error` | +| `file` | string | `""` | 日志文件路径。留空则仅输出到 stdout | + +### `[upgrade]` —— 升级源(Agent) + +Agent 从本地固定的发布源下载升级包,而不信任 Server 下发的 URL。 + +| 键 | 类型 | 默认值 | 说明 | +|----|------|--------|------| +| `release_repo_url` | string | `"https://github.com/ZingerLittleBee/ServerBee/releases"` | Agent 升级 Release 资产的基础 URL。必须复现 GitHub Releases 目录布局:二进制为 `{base}/download/v{version}/{asset}`,校验文件为 `{base}/download/v{version}/checksums.txt`。编译时默认值可通过构建期环境变量 `SERVERBEE_RELEASE_REPO` 覆盖 | +| `release_cert_spki_sha256` | string | `""` | 可选的发布源 TLS SPKI Pin。设为 64 位小写十六进制字符(叶证书 SubjectPublicKeyInfo DER 编码的 SHA-256)。留空则禁用 Pin。设置后,Agent 在标准证书链验证通过后额外校验叶证书 SPKI。格式非法(非 64 位或含非十六进制字符)时启动即报错 | + +**配置优先级**(越靠前越高): + +1. `--release-repo` CLI 参数 +2. `SERVERBEE_UPGRADE__RELEASE_REPO_URL` 环境变量 +3. `/etc/serverbee/agent.toml` 或 `agent.toml` 的 `[upgrade]` 节 +4. 编译时默认值(官方 GitHub Releases URL) -控制台"最新版本"检测使用的是 Server 端配置的发布源(`upgrade.release_base_url` / `upgrade.latest_version_url`)。这两项是独立的 Server 端设置,仅用于控制台的"最新版本"查询,与决定 Agent 实际下载内容的 Agent 端 `[upgrade] release_repo_url` 是不同的配置。如需保持一致的升级行为,建议将 Server 和 Agent 指向同一个发布仓库,除非有意让两者追踪不同的发布渠道。 +控制台的「最新版本」检测使用 Server 端配置的发布源(`upgrade.release_base_url` / `upgrade.latest_version_url`)。这两项是独立的 Server 端设置,仅用于控制台的「最新版本」查询,与决定 Agent 实际下载内容的 Agent 端 `[upgrade] release_repo_url` 是不同的配置。为保持一致的升级行为,请将 Server 和 Agent 指向同一个发布仓库,除非有意让两者追踪不同的发布源。 -**获取发布源证书的 SPKI Pin**: +**获取发布源证书 SPKI Pin** 的方法: ```bash openssl x509 -in cert.pem -pubkey -noout \ @@ -474,187 +466,80 @@ openssl x509 -in cert.pem -pubkey -noout \ | openssl dgst -sha256 -r | awk '{print $1}' ``` -## 完整 agent.toml 参考 +--- -> **Docker Agent:** 挂载宿主机的 machine-id 以确保指纹识别正确: -> ``` -> -v /etc/machine-id:/etc/machine-id:ro -> ``` +## 示例:最小 Server 配置 + +```toml +[server] +data_dir = "./data" +``` + +其余一切使用合理默认值。首次启动时,ServerBee 会创建 `admin` 用户并随机生成密码,将其在 Server 日志中打印一次。 -```toml title="/etc/serverbee/agent.toml" -# ============================================================ -# ServerBee Agent 配置文件 -# ============================================================ +## 示例:生产 Server 配置 + +```toml +[server] +listen = "127.0.0.1:9527" +data_dir = "/var/lib/serverbee" + +[auth] +secure_cookie = true + +[retention] +records_days = 14 +records_hourly_days = 180 + +[geoip] +mmdb_path = "/var/lib/serverbee/GeoLite2-City.mmdb" -# Server 地址(必填) -# 使用 http:// 或 https://(反向代理 HTTPS 时) +[asn] +mmdb_path = "/var/lib/serverbee/GeoLite2-ASN.mmdb" + +[log] +level = "info" +file = "/var/log/serverbee/server.log" + +[oauth] +base_url = "https://monitor.example.com" +allow_registration = false + +[oauth.github] +client_id = "Iv1.abc123" +client_secret = "secret123" +``` + +## 示例:最小 Agent 配置 + +```toml server_url = "http://your-server-ip:9527" +enrollment_code = "<来自设置页的一次性注册码>" +``` -# Agent Token,注册成功后自动写入 -# 首次运行时留空,注册成功后自动填入 -token = "" +## 示例:生产 Agent 配置 -# 一次性注册码(enrollment code) -# 由管理员在 Server 设置页(或 POST /api/agent/enrollments)生成 -# 仅在 token 为空的首次注册时使用;单次有效、默认 10 分钟过期, -# 注册成功后即被消费 -enrollment_code = "" +```toml +server_url = "https://monitor.example.com" +token = "previously-obtained-token" -# --- 采集配置 --- [collector] -# 指标采集和上报间隔(秒) -# 此值可被 Server 端 Welcome 消息中的 report_interval 覆盖 -# 默认: 3 interval = 3 - -# 是否启用 GPU 指标采集 -# 需要编译时启用 gpu feature 且服务器安装 NVIDIA 驱动 -# 默认: false -enable_gpu = false - -# 是否启用温度采集 -# 默认: true +enable_gpu = true enable_temperature = true -# --- 文件管理 --- [file] -# 是否启用文件管理功能 -# 需要同时在 Server 端为该 Agent 启用 CAP_FILE 能力 -# 默认: false -enabled = false - -# 允许浏览的根路径列表,留空则拒绝所有文件操作 -# 默认: [] -root_paths = [] - -# 文件读取/下载的最大字节数 -# 默认: 1073741824 (1GB) +enabled = true +root_paths = ["/home", "/var/log", "/etc"] max_file_size = 1073741824 - -# 拒绝访问的文件名 glob 模式 -# 默认: ["*.key", "*.pem", "id_rsa*", ".env*", "shadow", "passwd"] deny_patterns = ["*.key", "*.pem", "id_rsa*", ".env*", "shadow", "passwd"] -# --- IP 变更检测 --- [ip_change] -# 是否启用周期性 IP 变更检测 -# Agent 定期枚举网络接口地址并上报变更 -# 默认: true enabled = true +# external_ip_urls 默认为一组精选的公网 IP 服务。 +# 仅在需要指向内部镜像或禁用(设为 [])时覆盖。 -# 外部 IP 查询服务列表 -# Agent 启动和每次 IP 检测时按顺序逐个尝试,首个成功即采用 -# 在容器、NAT 或网卡看不到公网地址的环境下是必需的 -# 完全离线部署可设为空数组 [] 跳过外部查询 -# 默认: 4 个独立运营商,规避单点失效 -external_ip_urls = [ - "https://api.ipify.org", - "https://ifconfig.me/ip", - "https://icanhazip.com", - "https://checkip.amazonaws.com", -] - -# IP 检测间隔(秒) -# 默认: 300 (5 分钟) -interval_secs = 300 - -# --- 日志配置 --- [log] -# 日志级别: trace / debug / info / warn / error -# 默认: "info" level = "info" - -# 日志文件路径,留空则输出到 stdout -# 默认: "" -file = "/var/log/serverbee-agent.log" - -# --- 升级配置 --- -[upgrade] -# Agent 自动升级时下载 Release 资产的发布源基础 URL -# 镜像站需复现 GitHub Releases 目录结构: -# {base}/download/v{version}/{asset} — 二进制 -# {base}/download/v{version}/checksums.txt — 校验文件 -# 默认: "https://github.com/ZingerLittleBee/ServerBee/releases" -release_repo_url = "https://github.com/ZingerLittleBee/ServerBee/releases" - -# 可选的发布源 TLS 证书 SPKI Pin(64 位小写十六进制,叶证书 SPKI DER 的 SHA-256) -# 留空则禁用,格式非法时启动即报错 -# 默认: "" -release_cert_spki_sha256 = "" +file = "/var/log/serverbee/agent.log" ``` - -## 重要默认值汇总 - -下表列出了所有关键的默认值,便于快速参考: - -### Server 默认值 - -| 配置项 | 默认值 | 说明 | -|--------|--------|------| -| 监听端口 | `9527` | HTTP 和 WebSocket 共用端口 | -| 数据目录 | `./data` | 数据库和持久化数据(安装脚本部署为 `/opt/serverbee/data`) | -| 数据库文件 | `serverbee.db` | SQLite 数据库(相对于数据目录) | -| 连接池大小 | `10` | 最大并发数据库连接 | -| Session 有效期 | `86400` 秒(24 小时) | 滑动过期 | -| 管理员用户名 | `admin` | 仅首次初始化时使用 | -| 登录速率限制 | `5` 次/15 分钟/IP | 防暴力破解 | -| 注册速率限制 | `10` 次/15 分钟/IP | 防滥用注册 | -| 分钟级指标保留 | `7` 天 | 自动清理过期数据 | -| 小时级指标保留 | `90` 天 | 长期趋势分析 | -| GPU 指标保留 | `7` 天 | 与分钟级指标一致 | -| Ping 记录保留 | `7` 天 | 与分钟级指标一致 | -| 网络探测记录保留 | `7` 天 | 原始网络质量探测数据 | -| 网络探测小时聚合保留 | `90` 天 | 长期网络质量趋势分析 | -| 流量小时记录保留 | `7` 天 | 小时级流量数据 | -| 流量日记录保留 | `400` 天 | 长期流量趋势分析 | -| 任务结果保留 | `7` 天 | 远程命令执行结果 | -| Docker 事件保留 | `7` 天 | Docker 容器生命周期事件 | -| 审计日志保留 | `180` 天 | 半年审计记录 | -| 调度时区 | `UTC` | 流量日聚合时区 | -| GeoIP | 关闭 | 可提供 MMDB 文件路径,或在设置页下载 DB-IP Lite 数据库 | -| 日志级别 | `info` | 推荐生产环境使用 | - -### Agent 默认值 - -| 配置项 | 默认值 | 说明 | -|--------|--------|------| -| 采集间隔 | `3` 秒 | 可被 Server 动态调整 | -| GPU 采集 | 关闭 | 需要 NVIDIA 驱动 | -| 温度采集 | 开启 | 默认采集传感器温度 | -| 文件管理 | 关闭 | 需在 Agent 和 Server 同时启用 | -| 文件大小限制 | 1GB | 读取/下载的最大文件体积 | -| IP 变更检测 | 开启 | 默认检测网络接口 IP 变更 | -| 外部 IP 检测 | 开启 | 默认通过公共 IP 服务发现公网地址;设为空数组可禁用 | -| IP 检测间隔 | `300` 秒(5 分钟) | 定期检查间隔 | -| 日志级别 | `info` | 推荐生产环境使用 | - -### 内部默认值(不可配置) - -以下是系统内部的固定参数,供了解系统行为参考: - -| 参数 | 值 | 说明 | -|------|-----|------| -| Agent 上报间隔 | 3 秒 | 由 Server Welcome 消息控制 | -| 心跳间隔 | 30 秒 | Server 向 Agent 发送 Ping | -| 离线判定 | 30 秒 | 无上报则判定离线 | -| 离线检测扫描 | 10 秒 | 后台扫描 Agent 连接状态 | -| 指标写入 | 每 1 分钟 | 从内存缓存批量写入数据库 | -| 小时聚合 | 每 1 小时 | 分钟级数据聚合为小时级 | -| WebSocket JSON 帧上限 | 1 MB | 单条 JSON 消息最大体积 | -| WebSocket Binary 帧上限 | 64 KB | 终端数据帧最大体积 | -| 命令执行超时 | 300 秒 | 远程命令最大执行时间 | -| 命令输出上限 | 512 KB | 超出部分截断 | -| 命令长度上限 | 8 KB | 单条命令最大长度 | -| 并发命令上限 | 5 | 每台 Agent 最多并发执行 | -| 终端会话上限 | 3 | 每台服务器最多并发终端 | -| 终端空闲超时 | 10 分钟 | 无输入自动断开 | -| 告警采样窗口 | 10 分钟 | 最近 10 个采样点 | -| 告警触发比例 | 70% | 超过 70% 采样超阈值才触发 | -| 告警通知去抖 | 5 分钟 | 同一规则+服务器最短通知间隔 | -| 重连退避上限 | 30 秒 | 指数退避最大等待时间 | -| 重连抖动 | +/-20% | 避免雷群效应 | - - - - - From 7e8a6fd4524937c12dc2f554ae50ef7b6dbb3311 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:24 +0800 Subject: [PATCH 16/21] docs(deployment): add upgrade guide, TLS note and backup strategy --- apps/docs/content/docs/en/deployment.mdx | 58 ++- apps/docs/content/docs/zh/deployment.mdx | 512 ++++++++++------------- 2 files changed, 278 insertions(+), 292 deletions(-) diff --git a/apps/docs/content/docs/en/deployment.mdx b/apps/docs/content/docs/en/deployment.mdx index 191709f0..86e09852 100644 --- a/apps/docs/content/docs/en/deployment.mdx +++ b/apps/docs/content/docs/en/deployment.mdx @@ -4,13 +4,13 @@ description: Production deployment strategies for the ServerBee server and agent icon: Cloud --- -This guide covers production deployment best practices for ServerBee, including Railway, Docker, systemd, reverse proxy configuration, TLS, and backup strategies. +This guide covers production deployment best practices for ServerBee: Railway, Docker, systemd, reverse proxy configuration, TLS, and backup strategies. ## Railway (One-Click Deploy) [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/serverbee-server) -The fastest way to get ServerBee running. Click the button above and configure the environment variables: +The fastest way to get ServerBee running. Click the button above, then configure the environment variables: ```bash SERVERBEE_LOG__LEVEL="info" # Log level (trace/debug/info/warn/error) @@ -30,7 +30,7 @@ After deploying: 1. Add a **Volume** mounted at `/data` to persist data across deploys 2. Configure your agents to connect using the Railway-provided URL -3. On first start the server auto-creates an admin account with a randomly generated password. Open the Railway deploy logs and look for the highlighted credentials banner to retrieve it. On first login you are required to change this password and may optionally choose a different username. +3. On first start, the server auto-creates an admin account with a randomly generated password. Open the Railway deploy logs and look for the highlighted credentials banner to retrieve it. On first login you must change this password and may optionally choose a different username. Railway automatically assigns a port and provides HTTPS. No need to configure `SERVERBEE_SERVER__LISTEN` or TLS certificates. @@ -54,7 +54,7 @@ The Dockerfile exposes this switch via `ARG SERVERBEE_IMAGE_TAG=latest`. Railway ## Bootstrap Installer -For Linux hosts, the quickest binary/systemd path is the bootstrap installer: +On Linux hosts, the bootstrap installer is the quickest binary/systemd path: ```bash # Install the server @@ -127,7 +127,7 @@ docker compose logs -f serverbee-server ## Build from Source -For developers or users with customization needs, you can build ServerBee from source. Building requires Rust 1.85+ and Bun 1.x (or Node.js 22+). +Developers and users with customization needs can build ServerBee from source. This requires Rust 1.85+ and Bun 1.x (or Node.js 22+). ```bash # Clone the repo @@ -158,11 +158,11 @@ Start the server: ./target/release/serverbee-server ``` -Once you have deployed the compiled binaries to the target host, use the systemd service configuration below to manage the processes. +After deploying the compiled binaries to the target host, use the systemd service configuration below to manage the processes. ## Systemd Services -For deployments without Docker, use systemd to manage both the server and agent processes. +For deployments without Docker, use systemd to manage the server and agent processes. If you do not need hand-written unit files, prefer the bootstrap installer above. It creates the config files and systemd units for you, then the `serverbee` CLI handles upgrades, restarts, and config edits. @@ -248,11 +248,11 @@ See the [Agent Setup](/en/docs/agent) guide for a complete systemd service unit ## Reverse Proxy -Running ServerBee behind a reverse proxy is strongly recommended for production. It provides TLS termination, HTTP/2, and additional security headers. +Running ServerBee behind a reverse proxy is strongly recommended in production. The proxy provides TLS termination, HTTP/2, and additional security headers. ### Access URL and Cookie Settings -Choose one public access URL and keep the browser URL, agent `server_url`, and cookie setting aligned: +Pick one public access URL and keep the browser URL, the agent's `server_url`, and the cookie setting aligned: | Scenario | Public URL | `auth.secure_cookie` | Docker Environment Variable | Agent URL | |----------|------------|----------------------|-----------------------------|-----------| @@ -404,6 +404,10 @@ server_url = "https://monitor.example.com" The agent automatically handles WebSocket connections using the provided URL. + +ServerBee does not terminate TLS itself. All HTTPS/WSS encryption is handled by the reverse proxy in front of it (Nginx/Caddy/Traefik). This keeps the server implementation simple and lets you manage certificates in one place. + + When using HTTPS, keep `auth.secure_cookie = true` in your server configuration. Leaving it `false` may still allow login, but it removes the browser's Secure cookie protection and is not appropriate for production. @@ -497,13 +501,13 @@ The Docker Compose example above includes a health check configuration. Docker w ### External Monitoring -If you are using an external uptime monitoring service, point it at: +If you use an external uptime monitoring service, point it at: ``` https://monitor.example.com/healthz ``` -This creates a "monitor the monitor" setup, so you know if ServerBee itself goes down. +This creates a "monitor the monitor" setup, so you are alerted if ServerBee itself goes down. ## Resource Requirements @@ -517,7 +521,7 @@ ServerBee is designed for lightweight VPS instances: | Agent CPU | Negligible | < 1% of 1 core | | Agent RAM | 10 MB | 20 MB | -Database size depends on the number of monitored servers and retention settings. As a rough guide, expect about 1 MB per server per day of raw records. With 50 servers and default 7-day retention, the database will be approximately 350 MB. +Database size depends on the number of monitored servers and your retention settings. As a rough guide, expect about 1 MB of raw records per server per day. With 50 servers and the default 7-day retention, the database is roughly 350 MB. ## Security Checklist @@ -533,3 +537,33 @@ Before exposing ServerBee to the internet: - [ ] Keep the GeoIP database updated (if used) - [ ] Set up automated backups - [ ] Monitor the server itself with an external health check + +## Upgrading + +ServerBee runs database migrations automatically on startup, so no manual migration step is needed after an upgrade. Back up the database before upgrading. + +**Docker:** + +```bash +docker compose pull +docker compose up -d # restarts and runs migrations automatically +docker compose ps +``` + +**Binary:** + +```bash +wget https://github.com/ZingerLittleBee/ServerBee/releases/latest/download/serverbee-server-linux-amd64 +sudo systemctl stop serverbee-server +sudo mv serverbee-server-linux-amd64 /usr/local/bin/serverbee-server +sudo chmod +x /usr/local/bin/serverbee-server +sudo systemctl start serverbee-server +``` + +If you deployed with the install script, `sudo serverbee upgrade -y` handles the binary download, replacement, and restart for you. + + + + + + diff --git a/apps/docs/content/docs/zh/deployment.mdx b/apps/docs/content/docs/zh/deployment.mdx index 89bba2a6..ab7d299c 100644 --- a/apps/docs/content/docs/zh/deployment.mdx +++ b/apps/docs/content/docs/zh/deployment.mdx @@ -1,16 +1,16 @@ --- title: 部署指南 -description: 生产环境下 ServerBee 的部署、反向代理、TLS 配置和运维管理。 +description: 生产环境下 ServerBee 服务端和 Agent 的部署策略。 icon: Cloud --- -本文介绍在生产环境中部署 ServerBee 的最佳实践,包括 Railway 一键部署、Docker Compose、systemd 服务、反向代理和运维管理。 +本文介绍在生产环境部署 ServerBee 的最佳实践:Railway、Docker、systemd、反向代理配置、TLS 和备份策略。 ## Railway(一键部署) [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/serverbee-server) -最快的部署方式。点击上方按钮,配置以下环境变量: +最快的部署方式。点击上方按钮,然后配置以下环境变量: ```bash SERVERBEE_LOG__LEVEL="info" # 日志级别(trace/debug/info/warn/error) @@ -23,14 +23,14 @@ SERVERBEE_SCHEDULER__TIMEZONE="UTC" # 时区,影响流量按天聚 SERVERBEE_OAUTH__BASE_URL="" # OAuth 回调公网地址(如 https://xxx.up.railway.app) SERVERBEE_OAUTH__GITHUB__CLIENT_ID="" # GitHub OAuth Client ID SERVERBEE_OAUTH__GITHUB__CLIENT_SECRET="" # GitHub OAuth Client Secret -SERVERBEE_OAUTH__ALLOW_REGISTRATION="false" # 首次登录自动创建账号(true=开放注册,false=仅已绑定用户可登录) +SERVERBEE_OAUTH__ALLOW_REGISTRATION="false" # 首次 OAuth 登录自动创建账号(true=开放注册,false=仅已绑定用户可登录) ``` 部署后: -1. 添加 **Volume** 挂载到 `/data` 以持久化数据 -2. 将 Agent 配置连接到 Railway 提供的 URL -3. 首次启动时,Server 会自动创建管理员账号并随机生成密码。打开 Railway 部署日志,查找醒目的凭据横幅获取该密码。首次登录时你将被要求修改此密码,并可选择一个新的用户名。 +1. 添加 **Volume** 挂载到 `/data`,以在多次部署间持久化数据 +2. 将 Agent 配置为使用 Railway 提供的 URL 进行连接 +3. 首次启动时,服务端会自动创建管理员账号并随机生成密码。打开 Railway 部署日志,查找醒目的凭据横幅即可获取该密码。首次登录时你必须修改此密码,并可选择一个新的用户名。 Railway 会自动分配端口并提供 HTTPS,无需配置 `SERVERBEE_SERVER__LISTEN` 或 TLS 证书。 @@ -38,7 +38,7 @@ Railway 会自动分配端口并提供 HTTPS,无需配置 `SERVERBEE_SERVER__L ### 部署预发版本 -模板默认拉取 `ghcr.io/zingerlittlebee/serverbee-server:latest`,该 tag 只跟随稳定版。预发版(例如 `1.0.0-alpha.1`)会推到 GHCR,但**不会**更新 `:latest`。 +模板默认拉取 `ghcr.io/zingerlittlebee/serverbee-server:latest`,该 tag 只跟随稳定版。预发版(例如 `1.0.0-alpha.1`)会推送到 GHCR,但**不会**更新 `:latest`。 要锁定到指定版本(稳定版或预发版),在 Railway 服务的 **Variables** 里新增一条变量: @@ -49,7 +49,7 @@ SERVERBEE_IMAGE_TAG=1.0.0-alpha.1 保存后触发 Redeploy 即可。回到稳定通道时删除该变量(或改回 `latest`)。 -Dockerfile 通过 `ARG SERVERBEE_IMAGE_TAG=latest` 暴露这个开关,Railway 会自动把 Service Variables 同时作为构建参数和运行时环境变量注入。该变量不在 `SERVERBEE_*` 配置体系中,因此对运行时无副作用。 +Dockerfile 通过 `ARG SERVERBEE_IMAGE_TAG=latest` 暴露这个开关。Railway 会自动把 Service Variables 同时作为构建参数和运行时环境变量注入;该变量不在 `SERVERBEE_*` 配置体系中,因此对运行时无副作用。 ## 引导安装 @@ -66,7 +66,7 @@ curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/depl --enrollment-code YOUR_ONE_TIME_CODE ``` -安装完成后,使用 `serverbee` CLI 管理你的部署(安装时自动部署到 `/usr/local/bin/serverbee`): +安装完成后,使用 `serverbee` CLI 管理你的部署(安装时会自动部署到 `/usr/local/bin/serverbee`): ```bash sudo serverbee status @@ -78,63 +78,26 @@ sudo serverbee uninstall agent -y ``` -`install server` / `install agent` 主要用于首次引导安装。如果 `/usr/local/bin` 中已经存在对应二进制,安装脚本会直接沿用现有文件而不会覆盖。向已部署主机下发新 release 时,请改用 `upgrade`,或手动替换二进制后再重启服务。 +`install server` / `install agent` 是引导命令。如果 `/usr/local/bin` 中已经存在对应二进制,安装脚本会直接沿用现有文件而不会覆盖。向已部署主机下发新 release 二进制时,请改用 `upgrade`(或手动替换二进制)。 ## Docker Compose(推荐) -Docker Compose 是推荐的生产部署方式,简单且易于维护。 +Docker Compose 是在生产环境部署 ServerBee 最简单的方式。创建一个 `docker-compose.yml`: -### 基础部署 - -```yaml title="docker-compose.yml" +```yaml services: serverbee-server: image: ghcr.io/zingerlittlebee/serverbee-server:latest container_name: serverbee-server - ports: - - "9527:9527" - volumes: - - serverbee-data:/data restart: unless-stopped - -volumes: - serverbee-data: -``` - -```bash -# 启动 -docker compose up -d - -# 查看日志 -docker compose logs -f serverbee-server - -# 停止 -docker compose down - -# 更新到最新版本 -docker compose pull && docker compose up -d -``` - -### 带完整配置的部署 - -```yaml title="docker-compose.yml" -services: - serverbee-server: - image: ghcr.io/zingerlittlebee/serverbee-server:latest - container_name: serverbee-server ports: - - "127.0.0.1:9527:9527" # 仅监听本地,通过反向代理访问 + - "127.0.0.1:9527:9527" volumes: - serverbee-data:/data - - ./GeoLite2-City.mmdb:/data/GeoLite2-City.mmdb:ro + - ./server.toml:/app/server.toml:ro environment: - - SERVERBEE_SERVER__DATA_DIR=/data - - SERVERBEE_GEOIP__MMDB_PATH=/data/GeoLite2-City.mmdb - - SERVERBEE_RETENTION__RECORDS_DAYS=14 - - SERVERBEE_RETENTION__RECORDS_HOURLY_DAYS=365 - - SERVERBEE_LOG__LEVEL=info - restart: unless-stopped + - SERVERBEE_AUTH__SECURE_COOKIE=true healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:9527/healthz"] interval: 30s @@ -147,12 +110,24 @@ volumes: ``` -当使用反向代理时,建议将端口绑定到 `127.0.0.1:9527` 而非 `0.0.0.0:9527`,防止绕过反向代理直接访问。 +将端口绑定到 `127.0.0.1:9527` 而非 `0.0.0.0:9527`,可确保服务端只能通过反向代理访问,而无法从公网直接访问。 +启动服务: + +```bash +docker compose up -d +``` + +查看日志: + +```bash +docker compose logs -f serverbee-server +``` + ## 源码编译 -适合开发者或有定制需求的用户。源码编译需要 Rust 1.85+ 和 Bun 1.x(或 Node.js 22+)。 +开发者或有定制需求的用户可以从源码编译 ServerBee。源码编译需要 Rust 1.85+ 和 Bun 1.x(或 Node.js 22+)。 ```bash # 克隆仓库 @@ -165,7 +140,7 @@ bun install bun run build cd ../.. -# 构建 Server(通过 rust-embed 内嵌前端静态资源) +# 构建服务端(通过 rust-embed 内嵌前端静态资源) cargo build --release -p serverbee-server # 构建 Agent @@ -175,9 +150,9 @@ cargo build --release -p serverbee-agent 编译产物位于 `target/release/` 目录下: - `serverbee-server` — 服务端,内嵌前端静态资源 -- `serverbee-agent` — Agent 采集端 +- `serverbee-agent` — 指标采集 Agent -启动 Server: +启动服务端: ```bash ./target/release/serverbee-server @@ -185,217 +160,196 @@ cargo build --release -p serverbee-agent 将编译好的二进制部署到目标主机后,可以参考下文的 systemd 服务配置进行进程管理。 -## systemd 服务(生产 Linux 环境) +## systemd 服务 -适用于不使用 Docker 的 Linux 服务器。 +对于不使用 Docker 的部署,可以用 systemd 管理服务端和 Agent 进程。 -如果你不需要手写 unit 文件,更推荐上面的引导安装方式。它会自动生成配置和 systemd unit,后续通过 `serverbee` CLI 统一完成升级、重启和配置修改。 +如果你不需要手写 unit 文件,更推荐上面的引导安装方式。它会为你自动生成配置文件和 systemd unit,后续通过 `serverbee` CLI 统一完成升级、重启和配置修改。 注意:下面的 unit 示例采用自定义路径(`/usr/local/bin`、`/var/lib/serverbee`、`/etc/serverbee`)演示手动部署。引导脚本安装的实际布局不同——二进制在 `/opt/serverbee/bin/`、配置在 `/opt/serverbee/etc/`、数据在 `/opt/serverbee/data/`,且由 `serverbee` CLI 托管,无需手写 unit。 ### Server 服务 -```ini title="/etc/systemd/system/serverbee-server.service" +创建 `/etc/systemd/system/serverbee-server.service`: + +```ini [Unit] -Description=ServerBee Dashboard -After=network.target +Description=ServerBee Server +After=network-online.target +Wants=network-online.target [Service] Type=simple ExecStart=/usr/local/bin/serverbee-server -WorkingDirectory=/var/lib/serverbee Restart=always RestartSec=5 -LimitNOFILE=65536 +User=serverbee +Group=serverbee +WorkingDirectory=/var/lib/serverbee -# 安全加固(可选) +# 安全加固 NoNewPrivileges=true ProtectSystem=strict -ReadWritePaths=/var/lib/serverbee /etc/serverbee +ProtectHome=true +ReadWritePaths=/var/lib/serverbee /var/log/serverbee +PrivateTmp=true + +# 资源限制 +MemoryMax=512M [Install] WantedBy=multi-user.target ``` -### Agent 服务 +创建服务用户和目录: -```ini title="/etc/systemd/system/serverbee-agent.service" -[Unit] -Description=ServerBee Agent -After=network.target +```bash +# 创建系统用户 +sudo useradd -r -s /sbin/nologin -d /var/lib/serverbee serverbee -[Service] -Type=simple -ExecStart=/usr/local/bin/serverbee-agent -Restart=always -RestartSec=5 -AmbientCapabilities=CAP_NET_RAW +# 创建目录 +sudo mkdir -p /var/lib/serverbee /var/log/serverbee /etc/serverbee +sudo chown serverbee:serverbee /var/lib/serverbee /var/log/serverbee -[Install] -WantedBy=multi-user.target +# 放置配置文件 +sudo cp server.toml /etc/serverbee/server.toml + +# 放置二进制 +sudo cp serverbee-server /usr/local/bin/ +sudo chmod +x /usr/local/bin/serverbee-server +``` + +systemd 部署使用的生产 `server.toml`: + +```toml +[server] +listen = "127.0.0.1:9527" +data_dir = "/var/lib/serverbee" + +[log] +level = "info" +file = "/var/log/serverbee/server.log" ``` -### 常用操作 +启用并启动: ```bash -# 启用开机自启 sudo systemctl daemon-reload sudo systemctl enable serverbee-server -sudo systemctl enable serverbee-agent - -# 启动/停止/重启 sudo systemctl start serverbee-server -sudo systemctl stop serverbee-server -sudo systemctl restart serverbee-server - -# 查看状态和日志 sudo systemctl status serverbee-server -sudo journalctl -u serverbee-server -f -sudo journalctl -u serverbee-server --since "1 hour ago" ``` -## Nginx 反向代理 +### Agent 服务 + +完整的 Agent systemd 服务 unit 文件请参见 [Agent 配置](/zh/docs/agent) 指南。 + +## 反向代理 + +强烈建议在生产环境中将 ServerBee 部署在反向代理之后。反向代理可以提供 TLS 终止、HTTP/2 以及额外的安全响应头。 ### 访问地址和 Cookie 配置 -先确定一个对外访问地址,并让浏览器地址、Agent `server_url` 和 Cookie 配置保持一致: +先确定一个对外访问地址,并让浏览器地址、Agent 的 `server_url` 和 Cookie 配置保持一致: | 场景 | 对外地址 | `auth.secure_cookie` | Docker 环境变量 | Agent 地址 | |------|----------|----------------------|-----------------|------------| | IP 直连,普通 HTTP | `http://203.0.113.10:9527` | `false` | `SERVERBEE_AUTH__SECURE_COOKIE=false` | `http://203.0.113.10:9527` | | 域名 + HTTPS 反向代理 | `https://monitor.example.com` | `true` | `SERVERBEE_AUTH__SECURE_COOKIE=true` 或不设置该变量 | `https://monitor.example.com` | -使用域名访问时,需要先添加 DNS `A` 或 `AAAA` 记录指向服务器 IP,再用 Nginx、Caddy 或 Traefik 终止 HTTPS,并把请求反向代理到 ServerBee 的 `127.0.0.1:9527`。 +使用域名访问时,需要先添加 DNS `A` 或 `AAAA` 记录指向服务器,再用 Nginx、Caddy 或 Traefik 终止 HTTPS,并把请求反向代理到 ServerBee 的 `127.0.0.1:9527`。 -如果你是通过快速开始脚本安装的 Server,脚本会为了 HTTP 直连写入 `auth.secure_cookie = false`。迁移到 HTTPS 前,请修改 `/opt/serverbee/etc/server.toml`: +如果你是通过快速开始脚本安装的,脚本会为了 HTTP 直连写入 `auth.secure_cookie = false`。在把同一套安装切换到 HTTPS 前,请修改 `/opt/serverbee/etc/server.toml`: ```toml [auth] secure_cookie = true ``` -然后重启 Server: +然后重启服务端: ```bash sudo systemctl restart serverbee-server ``` -如果 ServerBee 已经通过安装脚本部署,也可以使用内置命令自动完成 Caddy HTTPS 配置: +如果 ServerBee 已经通过安装脚本部署,也可以让 CLI 自动完成 Caddy HTTPS 配置: ```bash sudo serverbee domain setup --domain monitor.example.com --email admin@example.com -y ``` -该命令会校验域名 DNS 是否解析到当前服务器,安装并配置 Caddy,把 ServerBee 改为只监听 `127.0.0.1:9527`,并把 `auth.secure_cookie` 设置为 `true`。如果 DNS 还没生效,命令会停止并打印需要添加的 `A`/`AAAA` 记录。 +该命令会校验域名是否解析到当前服务器,安装并配置 Caddy,把 ServerBee 改为只监听 `127.0.0.1:9527`,并把 `auth.secure_cookie` 设置为 `true`。如果 DNS 还没生效,命令会停止并打印你需要添加的 `A`/`AAAA` 记录。 + +### Nginx + +```nginx +upstream serverbee { + server 127.0.0.1:9527; +} -### 基础配置 +server { + listen 80; + server_name monitor.example.com; + return 301 https://$host$request_uri; +} -```nginx title="/etc/nginx/sites-available/serverbee" server { listen 443 ssl http2; server_name monitor.example.com; - ssl_certificate /etc/letsencrypt/live/monitor.example.com/fullchain.pem; + ssl_certificate /etc/letsencrypt/live/monitor.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/monitor.example.com/privkey.pem; + ssl_protocols TLSv1.2 TLSv1.3; + ssl_ciphers HIGH:!aNULL:!MD5; - # 推荐的 SSL 配置 - ssl_protocols TLSv1.2 TLSv1.3; - ssl_ciphers HIGH:!aNULL:!MD5; - ssl_prefer_server_ciphers on; + # 安全响应头 + add_header X-Frame-Options "SAMEORIGIN" always; + add_header X-Content-Type-Options "nosniff" always; + add_header Referrer-Policy "strict-origin-when-cross-origin" always; - # 普通 HTTP 请求 location / { - proxy_pass http://127.0.0.1:9527; + proxy_pass http://serverbee; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; - } - # 浏览器 WebSocket(实时数据推送) - location /api/ws/ { - proxy_pass http://127.0.0.1:9527; + # WebSocket 支持(Agent、浏览器实时更新和终端均需要) proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } - # Agent WebSocket - location /api/agent/ws { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; + # 为长连接 WebSocket 设置较长的超时 proxy_read_timeout 86400s; proxy_send_timeout 86400s; - } -} -# HTTP 重定向到 HTTPS -server { - listen 80; - server_name monitor.example.com; - return 301 https://$host$request_uri; + # 关闭缓冲以支持实时数据 + proxy_buffering off; + } } ``` -```bash -# 启用站点 -sudo ln -s /etc/nginx/sites-available/serverbee /etc/nginx/sites-enabled/ -sudo nginx -t -sudo systemctl reload nginx -``` - - -WebSocket 路径必须正确配置 `Upgrade` 和 `Connection` 头部。`proxy_read_timeout` 建议设置为较大值(如 86400s),防止 Nginx 主动断开 Agent 的长连接。 - +### Caddy -## Caddy 反向代理 +Caddy 会自动使用 Let's Encrypt 处理 TLS: -Caddy 会自动申请和续期 HTTPS 证书,配置最为简洁。 - -```txt title="/etc/caddy/Caddyfile" -monitor.example.com { - reverse_proxy 127.0.0.1:9527 -} ``` - -Caddy 天然支持 WebSocket 代理,无需额外配置。 - -如果需要更精细的控制: - -```txt title="/etc/caddy/Caddyfile" monitor.example.com { - # API 和 WebSocket - handle /api/* { - reverse_proxy 127.0.0.1:9527 - } - - # 静态文件和 SPA - handle { - reverse_proxy 127.0.0.1:9527 - } - - # 访问日志 - log { - output file /var/log/caddy/serverbee.log + reverse_proxy 127.0.0.1:9527 { + # Caddy 默认支持 WebSocket } } ``` -## Traefik 反向代理 +这就是所需的全部 Caddy 配置。Caddy 默认处理 HTTPS 证书、HTTP/2、WebSocket 升级和安全响应头。 -Traefik 通过 Docker labels 自动发现服务,无需单独配置文件。Traefik 天然支持 WebSocket 自动检测,无需额外配置。 +### Traefik -```yaml title="docker-compose.yml" +Traefik 通过 Docker labels 集成,无需单独的配置文件。Traefik 会自动检测 WebSocket 连接,无需额外配置。 + +```yaml services: serverbee-server: image: ghcr.io/zingerlittlebee/serverbee-server:latest @@ -419,196 +373,194 @@ volumes: serverbee-data: ``` -## TLS/HTTPS 配置 +## TLS / HTTPS -### 使用 Let's Encrypt(推荐) +生产环境部署请始终使用 HTTPS。推荐的方式如下: -**Caddy**(自动): - -Caddy 默认自动申请和续期 Let's Encrypt 证书,无需手动配置。 - -**Nginx + Certbot**: +### Let's Encrypt + Certbot(Nginx) ```bash -# 安装 Certbot sudo apt install certbot python3-certbot-nginx - -# 申请证书 sudo certbot --nginx -d monitor.example.com - -# 自动续期(Certbot 默认已设置 systemd timer) -sudo certbot renew --dry-run ``` -### Agent 连接配置 +Certbot 会自动配置 nginx 并设置证书续期。 + +### Let's Encrypt + Caddy -配置 HTTPS 后,Agent 需要使用 `https://` 协议连接: +Caddy 会自动申请和续期证书,无需额外设置。 -```toml title="agent.toml" +### 手动证书 + +如果你有自己的证书,请把它们放在一个安全目录中,并在反向代理配置里引用。确保证书文件可被代理进程读取,并且你已经准备好续期机制。 + +## Agent HTTPS 连接配置 + +当服务端部署在 HTTPS 之后时,需要相应地更新 Agent 的 `server_url`: + +```toml server_url = "https://monitor.example.com" ``` +Agent 会自动根据提供的 URL 处理 WebSocket 连接。 + -ServerBee 自身不处理 TLS 终止。所有 HTTPS/WSS 加密由前置的反向代理(Nginx/Caddy)处理。这种架构简化了 Server 的实现,同时也便于统一管理证书。 +ServerBee 自身不处理 TLS 终止。所有 HTTPS/WSS 加密都由前置的反向代理(Nginx/Caddy/Traefik)处理。这种架构简化了服务端的实现,也便于统一管理证书。 -使用 HTTPS 时,请保持 `auth.secure_cookie = true`。设为 `false` 可能仍然可以登录,但会去掉浏览器的 Secure Cookie 保护,不适合生产环境。 +使用 HTTPS 时,请在服务端配置中保持 `auth.secure_cookie = true`。设为 `false` 可能仍然可以登录,但会去掉浏览器的 Secure Cookie 保护,不适合生产环境。 ## 备份与恢复 -### 备份 +### 备份哪些内容 -ServerBee 的所有数据都存储在 SQLite 数据库文件中,备份只需要复制数据目录。 +ServerBee 把所有数据都存储在单个 SQLite 数据库文件和配置文件中: -**Docker 环境:** +| 项目 | 位置 | 说明 | +|------|------|------| +| 数据库 | `{data_dir}/serverbee.db` | 所有监控数据、用户、告警和设置 | +| WAL 文件 | `{data_dir}/serverbee.db-wal` | 预写日志(运行时备份需一并包含) | +| SHM 文件 | `{data_dir}/serverbee.db-shm` | 共享内存文件(运行时备份需一并包含) | +| 服务端配置 | `/opt/serverbee/etc/server.toml` | 服务端配置(安装脚本默认路径) | +| Agent 配置 | `/opt/serverbee/etc/agent.toml` | Agent 配置(位于每台被监控的服务器上) | +| GeoIP 数据库 | 视情况而定 | MaxMind MMDB 文件(如启用) | -```bash -# 方式一:直接复制 volume 数据 -docker compose stop -docker cp serverbee-server:/data /path/to/backup/serverbee-data-$(date +%Y%m%d) -docker compose start +### 备份策略 + +安装脚本默认的数据目录是 `/opt/serverbee/data`(数据库为 `/opt/serverbee/data/serverbee.db`)。若你是手写 systemd unit 的自定义部署,请替换为你配置的 `data_dir`。 -# 方式二:使用 SQLite 在线备份(无需停机) -docker compose exec serverbee-server sqlite3 /data/serverbee.db ".backup '/data/backup.db'" -docker cp serverbee-server:/data/backup.db /path/to/backup/ +**方式一:SQLite 备份命令(推荐,适用于运行中的服务端)** + +```bash +sqlite3 /opt/serverbee/data/serverbee.db ".backup '/backups/serverbee-$(date +%Y%m%d).db'" ``` -**systemd 环境:** +无需停机即可生成一致的时间点备份。 -引导脚本安装的数据目录是 `/opt/serverbee/data`(数据库为 `/opt/serverbee/data/serverbee.db`)。若你是手写 systemd unit 的自定义部署,请替换为你在配置中设置的 `data_dir`。停机操作也可用 `sudo serverbee restart` 代替下面的 `systemctl`。 +**方式二:文件复制(需要停止服务端)** ```bash -# 方式一:停机备份 sudo systemctl stop serverbee-server -cp -r /opt/serverbee/data /path/to/backup/serverbee-data-$(date +%Y%m%d) +cp /opt/serverbee/data/serverbee.db /backups/serverbee-$(date +%Y%m%d).db sudo systemctl start serverbee-server - -# 方式二:SQLite 在线备份(推荐,无需停机) -sqlite3 /opt/serverbee/data/serverbee.db ".backup '/path/to/backup/serverbee-$(date +%Y%m%d).db'" ``` -**定时备份脚本示例:** - -```bash title="/usr/local/bin/backup-serverbee.sh" -#!/bin/bash -BACKUP_DIR="/path/to/backups" -DB_PATH="/opt/serverbee/data/serverbee.db" -DATE=$(date +%Y%m%d_%H%M%S) - -# 创建备份 -sqlite3 "$DB_PATH" ".backup '${BACKUP_DIR}/serverbee-${DATE}.db'" - -# 保留最近 7 天的备份 -find "$BACKUP_DIR" -name "serverbee-*.db" -mtime +7 -delete - -echo "Backup completed: serverbee-${DATE}.db" -``` +**方式三:Docker volume 备份** ```bash -# 添加 crontab,每天凌晨 3 点备份 -echo "0 3 * * * /usr/local/bin/backup-serverbee.sh" | sudo tee -a /var/spool/cron/crontabs/root +docker compose stop +docker run --rm \ + -v serverbee-data:/data \ + -v $(pwd)/backups:/backups \ + alpine tar czf /backups/serverbee-$(date +%Y%m%d).tar.gz -C /data . +docker compose start ``` ### 恢复 +从备份恢复: + +1. 停止服务端 +2. 用备份替换数据库文件 +3. 启动服务端(如有需要,迁移会自动运行) + ```bash -# 停止服务 sudo systemctl stop serverbee-server +cp /backups/serverbee-20260314.db /opt/serverbee/data/serverbee.db +# 以独立用户运行的自定义加固部署:还需重新设置属主,例如 +# sudo chown serverbee:serverbee /opt/serverbee/data/serverbee.db +sudo systemctl start serverbee-server +``` -# 恢复数据库 -cp /path/to/backup/serverbee-20260314.db /opt/serverbee/data/serverbee.db +### 自动备份 -# 启动服务 -sudo systemctl start serverbee-server +设置一个 cron 任务来执行每日备份: + +```bash +# /etc/cron.d/serverbee-backup +0 2 * * * root sqlite3 /opt/serverbee/data/serverbee.db ".backup '/backups/serverbee-$(date +\%Y\%m\%d).db'" && find /backups -name 'serverbee-*.db' -mtime +30 -delete ``` - -恢复备份会覆盖备份时间点之后的所有数据变更。建议在恢复前先备份当前数据库。 - +它会在每天凌晨 2:00 运行,并删除超过 30 天的备份。 ## 健康检查 -### HTTP 健康检查端点 - -ServerBee 提供健康检查接口: +ServerBee 提供一个健康检查端点: -```bash -curl -s http://localhost:9527/healthz ``` +GET /healthz +``` + +用它来确认服务端是否正在运行且响应正常。可以在你的监控系统、Docker 健康检查或负载均衡器中配置它。 ### Docker 健康检查 -在 `docker-compose.yml` 中配置: +上面的 Docker Compose 示例已包含健康检查配置。当健康检查连续 3 次失败时,Docker 会自动重启容器。 -```yaml -healthcheck: - test: ["CMD", "wget", "--spider", "-q", "http://localhost:9527/healthz"] - interval: 30s - timeout: 5s - retries: 3 - start_period: 10s +### 外部监控 + +如果你使用外部的可用性监控服务,请把它指向: + +``` +https://monitor.example.com/healthz ``` -### 外部监控 +这样就形成了「监控监控系统」的配置,当 ServerBee 自身宕机时你也会收到告警。 + +## 资源需求 -建议使用外部监控工具监控 ServerBee 自身的可用性: +ServerBee 面向轻量级 VPS 实例设计: -- 监控 HTTP 端口 `9527` 的可达性 -- 监控 `/healthz` 端点的响应 -- 监控 WebSocket 连接的可用性 +| 组件 | 最低配置 | 推荐配置 | +|------|---------|---------| +| 服务端 CPU | 1 vCPU | 2 vCPU | +| 服务端内存 | 128 MB | 256 MB | +| 服务端磁盘 | 100 MB + 数据 | 1 GB+(取决于保留策略) | +| Agent CPU | 可忽略 | < 单核的 1% | +| Agent 内存 | 10 MB | 20 MB | -## 性能预估 +数据库大小取决于被监控服务器的数量和你的保留设置。粗略估计,每台服务器每天约产生 1 MB 原始记录。以 50 台服务器、默认 7 天保留为例,数据库约为 350 MB。 -以 1000 台 Agent 为例的资源消耗预估: +## 安全检查清单 -| 资源 | 预估值 | -|------|--------| -| Server 内存 | 50-100 MB | -| Agent 内存 | 5-15 MB(每台) | -| SQLite 写入 | 约 1000 行/分钟 | -| WebSocket 带宽 | 约 160 KB/s | -| 磁盘占用(30 天) | 约 8 GB | +将 ServerBee 暴露到公网前,请逐项确认: -SQLite 的 WAL 模式可以轻松承载这个级别的写入量。对于绝大多数 VPS 监控场景,单实例 ServerBee 即可满足需求。 +- [ ] 修改默认管理员密码 +- [ ] 使用 HTTPS 和有效的 TLS 证书 +- [ ] 设置 `auth.secure_cookie = true`(默认值) +- [ ] 将服务端绑定到 localhost,并通过反向代理对外暴露 +- [ ] 为登录尝试设置严格的速率限制 +- [ ] 为管理员账号启用 TOTP 两步验证 +- [ ] 仅在接入 Agent 时签发注册码,并保持其短时有效 +- [ ] 保持 GeoIP 数据库更新(如启用) +- [ ] 配置自动备份 +- [ ] 用外部健康检查监控服务端自身 ## 升级指南 -### Docker 升级 +ServerBee 在启动时会自动运行数据库迁移,升级后无需手动执行任何迁移步骤。升级前请先备份数据库。 + +**Docker:** ```bash -# 拉取最新镜像 docker compose pull - -# 重启服务(自动运行数据库迁移) -docker compose up -d - -# 确认运行状态 +docker compose up -d # 重启并自动运行迁移 docker compose ps -docker compose logs --tail 50 serverbee-server ``` -### 二进制升级 +**二进制:** ```bash -# 下载新版本 wget https://github.com/ZingerLittleBee/ServerBee/releases/latest/download/serverbee-server-linux-amd64 - -# 停止服务 sudo systemctl stop serverbee-server - -# 替换二进制 sudo mv serverbee-server-linux-amd64 /usr/local/bin/serverbee-server sudo chmod +x /usr/local/bin/serverbee-server - -# 启动服务(自动运行数据库迁移) sudo systemctl start serverbee-server ``` - -ServerBee 在启动时会自动运行数据库迁移,升级后无需手动执行任何迁移操作。建议在升级前进行数据库备份。 - +如果你使用安装脚本部署,`sudo serverbee upgrade -y` 会自动完成二进制下载、替换和重启。 From 818cfce653214ee28e6f67ffefcee282b102f891 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:24 +0800 Subject: [PATCH 17/21] docs(agent): expand install, registration and metrics reference --- apps/docs/content/docs/en/agent.mdx | 96 ++++++++---- apps/docs/content/docs/zh/agent.mdx | 222 ++++++++++++++++------------ 2 files changed, 194 insertions(+), 124 deletions(-) diff --git a/apps/docs/content/docs/en/agent.mdx b/apps/docs/content/docs/en/agent.mdx index 5ed81f19..19e81609 100644 --- a/apps/docs/content/docs/en/agent.mdx +++ b/apps/docs/content/docs/en/agent.mdx @@ -4,7 +4,7 @@ description: Install and configure the ServerBee agent on your monitored servers icon: Cpu --- -The ServerBee agent is a lightweight Rust binary that runs on each server you want to monitor. It collects system metrics (CPU, memory, disk, network, load, temperature, GPU, disk I/O) and reports them to the central ServerBee server over a persistent WebSocket connection. +The ServerBee agent is a lightweight Rust binary that runs on each server you want to monitor. It collects system metrics (CPU, memory, disk, network, load, temperature, GPU, disk I/O) and reports them to the central ServerBee server over a persistent WebSocket connection. It also executes probe tasks and remote commands dispatched by the server. ## What the Agent Does @@ -50,13 +50,25 @@ If the agent is already installed, re-running `install agent` errors out and tel Download the agent binary for your platform from the [releases page](https://github.com/ZingerLittleBee/ServerBee/releases): +| Platform | File name | +|----------|-----------| +| Linux amd64 | `serverbee-agent-linux-amd64` | +| Linux arm64 | `serverbee-agent-linux-arm64` | +| macOS amd64 | `serverbee-agent-darwin-amd64` | +| macOS arm64 | `serverbee-agent-darwin-arm64` | +| Windows amd64 | `serverbee-agent-windows-amd64.exe` | + ```bash -chmod +x serverbee-agent +wget https://github.com/ZingerLittleBee/ServerBee/releases/latest/download/serverbee-agent-linux-amd64 +chmod +x serverbee-agent-linux-amd64 +sudo mv serverbee-agent-linux-amd64 /usr/local/bin/serverbee-agent ``` ### Build from Source ```bash +git clone https://github.com/ZingerLittleBee/ServerBee.git +cd ServerBee cargo build --release -p serverbee-agent # With NVIDIA GPU monitoring (optional) @@ -103,24 +115,36 @@ Mount `/etc/machine-id` from the host to keep the agent fingerprint stable acros ## Registration Flow -Agents authenticate with the server using a **token**. There are two ways to obtain a token: +Agents authenticate with the server using a **token**. There are two ways to obtain a token: a one-time enrollment code (recommended) or a manually created token. ### Enrollment Code Registration (Recommended) -1. Sign in to the web UI as an admin, open **Settings**, and generate a one-time **enrollment code** (also available via `POST /api/agent/enrollments`, admin-only). The code is **single-use** and **short-lived** (default 10 minute expiry). -2. Configure the agent with the code: +1. Sign in to the web UI as an admin, open **Settings**, and generate a one-time **enrollment code** (also available via `POST /api/agent/enrollments`, admin-only). The code is **single-use** and **short-lived** (default 10 minute expiry), and a fresh code is needed for each new agent. +2. Configure the agent with the code, either via environment variables: -```toml +```bash +SERVERBEE_SERVER_URL=http://your-server-ip:9527 \ +SERVERBEE_ENROLLMENT_CODE=YOUR_ONE_TIME_CODE \ +serverbee-agent +``` + +or in the config file: + +```toml title="/etc/serverbee/agent.toml" server_url = "http://your-server-ip:9527" enrollment_code = "" +# Leave empty on first run; auto-populated after registration +token = "" ``` -3. Start the agent. It will: +3. Start the agent. On first run (no token), it will: - Send a registration request to `POST /api/agent/register` presenting the one-time enrollment code - Receive a `server_id` and per-server `token` from the server (the code is consumed on this first successful registration) - Save the token to the config file automatically - Connect via WebSocket using the token for all future sessions -- the enrollment code is no longer needed +On subsequent runs (token present), the agent connects directly over WebSocket, sends its static system info, and reports metrics on the configured interval. + When the agent can read a stable machine identifier, it also sends a fingerprint during registration. Repeated registration from the same machine reuses the existing server row instead of creating duplicate placeholders. If a code is lost, expired, or already used, the server responds with HTTP 401 and the agent logs `Registration failed: HTTP 401 ... enrollment code ... expired or already used`; mint a fresh code in Settings to retry. ### Correcting a Wrong Enrollment Code @@ -179,7 +203,7 @@ token = "" enrollment_code = "" [collector] -interval = 3 # Metric collection interval in seconds +interval = 3 # Metric collection interval in seconds (can be overridden by the server's Welcome message) enable_gpu = false # Enable NVIDIA GPU monitoring (requires nvidia-smi) enable_temperature = true # Enable temperature sensor monitoring @@ -188,9 +212,20 @@ level = "info" # Log level: trace, debug, info, warn, error file = "" # Log file path (empty = stdout only) ``` +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `server_url` | string | required | URL of your ServerBee server | +| `enrollment_code` | string | `""` | One-time enrollment code, needed only for first registration; consumed on success and unused once a token is present | +| `token` | string | auto-generated | Agent token, written automatically after registration | +| `collector.interval` | int | `3` | Metric collection interval in seconds; can be overridden by the server's Welcome message | +| `collector.enable_gpu` | bool | `false` | Enable GPU metric collection | +| `collector.enable_temperature` | bool | `true` | Enable temperature collection | +| `log.level` | string | `"info"` | Log level | +| `log.file` | string | `""` | Log file path (empty = stdout only) | + ### Environment Variables -Like the server, all options support `SERVERBEE_` prefixed environment variables: +Like the server, all options support `SERVERBEE_` prefixed environment variables. Use `__` (double underscore) as the nested-field separator: ```bash export SERVERBEE_SERVER_URL="http://your-server-ip:9527" @@ -247,11 +282,9 @@ Only NVIDIA GPUs are supported (via the `nvml-wrapper` library). AMD and Intel G ## Running as a Systemd Service -For production deployments, run the agent as a systemd service so it starts automatically on boot. +For production deployments, run the agent as a systemd service so it starts automatically on boot. The install script creates this service automatically; to configure it manually, create `/etc/systemd/system/serverbee-agent.service`: -Create `/etc/systemd/system/serverbee-agent.service`: - -```ini +```ini title="/etc/systemd/system/serverbee-agent.service" [Unit] Description=ServerBee Agent After=network-online.target @@ -264,6 +297,7 @@ Restart=always RestartSec=5 User=root WorkingDirectory=/etc/serverbee +AmbientCapabilities=CAP_NET_RAW # Optional: limit resource usage MemoryMax=128M @@ -273,6 +307,8 @@ CPUQuota=10% WantedBy=multi-user.target ``` +`AmbientCapabilities=CAP_NET_RAW` grants the privilege required for ICMP ping probes; remove the line if you do not need ICMP probing. + Then enable and start the service: ```bash @@ -281,7 +317,7 @@ sudo systemctl enable serverbee-agent sudo systemctl start serverbee-agent ``` -Check the status: +Check the status and logs: ```bash sudo systemctl status serverbee-agent @@ -344,23 +380,21 @@ During the initial connection: ## Collected Metrics -| Metric | Type | Description | -|--------|------|-------------| -| `cpu` | float | CPU usage percentage (0-100) | -| `mem_used` | int | Used memory in bytes | -| `swap_used` | int | Used swap in bytes | -| `disk_used` | int | Used disk space in bytes | -| `net_in_speed` | int | Network inbound speed (bytes/sec) | -| `net_out_speed` | int | Network outbound speed (bytes/sec) | -| `net_in_transfer` | int | Cumulative inbound transfer (bytes) | -| `net_out_transfer` | int | Cumulative outbound transfer (bytes) | -| `load1` / `load5` / `load15` | float | System load averages | -| `tcp_conn` | int | Active TCP connections | -| `udp_conn` | int | Active UDP connections | -| `process_count` | int | Running process count | -| `uptime` | int | System uptime in seconds | -| `temperature` | float | CPU temperature (optional) | -| `gpu` | object | GPU metrics per device (optional) | +The agent collects the following metrics (sourced from the `sysinfo` library, `/proc` on Linux, and `nvml-wrapper` for GPU) and reports them to the server: + +| Category | Reported fields | Source | +|----------|-----------------|--------| +| CPU | `cpu` (usage %), name, cores, arch | `sysinfo::System` | +| Memory | `mem_used`, `swap_used` (bytes) | `sysinfo::System` | +| Disk | `disk_used` (bytes) | `sysinfo::Disks` | +| Network | `net_in_speed` / `net_out_speed`, `net_in_transfer` / `net_out_transfer` (bytes) | `sysinfo::Networks` + delta calc | +| Load | `load1` / `load5` / `load15` | `sysinfo::System::load_average()` | +| Connections | `tcp_conn` / `udp_conn` | `/proc/net/tcp` (Linux) | +| Process | `process_count` | `sysinfo::System::processes()` | +| Uptime | `uptime` (seconds) | `sysinfo::System` | +| Temperature | `temperature` (°C, optional) | `sysinfo::Components` | +| GPU | `gpu` (utilization, VRAM, temperature, optional) | `nvml-wrapper` | +| Virtualization | virtualization type | `systemd-detect-virt` / DMI | ## Resource Footprint diff --git a/apps/docs/content/docs/zh/agent.mdx b/apps/docs/content/docs/zh/agent.mdx index 0df53f86..310aff10 100644 --- a/apps/docs/content/docs/zh/agent.mdx +++ b/apps/docs/content/docs/zh/agent.mdx @@ -4,30 +4,33 @@ description: ServerBee Agent 的安装、注册和配置指南。 icon: Cpu --- -Agent 是部署在被监控服务器上的轻量级数据采集程序。它负责采集系统指标(CPU、内存、磁盘、网络等),通过 WebSocket 实时上报至 Server,并执行 Server 下发的探测任务和远程命令。 +Agent 是部署在被监控服务器上的轻量级 Rust 二进制程序,运行在每一台你希望监控的服务器上。它采集系统指标(CPU、内存、磁盘、网络、负载、温度、GPU、磁盘 I/O),通过持久 WebSocket 连接实时上报至中心 Server,同时执行 Server 下发的探测任务和远程命令。 ## Agent 的职责 -- **指标采集**:每 3 秒采集一次系统指标并上报至 Server,全平台支持磁盘 I/O 吞吐量采集 -- **静态信息上报**:启动时上报 CPU 型号、操作系统、内存总量等静态信息 -- **Ping 探测**:执行 Server 下发的 ICMP/TCP/HTTP 探测任务 -- **远程命令**:执行 Server 下发的 Shell 命令并返回结果 -- **Web 终端**:提供 PTY 终端会话供管理员远程操作 -- **文件管理**:远程文件浏览、读写、上传/下载,支持路径沙箱安全机制 -- **Docker 监控**:当 Docker daemon 可用时,采集容器统计数据、日志流、事件、网络和卷信息 -- **自动重连**:与 Server 断开连接后自动重连(指数退避 + 随机抖动) +- 每 3 秒(可配置)采集一次系统指标并上报至 Server,全平台支持磁盘 I/O 吞吐量采集 +- 通过 WebSocket 将指标上报至 Server +- 执行 Server 下发的 Ping 探测任务(ICMP、TCP、HTTP) +- 提供 PTY Shell 终端会话供 Web 终端远程操作 +- 执行 Server 下发的远程命令 +- 管理远程文件操作(浏览、读取、写入、上传、下载),内置路径沙箱安全机制 +- 当 Docker daemon 可用时监控 Docker 容器(统计、日志、事件、网络、卷) +- 支持在 Server 推送新版本时自动自升级 +- 连接断开后以指数退避自动重连 ## 安装方式 ### 安装脚本(推荐) +安装脚本会自动检测架构、下载二进制、生成配置并注册 systemd 服务: + ```bash curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/deploy/install.sh | sudo bash -s -- agent \ --server-url http://your-server-ip:9527 \ --enrollment-code YOUR_ONE_TIME_CODE ``` -脚本会自动检测架构、下载二进制、生成配置并注册 systemd 服务。安装布局:二进制在 `/opt/serverbee/bin/`,配置在 `/opt/serverbee/etc/agent.toml`,管理 CLI 软链为 `/usr/local/bin/serverbee`。 +安装布局:二进制在 `/opt/serverbee/bin/`,配置在 `/opt/serverbee/etc/agent.toml`,管理 CLI 软链为 `/usr/local/bin/serverbee`。 安装完成后,使用 `serverbee` CLI 管理 Agent(安装时自动部署): @@ -61,10 +64,23 @@ chmod +x serverbee-agent-linux-amd64 sudo mv serverbee-agent-linux-amd64 /usr/local/bin/serverbee-agent ``` +### 源码编译 + +```bash +git clone https://github.com/ZingerLittleBee/ServerBee.git +cd ServerBee +cargo build --release -p serverbee-agent + +# 启用 NVIDIA GPU 监控(可选) +cargo build --release -p serverbee-agent --features gpu +``` + +二进制文件位于 `target/release/serverbee-agent`。 + ### Docker(不推荐) -ServerBee Agent 是绿色软件——只有一个二进制文件,不会产生文件夹和其他文件残留。卸载只需删除二进制文件和配置文件。推荐直接下载二进制文件运行。 +ServerBee Agent 是绿色软件——只有一个二进制文件,不会产生文件夹和其他文件残留。卸载只需删除二进制文件和配置文件。推荐直接下载二进制文件运行,体验最佳。 如果仍要使用 Docker,Agent 需要特权权限才能采集宿主机指标: @@ -97,28 +113,14 @@ docker run -d \ - 温度和 GPU 监控在容器内可能无法工作 - Web 终端功能访问的是容器内环境,而非宿主机 -### 源码编译 - -```bash -git clone https://github.com/ZingerLittleBee/ServerBee.git -cd ServerBee -cargo build --release -p serverbee-agent - -# 启用 GPU 监控(可选) -cargo build --release -p serverbee-agent --features gpu -``` - ## 注册流程 -Agent 首次连接 Server 需要通过一次性注册码(Enrollment Code)进行注册。 - -### 第一步:生成一次性注册码 - -以管理员身份登录 Server 管理面板,进入「设置」页面生成一个一次性注册码并复制(也可通过 `POST /api/agent/enrollments` 生成,仅管理员可用)。注册码为**单次使用**且**短时有效**(默认 10 分钟过期),每接入一台新 Agent 都需重新生成。 +Agent 通过 **Token** 向 Server 认证。获取 Token 有两种方式:一次性注册码(推荐)或手动创建的 Token。 -### 第二步:首次启动注册 +### 通过注册码注册(推荐) -**方式一:通过环境变量注册** +1. 以管理员身份登录 Server 管理面板,进入「设置」页面生成一个一次性**注册码**(也可通过 `POST /api/agent/enrollments` 生成,仅管理员可用)。注册码为**单次使用**且**短时有效**(默认 10 分钟过期),每接入一台新 Agent 都需重新生成。 +2. 用注册码配置 Agent,可通过环境变量: ```bash SERVERBEE_SERVER_URL=http://your-server-ip:9527 \ @@ -126,7 +128,7 @@ SERVERBEE_ENROLLMENT_CODE=YOUR_ONE_TIME_CODE \ serverbee-agent ``` -**方式二:编辑配置文件后启动** +也可写入配置文件: ```toml title="/etc/serverbee/agent.toml" server_url = "http://your-server-ip:9527" @@ -135,27 +137,15 @@ enrollment_code = "<设置页生成的一次性注册码>" token = "" ``` -```bash -SERVERBEE_ENROLLMENT_CODE=YOUR_ONE_TIME_CODE serverbee-agent -``` - -### 注册过程 +3. 启动 Agent。首次运行(无 token)时,它会: + - 向 Server 发送注册请求 `POST /api/agent/register`,携带一次性注册码 + - 从 Server 接收 `server_id` 和每服务器的 `token`(注册码在此次首次注册成功时被消费) + - 将 token 自动写回配置文件 + - 后续所有会话都使用 token 通过 WebSocket 连接——注册码不再需要 -``` -首次运行(无 token): - 1. 读取 server_url 和 enrollment_code - 2. 向 Server 发送注册请求 POST /api/agent/register(携带一次性注册码) - 3. Server 校验并消费注册码,返回 { server_id, token } - 4. Agent 将 token 写回配置文件 - 5. 开始正常的 WebSocket 连接和指标上报 - -后续运行(有 token): - 1. 直接使用 token 建立 WebSocket 连接 - 2. 发送静态系统信息 - 3. 按配置的间隔周期上报指标 -``` +后续运行(已有 token)时,Agent 直接通过 WebSocket 连接,发送静态系统信息,并按配置的间隔周期上报指标。 -当 Agent 能读取稳定的机器标识时,还会在注册请求中携带指纹。相同机器重复注册时会复用原有服务器记录并轮换 token,而不是继续创建新的占位条目。 +当 Agent 能读取稳定的机器标识时,还会在注册请求中携带指纹。相同机器重复注册时会复用原有服务器记录,而不是继续创建重复的占位条目。如果注册码丢失、过期或已被使用,Server 会返回 HTTP 401,Agent 日志中会出现 `Registration failed: HTTP 401 ... enrollment code ... expired or already used`;到「设置」页重新生成一个新码即可重试。 ### 更正错误的注册码 @@ -190,40 +180,63 @@ token = "从管理面板获取的 Agent Token" ## 配置文件 -手动运行的 Agent 默认读取 `/etc/serverbee/agent.toml`。通过安装脚本部署时,配置文件位于 `/opt/serverbee/etc/agent.toml`(`/etc/serverbee` 为旧版布局,脚本会自动迁移)。 +Agent 按以下顺序读取 TOML 配置文件: -```toml title="/etc/serverbee/agent.toml" +1. `/etc/serverbee/agent.toml`(系统级,优先) +2. `agent.toml`(工作目录) +3. 带 `SERVERBEE_` 前缀的环境变量 + + +通过安装脚本部署时,配置文件位于 `/opt/serverbee/etc/agent.toml`(`/etc/serverbee` 为旧版布局,脚本会自动迁移)。 + + +以下是包含全部可用选项的完整 `agent.toml`: + +```toml +# 必填:ServerBee Server 的地址 server_url = "http://your-server-ip:9527" -token = "auto-generated-after-registration" + +# 认证 Token(注册成功后自动写入) +token = "" + +# 首次注册用的一次性注册码(仅在 token 为空时使用) +enrollment_code = "" [collector] -interval = 3 -enable_gpu = false -enable_temperature = true +interval = 3 # 指标采集间隔,单位秒(可被 Server 的 Welcome 消息覆盖) +enable_gpu = false # 启用 NVIDIA GPU 监控(需要 nvidia-smi) +enable_temperature = true # 启用温度传感器监控 [log] -level = "info" -file = "/var/log/serverbee-agent.log" +level = "info" # 日志级别:trace、debug、info、warn、error +file = "" # 日志文件路径(留空仅输出到 stdout) ``` -### 配置项说明 - | 配置项 | 类型 | 默认值 | 说明 | |--------|------|--------|------| -| `server_url` | string | 必填 | Server 的地址 | +| `server_url` | string | 必填 | ServerBee Server 的地址 | | `enrollment_code` | string | `""` | 一次性注册码,仅首次注册时需要;注册成功后即被消费,拥有 token 后无需再填 | | `token` | string | 自动生成 | Agent Token,注册成功后自动写入 | -| `collector.interval` | int | `3` | 指标采集间隔,单位秒。可被 Server 的 Welcome 消息覆盖 | +| `collector.interval` | int | `3` | 指标采集间隔,单位秒;可被 Server 的 Welcome 消息覆盖 | | `collector.enable_gpu` | bool | `false` | 是否启用 GPU 指标采集 | | `collector.enable_temperature` | bool | `true` | 是否启用温度采集 | | `log.level` | string | `"info"` | 日志级别 | -| `log.file` | string | `""` | 日志文件路径,留空输出到 stdout | +| `log.file` | string | `""` | 日志文件路径(留空仅输出到 stdout) | + +### 环境变量 -环境变量覆盖:使用 `SERVERBEE_` 前缀,嵌套字段用 `__` 分隔。如 `SERVERBEE_SERVER_URL`(顶层)、`SERVERBEE_COLLECTOR__INTERVAL`(嵌套)、`SERVERBEE_LOG__LEVEL`(嵌套)。 +与 Server 一样,所有选项都支持 `SERVERBEE_` 前缀的环境变量,嵌套字段用 `__`(双下划线)分隔: + +```bash +export SERVERBEE_SERVER_URL="http://your-server-ip:9527" +export SERVERBEE_TOKEN="your-agent-token" +export SERVERBEE_COLLECTOR__INTERVAL=5 +export SERVERBEE_COLLECTOR__ENABLE_GPU=true +``` ## Agent 本地功能锁定 -除了 Server 端配置外,Agent 还支持通过 CLI 参数设置本地能力上限: +除了 Server 端配置外,Agent 还支持通过 CLI 参数收紧本地能力上限,无需改动 Server 端配置: ```bash serverbee-agent --allow-cap terminal --allow-cap exec @@ -238,16 +251,20 @@ serverbee-agent --deny-cap ping_http ## GPU 监控 -ServerBee 支持 NVIDIA GPU 指标采集,需要满足以下条件: +NVIDIA GPU 指标采集默认关闭,需同时满足以下三个条件: -1. **编译时**:启用 `gpu` feature flag +1. **编译时**:用 `gpu` feature flag 编译 Agent(预编译的 Release 二进制不含该特性) ```bash cargo build --release -p serverbee-agent --features gpu ``` -2. **运行时**:服务器安装了 NVIDIA 驱动和 NVML 库 -3. **配置中**:设置 `collector.enable_gpu = true` +2. **运行时**:宿主机安装了 NVIDIA 驱动和 NVML 库 +3. **配置中**:设置 `enable_gpu = true` + ```toml + [collector] + enable_gpu = true + ``` -采集的 GPU 指标包括: +启用后,Agent 会为每块设备采集 GPU 指标: | 指标 | 说明 | |------|------| @@ -257,47 +274,58 @@ ServerBee 支持 NVIDIA GPU 指标采集,需要满足以下条件: | GPU 利用率 | GPU 计算核心利用率百分比 | | GPU 温度 | 当前温度 | -每块 GPU 独立记录,支持多 GPU 服务器。 +这些指标会显示在 Server 管理面板中,并可用于告警规则。 -目前仅支持 NVIDIA GPU(通过 nvml-wrapper 库)。AMD 和 Intel GPU 的支持计划在后续版本中加入。 +目前仅支持 NVIDIA GPU(通过 `nvml-wrapper` 库)。AMD 和 Intel GPU 的支持计划在后续版本中加入。 ## 作为 systemd 服务运行 -安装脚本会自动创建 systemd 服务。如需手动配置: +生产环境建议将 Agent 作为 systemd 服务运行,以便开机自启。安装脚本会自动创建该服务;如需手动配置,创建 `/etc/systemd/system/serverbee-agent.service`: ```ini title="/etc/systemd/system/serverbee-agent.service" [Unit] Description=ServerBee Agent -After=network.target +After=network-online.target +Wants=network-online.target [Service] Type=simple ExecStart=/usr/local/bin/serverbee-agent Restart=always RestartSec=5 +User=root +WorkingDirectory=/etc/serverbee AmbientCapabilities=CAP_NET_RAW +# 可选:限制资源占用 +MemoryMax=128M +CPUQuota=10% + [Install] WantedBy=multi-user.target ``` +`AmbientCapabilities=CAP_NET_RAW` 是 ICMP Ping 探测所需的权限;不需要 ICMP 探测可移除此行。 + +然后启用并启动服务: + ```bash -# 启用并启动服务 sudo systemctl daemon-reload sudo systemctl enable serverbee-agent sudo systemctl start serverbee-agent +``` -# 查看运行状态 -sudo systemctl status serverbee-agent +查看运行状态和日志: -# 查看日志 -sudo journalctl -u serverbee-agent -f +```bash +sudo systemctl status serverbee-agent +journalctl -u serverbee-agent -f ``` -`AmbientCapabilities=CAP_NET_RAW` 是 ICMP Ping 探测所需的权限。如果不需要 ICMP 探测功能,可以移除此行。 +Agent 需以 root 运行才能访问全部系统指标(温度传感器、进程列表等)并为 Web 终端打开 PTY 会话。如果不需要终端访问,也可以用非 root 用户运行,但部分指标可能采集不到。 ## 平台支持 @@ -342,22 +370,30 @@ Agent 与 Server 之间维持一条持久 WebSocket 连接。连接断开后会 - **重连恢复**:重连成功后退避重置为 1 秒,并自动重新上报 `SystemInfo` - **心跳检测**:Server 每 30 秒发送一次 Ping,Agent 回复 Pong;超过 30 秒无上报即判定为离线 +首次连接过程: + +1. Server 发送 `Welcome` 消息,包含分配的 `server_id` 和 `report_interval` +2. Agent 发送 `SystemInfo`(CPU 型号、核心数、架构、操作系统、内核、内存、磁盘、IP 地址、虚拟化类型、Agent 版本) +3. Server 以 `Ack` 确认 +4. Server 同步所有已分配的 Ping 任务 +5. Agent 开始周期性指标上报循环 + ## 采集指标详情 -Agent 使用 `sysinfo` 库采集以下系统指标: - -| 指标类别 | 具体指标 | 采集来源 | -|----------|----------|----------| -| CPU | 使用率、型号、核心数、架构 | `sysinfo::System` | -| 内存 | 已用/总量、Swap 已用/总量 | `sysinfo::System` | -| 磁盘 | 已用/总量 | `sysinfo::Disks` | -| 网络 | 入站/出站速率、累计流量 | `sysinfo::Networks` + 差值计算 | -| 负载 | load1 / load5 / load15 | `sysinfo::System::load_average()` | -| 进程 | 进程数 | `sysinfo::System::processes()` | -| 连接数 | TCP / UDP 连接数 | `/proc/net/tcp` (Linux) | -| 温度 | 传感器温度 | `sysinfo::Components` | -| GPU | 利用率、显存、温度 | `nvml-wrapper`(可选) | -| 系统信息 | OS、内核版本、运行时间 | `sysinfo::System` | +Agent 采集以下指标(来源于 `sysinfo` 库、Linux 下的 `/proc`,以及用于 GPU 的 `nvml-wrapper`)并上报至 Server: + +| 类别 | 上报字段 | 采集来源 | +|------|----------|----------| +| CPU | `cpu`(使用率 %)、型号、核心数、架构 | `sysinfo::System` | +| 内存 | `mem_used`、`swap_used`(字节) | `sysinfo::System` | +| 磁盘 | `disk_used`(字节) | `sysinfo::Disks` | +| 网络 | `net_in_speed` / `net_out_speed`、`net_in_transfer` / `net_out_transfer`(字节) | `sysinfo::Networks` + 差值计算 | +| 负载 | `load1` / `load5` / `load15` | `sysinfo::System::load_average()` | +| 连接数 | `tcp_conn` / `udp_conn` | `/proc/net/tcp`(Linux) | +| 进程 | `process_count` | `sysinfo::System::processes()` | +| 运行时间 | `uptime`(秒) | `sysinfo::System` | +| 温度 | `temperature`(°C,可选) | `sysinfo::Components` | +| GPU | `gpu`(利用率、显存、温度,可选) | `nvml-wrapper` | | 虚拟化 | 虚拟化类型 | `systemd-detect-virt` / DMI | ## 资源开销 From 74e466caa0a4b05b134aff569e2133f1d702733c Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:24 +0800 Subject: [PATCH 18/21] docs(alerts): regroup metric types and document notification channels --- apps/docs/content/docs/en/alerts.mdx | 167 +++++++++++++++++---------- apps/docs/content/docs/zh/alerts.mdx | 145 ++++++++++++----------- 2 files changed, 178 insertions(+), 134 deletions(-) diff --git a/apps/docs/content/docs/en/alerts.mdx b/apps/docs/content/docs/en/alerts.mdx index e1f47b3b..3afd7d45 100644 --- a/apps/docs/content/docs/en/alerts.mdx +++ b/apps/docs/content/docs/en/alerts.mdx @@ -10,62 +10,84 @@ ServerBee includes a flexible alerting system that evaluates metric thresholds a A background task evaluates all enabled rules every 60 seconds: -1. Resolve which servers each enabled rule covers. +1. Resolve which servers each rule covers. 2. Check whether the rule's conditions are met for each covered server. 3. Send a notification when a rule triggers (subject to debounce). 4. Clear the alert state when a previously triggered rule recovers. -Alert state is persisted in the database and survives server restarts. Event-driven rules (IP change, SSH login, brute-force, port scan) are evaluated when an agent reports the event rather than on the 60-second cycle. +Alert state is persisted in the database and survives server restarts. Event-driven rules (IP change, SSH login, brute-force, port scan) are evaluated when an agent reports the event, not on the 60-second cycle. ## Creating Alert Rules An alert rule consists of: - **Name** -- A descriptive label. -- **Rules** -- One or more metric conditions, all of which must be true simultaneously (AND logic). +- **Rules** -- One or more metric conditions. All conditions must be true simultaneously (AND logic). - **Trigger mode** -- `always` (repeat with debounce) or `once` (notify only on the first trigger). - **Cover type** -- Which servers the rule applies to. - **Notification group** -- Where notifications are sent. +- **Trigger/recover tasks** -- Optional. Remote commands run automatically when the rule triggers or recovers. - **Block source IP** -- Optional. For security-event rules, automatically instructs the agent's firewall to block the offending source IP when the rule triggers. -### Supported Metric Types - -| Rule Type | Description | Threshold Meaning | -|-----------|-------------|-------------------| -| `cpu` | CPU usage percentage | `min`: triggers when CPU >= value | -| `memory` | Memory used (bytes) | `min`: triggers when usage >= value | -| `swap` | Swap used (bytes) | `min`: triggers when usage >= value | -| `disk` | Disk used (bytes) | `min`: triggers when usage >= value | -| `load1` | 1-minute load average | `min`: triggers when load >= value | -| `load5` | 5-minute load average | `min`: triggers when load >= value | -| `load15` | 15-minute load average | `min`: triggers when load >= value | -| `net_in_speed` | Download speed (bytes/s) | `min`: triggers when speed >= value | -| `net_out_speed` | Upload speed (bytes/s) | `min`: triggers when speed >= value | -| `tcp_conn` | TCP connections | `min`: triggers when count >= value | -| `udp_conn` | UDP connections | `min`: triggers when count >= value | -| `process` | Process count | `min`: triggers when count >= value | -| `temperature` | CPU temperature (C) | `min`: triggers when temp >= value | -| `gpu` | GPU utilization (%) | `min`: triggers when usage >= value | -| `network_latency` | Network latency (ms) | `min`: triggers when average probe latency >= value | -| `network_packet_loss` | Network packet loss (%) | `min`: triggers when packet loss percentage >= value | -| `offline` | Server offline | Triggers when server has been offline for `duration` seconds | -| `transfer_in_cycle` | Inbound traffic per cycle | Triggers when cumulative transfer >= `cycle_limit` bytes | -| `transfer_out_cycle` | Outbound traffic per cycle | Triggers when cumulative transfer >= `cycle_limit` bytes | -| `transfer_all_cycle` | Total traffic per cycle | Triggers when combined transfer >= `cycle_limit` bytes | -| `expiration` | Server expiration date | Triggers when expired_at is within `duration` days | -| `ip_changed` | Agent IP address changed | Event-driven rule; triggers when the agent reports an IP change event | -| `ssh_login_detected` | Successful SSH login | Event-driven; supports `first_seen_only` filter. See [Security Events](/en/docs/security-events) | -| `ssh_brute_force_detected` | SSH brute-force burst | Event-driven; supports `severity_min` (medium/high/critical) and `exclude_cidrs` | -| `port_scan_detected` | Port scan from a single IP | Event-driven; supports `severity_min` and `exclude_cidrs` | - -### Threshold Configuration +## Supported Metric Types + +### Resource Thresholds + +| Rule Type | Description | `min` Threshold Meaning | +|-----------|-------------|-------------------------| +| `cpu` | CPU usage percentage | Triggers when usage >= value | +| `memory` | Memory used (bytes) | Triggers when usage >= value | +| `swap` | Swap used (bytes) | Triggers when usage >= value | +| `disk` | Disk used (bytes) | Triggers when usage >= value | +| `load1` | 1-minute load average | Triggers when load >= value | +| `load5` | 5-minute load average | Triggers when load >= value | +| `load15` | 15-minute load average | Triggers when load >= value | +| `temperature` | CPU temperature (C) | Triggers when temp >= value | +| `gpu` | GPU utilization (%) | Triggers when usage >= value | +| `tcp_conn` | TCP connections | Triggers when count >= value | +| `udp_conn` | UDP connections | Triggers when count >= value | +| `process` | Process count | Triggers when count >= value | +| `net_in_speed` | Download speed (bytes/s) | Triggers when speed >= value | +| `net_out_speed` | Upload speed (bytes/s) | Triggers when speed >= value | + +### Traffic Cycle + +Used to monitor cumulative traffic over a time window: + +| Rule Type | Description | +|-----------|-------------| +| `transfer_in_cycle` | Inbound traffic accumulated per cycle | +| `transfer_out_cycle` | Outbound traffic accumulated per cycle | +| `transfer_all_cycle` | Combined inbound + outbound traffic per cycle | + +Cycle options: `hour` / `day` / `week` / `month` / `year` + +### Network Quality + +| Rule Type | Description | +|-----------|-------------| +| `network_latency` | Triggers when average probe latency (ms) exceeds the threshold | +| `network_packet_loss` | Triggers when the packet loss percentage exceeds the threshold | + +### Offline, Expiration & Events + +| Rule Type | Description | +|-----------|-------------| +| `offline` | Triggers when the server has been offline for `duration` seconds | +| `expiration` | Triggers when `expired_at` is within `duration` days | +| `ip_changed` | Event-driven; triggers when the agent reports an IP change event (not part of the 60-second cycle) | +| `ssh_login_detected` | Event-driven; successful SSH login. Supports `first_seen_only` filter. See [Security Events](/en/docs/security-events) | +| `ssh_brute_force_detected` | Event-driven; SSH brute-force burst. Supports `severity_min` (medium/high/critical) and `exclude_cidrs` | +| `port_scan_detected` | Event-driven; port scan from a single source IP. Supports `severity_min` and `exclude_cidrs` | + +## Threshold Configuration Each rule item supports these fields: ```json { "rule_type": "cpu", - "min": 80.0, + "min": 90.0, "max": null, "duration": null, "cycle_interval": null, @@ -110,13 +132,21 @@ Each rule item supports these fields: { "rule_type": "expiration", "duration": 7 } ``` +**Multiple conditions (AND logic):** a rule with several conditions triggers only when all of them hold. The following fires when CPU usage is at or above 90% *and* memory usage is at or above 8 GB: +```json +[ + { "rule_type": "cpu", "min": 90.0 }, + { "rule_type": "memory", "min": 8589934592 } +] +``` + ## Sampling and Trigger Logic For resource threshold alerts (CPU, memory, disk, load, etc.), ServerBee does not trigger on a single spike: -1. The evaluator reads all raw metric records from the **last 10 minutes**. -2. It counts how many records exceed the threshold. -3. The alert triggers only if **70% or more** of the samples exceed the threshold. +1. The evaluator reads all raw metric records from the **last 10 minutes** (one sample per minute, so up to 10 samples). +2. It counts how many samples exceed the threshold. +3. The alert triggers only if **70% or more** of the samples exceed the threshold (at least 7 of 10). This prevents false positives from brief, transient spikes. @@ -134,19 +164,28 @@ Each alert rule specifies which servers it applies to: ### `always` Mode (Default) -- Sends a notification every time the condition is evaluated as true -- **5-minute debounce** prevents notification spam: after a notification is sent, the next one is suppressed for 5 minutes even if the condition persists +- Sends a notification every time the condition is evaluated as true. +- **5-minute debounce** prevents notification spam: for the same (rule + server) combination, after a notification is sent, the next one is suppressed for 5 minutes even if the condition persists. ### `once` Mode - Sends a notification only on the **first trigger**. - No further notifications are sent until the condition recovers and triggers again. -Because alert state is persisted, a `once` rule that is already triggered will not re-fire after a server restart. +### State Persistence + +Trigger state is persisted to the `alert_states` table in SQLite and loaded into a hot cache on startup. As a result: + +- A `once` rule that is already triggered will not re-fire after a server restart. +- Triggered-but-not-yet-recovered alerts are restored automatically after a restart. ### Recovery -When a previously triggered alert recovers (the condition is no longer met), the alert state is cleared in both the in-memory cache and the database. If a `recover_trigger_tasks` command is configured, it runs at this point. If the condition triggers again later, notifications fire according to the trigger mode. +When a previously triggered alert recovers (the condition is no longer met): + +1. The alert is marked as recovered, and its state is cleared in both the in-memory cache and the database. +2. If recover tasks (`recover_trigger_tasks`) are configured, the corresponding remote commands run. +3. If the condition triggers again later, notifications fire according to the trigger mode. ## Maintenance Suppression @@ -179,7 +218,7 @@ If no `body_template` is provided, the default template is used. If no `Content- ### Telegram -Send messages to a Telegram chat via the Bot API. +Send messages to a Telegram chat via the Bot API. Messages are sent with HTML parse mode enabled. ```json { @@ -188,8 +227,6 @@ Send messages to a Telegram chat via the Bot API. } ``` -Messages are sent with HTML parse mode enabled. - ### Bark Send push notifications to iOS devices via [Bark](https://github.com/Finb/Bark). @@ -235,23 +272,7 @@ Send native Apple Push Notification service pushes to registered mobile devices. APNs requires an Apple developer key, team ID, bundle ID, and private key. Set `sandbox: true` only for development builds. -## Notification Groups - -Notification channels are organized into **groups**. An alert rule is linked to a notification group, and when the rule triggers, all enabled channels in the group are dispatched. - -This allows you to: - -- Send the same alert to multiple channels (e.g., Telegram + Email). -- Reuse channel configurations across different alert rules. -- Enable or disable individual channels without modifying alert rules. - -After creating a channel, verify its configuration with the test endpoint: - -``` -POST /api/notifications/:id/test -``` - -## Template Variables +### Template Variables Notification messages support the following template variables: @@ -274,6 +295,24 @@ The default notification template is: Time: {{time}} ``` +### Testing a Channel + +After creating a channel, verify its configuration with the test endpoint: + +``` +POST /api/notifications/:id/test +``` + +## Notification Groups + +Notification channels are organized into **groups**. An alert rule is linked to a notification group, and when the rule triggers, all enabled channels in the group are dispatched. + +This allows you to: + +- Send the same alert to multiple channels (e.g., Telegram + Email). +- Reuse channel configurations across different alert rules. +- Enable or disable individual channels without modifying alert rules. + ## Offline Detection Offline status is determined by a dedicated background task and works together with `offline` rules: @@ -286,8 +325,8 @@ Offline status is determined by a dedicated background task and works together w Here is a typical setup for monitoring CPU usage across all servers with Telegram notifications: -1. **Create a Telegram notification channel** with your bot token and chat ID -2. **Create a notification group** that includes the Telegram channel +1. **Create a Telegram notification channel** with your bot token and chat ID. +2. **Create a notification group** that includes the Telegram channel. 3. **Create an alert rule:** - Name: "High CPU Usage" - Rules: `[{"rule_type": "cpu", "min": 90.0}]` diff --git a/apps/docs/content/docs/zh/alerts.mdx b/apps/docs/content/docs/zh/alerts.mdx index 35a5b934..20b4ab5b 100644 --- a/apps/docs/content/docs/zh/alerts.mdx +++ b/apps/docs/content/docs/zh/alerts.mdx @@ -10,7 +10,7 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 后台任务每 60 秒评估一次所有启用的告警规则: -1. 解析每条启用规则覆盖的服务器范围。 +1. 解析每条规则覆盖的服务器范围。 2. 逐台检查规则条件是否满足。 3. 触发时通过通知组发送通知(受去抖限制)。 4. 之前触发的规则恢复后,清除告警状态。 @@ -23,10 +23,10 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 - **规则名称**:用于标识和展示。 - **告警条件**:一条或多条指标条件,所有条件必须同时满足(AND 逻辑)。 -- **覆盖范围**:规则适用的服务器范围。 - **触发模式**:`always`(持续通知,带去抖)或 `once`(仅首次触发通知)。 +- **覆盖范围**:规则适用的服务器范围。 - **通知组**:通知发送目标。 -- **关联任务**:触发/恢复时自动执行的远程命令(可选)。 +- **触发 / 恢复任务**:可选。规则触发或恢复时自动执行的远程命令。 - **阻断源 IP**:可选。对安全事件类规则,触发时自动指示 Agent 防火墙阻断攻击源 IP。 ## 支持的指标类型 @@ -66,19 +66,19 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 | 指标类型 | 说明 | |----------|------| -| `network_latency` | 平均探测延迟超过阈值时触发 | +| `network_latency` | 平均探测延迟(ms)超过阈值时触发 | | `network_packet_loss` | 丢包率超过阈值时触发 | ### 离线、到期和事件 | 指标类型 | 说明 | |----------|------| -| `offline` | 服务器持续离线超过指定时长后触发 | -| `expiration` | 服务器 `expired_at` 距今小于等于指定天数时触发 | -| `ip_changed` | Agent 上报 IP 变化事件时触发(事件驱动,不参与每分钟轮询) | -| `ssh_login_detected` | 成功 SSH 登录事件触发;支持 `first_seen_only` 过滤。详见 [安全事件检测](/zh/docs/security-events) | -| `ssh_brute_force_detected` | SSH 爆破事件触发;支持 `severity_min`(medium/high/critical)和 `exclude_cidrs` 过滤 | -| `port_scan_detected` | 单一源 IP 的端口扫描事件触发;支持 `severity_min` 和 `exclude_cidrs` 过滤 | +| `offline` | 服务器持续离线超过 `duration` 秒后触发 | +| `expiration` | 服务器 `expired_at` 距今小于等于 `duration` 天时触发 | +| `ip_changed` | 事件驱动;Agent 上报 IP 变化事件时触发(不参与 60 秒轮询) | +| `ssh_login_detected` | 事件驱动;成功 SSH 登录事件触发,支持 `first_seen_only` 过滤。详见 [安全事件检测](/zh/docs/security-events) | +| `ssh_brute_force_detected` | 事件驱动;SSH 爆破事件触发,支持 `severity_min`(medium/high/critical)和 `exclude_cidrs` 过滤 | +| `port_scan_detected` | 事件驱动;单一源 IP 的端口扫描事件触发,支持 `severity_min` 和 `exclude_cidrs` 过滤 | ## 阈值配置 @@ -132,7 +132,7 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 { "rule_type": "expiration", "duration": 7 } ``` -一条告警规则可以包含多个条件,所有条件必须同时满足(AND 逻辑)才会触发。例如,下面的规则在 CPU 已用 ≥ 90% 且内存已用 ≥ 8 GB 时才触发: +**多条件(AND 逻辑):** 一条规则可以包含多个条件,所有条件同时满足才触发。下面的规则在 CPU 使用率 ≥ 90% **且**内存已用 ≥ 8 GB 时才触发: ```json [ @@ -141,98 +141,103 @@ ServerBee 提供灵活的告警系统,支持阈值监控、安全事件驱动 ] ``` -## 覆盖类型 - -告警规则的覆盖范围有三种模式: +## 采样与触发逻辑 -| 覆盖类型 | 说明 | -|----------|------| -| `all` | 适用于所有服务器 | -| `include` | 仅适用于指定的服务器列表 | -| `exclude` | 适用于除指定服务器外的所有服务器 | +对于资源阈值类告警(CPU、内存、磁盘、负载等),ServerBee 不会因单点瞬时波动就触发: -## 告警评估机制 +1. 评估器读取**最近 10 分钟**的全部原始指标记录(每分钟一个采样点,最多 10 个)。 +2. 统计其中超过阈值的采样点数量。 +3. 仅当 **70% 及以上**的采样点超过阈值(10 个中至少 7 个)时才触发。 -告警评估并非简单的单点阈值判断,而是采用了采样窗口和触发比例的机制,有效降低因瞬时波动导致的误报。 +这样可以避免短暂、瞬时的尖峰造成误报。 -### 采样窗口 +## 覆盖类型 -- 查询最近 **10 分钟** 的指标记录(即 10 个采样点,因为每分钟写入一次) -- 检查其中超过阈值的采样点比例 +每条告警规则指定其适用的服务器范围: -### 触发比例 +| 覆盖类型 | 说明 | +|----------|------| +| `all` | 适用于系统内所有服务器 | +| `include` | 仅适用于指定的服务器 ID 列表 | +| `exclude` | 适用于除指定服务器外的所有服务器 | -- **70%** 以上的采样点超过阈值才判定为触发 -- 即 10 个采样点中至少 7 个超过阈值 +## 触发模式与去抖 -### 通知去抖 +### `always` 模式(默认) -- `always` 模式下,同一(规则 + 服务器)组合的最短通知间隔为 **5 分钟** -- 避免在持续异常时频繁发送通知 +- 每次评估满足条件时都发送通知。 +- **5 分钟去抖**防止通知刷屏:对同一(规则 + 服务器)组合,发送一次通知后,即使条件持续满足,接下来 5 分钟内也不会再次发送。 -### 触发模式 +### `once` 模式 -| 模式 | 行为 | -|------|------| -| `always` | 每次评估满足条件都发送通知(受 5 分钟去抖限制) | -| `once` | 仅在首次触发时通知,恢复后再次触发才会再次通知 | +- 仅在**首次触发**时发送通知。 +- 在条件恢复并再次触发之前,不会发送后续通知。 ### 状态持久化 -告警触发状态会持久化到 SQLite 数据库的 `alert_states` 表中。这意味着: +告警触发状态会持久化到 SQLite 的 `alert_states` 表,并在启动时加载到热缓存。因此: -- Server 重启后,`once` 模式的规则不会重复触发 -- 已触发未恢复的告警状态在重启后自动恢复 -- 启动时从数据库加载热缓存,加速后续评估 +- 已触发的 `once` 规则在 Server 重启后不会重复触发。 +- 已触发未恢复的告警在重启后自动恢复。 ### 恢复机制 -当一条之前触发的告警规则不再满足触发条件时: +当一条之前触发的告警不再满足触发条件时: 1. 标记为已恢复,并清除内存缓存和数据库中的告警状态。 -2. 如果配置了恢复触发任务(`recover_trigger_tasks`),自动执行对应的远程命令。 -3. 下次满足条件时按触发模式重新触发。 +2. 如果配置了恢复任务(`recover_trigger_tasks`),自动执行对应的远程命令。 +3. 之后若再次满足条件,则按触发模式重新发送通知。 -### 维护窗口抑制 +## 维护窗口抑制 当受影响服务器处于活动维护窗口时,ServerBee 会抑制该服务器的告警通知。规则评估仍会执行,但维护结束前不会发送通知。事件驱动规则与轮询规则遵循同样的覆盖范围和维护窗口抑制逻辑。 -### 阻断源 IP +## 阻断源 IP 安全事件类规则(`ssh_brute_force_detected`、`port_scan_detected`)可以开启**阻断源 IP**。触发时,ServerBee 会指示受影响 Agent 的防火墙阻断攻击源 IP,把检测变为自动处置。该能力需要服务器具备 `CAP_FIREWALL_BLOCK` 权限。详见 [安全事件检测](/zh/docs/security-events) 和 [防火墙管理](/zh/docs/firewall)。 ## 通知渠道 -ServerBee 支持以下通知渠道: +ServerBee 支持五种通知渠道类型。每个渠道是独立实体,可在多个通知组间复用。 ### Webhook -通过 HTTP 请求发送通知,适用于集成到各类自动化平台。 +向任意 URL 发送 HTTP 请求,支持自定义方法、请求头和请求体模板。 -| 配置项 | 说明 | -|--------|------| -| `url` | Webhook 目标 URL | -| `method` | HTTP 方法(GET / POST) | -| `headers` | 自定义请求头 | -| `body_template` | 请求体模板 | +```json +{ + "url": "https://hooks.slack.com/services/xxx", + "method": "POST", + "headers": { + "Content-Type": "application/json" + }, + "body_template": "{\"text\": \"{{server_name}} {{event}}: {{message}}\"}" +} +``` + +未提供 `body_template` 时使用默认模板;未设置 `Content-Type` 时默认为 `application/json`。 ### Telegram -通过 Telegram Bot 发送消息。 +通过 Telegram Bot API 向聊天发送消息,使用 HTML 解析模式。 -| 配置项 | 说明 | -|--------|------| -| `bot_token` | Telegram Bot Token | -| `chat_id` | 目标聊天 ID | +```json +{ + "bot_token": "123456:ABC-DEF", + "chat_id": "-1001234567890" +} +``` ### Bark -通过 Bark 推送 iOS 通知。 +通过 [Bark](https://github.com/Finb/Bark) 向 iOS 设备推送通知。 -| 配置项 | 说明 | -|--------|------| -| `server_url` | Bark 服务端地址 | -| `device_key` | 设备 Key | +```json +{ + "server_url": "https://api.day.app", + "device_key": "your-device-key" +} +``` ### 邮件(通过 Resend) @@ -274,12 +279,12 @@ APNs 需要 Apple Developer key、Team ID、Bundle ID 和私钥。只有开发 | 变量 | 说明 | |------|------| -| `{{server_name}}` | 服务器名称 | -| `{{server_id}}` | 服务器 ID | -| `{{rule_name}}` | 告警规则名称 | -| `{{event}}` | 事件类型(触发/恢复) | -| `{{message}}` | 告警详细信息 | -| `{{time}}` | 事件发生时间 | +| `{{server_name}}` | 受影响服务器名称 | +| `{{server_id}}` | 受影响服务器的唯一 ID | +| `{{rule_name}}` | 触发的告警规则名称 | +| `{{event}}` | 事件类型(如「triggered」) | +| `{{message}}` | 可读的告警详细信息 | +| `{{time}}` | 事件发生时间(UTC) | | `{{cpu}}` | 当前 CPU 使用率字符串 | | `{{memory}}` | 当前内存使用字符串 | @@ -288,7 +293,7 @@ APNs 需要 Apple Developer key、Team ID、Bundle ID 和私钥。只有开发 ``` [ServerBee] {{server_name}} {{event}} {{message}} -时间: {{time}} +Time: {{time}} ``` ### 发送测试 @@ -305,7 +310,7 @@ POST /api/notifications/:id/test - 把同一条告警同时发到多个渠道(例如 Telegram + Email)。 - 在多条告警规则间复用渠道配置。 -- 单独启用/禁用某个渠道,无需改动告警规则。 +- 单独启用 / 禁用某个渠道,无需改动告警规则。 ## 离线检测 From 2d2f775b5a3d4e70638d4866f73b4d10ec5d67ed Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:24 +0800 Subject: [PATCH 19/21] docs(ping): add probe type, interval and API query reference --- apps/docs/content/docs/en/ping.mdx | 99 +++++++++++----- apps/docs/content/docs/zh/ping.mdx | 174 ++++++++++++++--------------- 2 files changed, 156 insertions(+), 117 deletions(-) diff --git a/apps/docs/content/docs/en/ping.mdx b/apps/docs/content/docs/en/ping.mdx index 524a1001..59f60261 100644 --- a/apps/docs/content/docs/en/ping.mdx +++ b/apps/docs/content/docs/en/ping.mdx @@ -4,40 +4,40 @@ description: Monitor endpoint availability and latency with ICMP, TCP, and HTTP icon: Radar --- -ServerBee includes built-in ping monitoring that lets you track the availability and latency of external endpoints from the perspective of your monitored servers. This is useful for verifying network connectivity, measuring latency between regions, and detecting outages. +ServerBee includes built-in ping monitoring that tracks the availability and latency of external endpoints from the perspective of your monitored servers. Use it to verify network connectivity, measure latency between regions, and detect outages. ## Probe Types -Three types of probes are available: +Three probe types are available. Each measures round-trip latency and reports a success/failure status. -| Type | Description | Target Format | -|------|-------------|---------------| -| **ICMP** | Standard ping (ICMP echo request/reply) | IP address or hostname (e.g., `8.8.8.8`, `google.com`) | -| **TCP** | TCP connection test (SYN-ACK handshake) | Host and port (e.g., `google.com:443`) | -| **HTTP** | HTTP(S) request and response time | Full URL (e.g., `https://example.com/health`) | - -Each probe type measures round-trip latency and reports success/failure status. +| Type | What it does | Target format | Timeout | Privileges | +|------|--------------|---------------|---------|------------| +| **ICMP** | Standard ICMP echo request/reply | IP or hostname (e.g., `8.8.8.8`, `google.com`) | 5 s | `CAP_NET_RAW` on Linux | +| **TCP** | TCP connection test (handshake) | `host:port` (e.g., `google.com:443`) | 5 s | None | +| **HTTP** | HTTP(S) GET request and response time | Full URL (e.g., `https://example.com/health`) | 10 s | None | ## Creating Ping Tasks -Ping tasks are created through the dashboard or API. Each task requires: +Create probe tasks from the **Ping** page in the dashboard, or via the API. Each task requires: -- **Name** -- A descriptive label for the task -- **Probe type** -- `icmp`, `tcp`, or `http` -- **Target** -- The endpoint to probe (format depends on probe type) -- **Interval** -- How often to run the probe, in seconds -- **Server IDs** -- Which agents should run this probe (as a JSON array of server IDs) -- **Enabled** -- Whether the task is active +| Field | Description | +|-------|-------------| +| Name | A descriptive label, e.g., "Cloudflare DNS" | +| Probe type | `icmp`, `tcp`, or `http` | +| Target | The endpoint to probe (format depends on probe type) | +| Interval | How often to run the probe, in seconds | +| Server IDs | Which agents should run this probe | +| Enabled | Whether the task is active | ### Example: ICMP Ping -Monitor basic connectivity to Google DNS: +Monitor basic connectivity to Cloudflare DNS: | Field | Value | |-------|-------| -| Name | Google DNS | +| Name | Cloudflare DNS | | Probe Type | icmp | -| Target | 8.8.8.8 | +| Target | 1.1.1.1 | | Interval | 60 | ### Example: TCP Port Check @@ -62,6 +62,30 @@ Monitor a web service endpoint: | Target | https://api.example.com/health | | Interval | 60 | +## Target Format + +The target format depends on the probe type: + +| Probe type | Target format | Example | +|------------|---------------|---------| +| ICMP | IP address or hostname | `1.1.1.1`, `google.com` | +| TCP | host:port | `example.com:443`, `10.0.0.1:3306` | +| HTTP | Full URL | `https://example.com`, `http://10.0.0.1:8080/health` | + +## Choosing an Interval + +The interval is in seconds. Tune it to the target's importance and probe type: + +| Scenario | Suggested interval | +|----------|--------------------| +| Critical service | 15--30 s | +| Regular website | 60 s | +| Infrastructure check | 120--300 s | + + +Very short intervals increase agent load and network overhead, and may trip the target's firewall or rate limits. Set an interval that matches your actual needs. + + ## How Tasks Are Distributed When you create or update a ping task, the server syncs it to the appropriate agents: @@ -71,6 +95,11 @@ When you create or update a ping task, the server syncs it to the appropriate ag 3. Each agent starts (or updates) its local probe scheduler 4. Probe results are sent back to the server as `PingResult` messages +The server syncs probe tasks to agents at these moments: + +- When an agent first connects or reconnects +- When an admin creates, updates, or deletes a probe task + If an agent disconnects and reconnects, the server automatically re-syncs all assigned ping tasks. ## UI Feedback @@ -84,10 +113,16 @@ The `server_ids_json` field controls which agents execute the probe: - **Specific servers** -- Provide an array of server IDs: `["srv-1", "srv-2"]` - **All servers** -- Use an empty array `[]` or the special value `["*"]` to assign to all connected agents -This lets you measure latency from multiple geographic locations simultaneously and compare results. +Running the same probe from multiple agents lets you: + +- Probe one target from different regions simultaneously and compare network quality +- Rule out single-point network issues for a more accurate availability verdict +- Localize a fault to the probing node when one agent fails while others succeed ## Viewing Results +### Probe Records + Ping results are stored in the database with the following data: | Field | Description | @@ -99,11 +134,19 @@ Ping results are stored in the database with the following data: | `error` | Error message if the probe failed | | `time` | Timestamp of the measurement | -Results can be viewed in the dashboard as: +### Latency Charts + +The ping task detail page shows latency trend charts per agent. Charts support filtering by time range to help you analyze how network quality changes over time. When multiple servers probe the same target, results are broken down per agent, and a success-rate percentage is shown. + +### API Query -- **Latency charts** showing trends over time -- **Success rate** percentages -- **Per-agent breakdown** when multiple servers probe the same target +Query probe records through the REST API: + +``` +GET /api/ping-tasks/:id/records?from=2026-03-13T00:00:00Z&to=2026-03-14T00:00:00Z&server_id=xxx +``` + +Filtering by time range and agent is supported. ## Data Retention @@ -118,4 +161,10 @@ The cleanup task runs hourly and removes records older than the configured reten ## Integration with Alerts -You can combine ping monitoring with alert rules. For example, to alert when a critical endpoint becomes unreachable, you could create an alert rule that watches for failed ping probes. The ping data is stored alongside regular monitoring data, making it available for the same alerting infrastructure. +Ping monitoring works with alert rules. The `network_latency` and `network_packet_loss` rule types let you alert on probe latency or packet loss, so you're notified when a critical endpoint degrades or becomes unreachable. Ping data is stored alongside regular monitoring data and reuses the same alerting infrastructure. See [Alerts & Notifications](/en/docs/alerts). + + + + + + diff --git a/apps/docs/content/docs/zh/ping.mdx b/apps/docs/content/docs/zh/ping.mdx index 89fef8b6..5cf95525 100644 --- a/apps/docs/content/docs/zh/ping.mdx +++ b/apps/docs/content/docs/zh/ping.mdx @@ -4,104 +4,72 @@ description: 通过 ICMP、TCP 和 HTTP 探测监控网络可达性和延迟。 icon: Radar --- -ServerBee 的 Ping 监控功能允许你通过分布在不同服务器上的 Agent 对目标地址进行周期性探测,监测网络可达性和响应延迟。 +ServerBee 内置 Ping 监控功能,通过分布在不同服务器上的 Agent 对目标地址进行周期性探测,监测网络可达性和响应延迟。它适用于验证网络连通性、对比不同地区的延迟以及发现服务中断。 ## 探测类型 -ServerBee 支持三种探测协议: +ServerBee 支持三种探测类型。每种都会测量往返延迟并上报成功 / 失败状态。 -### ICMP Ping - -标准的 ICMP Echo 探测,测量目标主机的网络延迟。 - -| 参数 | 值 | -|------|-----| -| 实现 | `surge-ping` 库 | -| 超时 | 5 秒 | -| 权限要求 | `CAP_NET_RAW`(Linux) | - -``` -目标格式: 1.1.1.1 或 example.com -``` - -### TCP Ping - -通过建立 TCP 连接来测试目标端口的可达性和响应时间。 - -| 参数 | 值 | -|------|-----| -| 实现 | `tokio::net::TcpStream::connect()` | -| 超时 | 5 秒 | -| 权限要求 | 无特殊权限 | - -``` -目标格式: example.com:443 或 1.1.1.1:80 -``` - -### HTTP Ping - -发送 HTTP GET 请求,测量 Web 服务的响应时间和可用性。 - -| 参数 | 值 | -|------|-----| -| 实现 | `reqwest` 库 | -| 超时 | 10 秒 | -| 权限要求 | 无特殊权限 | - -``` -目标格式: https://example.com 或 http://example.com/health -``` +| 类型 | 探测内容 | 目标格式 | 超时 | 权限要求 | +|------|----------|----------|------|----------| +| **ICMP** | 标准 ICMP Echo 请求 / 应答 | IP 或域名(如 `1.1.1.1`、`google.com`) | 5 秒 | Linux 下需 `CAP_NET_RAW` | +| **TCP** | TCP 连接测试(握手) | `host:port`(如 `google.com:443`) | 5 秒 | 无 | +| **HTTP** | HTTP(S) GET 请求与响应时间 | 完整 URL(如 `https://example.com/health`) | 10 秒 | 无 | ## 创建 Ping 任务 -在管理面板的「Ping 探测」页面创建新的探测任务。每个任务需要配置: +在管理面板的「Ping 探测」页面创建探测任务,也可通过 API 创建。每个任务需要配置: | 配置项 | 说明 | |--------|------| | 任务名称 | 用于标识的名称,如「Cloudflare DNS」 | -| 探测类型 | ICMP / TCP / HTTP | -| 目标地址 | 根据探测类型填写目标 | +| 探测类型 | `icmp`、`tcp` 或 `http` | +| 目标地址 | 待探测的目标(格式取决于探测类型) | | 探测间隔 | 探测频率,单位秒 | | 执行节点 | 选择哪些 Agent 执行此探测 | | 启用状态 | 是否启用此任务 | -### 配置示例 +### 示例:ICMP Ping -**监控 Cloudflare DNS (ICMP):** +监控到 Cloudflare DNS 的基本连通性: | 配置项 | 值 | |--------|-----| | 名称 | Cloudflare DNS | -| 类型 | ICMP | +| 探测类型 | icmp | | 目标 | 1.1.1.1 | -| 间隔 | 60 秒 | +| 间隔 | 60 | -**监控网站可用性 (HTTP):** +### 示例:TCP 端口检查 + +验证数据库端口是否可达: | 配置项 | 值 | |--------|-----| -| 名称 | 官网监控 | -| 类型 | HTTP | -| 目标 | https://example.com | -| 间隔 | 30 秒 | +| 名称 | PostgreSQL Primary | +| 探测类型 | tcp | +| 目标 | db.internal:5432 | +| 间隔 | 30 | + +### 示例:HTTP 健康检查 -**监控数据库端口 (TCP):** +监控一个 Web 服务端点: | 配置项 | 值 | |--------|-----| -| 名称 | MySQL 端口 | -| 类型 | TCP | -| 目标 | db.example.com:3306 | -| 间隔 | 30 秒 | +| 名称 | API Health | +| 探测类型 | http | +| 目标 | https://api.example.com/health | +| 间隔 | 60 | -## 目标配置 +## 目标格式 不同探测类型的目标格式要求: | 探测类型 | 目标格式 | 示例 | |----------|----------|------| | ICMP | IP 地址或域名 | `1.1.1.1`、`google.com` | -| TCP | 地址:端口 | `example.com:443`、`10.0.0.1:3306` | +| TCP | host:port | `example.com:443`、`10.0.0.1:3306` | | HTTP | 完整 URL | `https://example.com`、`http://10.0.0.1:8080/health` | ## 间隔设置 @@ -115,27 +83,60 @@ ServerBee 支持三种探测协议: | 基础设施检查 | 120-300 秒 | -过短的探测间隔会增加 Agent 的负载和网络开销,也可能被目标服务器的防火墙策略拦截。建议根据实际需求合理设置。 +过短的探测间隔会增加 Agent 的负载和网络开销,也可能被目标服务器的防火墙策略或限流拦截。建议根据实际需求合理设置。 -## 查看结果和延迟图表 +## 任务分发同步 + +创建或更新 Ping 任务时,Server 会将其同步到对应的 Agent: + +1. Server 将任务配置存入数据库 +2. 向每个被分配的 Agent 发送 `PingTasksSync` 消息,包含该 Agent 的全部活跃任务 +3. 各 Agent 启动(或更新)本地的探测调度器 +4. 探测结果以 `PingResult` 消息回传给 Server + +Server 在以下时机会向 Agent 同步探测任务: + +- Agent 首次连接或重连时 +- 管理员创建、修改或删除探测任务时 + +当某个 Agent 断开并重连后,Server 会自动重新同步其全部已分配的 Ping 任务。 + +## 界面反馈 + +在 Web 控制台中,创建、删除、启用、禁用 Ping 任务都会根据当前语言显示对应的成功或失败提示。启用 / 禁用请求发送期间,切换按钮会暂时禁用,以减少误触的重复提交。 + +## 分配执行节点 + +`server_ids_json` 字段控制哪些 Agent 执行探测: + +- **指定节点**:填写服务器 ID 数组,如 `["srv-1", "srv-2"]` +- **全部节点**:使用空数组 `[]` 或特殊值 `["*"]`,分配给所有在线 Agent + +从多个 Agent 执行同一探测可以: + +- 从不同地区同时探测同一目标,对比各地的网络质量 +- 排除单点网络问题的干扰,更准确地判断目标可用性 +- 当某个节点探测失败而其他节点成功时,将故障定位到探测节点一侧 + +## 查看结果与延迟图表 ### 探测记录 -每次探测的结果包含以下信息: +每次探测的结果会存入数据库,包含以下信息: | 字段 | 说明 | |------|------| -| 任务 ID | 关联的 Ping 任务 | -| 执行节点 | 执行探测的 Agent | -| 延迟 | 响应延迟,单位毫秒 | -| 成功/失败 | 探测是否成功 | -| 错误信息 | 失败时的错误描述 | -| 时间戳 | 探测执行时间 | +| `task_id` | 关联的 Ping 任务 | +| `server_id` | 执行探测的 Agent | +| `latency` | 往返延迟,单位毫秒 | +| `success` | 探测是否成功 | +| `error` | 失败时的错误信息 | +| `time` | 探测执行时间 | ### 延迟图表 -在 Ping 任务详情页面,可以查看每个执行节点的延迟趋势图。图表支持按时间范围筛选,帮助你分析网络质量变化趋势。 +在 Ping 任务详情页面,可以查看每个执行节点的延迟趋势图。图表支持按时间范围筛选,帮助你分析网络质量变化趋势。当多个节点探测同一目标时,结果会按节点拆分展示,并显示成功率百分比。 ### API 查询 @@ -147,31 +148,20 @@ GET /api/ping-tasks/:id/records?from=2026-03-13T00:00:00Z&to=2026-03-14T00:00:00 支持按时间范围和执行节点筛选。 -## 分配执行节点 - -Ping 任务可以分配给特定的 Agent 节点执行,也可以分配给所有节点: - -- **指定节点**:在创建任务时选择一个或多个 Agent,探测从这些节点发起 -- **全部节点**:选择所有 Agent,获得来自不同地理位置的探测视角 - -多节点探测的优势: - -- 从不同地区同时探测同一目标,对比各地的网络质量 -- 排除单点网络问题的干扰,更准确地判断目标可用性 -- 当某个节点探测失败而其他节点成功时,说明问题可能在探测节点一侧 +## 数据保留 -### 任务同步 +Ping 记录默认保留 **7 天**,可通过 `server.toml` 的 `retention.ping_records_days` 配置: -Server 在以下时机会向 Agent 同步探测任务: - -- Agent 首次连接或重连时 -- 管理员创建、修改或删除探测任务时 +```toml +[retention] +ping_records_days = 14 +``` -同步通过 `PingTasksSync` WebSocket 消息完成,Agent 收到后会更新本地的探测任务列表。 +清理任务每小时执行一次,移除超过保留期的记录。 -## 界面反馈 +## 与告警集成 -在 Web 控制台中,创建、删除、启用、禁用 Ping 任务都会根据当前语言显示对应的成功或失败提示。启用/禁用请求发送期间,切换按钮会暂时禁用,以减少误触发的重复提交。 +Ping 监控可以与告警规则结合使用。通过 `network_latency` 和 `network_packet_loss` 规则类型,可针对探测延迟或丢包配置告警,在关键目标性能下降或不可达时及时收到通知。Ping 数据与常规监控数据一并存储,可复用同一套告警基础设施。详见 [告警与通知](/zh/docs/alerts)。 From b86a86dbd3c18daa7c7dd07778f02b06cf38c57e Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:34:28 +0800 Subject: [PATCH 20/21] docs: proofread and polish wording across guides --- apps/docs/content/docs/en/admin.mdx | 18 ++--- apps/docs/content/docs/en/capabilities.mdx | 48 +++++++------- apps/docs/content/docs/en/cost-insights.mdx | 2 +- apps/docs/content/docs/en/custom-widgets.mdx | 2 +- apps/docs/content/docs/en/dashboards.mdx | 2 +- apps/docs/content/docs/en/file-manager.mdx | 24 +++---- apps/docs/content/docs/en/firewall.mdx | 6 +- apps/docs/content/docs/en/ip-quality.mdx | 4 +- apps/docs/content/docs/en/mobile.mdx | 2 +- apps/docs/content/docs/en/quick-start.mdx | 2 +- apps/docs/content/docs/en/resource-usage.mdx | 2 +- apps/docs/content/docs/en/security-events.mdx | 2 +- apps/docs/content/docs/en/security.mdx | 12 ++-- apps/docs/content/docs/en/server.mdx | 2 +- .../docs/content/docs/en/service-monitors.mdx | 2 +- apps/docs/content/docs/en/status-page.mdx | 4 +- apps/docs/content/docs/zh/admin.mdx | 18 ++--- apps/docs/content/docs/zh/capabilities.mdx | 26 ++++---- apps/docs/content/docs/zh/file-manager.mdx | 22 +++---- apps/docs/content/docs/zh/ip-quality.mdx | 2 +- apps/docs/content/docs/zh/mobile.mdx | 14 ++-- apps/docs/content/docs/zh/quick-start.mdx | 7 +- apps/docs/content/docs/zh/resource-usage.mdx | 46 ++++++------- apps/docs/content/docs/zh/security-events.mdx | 4 +- apps/docs/content/docs/zh/security.mdx | 12 ++-- apps/docs/content/docs/zh/server.mdx | 65 ++++++------------- apps/docs/content/docs/zh/status-page.mdx | 2 +- 27 files changed, 163 insertions(+), 189 deletions(-) diff --git a/apps/docs/content/docs/en/admin.mdx b/apps/docs/content/docs/en/admin.mdx index 47bc0e63..f31f2612 100644 --- a/apps/docs/content/docs/en/admin.mdx +++ b/apps/docs/content/docs/en/admin.mdx @@ -8,7 +8,7 @@ This page covers features available only to administrators (Admin role). ## User Management -ServerBee supports multiple users with two roles: +ServerBee supports multiple users across two roles: | Role | Permissions | |------|-------------| @@ -35,7 +35,7 @@ Go to Settings → Users: ## Audit Logs -ServerBee automatically records audit logs for critical operations, helping administrators track security events. +ServerBee automatically records audit logs for critical operations so administrators can track security events. ### Recorded Events @@ -58,7 +58,7 @@ Go to Settings → Audit Logs: GET /api/audit-logs?limit=50&offset=0 ``` -Returns a paginated audit log list. Each entry contains: +Returns a paginated list of audit records. Each entry contains: ```json { @@ -217,10 +217,10 @@ Use `transfer_in_cycle` / `transfer_out_cycle` / `transfer_all_cycle` alert type ### Cost Insights -Once `price` and `billing_cycle` are filled in, ServerBee derives cost-aware signals from the recorded billing inputs and the agent's reported resources, utilization, and uptime. Insights are surfaced in three places: +Once `price` and `billing_cycle` are set, ServerBee derives cost-aware signals from the recorded billing inputs and the agent's reported resources, utilization, and uptime. Insights are surfaced in three places: -- **Servers list cost cell** -- The `/servers` table shows the monthly-equivalent cost and a value grade chip per server, so cost lives next to CPU/memory/disk/traffic instead of being hidden in the edit dialog -- **Dashboard server card** -- Each server card shows a compact cost signal with a footnote summarizing burn rate and value grade. The signal is hidden when cost config is missing or invalid +- **Servers list cost cell** -- The `/servers` table shows each server's monthly-equivalent cost and a value-grade chip, so cost sits next to CPU/memory/disk/traffic instead of being hidden in the edit dialog +- **Dashboard server card** -- Each card shows a compact cost signal with a footnote summarizing burn rate and value grade. The signal is hidden when cost config is missing or invalid - **Per-server cost insights panel** -- The server detail page renders the full breakdown: monthly-equivalent cost, billing-cycle elapsed/remaining cost and burn percent, days remaining, normalized resource unit costs (per CPU core, per GB memory, per GB disk, per TB traffic limit), and the value score with reasons and confidence #### Value Score @@ -233,15 +233,15 @@ For automation, the same data is exposed read-only at `GET /api/cost/overview` ( ## Agent Registration Management -Admins can manage first-time agent onboarding directly from the UI: +Admins can manage first-time agent onboarding directly from the UI. ### Enrollment Codes -Go to **Settings** to mint a one-time enrollment code for a new agent. Each code is single-use and short-lived (default 10 minute expiry); it is consumed on the agent's first successful registration. You can list previously issued codes (the plaintext is never shown again, only an 8-character prefix and metadata) and delete unused ones. Already connected agents are unaffected because they use their own stored per-server tokens; a specific agent's run token can be rotated/revoked from the server detail actions, which forces that agent to reconnect. +Go to **Settings** to mint a one-time enrollment code for a new agent. Each code is single-use and short-lived (default 10-minute expiry) and is consumed on the agent's first successful registration. You can list previously issued codes (the plaintext is never shown again -- only an 8-character prefix and metadata) and delete unused ones. Already-connected agents are unaffected because they use their own stored per-server tokens; to revoke a specific agent, rotate its run token from the server detail actions, which forces that agent to reconnect. ### Clean Up Unconnected Placeholders -When failed onboarding leaves behind offline `New Server` placeholders, the Servers page shows a **Clean up unconnected** action. It removes only never-initialized offline placeholders and deliberately keeps online-but-uninitialized agents. +Failed onboarding can leave behind offline `New Server` placeholders. The Servers page then shows a **Clean up unconnected** action that removes only never-initialized offline placeholders and deliberately keeps online-but-uninitialized agents. diff --git a/apps/docs/content/docs/en/capabilities.mdx b/apps/docs/content/docs/en/capabilities.mdx index 24069523..3bef3255 100644 --- a/apps/docs/content/docs/en/capabilities.mdx +++ b/apps/docs/content/docs/en/capabilities.mdx @@ -4,11 +4,11 @@ description: Control which features each agent is allowed to use with per-server icon: ToggleRight --- -ServerBee supports per-agent capability toggles that let administrators control exactly which operations each server is allowed to perform, enforcing the principle of least privilege. +ServerBee provides per-agent capability toggles, letting administrators control exactly which operations each server is allowed to perform and enforcing the principle of least privilege. ## Capability List -ServerBee defines 11 capability bits, divided into two risk levels. The valid mask is `2047` (bits 0..=10). +ServerBee defines 11 capability bits across two risk levels. The valid mask is `2047` (bits 0..=10). ### High Risk (Disabled by Default) @@ -20,7 +20,7 @@ ServerBee defines 11 capability bits, divided into two risk levels. The valid ma | **Docker Management** | `CAP_DOCKER` (128) | Allow Docker container monitoring, log streaming, and container actions | -These capabilities involve executing arbitrary code or accessing the filesystem on the target server. They are disabled by default. Only enable them on trusted servers. +These capabilities allow executing arbitrary code or accessing the filesystem on the target server, so they are disabled by default. Enable them only on trusted servers. @@ -39,15 +39,15 @@ File Manager requires additional agent-side configuration (`root_paths`, `deny_p | **Firewall Blocklist** | `CAP_FIREWALL_BLOCK` (512) | Allow agent to apply the server-pushed nftables blocklist. Requires root or `CAP_NET_ADMIN` plus the `nft` CLI on the host. See [Firewall Blocklist](/en/docs/firewall) | | **IP Quality** | `CAP_IP_QUALITY` (1024) | Allow agent to query third-party IP quality APIs for outbound IP scoring | -Newly registered agents default to a capabilities value of `1852` (auto upgrade + three ping capabilities + security events + firewall blocklist + IP quality). +Newly registered agents default to a capabilities value of `1852` (auto upgrade + the three ping probes + security events + firewall blocklist + IP quality), which keeps the high-risk terminal, exec, file, and Docker capabilities off. ## Configuration ### Single Server 1. Go to Dashboard → click a server → server detail page -2. In the **Capabilities** section, use toggle switches to enable or disable features -3. Changes take effect immediately — the server pushes a `CapabilitiesSync` message to the agent via WebSocket +2. In the **Capabilities** section, use the toggle switches to enable or disable features +3. Changes take effect immediately -- the server pushes a `CapabilitiesSync` message to the agent over WebSocket ### Batch Configuration @@ -76,45 +76,45 @@ curl -X PUT https://your-server/api/servers/batch-capabilities \ -d '{"server_ids": ["id1", "id2"], "capabilities": 63}' ``` -The capabilities value is a bitwise OR of individual capability bits. Examples: +The capabilities value is a bitwise OR of the individual capability bits. Examples: - `1852` = Auto Upgrade + ICMP + TCP + HTTP + Security Events + Firewall Blocklist + IP Quality (default) - `2047` = all capabilities enabled (full valid mask, bits 0..=10) - `1980` = default + Docker - `1916` = default + File Manager - `316` = previous default (no firewall blocklist or IP quality) -- `60` = legacy default (no Security Events / firewall / IP quality) +- `60` = legacy default (no security events, firewall, or IP quality) - `0` = all capabilities disabled ## Defense in Depth -ServerBee validates capabilities on both the server side and agent side: +ServerBee validates capabilities on both the server side and the agent side. The effective capabilities are the intersection of the two -- if either side disables a bit, the operation is rejected. ### Server-Side Enforcement -- **Terminal**: WebSocket upgrade rejected with 403 +- **Terminal**: the WebSocket upgrade is rejected with 403 - **Exec**: `POST /api/tasks` and scheduled task runs filter out disabled servers and write synthetic results (`exit_code = -2`, message: "Capability 'exec' is disabled") - **Auto Upgrade**: `POST /api/servers/{id}/upgrade` returns 403 when `CAP_UPGRADE` is disabled -- **Ping and Traceroute**: Probe tasks are filtered by capability; traceroute requires effective `CAP_PING_ICMP` +- **Ping and Traceroute**: probe tasks are filtered by capability; traceroute requires effective `CAP_PING_ICMP` - **File Manager**: file endpoints reject requests before dispatch when `CAP_FILE` is disabled -- **Docker**: Docker read/action endpoints and Docker log WebSocket routes require `CAP_DOCKER` and agent runtime Docker support +- **Docker**: Docker read/action endpoints and the Docker log WebSocket routes require `CAP_DOCKER` and agent runtime Docker support ### Agent-Side Enforcement -Even if a server-side message is bypassed, the agent checks capabilities locally: +Even if a server-side message is bypassed, the agent re-checks capabilities locally: -- Returns a `CapabilityDenied` message for unauthorized commands -- The server writes a synthetic result (`exit_code = -1`) upon receiving `CapabilityDenied` +- It returns a `CapabilityDenied` message for unauthorized commands +- On receiving `CapabilityDenied`, the server writes a synthetic result (`exit_code = -1`) - Denial events are recorded in the audit log ### Real-Time Sync When an administrator changes capabilities: -1. Server sends `CapabilitiesSync` to the target agent via WebSocket -2. Agent atomically updates its local capabilities value using `AtomicU32` -3. Server sends `CapabilitiesChanged` to all connected browsers via WebSocket -4. Frontend updates the UI state in real time -5. If ping-related capability bits change, the server automatically re-syncs ping tasks +1. The server sends `CapabilitiesSync` to the target agent over WebSocket +2. The agent atomically updates its local capabilities value via `AtomicU32` +3. The server sends `CapabilitiesChanged` to all connected browsers over WebSocket +4. The frontend updates its UI state in real time +5. If ping-related bits change, the server automatically re-syncs ping tasks ## Frontend Behavior @@ -125,15 +125,15 @@ When an administrator changes capabilities: - **Files button**: Hidden for servers without `CAP_FILE`; clicking opens the file manager at `/files/{serverId}` - **Docker link**: Hidden for servers without `CAP_DOCKER`; clicking navigates to `/servers/{serverId}/docker` -## Server Config vs Client Lock +## Server Config vs. Client Lock -Runtime capability state now has three layers: +Runtime capability state has three layers: - `capabilities`: the server-configured bitmap stored in the database - `agent_local_capabilities`: the bitmap allowed by the running agent process -- `effective_capabilities`: the runtime intersection actually enforced by the system +- `effective_capabilities`: the runtime intersection actually enforced (`capabilities & agent_local_capabilities`) -When an agent locally disables a capability, the UI shows the toggle as disabled with the tooltip `客户端关闭`. This means the running agent has locked that capability off locally, and the server cannot turn it back on until the agent is restarted with a different local policy. +When an agent disables a capability locally, the UI shows the toggle as disabled with the tooltip `客户端关闭` ("disabled by client"). The running agent has locked that capability off, and the server cannot turn it back on until the agent restarts with a different local policy. diff --git a/apps/docs/content/docs/en/cost-insights.mdx b/apps/docs/content/docs/en/cost-insights.mdx index 17664f83..848e1ead 100644 --- a/apps/docs/content/docs/en/cost-insights.mdx +++ b/apps/docs/content/docs/en/cost-insights.mdx @@ -28,7 +28,7 @@ Every server with a valid billing configuration produces the fields below (expos ## Input validation -The score is skipped when any of the following are true. The reason is still surfaced via `invalid_reason` so the UI can flag the entry instead of silently scoring it as 0: +The score is skipped when any of the following are true. The reason is still surfaced via `invalid_reason` so the UI can flag the entry rather than silently scoring it as 0: | `invalid_reason` | Trigger | |------------------|---------| diff --git a/apps/docs/content/docs/en/custom-widgets.mdx b/apps/docs/content/docs/en/custom-widgets.mdx index 700a2b00..61ceea54 100644 --- a/apps/docs/content/docs/en/custom-widgets.mdx +++ b/apps/docs/content/docs/en/custom-widgets.mdx @@ -4,7 +4,7 @@ description: Author custom dashboard widgets with React + Zod, and install them icon: Puzzle --- -ServerBee dashboards natively support custom widget modules. Each widget is a standalone ES module that an admin installs once and then drops onto any dashboard, exactly like a built-in widget. This guide covers the two supported install methods: +ServerBee dashboards natively support custom widget modules. Each widget is a standalone ES module that an admin installs once and can then drop onto any dashboard, exactly like a built-in widget. This guide covers the two supported install methods: - **Method B** — a single `.js` file (one widget per file) - **Method C** — a `.zip` collection bundle (multiple widgets in one upload, sharing the same build output) diff --git a/apps/docs/content/docs/en/dashboards.mdx b/apps/docs/content/docs/en/dashboards.mdx index 1ac2fd27..ae7a38c3 100644 --- a/apps/docs/content/docs/en/dashboards.mdx +++ b/apps/docs/content/docs/en/dashboards.mdx @@ -73,7 +73,7 @@ The widget shows an installation prompt when GeoIP data is missing. ## Performance & Widget Capacity -The dashboard uses viewport-gated lazy mounting (IntersectionObserver): widgets off-screen do not mount their charts at first paint, so the cold-load cost barely grows with widget count. **While scrolling**, however, charts (built on recharts) measure SVG text synchronously (`getBBox` / text size). When many widgets enter the viewport at once, those forced reflows accumulate and the scroll noticeably stutters. +The dashboard uses viewport-gated lazy mounting (IntersectionObserver): off-screen widgets do not mount their charts at first paint, so cold-load cost barely grows with widget count. **While scrolling**, however, charts (built on recharts) measure SVG text synchronously (`getBBox` / text size). When many widgets enter the viewport at once, those forced reflows accumulate and the scroll noticeably stutters. Measured (desktop Chrome, no CPU/network throttling, widgets mixed across line-chart / gauge / multi-line / disk-io): diff --git a/apps/docs/content/docs/en/file-manager.mdx b/apps/docs/content/docs/en/file-manager.mdx index 9bd0bb11..88681159 100644 --- a/apps/docs/content/docs/en/file-manager.mdx +++ b/apps/docs/content/docs/en/file-manager.mdx @@ -4,18 +4,18 @@ description: Browse, read, edit, upload, download, and manage remote files throu icon: FolderOpen --- -File Manager provides controlled remote filesystem access through the ServerBee agent. It is intended for operational tasks such as checking logs, editing small configuration files, and transferring files without opening a full terminal. +File Manager provides controlled remote filesystem access through the ServerBee agent. It is meant for operational tasks such as checking logs, editing small configuration files, and transferring files without opening a full terminal. -File Manager is a high-risk feature. Enable it only on trusted servers and restrict `root_paths` to the minimum directories needed. +File Manager is a high-risk feature. Enable it only on trusted servers, and restrict `root_paths` to the minimum directories needed. ## Requirements File Manager must be enabled at two layers: -1. **Server-side capability:** enable `CAP_FILE` for the server in **Settings → Capabilities** or the server detail page. -2. **Agent-side policy:** set `[file].enabled = true` and configure at least one allowed root path. +1. **Server-side capability** -- enable `CAP_FILE` for the server in **Settings → Capabilities** or on the server detail page. +2. **Agent-side policy** -- set `[file].enabled = true` and configure at least one allowed root path. Example `agent.toml`: @@ -44,7 +44,7 @@ Open a server's action menu and click **Files**, or navigate directly to: /files/{serverId} ``` -The button is hidden when the server does not have `CAP_FILE` in its effective capabilities. +The button is hidden when `CAP_FILE` is not in the server's effective capabilities. ## Permissions @@ -53,7 +53,7 @@ The button is hidden when the server does not have `CAP_FILE` in its effective c | Admin | Browse, stat, read, write, upload, download, delete, move, create directories, cancel transfers | | Member | Browse, stat, read, download, list own transfers | -All high-risk file operations are recorded in the audit log, including denied attempts when the capability is disabled. +All high-risk file operations are recorded in the audit log, including attempts denied because the capability is disabled. ## Supported Operations @@ -74,11 +74,11 @@ All high-risk file operations are recorded in the audit log, including denied at The agent enforces path safety before touching the filesystem: -- `root_paths` is an allow-list. Empty `root_paths` rejects all file operations. -- Paths must resolve inside one of the configured roots. +- `root_paths` is an allow-list; an empty list rejects all file operations. +- Every path must resolve inside one of the configured roots. - `deny_patterns` blocks sensitive names such as private keys, `.env*`, `shadow`, and `passwd`. -- The agent also checks local capabilities, so server-side capability changes cannot override an agent-local deny. -- The server checks `CAP_FILE` before dispatching file messages to the agent. +- The agent also checks its local capabilities, so a server-side change cannot override an agent-local deny. +- The server checks `CAP_FILE` before dispatching any file message to the agent. ## Limits @@ -88,11 +88,11 @@ The agent enforces path safety before touching the filesystem: | Agent read/download max file size | 1 GB | Agent `[file].max_file_size` / `SERVERBEE_FILE__MAX_FILE_SIZE` | | Inline read chunk | 384 KB | Protocol limit to keep WebSocket frames below the configured max size | -Uploads and downloads are chunked. Downloads create a temporary transfer on the server and can be cancelled while pending or in progress. +Uploads and downloads are chunked. A download creates a temporary transfer on the server that can be cancelled while pending or in progress. ## API -Read endpoints are available to Admin and Member users. Write endpoints require Admin. +Read endpoints are available to both Admin and Member users. Write endpoints require Admin. | Method | Path | Description | |--------|------|-------------| diff --git a/apps/docs/content/docs/en/firewall.mdx b/apps/docs/content/docs/en/firewall.mdx index e1611f19..c54ded55 100644 --- a/apps/docs/content/docs/en/firewall.mdx +++ b/apps/docs/content/docs/en/firewall.mdx @@ -4,7 +4,7 @@ description: Block inbound traffic from IPs and CIDRs across one or more agents icon: Shield --- -ServerBee can centrally manage an inbound-traffic blocklist. The server holds the canonical list; each opted-in agent applies it via `nftables`. +ServerBee can centrally manage an inbound-traffic blocklist. The server holds the canonical list, and each opted-in agent applies it via `nftables`. ## Requirements @@ -68,7 +68,7 @@ table inet serverbee { } ``` -The server pushes incremental adds/removes over WebSocket. On agent reconnect or capability transition, the server sends a `Reset` followed by a full `Sync`. The agent acks the result of each entry; failures keep the row eligible for retry on the next sync. +The server pushes incremental adds/removes over WebSocket. On agent reconnect or capability transition, it sends a `Reset` followed by a full `Sync`. The agent acks the result of each entry; failed entries stay eligible for retry on the next sync. ## Removing the cleanup @@ -85,7 +85,7 @@ nft delete table inet serverbee ## Audit log -Every action — `firewall_block_created`, `firewall_block_deleted`, `firewall_block_applied_agent`, `firewall_block_removed_agent`, `firewall_block_rejected_server`, `firewall_block_rejected_agent`, `firewall_auto_block_skipped_conflict`, `firewall_reset_acked` — is recorded in the audit log and visible in the Firewall page's Activity tab. +Every action is recorded in the audit log and shown on the Firewall page's Activity tab: `firewall_block_created`, `firewall_block_deleted`, `firewall_block_applied_agent`, `firewall_block_removed_agent`, `firewall_block_rejected_server`, `firewall_block_rejected_agent`, `firewall_auto_block_skipped_conflict`, `firewall_reset_acked`. ## Limitations diff --git a/apps/docs/content/docs/en/ip-quality.mdx b/apps/docs/content/docs/en/ip-quality.mdx index c36d0f70..1baf9277 100644 --- a/apps/docs/content/docs/en/ip-quality.mdx +++ b/apps/docs/content/docs/en/ip-quality.mdx @@ -4,7 +4,7 @@ description: Check each agent's egress IP against streaming and AI services, and icon: Globe --- -IP Quality lets each agent assess the quality of its VPS egress IP and report the results back to the server. It does two things: +IP Quality lets each agent assess its VPS egress IP and report the results back to the server. It does two things: 1. **Service unlock detection** — the agent issues HTTP requests from its egress IP to determine the unlock status of popular streaming, AI, and social services. 2. **IP metadata and risk scoring** — the server derives country, ASN, and IP type from its local GeoIP database and, when a third-party provider is configured, computes a fraud risk score. @@ -37,7 +37,7 @@ sudo serverbee restart agent ## Built-In Services -Nine services are seeded at startup. Each has a hardcoded detector that issues an HTTP request from the agent's egress IP and interprets the response to determine unlock status. +Nine services are seeded at startup. Each ships with a hardcoded detector that issues an HTTP request from the agent's egress IP and interprets the response to determine unlock status. | Service | Category | Notes | |---------|----------|-------| diff --git a/apps/docs/content/docs/en/mobile.mdx b/apps/docs/content/docs/en/mobile.mdx index 0ecf81a4..aeb051d6 100644 --- a/apps/docs/content/docs/en/mobile.mdx +++ b/apps/docs/content/docs/en/mobile.mdx @@ -4,7 +4,7 @@ description: Native iOS app for ServerBee with QR pairing, push notifications, a icon: Smartphone --- -ServerBee provides a native iOS companion app that brings server monitoring to your mobile device. Get real-time metrics, receive push notifications for alerts, and manage your servers on the go. +ServerBee provides a native iOS companion app that brings server monitoring to your phone: real-time metrics, push notifications for alerts, and server management on the go. ## Features diff --git a/apps/docs/content/docs/en/quick-start.mdx b/apps/docs/content/docs/en/quick-start.mdx index 0a023e35..4715d7ca 100644 --- a/apps/docs/content/docs/en/quick-start.mdx +++ b/apps/docs/content/docs/en/quick-start.mdx @@ -4,7 +4,7 @@ description: Install the ServerBee server and agents in minutes with the one-lin icon: Zap --- -The recommended way to deploy ServerBee is the `deploy/install.sh` one-line script. It handles the tedious parts for you — architecture detection, binary downloads, service registration — so it works out of the box: +The recommended way to deploy ServerBee is the `deploy/install.sh` one-line script. It handles the tedious parts for you — architecture detection, binary downloads, and service registration — so it works out of the box: - **Interactive wizard** — run it with no arguments to pick the language, component, and install method through prompts - **Architecture aware** — detects amd64/arm64 and pulls the matching Release binary diff --git a/apps/docs/content/docs/en/resource-usage.mdx b/apps/docs/content/docs/en/resource-usage.mdx index a07e6d26..4beee0f8 100644 --- a/apps/docs/content/docs/en/resource-usage.mdx +++ b/apps/docs/content/docs/en/resource-usage.mdx @@ -29,7 +29,7 @@ Measurement environment: 4-core AMD EPYC 7B13 / 8 GB / Ubuntu 24.04 (KVM), Agent Memory rose from ~5 MB at cold start to ~27 MB over 8 hours (at 8h the current value ≈ the peak, so it **appears to be near a plateau, pending 24h confirmation**). Same root cause as the Server: the Agent is also Rust + tokio, and glibc's default multi-arena allocator hoards freed memory inside the process instead of returning it to the OS (sysinfo produces many small allocations every 3s). This is not a code-level leak. If steady-state memory needs to be reduced further, set `MALLOC_ARENA_MAX=2` on the Agent's systemd service. -For comparison: among comparable resident monitoring agents, Netdata often reaches tens to hundreds of MB, Telegraf ~30–50 MB, node_exporter ~15–20 MB. The ServerBee Agent's steady-state ~27 MB memory + <1% CPU is still in the lightweight tier. +For comparison: among comparable resident monitoring agents, Netdata often reaches tens to hundreds of MB, Telegraf ~30–50 MB, and node_exporter ~15–20 MB. The ServerBee Agent's steady-state ~27 MB memory + <1% CPU keeps it firmly in the lightweight tier. ## Server diff --git a/apps/docs/content/docs/en/security-events.mdx b/apps/docs/content/docs/en/security-events.mdx index 13a2eac4..5bb4563a 100644 --- a/apps/docs/content/docs/en/security-events.mdx +++ b/apps/docs/content/docs/en/security-events.mdx @@ -4,7 +4,7 @@ description: Detect SSH logins, SSH brute-force attempts, and port scans on each icon: ShieldAlert --- -Agents detect host-level intrusion signals and stream structured events to the server. Raw logs never leave the host — only the parsed event metadata is reported. Events are browsable in the UI and can drive notifications through the standard alert pipeline. +Agents detect host-level intrusion signals and stream structured events to the server. Raw logs never leave the host — only parsed event metadata is reported. Events are browsable in the UI and can drive notifications through the standard alert pipeline. ## Event Types diff --git a/apps/docs/content/docs/en/security.mdx b/apps/docs/content/docs/en/security.mdx index 6b1c17bb..3df1ca90 100644 --- a/apps/docs/content/docs/en/security.mdx +++ b/apps/docs/content/docs/en/security.mdx @@ -4,7 +4,7 @@ description: Configure two-factor authentication, OAuth login, password manageme icon: Shield --- -ServerBee provides multiple layers of security including two-factor authentication (2FA), OAuth social login, password policies, and login rate limiting. +ServerBee layers several security controls: two-factor authentication (2FA), OAuth social login, password policies, and login rate limiting. ## Two-Factor Authentication (2FA) @@ -13,10 +13,10 @@ ServerBee supports TOTP (Time-based One-Time Password) based two-factor authenti ### Enabling 2FA 1. Log in and go to Settings → Security -2. In the "Two-Factor Authentication" section, click **Setup** -3. Scan the QR code (or manually enter the Base32 secret) -4. Generate a 6-digit verification code in your authenticator app -5. Enter the code and click **Enable** to complete setup +2. In the **Two-Factor Authentication** section, click **Setup** +3. Scan the QR code (or enter the Base32 secret manually) +4. Enter the 6-digit code from your authenticator app +5. Click **Enable** to finish setup Once enabled, a 6-digit TOTP code is required for every login. Codes refresh every 30 seconds. @@ -89,7 +89,7 @@ Passwords are hashed with argon2, following OWASP recommendations. ### First-Run Admin Credentials -On first start (when no users exist) ServerBee auto-creates an admin account with a randomly generated password and prints it once to the server/container logs as a highlighted credentials banner. There is no way to preset the username or password via environment variables. On first login you are required to change this password, and may optionally choose a different username at that time. +On first start (when no users exist), ServerBee auto-creates an admin account with a randomly generated password and prints it once to the server/container logs as a highlighted credentials banner. The username and password cannot be preset via environment variables. On first login you must change this password, and may optionally choose a different username. ## Login Security diff --git a/apps/docs/content/docs/en/server.mdx b/apps/docs/content/docs/en/server.mdx index d2d82fe3..e7f6345f 100644 --- a/apps/docs/content/docs/en/server.mdx +++ b/apps/docs/content/docs/en/server.mdx @@ -78,7 +78,7 @@ max_connections = 10 # Maximum SQLite connection pool size [auth] session_ttl = 86400 # Session lifetime in seconds (default: 24 hours) -max_servers = 0 # Soft limit for newly enrolled servers +max_servers = 0 # Soft limit for newly enrolled servers (0 = unlimited) secure_cookie = true # Set Secure flag on session cookies (disable for HTTP-only dev) [retention] diff --git a/apps/docs/content/docs/en/service-monitors.mdx b/apps/docs/content/docs/en/service-monitors.mdx index c393f8ba..7b2c58ca 100644 --- a/apps/docs/content/docs/en/service-monitors.mdx +++ b/apps/docs/content/docs/en/service-monitors.mdx @@ -4,7 +4,7 @@ description: Monitor SSL certificates, DNS records, HTTP keywords, TCP ports, an icon: Radar --- -Service Monitors are synthetic checks that run from the central ServerBee server. They let you monitor public-facing services even when those services are not running on a ServerBee agent host. +Service Monitors are synthetic checks that run from the central ServerBee server. Use them to monitor public-facing services even when those services do not run on a ServerBee agent host. Unlike Ping Monitoring, which asks agents to probe network targets, Service Monitors are evaluated by the server process. Results are stored in SQLite, shown in the dashboard, and can trigger notifications through notification groups. diff --git a/apps/docs/content/docs/en/status-page.mdx b/apps/docs/content/docs/en/status-page.mdx index 68189b9d..bbe4f488 100644 --- a/apps/docs/content/docs/en/status-page.mdx +++ b/apps/docs/content/docs/en/status-page.mdx @@ -4,7 +4,7 @@ description: Publish a public server health page with live metrics, incidents, m icon: Globe --- -ServerBee serves a single public status page at `https://your-server/status`. It is publicly accessible and requires no authentication, so you can share it with users or stakeholders to communicate service health. +ServerBee serves a single public status page at `https://your-server/status`. It requires no authentication, so you can share it with users or stakeholders to communicate service health. The page shows the servers and sections an administrator selects, backed by the public `GET /api/status/config` and `GET /api/status` endpoints. @@ -108,7 +108,7 @@ Incidents are public announcements for outages or degraded service. They can opt | `server_ids_json` | Optional affected servers | | `is_public` | Whether the incident is shown on the public status page | -An incident can carry multiple updates. Each update has its own `status` and `message`. Adding an update records the message and moves the incident to the update's status. Setting status to `resolved` also sets `resolved_at`. +An incident can carry multiple updates, each with its own `status` and `message`. Adding an update records the message and moves the incident to that update's status. Setting the status to `resolved` also sets `resolved_at`. ### API diff --git a/apps/docs/content/docs/zh/admin.mdx b/apps/docs/content/docs/zh/admin.mdx index 2c64fba0..bc5f0d83 100644 --- a/apps/docs/content/docs/zh/admin.mdx +++ b/apps/docs/content/docs/zh/admin.mdx @@ -58,7 +58,7 @@ ServerBee 自动记录关键操作的审计日志,帮助管理员追踪安全 GET /api/audit-logs?limit=50&offset=0 ``` -返回审计日志列表,支持分页参数 `limit` 和 `offset`。每条记录包含: +返回分页的审计记录列表,支持 `limit` 和 `offset` 参数。每条记录包含: ```json { @@ -219,9 +219,9 @@ GET /api/audit-logs?limit=50&offset=0 填写 `price` 和 `billing_cycle` 之后,ServerBee 会基于已记录的计费信息以及 Agent 上报的资源、利用率和在线时长,自动衍生出一组成本相关的信号。这些洞察会在三个位置展示: -- **服务器列表的成本单元格** -- `/servers` 表格在 CPU/内存/磁盘/流量旁边直接展示月度等价成本和价值评级,不必再到编辑弹窗里翻账单字段 -- **仪表盘 server card** -- 每张服务器卡片会显示一行紧凑的 cost 信号 + 注脚,概括 burn 速率和价值评级;如果计费配置缺失或非法会自动隐藏 -- **服务器详情的 cost insights 面板** -- 服务器详情页展示完整明细:月度等价成本、当前计费周期已消耗 / 剩余成本与 burn 百分比、剩余天数、归一化的资源单位成本(每 CPU 核 / 每 GB 内存 / 每 GB 磁盘 / 每 TB 流量限额),以及带原因和置信度的价值评分 +- **服务器列表的成本单元格** —— `/servers` 表格在 CPU/内存/磁盘/流量旁边直接展示每台服务器的月度等价成本和价值评级,不必再到编辑弹窗里翻找账单字段 +- **仪表盘服务器卡片** —— 每张卡片会显示一行紧凑的成本信号和注脚,概括消耗速率和价值评级;计费配置缺失或非法时自动隐藏 +- **服务器详情的成本洞察面板** —— 服务器详情页展示完整明细:月度等价成本、当前计费周期已消耗 / 剩余成本与消耗百分比、剩余天数、归一化的资源单位成本(每 CPU 核 / 每 GB 内存 / 每 GB 磁盘 / 每 TB 流量限额),以及带原因和置信度的价值评分 #### 价值评分 @@ -229,19 +229,19 @@ GET /api/audit-logs?limit=50&offset=0 #### API -同样的数据通过只读 API 暴露:`GET /api/cost/overview`(按币种汇总的舰队总览 + 每台服务器摘要)和 `GET /api/servers/{id}/cost-insights`(单台服务器的完整明细)。认证细节见 [API 参考](/zh/docs/api-reference#已认证读取端点)。 +同样的数据也通过只读 API 暴露,便于自动化使用:`GET /api/cost/overview`(按币种汇总的机群总览 + 每台服务器摘要)和 `GET /api/servers/{id}/cost-insights`(单台服务器的完整明细)。认证细节见 [API 参考](/zh/docs/api-reference#已认证读取端点)。 ## Agent 注册管理 -管理员现在可以直接在界面里管理首次接入流程: +管理员可以直接在界面里管理 Agent 的首次接入流程。 ### 一次性注册码 -进入「设置」页面,为新 Agent 生成一个一次性注册码。每个注册码单次使用且短时有效(默认 10 分钟过期),在 Agent 首次成功注册时被消费。可以查看已生成的注册码列表(不会再次显示明文,仅显示 8 位前缀和元信息)并删除未使用的注册码。已连接的 Agent 不受影响,因为它们使用各自已保存的每台服务器专属 token;如需吊销某个 Agent,可在服务器详情操作中轮换/吊销其 run token,吊销后该 Agent 会被强制重新连接。 +进入 **设置** 页面,为新 Agent 生成一个一次性注册码。每个注册码单次使用且短时有效(默认 10 分钟过期),在 Agent 首次成功注册时被消费。你可以查看已生成的注册码列表(不会再次显示明文,仅显示 8 位前缀和元信息)并删除未使用的注册码。已连接的 Agent 不受影响,因为它们使用各自已保存的每台服务器专属 token;如需吊销某个 Agent,可在服务器详情操作中轮换其 run token,该 Agent 会被强制重新连接。 -### 清理未连接占位服务器 +### 清理未连接的占位服务器 -如果失败的接入流程留下了离线的 `New Server` 占位条目,`/servers` 页面会显示 **Clean up unconnected** 操作。它只删除从未完成初始化的离线占位服务器,已经在线但尚未上报 `SystemInfo` 的节点会被保留。 +接入失败可能会留下离线的 `New Server` 占位条目。此时 **服务器** 页面会显示 **Clean up unconnected** 操作,它只删除从未完成初始化的离线占位服务器,并刻意保留在线但尚未初始化的 Agent。 diff --git a/apps/docs/content/docs/zh/capabilities.mdx b/apps/docs/content/docs/zh/capabilities.mdx index 59de7130..892de278 100644 --- a/apps/docs/content/docs/zh/capabilities.mdx +++ b/apps/docs/content/docs/zh/capabilities.mdx @@ -4,11 +4,11 @@ description: 为每台服务器独立控制 Agent 的功能权限,实现最小 icon: ToggleRight --- -ServerBee 支持为每台 Agent 独立控制可用功能。通过功能开关(Capability Toggles),管理员可以精确控制每台服务器允许执行的操作,实现最小权限原则。 +ServerBee 支持为每台 Agent 独立控制可用功能。通过功能开关(Capability Toggles),管理员可以精确控制每台服务器允许执行的操作,落实最小权限原则。 ## 功能列表 -ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 `2047`(bits 0..=10)。 +ServerBee 定义了 11 个功能位,分为两个风险等级,有效掩码为 `2047`(bits 0..=10)。 ### 高风险功能(默认关闭) @@ -20,7 +20,7 @@ ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 | **Docker Management** | `CAP_DOCKER` (128) | 允许 Docker 容器监控、日志流、容器操作 | -这些功能涉及在目标服务器上执行任意代码或访问文件系统,因此默认关闭。请仅在信任的服务器上启用。 +这些功能涉及在目标服务器上执行任意代码或访问文件系统,因此默认关闭。请仅在可信服务器上启用。 @@ -39,7 +39,7 @@ ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 | **Firewall Blocklist** | `CAP_FIREWALL_BLOCK` (512) | 允许 Agent 应用 Server 下发的 nftables 黑名单。需要 root 或 `CAP_NET_ADMIN`,并在主机安装 `nft` CLI。详见 [防火墙黑名单](/zh/docs/firewall) | | **IP Quality** | `CAP_IP_QUALITY` (1024) | 允许 Agent 调用第三方 IP 质量 API 给出口 IP 评分 | -新注册的 Agent 默认 capabilities 值为 `1852`(自动升级 + 三个 Ping 功能 + 安全事件检测 + 防火墙黑名单 + IP 质量)。 +新注册的 Agent 默认 capabilities 值为 `1852`(自动升级 + 三个 Ping 探测 + 安全事件检测 + 防火墙黑名单 + IP 质量),并保持高风险的终端、Exec、文件和 Docker 功能关闭。 ## 配置方式 @@ -47,7 +47,7 @@ ServerBee 定义了 11 个功能位,分为两个风险等级。有效掩码为 1. 进入 Dashboard → 点击目标服务器 → 服务器详情页 2. 在 **Capabilities** 区域,使用 toggle 开关启用或禁用各项功能 -3. 更改立即生效,Server 会通过 WebSocket 实时推送 `CapabilitiesSync` 消息到 Agent +3. 更改立即生效,Server 通过 WebSocket 实时推送 `CapabilitiesSync` 消息到 Agent ### 批量配置 @@ -82,25 +82,25 @@ capabilities 值是各功能位的按位或(OR)结果。例如: - `1980` = 默认值 + Docker - `1916` = 默认值 + 文件管理 - `316` = 上一版默认值(不含防火墙黑名单和 IP 质量) -- `60` = 旧版默认值(无安全事件检测 / 防火墙 / IP 质量) +- `60` = 旧版默认值(无安全事件检测、防火墙、IP 质量) - `0` = 全部功能禁用 ## 双重验证机制 -ServerBee 采用纵深防御(defense in depth)策略,在 Server 端和 Agent 端同时验证功能权限: +ServerBee 采用纵深防御(defense in depth)策略,在 Server 端和 Agent 端同时验证功能权限。有效能力是两者的交集,任一端禁用某个功能位,对应操作即被拒绝。 ### Server 端拦截 - **Terminal**:WebSocket 升级请求被 403 拦截 - **Exec**:`POST /api/tasks` 和计划任务运行会过滤无权限服务器,写入合成结果(`exit_code = -2`,提示 "Capability 'exec' is disabled") -- **Auto Upgrade**:`POST /api/servers/{id}/upgrade` 在未启用 `CAP_UPGRADE` 时返回 403 +- **Auto Upgrade**:未启用 `CAP_UPGRADE` 时,`POST /api/servers/{id}/upgrade` 返回 403 - **Ping 和 Traceroute**:按 capability 过滤探测任务;Traceroute 需要 effective `CAP_PING_ICMP` - **File Manager**:文件端点在下发前检查 `CAP_FILE`,未启用时直接拒绝 - **Docker**:Docker 读取/操作端点和 Docker 日志 WebSocket 需要 `CAP_DOCKER`,并要求 Agent 运行时支持 Docker ### Agent 端拒绝 -即使 Server 端消息被绕过,Agent 本地也会检查 capabilities: +即使 Server 端消息被绕过,Agent 本地也会再次检查 capabilities: - 收到不允许的命令时返回 `CapabilityDenied` 消息 - Server 收到 `CapabilityDenied` 后写入合成结果(`exit_code = -1`) @@ -114,7 +114,7 @@ ServerBee 采用纵深防御(defense in depth)策略,在 Server 端和 Age 2. Agent 使用 `AtomicU32` 原子更新本地 capabilities 值 3. Server 通过 WebSocket 发送 `CapabilitiesChanged` 到所有连接的浏览器 4. 前端实时更新 UI 状态 -5. 如果 Ping 相关 capability 发生变化,Server 自动触发 Ping 任务重同步 +5. 若 Ping 相关功能位发生变化,Server 自动触发 Ping 任务重同步 ## 前端表现 @@ -127,13 +127,13 @@ ServerBee 采用纵深防御(defense in depth)策略,在 Server 端和 Age ## Server 配置位与客户端锁定 -运行时的 capability 现在分成三层: +运行时的 capability 分为三层: - `capabilities`:Server 数据库中配置的功能位 - `agent_local_capabilities`:当前运行中的 Agent 本地允许位 -- `effective_capabilities`:系统实际执行时使用的交集结果 +- `effective_capabilities`:系统实际执行时使用的交集(`capabilities & agent_local_capabilities`) -当 Agent 本地策略关闭某个能力时,UI 会把对应开关直接禁用,并显示 tooltip `客户端关闭`。这表示当前运行中的 Agent 已经在本地把它锁死,Server 端不能强行重新打开,除非 Agent 以新的本地策略重新启动。 +当 Agent 本地策略关闭某个功能时,UI 会把对应开关直接禁用,并显示 tooltip `客户端关闭`。这表示当前运行中的 Agent 已在本地把它锁死,Server 端无法强行重新打开,除非 Agent 以新的本地策略重新启动。 diff --git a/apps/docs/content/docs/zh/file-manager.mdx b/apps/docs/content/docs/zh/file-manager.mdx index 7029e26e..c829cc72 100644 --- a/apps/docs/content/docs/zh/file-manager.mdx +++ b/apps/docs/content/docs/zh/file-manager.mdx @@ -4,18 +4,18 @@ description: 通过 ServerBee 浏览、读取、编辑、上传、下载和管 icon: FolderOpen --- -文件管理器通过 ServerBee Agent 提供受控的远程文件系统访问能力。它适合查看日志、编辑小型配置文件、传输文件等运维场景,不必打开完整终端。 +文件管理器通过 ServerBee Agent 提供受控的远程文件系统访问能力,适合查看日志、编辑小型配置文件、传输文件等运维场景,无需打开完整终端。 -文件管理器属于高风险功能。请只在可信服务器上启用,并把 `root_paths` 限制到最小必要目录。 +文件管理器属于高风险功能。请仅在可信服务器上启用,并将 `root_paths` 限制到最小必要目录。 ## 启用条件 文件管理必须在两层同时启用: -1. **Server 端 capability:** 在 **Settings → Capabilities** 或服务器详情页启用 `CAP_FILE`。 -2. **Agent 本地策略:** 设置 `[file].enabled = true`,并配置至少一个允许访问的根目录。 +1. **Server 端 capability**:在 **Settings → Capabilities** 或服务器详情页启用 `CAP_FILE`。 +2. **Agent 本地策略**:设置 `[file].enabled = true`,并配置至少一个允许访问的根目录。 示例 `agent.toml`: @@ -44,7 +44,7 @@ SERVERBEE_FILE__MAX_FILE_SIZE=1073741824 /files/{serverId} ``` -当服务器的 effective capabilities 中没有 `CAP_FILE` 时,前端会隐藏 Files 按钮。 +当服务器的 effective capabilities 中不含 `CAP_FILE` 时,前端会隐藏 Files 按钮。 ## 权限 @@ -53,7 +53,7 @@ SERVERBEE_FILE__MAX_FILE_SIZE=1073741824 | Admin | 浏览、stat、读取、写入、上传、下载、删除、移动、新建目录、取消传输 | | Member | 浏览、stat、读取、下载、查看自己的传输 | -所有高风险文件操作都会写入审计日志。因 capability 关闭而被拒绝的尝试也会记录。 +所有高风险文件操作都会写入审计日志,包括因 capability 关闭而被拒绝的尝试。 ## 支持的操作 @@ -74,11 +74,11 @@ SERVERBEE_FILE__MAX_FILE_SIZE=1073741824 Agent 在访问文件系统前会执行路径安全检查: -- `root_paths` 是允许列表。空列表会拒绝所有文件操作。 +- `root_paths` 是允许列表;空列表会拒绝所有文件操作。 - 路径解析后必须位于某个允许根目录内。 - `deny_patterns` 会拒绝敏感名称,例如私钥、`.env*`、`shadow`、`passwd`。 -- Agent 同样检查本地 capability,因此 Server 端不能覆盖 Agent 本地拒绝策略。 -- Server 在下发文件消息前也会检查 `CAP_FILE`。 +- Agent 同样会检查本地 capability,因此 Server 端的改动无法覆盖 Agent 本地的拒绝策略。 +- Server 在下发任何文件消息前也会检查 `CAP_FILE`。 ## 限制 @@ -88,11 +88,11 @@ Agent 在访问文件系统前会执行路径安全检查: | Agent 读取/下载文件大小 | 1 GB | Agent `[file].max_file_size` / `SERVERBEE_FILE__MAX_FILE_SIZE` | | 内联读取分块 | 384 KB | 协议限制,用于保证 WebSocket 帧小于最大消息大小 | -上传和下载均采用分块传输。下载会在 Server 端创建临时传输,pending 或 in progress 状态时可以取消。 +上传和下载均采用分块传输。下载会在 Server 端创建临时传输,处于 pending 或 in progress 状态时可以取消。 ## API -读取类端点对 Admin 和 Member 可用。写入类端点需要 Admin。 +读取类端点对 Admin 和 Member 均可用。写入类端点需要 Admin。 | 方法 | 路径 | 说明 | |------|------|------| diff --git a/apps/docs/content/docs/zh/ip-quality.mdx b/apps/docs/content/docs/zh/ip-quality.mdx index 272c3269..6326ccaa 100644 --- a/apps/docs/content/docs/zh/ip-quality.mdx +++ b/apps/docs/content/docs/zh/ip-quality.mdx @@ -4,7 +4,7 @@ description: 检测 Agent 出口 IP 的流媒体和 AI 服务解锁情况,并 icon: Globe --- -IP 质量检测功能让每台 Agent 评估其 VPS 出口 IP 的质量,并将结果上报到 Server。它做两件事: +IP 质量检测让每台 Agent 评估自身 VPS 出口 IP,并将结果上报到 Server。它做两件事: 1. **服务解锁检测** — Agent 从自身出口 IP 发送 HTTP 请求,判断流媒体、AI、社交等热门服务的解锁状态。 2. **IP 元数据与风险评分** — Server 通过本地 GeoIP 数据库获取国家、ASN 和 IP 类型,并在配置了第三方提供商时计算欺诈风险分。 diff --git a/apps/docs/content/docs/zh/mobile.mdx b/apps/docs/content/docs/zh/mobile.mdx index 4f676031..cf74d1ff 100644 --- a/apps/docs/content/docs/zh/mobile.mdx +++ b/apps/docs/content/docs/zh/mobile.mdx @@ -4,7 +4,7 @@ description: ServerBee 原生 iOS 配套应用,支持二维码配对、推送 icon: Smartphone --- -ServerBee 提供原生 iOS 配套应用,将服务器监控带到您的移动设备。获取实时指标、接收推送通知告警,并随时随地管理您的服务器。 +ServerBee 提供原生 iOS 配套应用,把服务器监控带到你的手机:实时指标、告警推送通知,以及随时随地管理服务器。 ## 功能特性 @@ -36,7 +36,7 @@ refresh_ttl = 2592000 # 刷新令牌有效期(秒),默认 30 天 ### 2. 从 Web 应用开始配对 -1. 登录您的 ServerBee Web 应用 +1. 登录你的 ServerBee Web 应用 2. 进入 **设置** → **移动设备** 3. 点击 **配对新设备** 4. 显示一个 5 分钟有效期的二维码 @@ -46,7 +46,7 @@ refresh_ttl = 2592000 # 刷新令牌有效期(秒),默认 30 天 1. 打开 ServerBee iOS 应用 2. 在欢迎屏幕点击 **扫描二维码** 3. 将相机对准 Web 应用中显示的二维码 -4. 应用将自动认证并显示您的服务器列表 +4. 应用将自动认证并显示你的服务器列表 ## 管理已配对设备 @@ -77,17 +77,17 @@ refresh_ttl = 2592000 # 刷新令牌有效期(秒),默认 30 天 1. 进入 **设置** → **通知** 2. 添加类型为 **APNs** 的新通知渠道 -3. 上传您的 APNs 认证密钥(来自 Apple Developer Portal 的 .p8 文件) -4. 输入您的 Team ID、Key ID 和 Bundle ID +3. 上传你的 APNs 认证密钥(来自 Apple Developer Portal 的 .p8 文件) +4. 输入你的 Team ID、Key ID 和 Bundle ID 5. 使用 **测试** 按钮测试配置 ### APNs 必填字段 | 字段 | 说明 | |------|------| -| Team ID | 您的 Apple Developer Team 标识符(10 位字符) | +| Team ID | 你的 Apple Developer Team 标识符(10 位字符) | | Key ID | Apple Developer Portal 中 APNs 认证密钥的 Key ID | -| Bundle ID | 您的 iOS 应用 Bundle 标识符(如 `com.example.serverbee`) | +| Bundle ID | 你的 iOS 应用 Bundle 标识符(如 `com.example.serverbee`) | | Private Key | Apple Developer Portal 中 .p8 文件的内容 | ### 创建 APNs 密钥 diff --git a/apps/docs/content/docs/zh/quick-start.mdx b/apps/docs/content/docs/zh/quick-start.mdx index 8a8ae900..79a3dab7 100644 --- a/apps/docs/content/docs/zh/quick-start.mdx +++ b/apps/docs/content/docs/zh/quick-start.mdx @@ -208,7 +208,8 @@ docker compose logs serverbee-server - - - + + + + diff --git a/apps/docs/content/docs/zh/resource-usage.mdx b/apps/docs/content/docs/zh/resource-usage.mdx index d4eb48da..517a0ba4 100644 --- a/apps/docs/content/docs/zh/resource-usage.mdx +++ b/apps/docs/content/docs/zh/resource-usage.mdx @@ -4,70 +4,70 @@ description: ServerBee Agent 与 Server 的 CPU、内存、磁盘、网络运行 icon: Gauge --- -本页记录 ServerBee **Agent** 与 **Server** 在运行时的资源开销实测数据,涵盖 CPU、内存、磁盘和网络。磁盘(数据库)增长的详细容量规划见 [存储与容量规划](/zh/docs/storage-sizing)。 +本页记录 ServerBee **Agent** 与 **Server** 运行时的资源开销实测数据,涵盖 CPU、内存、磁盘和网络。磁盘(数据库)增长的详细容量规划见 [存储与容量规划](/zh/docs/storage-sizing)。 -以下数值为实测,非估算。除非另有说明,均测于 2026-05-19,使用 v0.9.3。资源占用会随采集间隔、连接的 Agent 数量、是否启用 GPU/温度采集以及面板查询负载浮动,这里给出的是稳定的数量级参考。 +以下数值为实测值,非估算。除非另有说明,均测于 2026-05-19,使用 v0.9.3。资源占用会随采集间隔、连接的 Agent 数量、是否启用 GPU/温度采集以及面板查询负载浮动,这里给出的是稳定的数量级参考。 ## Agent -Agent 是轻量探针,CPU 开销可忽略,内存稳态在数十 MB 级别。 +Agent 是轻量探针,CPU 开销可忽略,内存稳态在数十 MB 级别。 -| 资源 | 冷启动(~90s) | 稳态(运行 8h) | 说明 | +| 资源 | 冷启动(~90s) | 稳态(运行 8h) | 说明 | |------|--------------|----------------|------| -| 内存(systemd cgroup,真实私有内存) | ~4.3 MB,峰值 ~5.1 MB | **~27 MB,峰值 ~28 MB** | 最准确的内存口径 | -| 内存(`ps` RSS,含共享库映射) | ~10 MB | **~34 MB** | 偏高估,含共享库 | -| CPU | ~0.7%(单核) | ~0.5%(单核) | 稳定,含每 3s 采集瞬时尖峰 | +| 内存(systemd cgroup,真实私有内存) | ~4.3 MB,峰值 ~5.1 MB | **~27 MB,峰值 ~28 MB** | 最准确的内存口径 | +| 内存(`ps` RSS,含共享库映射) | ~10 MB | **~34 MB** | 偏高估,含共享库 | +| CPU | ~0.7%(单核) | ~0.5%(单核) | 稳定,含每 3s 采集的瞬时尖峰 | | 线程数 | 10 | 9–10 | tokio runtime | | 二进制大小 | ~11 MB | — | linux-amd64 release | -| 网络 | 每个采集间隔(默认 3s)一条 `SystemReport` JSON + 周期性 ping/pong;数量级 ≤ 1–2 KB/s | — | 随采集间隔与启用指标项变化 | +| 网络 | 每个采集间隔(默认 3s)一条 `SystemReport` JSON + 周期性 ping/pong;数量级 ≤ 1–2 KB/s | — | 随采集间隔与启用指标项变化 | -测量环境:4 核 AMD EPYC 7B13 / 8 GB / Ubuntu 24.04(KVM),Agent v0.9.3,采集间隔 3s,启用温度采集,经 HTTPS WebSocket 上报至远端 Server。 +测量环境:4 核 AMD EPYC 7B13 / 8 GB / Ubuntu 24.04(KVM),Agent v0.9.3,采集间隔 3s,启用温度采集,经 HTTPS WebSocket 上报至远端 Server。 -内存从冷启动 ~5 MB 在 8 小时内升至 ~27 MB(8h 时当前值 ≈ 峰值,**疑似已接近平台,待 24h 数据确认**)。与 Server 同源:Agent 也是 Rust + tokio,默认 glibc 多 arena 分配器会把已释放内存攒在进程内不归还 OS(sysinfo 每 3s 采集产生大量小分配)。这不是代码层泄漏。若稳态内存需进一步压低,可给 Agent 的 systemd 服务设置 `MALLOC_ARENA_MAX=2`。 +内存从冷启动 ~5 MB 在 8 小时内升至 ~27 MB(8h 时当前值 ≈ 峰值,**疑似已接近平台,待 24h 数据确认**)。与 Server 同源:Agent 也是 Rust + tokio,glibc 默认的多 arena 分配器会把已释放内存攒在进程内不归还 OS(sysinfo 每 3s 采集产生大量小分配)。这不是代码层泄漏。若需进一步压低稳态内存,可给 Agent 的 systemd 服务设置 `MALLOC_ARENA_MAX=2`。 -对比参考:同类常驻监控 Agent 中,Netdata 常达数十~上百 MB,Telegraf ~30–50 MB,node_exporter ~15–20 MB。ServerBee Agent 稳态 ~27 MB 内存 + <1% CPU,仍属轻量一档。 +对比参考:同类常驻监控 Agent 中,Netdata 常达数十至上百 MB,Telegraf ~30–50 MB,node_exporter ~15–20 MB。ServerBee Agent 稳态 ~27 MB 内存 + <1% CPU,仍属轻量一档。 ## Server Server 内存开销主要受连接的 Agent 数量、面板/API 查询负载和数据保留策略影响。 -### 生产参考(Railway,真实负载) +### 生产参考(Railway,真实负载) -接入少量 Agent 的生产实例,在配置 `MALLOC_ARENA_MAX=2`(抑制 glibc 多 arena 内存驻留)后: +接入少量 Agent 的生产实例,在配置 `MALLOC_ARENA_MAX=2`(抑制 glibc 多 arena 内存驻留)后: | 资源 | 实测值 | 说明 | |------|--------|------| -| 内存 | 稳态 ~140–170 MB,长时间平稳无持续爬升 | 健康水位 | +| 内存 | 稳态 ~140–170 MB,长时间平稳无持续爬升 | 健康水位 | -未设置 `MALLOC_ARENA_MAX=2`(或未改用 jemalloc)时,glibc 默认多 arena 会把已释放内存攒在进程内不归还 OS,长期运行 RSS 会持续爬升(观测到 2–3 天涨至 ~800 MB–1 GB)。这并非代码层内存泄漏,但部署 Server 时建议设置该环境变量。 +未设置 `MALLOC_ARENA_MAX=2`(或未改用 jemalloc)时,glibc 默认的多 arena 分配器会把已释放内存攒在进程内不归还 OS,长期运行 RSS 会持续爬升(观测到 2–3 天涨至 ~800 MB–1 GB)。这并非代码层内存泄漏,但部署 Server 时建议设置该环境变量。 ### 自托管空载参考 -同一 VPS 上近乎空载(无活跃 Agent 连接)的 Server 实例: +同一 VPS 上近乎空载(无活跃 Agent 连接)的 Server 实例: | 资源 | 实测值 | 说明 | |------|--------|------| -| 内存(systemd cgroup) | ~15 MB,峰值 ~23.8 MB | 空载下限参考 | +| 内存(systemd cgroup) | ~15 MB,峰值 ~23.8 MB | 空载下限参考 | | 线程数 | 7 | tokio runtime | | CPU | 空载近乎为 0 | 随 Agent 数与查询负载上升 | -| 二进制大小 | ~47 MB | linux-amd64 release,内嵌前端 SPA | +| 二进制大小 | ~47 MB | linux-amd64 release,内嵌前端 SPA | ### 磁盘 -Server 磁盘占用来自 SQLite 数据库,随服务器数量、启用功能和保留策略增长。完整的 30 天容量公式与场景目录见 [存储与容量规划](/zh/docs/storage-sizing)。生产环境请在数据库基础大小上额外预留 10%–20% 作为 WAL 与突发写入缓冲。 +Server 磁盘占用来自 SQLite 数据库,随服务器数量、启用功能和保留策略增长。完整的 30 天容量公式与场景目录见 [存储与容量规划](/zh/docs/storage-sizing)。生产环境请在数据库基础大小上额外预留 10%–20% 作为 WAL 与突发写入的缓冲。 ## 注意事项 -- Railway 内存数值依赖已设置 `MALLOC_ARENA_MAX=2`;未设置时不适用。 -- 自托管 Server 数值测于近乎空载状态,**不代表**有大量 Agent 或高查询负载下的占用,仅作下限参考。 -- Agent 网络用量为按协议行为估算的数量级,实际值随采集间隔与启用的指标项变化。 -- 所有数值为数量级参考,用于容量规划与异常判断(关注**趋势**而非绝对值:稳态平稳 = 健康,持续爬升不回落 = 需排查)。 +- Railway 内存数值依赖已设置 `MALLOC_ARENA_MAX=2`,未设置时不适用。 +- 自托管 Server 数值测于近乎空载状态,**不代表**大量 Agent 或高查询负载下的占用,仅作下限参考。 +- Agent 网络用量为按协议行为估算的数量级,实际值随采集间隔与启用的指标项变化。 +- 所有数值均为数量级参考,用于容量规划与异常判断(关注**趋势**而非绝对值:稳态平稳 = 健康,持续爬升不回落 = 需排查)。 ## 相关文档 diff --git a/apps/docs/content/docs/zh/security-events.mdx b/apps/docs/content/docs/zh/security-events.mdx index 1dcde8a7..9590a796 100644 --- a/apps/docs/content/docs/zh/security-events.mdx +++ b/apps/docs/content/docs/zh/security-events.mdx @@ -4,7 +4,7 @@ description: 在 Agent 上检测 SSH 登录、SSH 爆破和端口扫描,并将 icon: ShieldAlert --- -Agent 在每台主机上检测三类主机级入侵信号,并将结构化事件流式上报到 Server。原始日志不会离开主机,只有解析后的事件元数据被上报。事件既可在控制台浏览,也可接入标准告警通道触发通知。 +Agent 在每台主机上检测三类主机级入侵信号,并将结构化事件流式上报到 Server。原始日志不会离开主机,仅上报解析后的事件元数据。事件可在控制台浏览,也可接入标准告警通道触发通知。 ## 事件类型 @@ -94,7 +94,7 @@ Alerts 页面提供三张 **预设卡片**,一键创建规则。预设已填 ### 去重 -通知按 `(rule_id, server_id, event_key)` 维度去重。`event_key` 包含源 IP,所以: +通知按 `(rule_id, server_id, event_key)` 维度去重。`event_key` 包含源 IP,因此: - 两个不同攻击者命中同一台服务器 → **两次** 通知。 - 同一攻击者在去重窗口内反复触发 → **一次** 通知。 diff --git a/apps/docs/content/docs/zh/security.mdx b/apps/docs/content/docs/zh/security.mdx index 8566d88a..54ae1045 100644 --- a/apps/docs/content/docs/zh/security.mdx +++ b/apps/docs/content/docs/zh/security.mdx @@ -4,7 +4,7 @@ description: 配置双因素认证、OAuth 登录、密码管理和登录安全 icon: Shield --- -ServerBee 提供多层安全防护,包括双因素认证 (2FA)、OAuth 社交登录、密码策略和登录限流。 +ServerBee 提供多层安全防护:双因素认证 (2FA)、OAuth 社交登录、密码策略和登录限流。 ## 双因素认证 (2FA) @@ -13,10 +13,10 @@ ServerBee 支持基于 TOTP (Time-based One-Time Password) 的双因素认证, ### 启用 2FA 1. 登录后进入 Settings → Security -2. 在 "Two-Factor Authentication" 区域点击 **Setup** -3. 扫描显示的 QR 码(或手动输入 Base32 密钥) -4. 在认证器应用中生成 6 位数验证码 -5. 输入验证码点击 **Enable** 完成启用 +2. 在 **Two-Factor Authentication** 区域点击 **Setup** +3. 扫描 QR 码(或手动输入 Base32 密钥) +4. 输入认证器应用中显示的 6 位验证码 +5. 点击 **Enable** 完成启用 启用后每次登录都需要输入 6 位 TOTP 验证码。验证码每 30 秒更新一次。 @@ -89,7 +89,7 @@ client_secret = "your-github-client-secret" ### 首次启动管理员凭据 -首次启动(数据库中没有任何用户)时,ServerBee 会自动创建管理员账号,随机生成密码,并以醒目的凭据横幅在 Server/容器日志中打印一次。无法通过环境变量预设用户名或密码。首次登录时你将被要求修改此密码,并可在此时选择一个新的用户名。 +首次启动(数据库中没有任何用户)时,ServerBee 会自动创建管理员账号,随机生成密码,并以醒目的凭据横幅在 Server/容器日志中打印一次。用户名和密码无法通过环境变量预设。首次登录时你必须修改此密码,并可选择一个新的用户名。 ## 登录安全 diff --git a/apps/docs/content/docs/zh/server.mdx b/apps/docs/content/docs/zh/server.mdx index d981a62a..f0a5cd07 100644 --- a/apps/docs/content/docs/zh/server.mdx +++ b/apps/docs/content/docs/zh/server.mdx @@ -25,13 +25,11 @@ curl -fsSL https://raw.githubusercontent.com/ZingerLittleBee/ServerBee/main/depl ### 二进制安装(手动) -从 [GitHub Releases](https://github.com/ZingerLittleBee/ServerBee/releases) 下载对应平台的预编译二进制文件自行运行: +从 [GitHub Releases](https://github.com/ZingerLittleBee/ServerBee/releases) 下载对应平台的预编译二进制并自行运行: ```bash -# Linux amd64 -wget https://github.com/ZingerLittleBee/ServerBee/releases/latest/download/serverbee-server-linux-amd64 -chmod +x serverbee-server-linux-amd64 -sudo mv serverbee-server-linux-amd64 /usr/local/bin/serverbee-server +chmod +x serverbee-server +./serverbee-server ``` ### Docker @@ -41,16 +39,12 @@ docker run -d \ --name serverbee \ -p 9527:9527 \ -v serverbee-data:/data \ - --restart unless-stopped \ ghcr.io/zingerlittlebee/serverbee-server:latest ``` ### 源码编译 ```bash -git clone https://github.com/ZingerLittleBee/ServerBee.git -cd ServerBee - # 先构建前端(会嵌入到 Server 二进制中) cd apps/web && bun install && bun run build && cd ../.. @@ -157,7 +151,7 @@ export SERVERBEE_GEOIP__MMDB_PATH="/path/to/GeoLite2-City.mmdb" ## 数据库 -ServerBee 使用 SQLite 存储所有持久化数据。首次启动时会自动在 `data_dir` 下创建数据库文件。启动时自动运行数据库迁移,无需手动维护表结构。 +ServerBee 使用 SQLite 存储所有持久化数据。首次启动时会自动在 `data_dir` 下创建数据库文件。 以下 SQLite pragma 会自动设置: @@ -168,6 +162,8 @@ ServerBee 使用 SQLite 存储所有持久化数据。首次启动时会自动 | `busy_timeout` | 5000ms | 数据库被锁时最多等待 5 秒 | | `foreign_keys` | ON | 强制外键引用完整性 | +启动时自动运行数据库迁移,无需手动维护表结构。 + ## 初始管理员账户 Server 首次启动时,如果 `users` 表为空,会自动创建管理员账户。这里没有用户名/密码环境变量:密码始终随机生成,并以醒目的凭据横幅在 Server/容器日志中打印一次。 @@ -278,17 +274,21 @@ scopes = ["openid", "email", "profile"] ## 反向代理 -ServerBee 本身不处理 TLS,生产环境建议使用反向代理来提供 HTTPS 支持。 + +不想手动配反向代理的话,安装脚本可以全自动完成这件事:安装时加 `--domain monitor.example.com --email admin@example.com`,或对已装好的 Server 执行 `sudo serverbee domain setup --domain monitor.example.com --email admin@example.com`。脚本会校验 DNS、装好 Caddy、写 Caddyfile 并签发 HTTPS 证书,同时把 `auth.secure_cookie` 设为 `true`。下面的手动配置仅在你想自己掌控反向代理时才需要。 + + +在反向代理后运行时,必须正确转发 WebSocket 连接。 -### Nginx 示例 +### Nginx -```nginx title="/etc/nginx/sites-available/serverbee" +```nginx server { listen 443 ssl http2; - server_name monitor.example.com; + server_name serverbee.example.com; - ssl_certificate /etc/letsencrypt/live/monitor.example.com/fullchain.pem; - ssl_certificate_key /etc/letsencrypt/live/monitor.example.com/privkey.pem; + ssl_certificate /etc/ssl/certs/serverbee.pem; + ssl_certificate_key /etc/ssl/private/serverbee.key; location / { proxy_pass http://127.0.0.1:9527; @@ -296,49 +296,22 @@ server { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; - } - # WebSocket 支持(必须配置) - location /api/ws/ { - proxy_pass http://127.0.0.1:9527; + # WebSocket 支持 proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_read_timeout 86400s; - proxy_send_timeout 86400s; - } - - # Agent WebSocket - location /api/agent/ws { - proxy_pass http://127.0.0.1:9527; - proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection "upgrade"; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; proxy_read_timeout 86400s; proxy_send_timeout 86400s; } } - -server { - listen 80; - server_name monitor.example.com; - return 301 https://$host$request_uri; -} ``` -WebSocket 路径(`/api/ws/` 和 `/api/agent/ws`)必须正确配置 `Upgrade` 和 `Connection` 头部,否则实时数据推送和 Agent 连接将无法工作。同时建议将 `proxy_read_timeout` 设置为较大值(如 86400s),防止长连接被反向代理主动断开。 +较大的 `proxy_read_timeout` 和 `proxy_send_timeout` 对 WebSocket 连接很重要。不配置的话,Nginx 可能过早关闭空闲连接,导致 Agent 和终端会话断开。 -### Caddy 示例 - - -不想手动配反向代理的话,安装脚本可以全自动完成这件事:安装时加 `--domain monitor.example.com --email admin@example.com`,或对已装好的 Server 执行 `sudo serverbee domain setup --domain monitor.example.com --email admin@example.com`。脚本会校验 DNS、装好 Caddy、写 Caddyfile 并签发 HTTPS 证书,同时把 `auth.secure_cookie` 设为 `true`。下面的手动配置仅在你想自己掌控反向代理时才需要。 - +### Caddy Caddy 会自动处理 HTTPS 证书和 WebSocket 代理,配置更加简洁: diff --git a/apps/docs/content/docs/zh/status-page.mdx b/apps/docs/content/docs/zh/status-page.mdx index d066ca21..41fda0ee 100644 --- a/apps/docs/content/docs/zh/status-page.mdx +++ b/apps/docs/content/docs/zh/status-page.mdx @@ -4,7 +4,7 @@ description: 发布包含实时指标、事件公告、维护窗口和可用性 icon: Globe --- -ServerBee 提供一个公开状态页,地址为 `https://your-server/status`。它无需登录即可访问,方便你向用户或相关方公示服务健康状态。 +ServerBee 提供一个公开状态页,地址为 `https://your-server/status`。它无需认证即可访问,方便你向用户或相关方公示服务健康状态。 页面展示管理员选定的服务器和模块,数据来自公开端点 `GET /api/status/config` 和 `GET /api/status`。 From 198e115e90978805e4c8c61dc173222c44651a95 Mon Sep 17 00:00:00 2001 From: ZingerLittleBee <6970999@gmail.com> Date: Sun, 31 May 2026 18:57:49 +0800 Subject: [PATCH 21/21] =?UTF-8?q?docs(env):=20sync=20ENV.md/docs=20with=20?= =?UTF-8?q?code,=20fix=20cn=E2=86=92zh=20paths=20and=20ipapi=5Fis=20endpoi?= =?UTF-8?q?nt=20default?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- AGENTS.md | 4 ++-- ENV.md | 4 ++-- apps/docs/content/docs/en/configuration.mdx | 4 ++-- apps/docs/content/docs/zh/configuration.mdx | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index f9653ecb..b7850eaa 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -90,7 +90,7 @@ RBAC: Admin (full access) vs Member (read-only). `require_admin` middleware on w - **Errors**: `AppError` enum → automatic HTTP status code mapping via `IntoResponse` - **API responses**: All endpoints return `Json>` wrapping data in `{ data: T }` - **OpenAPI**: Every endpoint annotated with `#[utoipa::path]`, every DTO with `#[derive(ToSchema)]`. Swagger UI at `/swagger-ui/` -- **Config**: Figment loads TOML then env vars. Prefix `SERVERBEE_`, nested separator `__` (double underscore). Example: `SERVERBEE_ADMIN__PASSWORD` → `admin.password`. **When adding/changing env vars, update `ENV.md` and `apps/docs/content/docs/{en,cn}/configuration.mdx` simultaneously.** +- **Config**: Figment loads TOML then env vars. Prefix `SERVERBEE_`, nested separator `__` (double underscore). Example: `SERVERBEE_ADMIN__PASSWORD` → `admin.password`. **When adding/changing env vars, update `ENV.md` and `apps/docs/content/docs/{en,zh}/configuration.mdx` simultaneously.** - **Capabilities**: u32 bitmask per server, defined in `crates/common/src/constants.rs` — `CAP_TERMINAL=1, CAP_EXEC=2, CAP_UPGRADE=4, CAP_PING_ICMP=8, CAP_PING_TCP=16, CAP_PING_HTTP=32, CAP_FILE=64, CAP_DOCKER=128, CAP_SECURITY_EVENTS=256, CAP_FIREWALL_BLOCK=512, CAP_IP_QUALITY=1024`. Default `CAP_DEFAULT=1852` (upgrade + ICMP/TCP/HTTP ping + security events + firewall blocklist + IP quality). Effective caps = `server_caps & agent_local_caps`; defense-in-depth validated on both sides. - **Migrations**: sea-orm migrations in `crates/server/src/migration/`. Run automatically on startup. **Only implement `up()` — leave `down()` as a no-op (`Ok(())`).** Migrations are not reversible to avoid accidental data loss. @@ -119,7 +119,7 @@ E2E manual verification checklists are in `tests/` directory, organized by featu ## Documentation -- **Fumadocs site**: `apps/docs/content/docs/{cn,en}/` — 16 MDX pages per language +- **Fumadocs site**: `apps/docs/content/docs/{en,zh}/` — 16 MDX pages per language - **OpenAPI**: Auto-generated at `/swagger-ui/` and `/api-docs/openapi.json` - **Architecture spec**: `docs/superpowers/specs/2026-03-12-serverbee-architecture-design.md` - **Progress tracking**: `docs/superpowers/plans/PROGRESS.md` diff --git a/ENV.md b/ENV.md index 2afc71af..6a38a1dd 100644 --- a/ENV.md +++ b/ENV.md @@ -6,7 +6,7 @@ Example: TOML `server.listen` → env var `SERVERBEE_SERVER__LISTEN` > **First-run admin account**: There is no admin username/password env var. On first start (when no users exist) the server auto-creates an admin account with a randomly generated password and prints it once to the server/container logs as a highlighted credentials banner — capture it from the logs. You must change this password on first login, and may optionally choose a different username at that time. -> **Maintainer Note**: When adding or modifying environment variables, update both this file and `apps/docs/content/docs/{en,cn}/configuration.mdx`. +> **Maintainer Note**: When adding or modifying environment variables, update both this file and `apps/docs/content/docs/{en,zh}/configuration.mdx`. ## Developer Workflow Env Vars @@ -121,7 +121,7 @@ Default risk-scoring works out of the box via [ipapi.is](https://ipapi.is) (no A | `SERVERBEE_IP_QUALITY__RISK_PROVIDER` | `ip_quality.risk_provider` | string | `"ipapi_is"` | Primary risk provider. One of: `none`, `ipapi_is`, `ip-api`. | | `SERVERBEE_IP_QUALITY__RISK_PROVIDER_FALLBACK` | `ip_quality.risk_provider_fallback` | string | `"ip-api"` | Fallback provider triggered on primary failure. Set to `none` to disable. | | `SERVERBEE_IP_QUALITY__IPAPI_IS__API_KEY` | `ip_quality.ipapi_is.api_key` | string | - | Optional. Configure for higher per-account rate limits. | -| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `ip_quality.ipapi_is.endpoint` | string | `https://api.ipapi.is` | Override for self-hosted mirrors or testing. | +| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `ip_quality.ipapi_is.endpoint` | string | `""` | Override for self-hosted mirrors or testing. Empty falls back to the built-in default `https://api.ipapi.is`. | **Migration from older versions:** Earlier releases supported four paid providers (Scamalytics, IPQualityScore, ProxyCheck, AbuseIPDB) configured via `SERVERBEE_IP_QUALITY__{SCAMALYTICS,IPQS,PROXYCHECK,ABUSEIPDB}__*`. These env vars are silently ignored. To restore equivalent functionality, fork or vendor the provider implementation from a tag prior to 2026-05-25. diff --git a/apps/docs/content/docs/en/configuration.mdx b/apps/docs/content/docs/en/configuration.mdx index 74d669cc..1df8c34e 100644 --- a/apps/docs/content/docs/en/configuration.mdx +++ b/apps/docs/content/docs/en/configuration.mdx @@ -135,7 +135,7 @@ Default risk-scoring works out of the box via [ipapi.is](https://ipapi.is) (no A | `SERVERBEE_IP_QUALITY__RISK_PROVIDER` | `"ipapi_is"` | Primary risk provider. One of: `none`, `ipapi_is`, `ip-api`. | | `SERVERBEE_IP_QUALITY__RISK_PROVIDER_FALLBACK` | `"ip-api"` | Fallback provider triggered on primary failure. Set to `none` to disable. | | `SERVERBEE_IP_QUALITY__IPAPI_IS__API_KEY` | -- | Optional. Configure for higher per-account rate limits. | -| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `https://api.ipapi.is` | Override for self-hosted mirrors or testing. | +| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `""` | Override for self-hosted mirrors or testing. Empty falls back to the built-in default `https://api.ipapi.is`. | **Migration from older versions:** Earlier releases supported four paid providers (Scamalytics, IPQualityScore, ProxyCheck, AbuseIPDB) configured via `SERVERBEE_IP_QUALITY__{SCAMALYTICS,IPQS,PROXYCHECK,ABUSEIPDB}__*`. These env vars are silently ignored. To restore equivalent functionality, fork or vendor the provider implementation from a tag prior to 2026-05-25. @@ -387,7 +387,7 @@ Default risk-scoring works out of the box via [ipapi.is](https://ipapi.is) (no A | Key | Type | Default | Description | |-----|------|---------|-------------| | `api_key` | string | -- | Optional. Configure for higher per-account rate limits. | -| `endpoint` | string | `"https://api.ipapi.is"` | Override for self-hosted mirrors or testing. | +| `endpoint` | string | `""` | Override for self-hosted mirrors or testing. Empty falls back to the built-in default `https://api.ipapi.is`. | --- diff --git a/apps/docs/content/docs/zh/configuration.mdx b/apps/docs/content/docs/zh/configuration.mdx index 34b95755..9caa1a28 100644 --- a/apps/docs/content/docs/zh/configuration.mdx +++ b/apps/docs/content/docs/zh/configuration.mdx @@ -135,7 +135,7 @@ ServerBee 使用 [Figment](https://github.com/SergioBenitez/Figment) 加载配 | `SERVERBEE_IP_QUALITY__RISK_PROVIDER` | `"ipapi_is"` | 主风险评分 Provider。可选:`none`、`ipapi_is`、`ip-api`。 | | `SERVERBEE_IP_QUALITY__RISK_PROVIDER_FALLBACK` | `"ip-api"` | 主 Provider 失败时的兜底。设为 `none` 关闭。 | | `SERVERBEE_IP_QUALITY__IPAPI_IS__API_KEY` | -- | 可选。配置后享受更高的账户级速率限制。 | -| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `https://api.ipapi.is` | 自建镜像或测试时覆盖。 | +| `SERVERBEE_IP_QUALITY__IPAPI_IS__ENDPOINT` | `""` | 自建镜像或测试时覆盖。留空则回退到内置默认 `https://api.ipapi.is`。 | **老版本升级说明:** 早期版本支持 4 个付费 Provider(Scamalytics、IPQualityScore、ProxyCheck、AbuseIPDB),通过 `SERVERBEE_IP_QUALITY__{SCAMALYTICS,IPQS,PROXYCHECK,ABUSEIPDB}__*` 配置。这些环境变量会被静默忽略。如需恢复对应能力,请从 2026-05-25 之前的 tag 中 fork 或 vendor 对应实现。 @@ -387,7 +387,7 @@ Agent 顶层键使用单下划线,嵌套键使用 `__`(双下划线)。 | 键 | 类型 | 默认值 | 说明 | |----|------|--------|------| | `api_key` | string | -- | 可选。配置后享受更高的账户级速率限制。 | -| `endpoint` | string | `"https://api.ipapi.is"` | 自建镜像或测试时覆盖。 | +| `endpoint` | string | `""` | 自建镜像或测试时覆盖。留空则回退到内置默认 `https://api.ipapi.is`。 | ---