From d5937c4a4ba7c0bfda1e1fa587a5f7e866fe55ff Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 15 Nov 2025 23:47:52 +0000 Subject: [PATCH 1/4] Initial plan From b3e66ddc67096a7a56fb6764752a3bf0a70ec30c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 15 Nov 2025 23:53:12 +0000 Subject: [PATCH 2/4] Initial commit - format AGENTS.md Co-authored-by: Hexagon <419737+Hexagon@users.noreply.github.com> --- AGENTS.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 952354b..17750bf 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -32,6 +32,7 @@ pup/ Pup is built with Deno. Key commands are defined in `deno.json`: ### Formatting and Linting + ```bash deno fmt # Format code deno fmt --check # Check formatting @@ -39,11 +40,13 @@ deno lint # Lint code ``` ### Testing + ```bash deno test --allow-read --allow-write --allow-env --allow-net --allow-sys --allow-run --coverage=cov_profile ``` ### Build Tasks + ```bash deno task check # Run format, lint, and tests deno task build-schema # Generate JSON schema @@ -54,6 +57,7 @@ deno task build # Complete build process ## Pre-commit Checks The project uses GitHub Actions for CI (`.github/workflows/deno.yaml`): + - Format checking (`deno fmt --check`) - Linting (`deno lint`) - Full test suite with coverage @@ -66,6 +70,7 @@ Before submitting PRs, run `deno task check` locally to ensure all checks pass. Pup is part of an ecosystem of packages available on JSR: ### Core Dependencies + - **[@pup/api-definitions](https://github.com/hexagon/pup-api-definitions)** - API type definitions shared across the ecosystem - **[@pup/api-client](https://github.com/hexagon/pup-api-client)** - REST API client for CLI, plugins, and telemetry - **[@pup/telemetry](https://github.com/hexagon/pup-telemetry)** - Runtime-agnostic library for process telemetry and IPC @@ -73,6 +78,7 @@ Pup is part of an ecosystem of packages available on JSR: - **[@pup/plugin](https://github.com/hexagon/pup-plugin)** - Base library for creating Pup plugins ### Official Plugins + - **[pup-plugin-web-interface](https://github.com/hexagon/pup-plugin-web-interface)** - Web-based UI for managing Pup ## Key Concepts From df9f1fa8f6880fa792c99e36356e89b846372899 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 15 Nov 2025 23:58:10 +0000 Subject: [PATCH 3/4] Add exponential backoff for process restarts Co-authored-by: Hexagon <419737+Hexagon@users.noreply.github.com> --- docs/src/changelog.md | 4 + docs/src/examples/max-restarts/README.md | 76 ++++++++++++++ docs/src/examples/max-restarts/pup.jsonc | 9 ++ docs/src/usage/configuration.md | 4 +- lib/core/configuration.ts | 2 + lib/core/pup.ts | 10 +- test/core/restart-backoff.test.ts | 122 +++++++++++++++++++++++ 7 files changed, 225 insertions(+), 2 deletions(-) create mode 100644 docs/src/examples/max-restarts/README.md create mode 100644 test/core/restart-backoff.test.ts diff --git a/docs/src/changelog.md b/docs/src/changelog.md index c4a8ab9..15af353 100644 --- a/docs/src/changelog.md +++ b/docs/src/changelog.md @@ -9,6 +9,10 @@ nav_order: 13 All notable changes to this project will be documented in this section. +## [Unreleased] + +- feat(core): Add exponential backoff for process restarts via `restartBackoffMs` configuration option to prevent rapid restart loops + ## [1.0.4] - 2024-11-19 - fix(core): Fix service auto start after install on Windows by upgrading dependency @cross/service diff --git a/docs/src/examples/max-restarts/README.md b/docs/src/examples/max-restarts/README.md new file mode 100644 index 0000000..1d3ca90 --- /dev/null +++ b/docs/src/examples/max-restarts/README.md @@ -0,0 +1,76 @@ +# Max Restarts and Exponential Backoff Example + +This example demonstrates how to configure restart limits and exponential backoff for processes that may fail. + +## Overview + +The configuration shows two processes: + +1. **max-3-times**: Uses a fixed restart delay of 3 seconds and allows up to 3 restarts +2. **with-exponential-backoff**: Uses exponential backoff starting at 1 second and capping at 30 seconds, allowing up to 5 restarts + +## How Exponential Backoff Works + +When a process fails repeatedly, exponential backoff prevents rapid restart loops by increasing the delay between each restart attempt: + +- **1st restart**: 1 second delay (restartDelayMs) +- **2nd restart**: 2 seconds delay (1s × 2¹) +- **3rd restart**: 4 seconds delay (1s × 2²) +- **4th restart**: 8 seconds delay (1s × 2³) +- **5th restart**: 16 seconds delay (1s × 2⁴) +- **Further restarts**: 30 seconds delay (capped at restartBackoffMs) + +This approach: + +- Gives transient issues time to resolve +- Prevents resource exhaustion from rapid restart loops +- Allows more restart attempts while being system-friendly + +## Running the Example + +```bash +pup run +``` + +The `server.js` script exits immediately, so both processes will restart according to their configured policies until they reach their restart limits. + +## Configuration + +```jsonc +{ + "processes": [ + { + "id": "max-3-times", + "cmd": "deno run server.js", + "autostart": true, + "restart": "always", + "restartLimit": 3, + "restartDelayMs": 3000 + }, + { + "id": "with-exponential-backoff", + "cmd": "deno run server.js", + "autostart": true, + "restart": "always", + "restartLimit": 5, + "restartDelayMs": 1000, + "restartBackoffMs": 30000 + } + ] +} +``` + +## When to Use Exponential Backoff + +Exponential backoff is particularly useful for: + +- Services that may experience temporary network issues +- Processes that depend on external resources (databases, APIs) +- Applications that might fail during deployment or updates +- Any process where rapid restart loops could cause problems + +## Notes + +- If `restartBackoffMs` is not set, the process will use a fixed `restartDelayMs` between all restarts +- The backoff resets when the process exits successfully (status: FINISHED) or is manually stopped +- Combined with `restartLimit`, exponential backoff provides robust process recovery while preventing runaway restart loops diff --git a/docs/src/examples/max-restarts/pup.jsonc b/docs/src/examples/max-restarts/pup.jsonc index ed970d2..e3e419e 100644 --- a/docs/src/examples/max-restarts/pup.jsonc +++ b/docs/src/examples/max-restarts/pup.jsonc @@ -7,6 +7,15 @@ "restart": "always", "restartLimit": 3, "restartDelayMs": 3000 + }, + { + "id": "with-exponential-backoff", + "cmd": "deno run server.js", + "autostart": true, + "restart": "always", + "restartLimit": 5, + "restartDelayMs": 1000, + "restartBackoffMs": 30000 } ] } diff --git a/docs/src/usage/configuration.md b/docs/src/usage/configuration.md index aeb6c3c..050fe81 100644 --- a/docs/src/usage/configuration.md +++ b/docs/src/usage/configuration.md @@ -62,7 +62,9 @@ You need to specify one of these for each process, else the process will never s ### Restart policy - `restart` (optional): A string specifying when the process should be restarted. Allowed values: "always" or "error". -- `restartDelayMs` (optional): A number specifying the delay (in milliseconds) before restarting the process. +- `restartDelayMs` (optional): A number specifying the initial delay (in milliseconds) before restarting the process. Default: 10000ms (10 seconds), or 500ms when watching files. +- `restartBackoffMs` (optional): A number specifying the maximum delay (in milliseconds) for exponential backoff when a process fails repeatedly. When set, the restart delay will double after each + consecutive failure, starting from `restartDelayMs` and capping at `restartBackoffMs`. This prevents rapid restart loops that can exhaust system resources. If not set, restarts use a fixed delay. - `restartLimit` (optional): A number specifying the maximum number of restarts allowed for the process. ### Stop/restart policy diff --git a/lib/core/configuration.ts b/lib/core/configuration.ts index 2d82e81..4e3e55d 100644 --- a/lib/core/configuration.ts +++ b/lib/core/configuration.ts @@ -100,6 +100,7 @@ interface ProcessConfiguration { logger?: ProcessLoggerConfiguration restart?: string restartDelayMs?: number + restartBackoffMs?: number restartLimit?: number } @@ -157,6 +158,7 @@ const ConfigurationSchema = z.object({ terminateGracePeriod: z.number().min(0).default(0), restart: z.optional(z.enum(["always", "error"])), restartDelayMs: z.number().min(0).max(24 * 60 * 60 * 1000 * 1).default(10000), // Max one day + restartBackoffMs: z.optional(z.number().min(0).max(24 * 60 * 60 * 1000 * 1)), // Max one day - exponential backoff cap overrun: z.optional(z.boolean()), restartLimit: z.optional(z.number().min(0)), timeout: z.optional(z.number().min(1)), diff --git a/lib/core/pup.ts b/lib/core/pup.ts index 6956f7a..49c1fb5 100644 --- a/lib/core/pup.ts +++ b/lib/core/pup.ts @@ -279,7 +279,15 @@ class Pup { const msSinceExited = status.exited ? (new Date().getTime() - status.exited?.getTime()) : Infinity // Default restart delay to 10000ms, except when watching - const restartDelay = config.restartDelayMs ?? config.watch ? 500 : 10000 + const baseRestartDelay = config.restartDelayMs ?? config.watch ? 500 : 10000 + + // Calculate exponential backoff if restartBackoffMs is configured + let restartDelay = baseRestartDelay + if (config.restartBackoffMs !== undefined && status.restarts && status.restarts > 0) { + // Exponential backoff: delay = baseDelay * (2 ^ restarts), capped at restartBackoffMs + const exponentialDelay = baseRestartDelay * Math.pow(2, status.restarts - 1) + restartDelay = Math.min(exponentialDelay, config.restartBackoffMs) + } // Always restart if restartpolicy is undefined and autostart is true const restartPolicy = config.restart ?? ((config.autostart || config.watch) ? "always" : undefined) diff --git a/test/core/restart-backoff.test.ts b/test/core/restart-backoff.test.ts new file mode 100644 index 0000000..6977e33 --- /dev/null +++ b/test/core/restart-backoff.test.ts @@ -0,0 +1,122 @@ +/* + * Test exponential backoff for process restarts + * + * @file test/core/restart-backoff.test.ts + */ + +import type { Configuration } from "../../lib/core/configuration.ts" +import { ApiProcessState } from "@pup/api-definitions" +import { Pup } from "../../lib/core/pup.ts" +import { assertEquals, assertGreaterOrEqual, assertLessOrEqual } from "@std/assert" +import { test } from "@cross/test" + +test("Process restart with exponential backoff", async () => { + const TEST_PROCESS_ID = "restart-backoff-test" + // Command that exits immediately with error + const TEST_PROCESS_COMMAND = "deno eval 'Deno.exit(1)'" + + const config: Configuration = { + processes: [ + { + "id": TEST_PROCESS_ID, + "cmd": TEST_PROCESS_COMMAND, + "restart": "error", + "restartDelayMs": 100, // 100ms base delay + "restartBackoffMs": 1000, // Cap at 1 second + "restartLimit": 5, + }, + ], + } + const pup = new Pup(config) + await pup.init() + + // Find process + const testProcess = pup.processes.findLast((p) => p.getConfig().id === TEST_PROCESS_ID) + assertEquals(testProcess !== undefined, true) + + // Start process + pup.start(TEST_PROCESS_ID, "test") + + // Wait for first failure + await new Promise((resolve) => setTimeout(resolve, 200)) + + let status = testProcess!.getStatus() + assertEquals(status.status, ApiProcessState.ERRORED) + assertEquals(status.restarts, 0) // First run, no restarts yet + + // Wait for first restart (should happen quickly with 100ms base delay) + await new Promise((resolve) => setTimeout(resolve, 200)) + + status = testProcess!.getStatus() + assertGreaterOrEqual(status.restarts || 0, 1) // At least one restart + + // Wait for potential second restart (should take ~200ms with exponential backoff) + await new Promise((resolve) => setTimeout(resolve, 350)) + + status = testProcess!.getStatus() + assertGreaterOrEqual(status.restarts || 0, 2) // At least two restarts + + // Wait for potential third restart (should take ~400ms with exponential backoff) + await new Promise((resolve) => setTimeout(resolve, 550)) + + status = testProcess!.getStatus() + assertGreaterOrEqual(status.restarts || 0, 3) // At least three restarts + + // Verify that restarts are limited by restartLimit + await new Promise((resolve) => setTimeout(resolve, 2000)) + + status = testProcess!.getStatus() + assertLessOrEqual(status.restarts || 0, 5) // Should not exceed restartLimit + + // If limit reached, status should be EXHAUSTED + if (status.restarts === 5) { + assertEquals(status.status, ApiProcessState.EXHAUSTED) + } + + // Terminate pup + await pup.terminate(500) +}) + +test("Process restart without backoff (default behavior)", async () => { + const TEST_PROCESS_ID = "restart-no-backoff-test" + // Command that exits immediately with error + const TEST_PROCESS_COMMAND = "deno eval 'Deno.exit(1)'" + + const config: Configuration = { + processes: [ + { + "id": TEST_PROCESS_ID, + "cmd": TEST_PROCESS_COMMAND, + "restart": "error", + "restartDelayMs": 100, // 100ms fixed delay + // No restartBackoffMs - should use fixed delay + "restartLimit": 3, + }, + ], + } + const pup = new Pup(config) + await pup.init() + + // Find process + const testProcess = pup.processes.findLast((p) => p.getConfig().id === TEST_PROCESS_ID) + assertEquals(testProcess !== undefined, true) + + // Start process + pup.start(TEST_PROCESS_ID, "test") + + // Wait for first failure and restart + await new Promise((resolve) => setTimeout(resolve, 250)) + + let status = testProcess!.getStatus() + assertGreaterOrEqual(status.restarts || 0, 1) + + // With fixed 100ms delay, we should get more restarts in the same time + // compared to exponential backoff + await new Promise((resolve) => setTimeout(resolve, 400)) + + status = testProcess!.getStatus() + assertGreaterOrEqual(status.restarts || 0, 2) + + // Terminate pup + await pup.terminate(500) +}) From ed816d4911bec4a4665d02423fc29e26670c1688 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 16 Nov 2025 00:10:39 +0000 Subject: [PATCH 4/4] Fix test timing to account for watchdog interval Co-authored-by: Hexagon <419737+Hexagon@users.noreply.github.com> --- test/core/restart-backoff.test.ts | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/test/core/restart-backoff.test.ts b/test/core/restart-backoff.test.ts index 6977e33..86df93c 100644 --- a/test/core/restart-backoff.test.ts +++ b/test/core/restart-backoff.test.ts @@ -22,7 +22,7 @@ test("Process restart with exponential backoff", async () => { "cmd": TEST_PROCESS_COMMAND, "restart": "error", "restartDelayMs": 100, // 100ms base delay - "restartBackoffMs": 1000, // Cap at 1 second + "restartBackoffMs": 2000, // Cap at 2 seconds "restartLimit": 5, }, ], @@ -37,33 +37,36 @@ test("Process restart with exponential backoff", async () => { // Start process pup.start(TEST_PROCESS_ID, "test") - // Wait for first failure - await new Promise((resolve) => setTimeout(resolve, 200)) + // Wait for first failure (process exits immediately) + await new Promise((resolve) => setTimeout(resolve, 500)) let status = testProcess!.getStatus() assertEquals(status.status, ApiProcessState.ERRORED) assertEquals(status.restarts, 0) // First run, no restarts yet - // Wait for first restart (should happen quickly with 100ms base delay) - await new Promise((resolve) => setTimeout(resolve, 200)) + // Wait for first restart + // Watchdog runs every 1s, then waits for restartDelay (100ms) + await new Promise((resolve) => setTimeout(resolve, 1500)) status = testProcess!.getStatus() assertGreaterOrEqual(status.restarts || 0, 1) // At least one restart - // Wait for potential second restart (should take ~200ms with exponential backoff) - await new Promise((resolve) => setTimeout(resolve, 350)) + // Wait for potential second restart + // Watchdog 1s + exponential backoff delay (200ms for 2nd restart) + await new Promise((resolve) => setTimeout(resolve, 1500)) status = testProcess!.getStatus() assertGreaterOrEqual(status.restarts || 0, 2) // At least two restarts - // Wait for potential third restart (should take ~400ms with exponential backoff) - await new Promise((resolve) => setTimeout(resolve, 550)) + // Wait for potential third restart + // Watchdog 1s + exponential backoff delay (400ms for 3rd restart) + await new Promise((resolve) => setTimeout(resolve, 1500)) status = testProcess!.getStatus() assertGreaterOrEqual(status.restarts || 0, 3) // At least three restarts // Verify that restarts are limited by restartLimit - await new Promise((resolve) => setTimeout(resolve, 2000)) + await new Promise((resolve) => setTimeout(resolve, 3000)) status = testProcess!.getStatus() assertLessOrEqual(status.restarts || 0, 5) // Should not exceed restartLimit @@ -105,14 +108,14 @@ test("Process restart without backoff (default behavior)", async () => { pup.start(TEST_PROCESS_ID, "test") // Wait for first failure and restart - await new Promise((resolve) => setTimeout(resolve, 250)) + // Watchdog runs every 1s, then waits for restartDelay (100ms) + await new Promise((resolve) => setTimeout(resolve, 1500)) let status = testProcess!.getStatus() assertGreaterOrEqual(status.restarts || 0, 1) - // With fixed 100ms delay, we should get more restarts in the same time - // compared to exponential backoff - await new Promise((resolve) => setTimeout(resolve, 400)) + // With fixed 100ms delay, should get second restart after another 1.1s + await new Promise((resolve) => setTimeout(resolve, 1500)) status = testProcess!.getStatus() assertGreaterOrEqual(status.restarts || 0, 2)