Skip to content

Feature: Implement CRIU (Checkpoint/Restore) to eliminate cold start latency #72

@BeSovereign

Description

@BeSovereign

Description

To save resources, Freeshard stops inactive containers. While the network request is held until the container starts, the physical "cold start" of heavy applications (like Immich) takes too long. This boot time exceeds the hardcoded HTTP timeout limits of many native mobile apps, causing them to drop the connection.

We need to implement CRIU (Checkpoint/Restore In Userspace) to eliminate this latency and push start times below the mobile app timeout thresholds.

Proposed Changes

  1. Checkpoint Logic: Modify the inactivity shutdown logic: Instead of running docker stop, the controller should execute docker checkpoint create <container_name> <checkpoint_name>. This freezes the RAM and CPU state directly to the SSD.
  2. Restore Logic: Modify the wake-on-demand logic: Instead of a normal container boot, the controller should execute docker start --checkpoint <checkpoint_name> <container_name>.
  3. Host Configuration: Ensure the Docker daemon is configured with experimental features enabled (required for CRIU integration) and that checkpoint images are managed cleanly to avoid storage bloat.

Motivation

CRIU restores the application state directly into RAM in a fraction of a second. Bypassing the application's boot sequence (like JVM initialization or Node.js startup) ensures that the container can process the held request fast enough to prevent native apps from crashing or timing out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions