Skip to content

add docker feature to the project#34

Open
wild-deer wants to merge 2 commits intoRobbyant:mainfrom
wild-deer:main
Open

add docker feature to the project#34
wild-deer wants to merge 2 commits intoRobbyant:mainfrom
wild-deer:main

Conversation

@wild-deer
Copy link
Copy Markdown

1. High-Level Summary (TL;DR)

  • Impact Level: [Medium] - Introduces a standardized Docker container runtime environment for the project, simplifying environment setup (especially CUDA and PyTorch dependencies), with no effect on existing local runtime logic.
  • Key Changes:
    • Added Dockerfile: Based on PyTorch and CUDA 13.0 images, automatically configures Alibaba Cloud APT mirror and pre-installs the flashinfer library along with system dependencies.
    • Added docker-compose.yml: Provides one-click startup configuration, supporting full GPU mounting, 32GB shared memory, and persistent model caching.
    • Updated README.md: Added comprehensive Docker usage instructions, including build/run commands and guidance for adjusting CUDA versions to match local machines.
    • 🔧 Added .dockerignore: Excludes .git, .venv, caches, and build artifacts to optimize image build speed.

2. Visual Overview (Code & Logic Map)

graph TD
    %% Style definition (high contrast)
    classDef host fill:#bbdefb,color:#0d47a1,stroke:#0d47a1
    classDef docker fill:#c8e6c9,color:#1a5e20,stroke:#1a5e20
    classDef volume fill:#fff3e0,color:#e65100,stroke:#e65100

    subgraph "Host Machine"
        User["Developer"]:::host
        Source["Local code directory (./)"]:::host
    end

    subgraph "Docker Environment"
        Compose["docker-compose.yml"]:::docker
        Container["lingbot-map container (CUDA 13.0)"]:::docker
        
        subgraph "Volumes"
            HFCache["hf_cache"]:::volume
            TorchCache["torch_cache"]:::volume
        end
    end

    User -->|"docker compose run"| Compose
    Compose -->|"Build & Start"| Container
    Source -.->|"Mount workspace (/workspace)"| Container
    Container -.->|"Persistent read/write"| HFCache
    Container -.->|"Persistent read/write"| TorchCache
Loading

3. Detailed Change Analysis

🐳 Docker Containerization Configuration

Description: Added full Docker support configuration, making it easy to replicate complex GPU/CUDA environments across different machines (source files: Dockerfile, docker-compose.yml, .dockerignore).

Base Environment & Dependencies (Dockerfile):

Item Details Notes
Base Image spxiong/pytorch:2.11.0-py3.10.19-cuda13.0.2-ubuntu22.04 Pre-installed PyTorch 2.11 and CUDA 13.0.2
APT Mirror https://mirrors.aliyun.com/ubuntu Uses Alibaba Cloud mirror by default for faster builds in China (overridable via build arg UBUNTU_MIRROR)
System Dependencies ffmpeg, libgl1, libsm6, etc. Provides low-level libraries for vision processing (e.g., OpenCV)
Python Dependencies flashinfer-python, flashinfer-jit-cache Installs FlashInfer wheels for cu130 and the main project package .[vis]

Environment & Resource Mounting (docker-compose.yml):

Key Value / Target Path Notes
Volume: Code .:/workspace Mounts local code directory as container workspace, supporting hot-reload
Volume: HF Cache hf_cache:/root/.cache/huggingface Persists HuggingFace models to avoid repeated downloads
Volume: Torch Cache torch_cache:/root/.cache/torch Persists PyTorch cache data
Ports 8080:8080 Maps internal service port to host
Resource: gpus all Exposes all available host NVIDIA GPUs to the container
Resource: shm_size 32gb Increases shared memory to prevent out-of-memory issues in multi-process data loading (e.g., DataLoader)

📝 Documentation Update

Description: Updated user guide to support the new Docker workflow (source file: README.md).

  • Added a new "Docker / Docker Compose" section.
  • Provided clear startup commands (docker compose build & docker compose run --rm --service-ports lingbot-map).
  • Added an example for running the demo script demo.py inside the container.
  • High-value guidance: Included concrete Diff examples for modifying the Dockerfile to support other CUDA versions (e.g., downgrading from CUDA 13.0 to the more common CUDA 12.8).

4. Impact & Risk Assessment

  • ⚠️ Breaking Changes: None. All changes are new files and documentation updates, with no impact on existing local development/runtime workflows using conda/pip.
  • 🧪 Testing Recommendations:
    • Build Test: Run docker compose build on a machine with NVIDIA drivers and Docker installed, verifying successful Alibaba Cloud mirror switching and dependency installation (network variations may cause APT/PIP download failures).
    • GPU Passthrough Test: After starting the container, run nvidia-smi and python -c "import torch; print(torch.cuda.is_available()) in the terminal to confirm GPU detection.
    • Business Logic Test: Run demo.py inside the container to verify flashinfer and model inference work as expected without errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant