Skip to content

Low memory version demo#43

Open
YoshiRi wants to merge 24 commits intoRobbyant:mainfrom
YoshiRi:claude/bf16-aggregator-lowmem
Open

Low memory version demo#43
YoshiRi wants to merge 24 commits intoRobbyant:mainfrom
YoshiRi:claude/bf16-aggregator-lowmem

Conversation

@YoshiRi
Copy link
Copy Markdown

@YoshiRi YoshiRi commented Apr 23, 2026

No description provided.

claude added 24 commits April 18, 2026 22:51
Enables running the lingbot-map demo with user-provided images on Docker
without any local Python/CUDA setup:

- Dockerfile: CUDA 12.8 + PyTorch 2.9.1 + lingbot-map[vis] + FlashInfer
- docker/entrypoint.sh: auto-downloads model from HuggingFace on first run,
  falls back to --use_sdpa if FlashInfer is unavailable
- docker-compose.yml: mounts ./images and ./model, exposes port 8080
- README.md: Docker Quick Start section

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…sues

nvidia/cuda tags can be unavailable without Docker Hub auth or may not exist
for CUDA 12.8 with the cudnn9 suffix. pytorch/pytorch official images ship
PyTorch + CUDA pre-installed and are publicly accessible without auth.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
- Add image design table (base image, backends, ports, volume layout)
- Add directory structure diagram showing /app, /model, /data
- Add dedicated section for built-in example scenes (church/oxford/university/loop)
- Clarify that examples are baked into the image — no extra data mount needed
- Expand tips section: pre-downloaded model, GPU memory, long sequences, fast inference

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
opencv-python requires libGL.so.1 which is not present in the
pytorch/pytorch base image. Adding libgl1 resolves the error.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
demo.py: load checkpoint to CPU before GPU transfer to avoid holding both
the state dict and the model weights in GPU memory simultaneously. Cast
the aggregator (DINOv2 trunk) to bfloat16 on CPU before model.to(device)
to halve its VRAM footprint (~2-3 GB saved). Heads remain in fp32.

Dockerfile: rename PYTORCH_CUDA_ALLOC_CONF to PYTORCH_ALLOC_CONF
(deprecated in newer PyTorch versions).

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Restores original load_model behaviour: checkpoint loaded directly to
the target device in fp32. The bfloat16 pre-cast degraded model accuracy
and is not appropriate outside of memory-constrained environments.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
map_location=device caused the state dict to be loaded onto GPU while the
model itself was still on CPU. load_state_dict then performed a D2H copy,
leaving the GPU state dict alive until the function returned — at which point
model.to(device) also needed GPU space, doubling peak VRAM with no benefit.

Changing to map_location="cpu" ensures a single H2D transfer via model.to(device).

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
The DINOv2 aggregator trunk accounts for ~2-3 GB of the fp32 model.
Casting it on CPU before model.to(device) avoids the temporary fp32+bf16
coexistence on GPU that would OOM on cards with <=6 GB VRAM.

Per the original authors (demo.py:329-336): "no measurable quality change".
Heads remain in fp32; the matching cast in main() becomes a no-op.

To revert: git revert HEAD or switch back to claude/docker-image-stream-demo-Lgnfl.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Instead of hardcoding the aggregator bf16 cast, gate it behind LOW_VRAM_MODE.
Both image variants now build from a single Dockerfile on a single branch:

  # Standard (full precision)
  docker build -t lingbot-map-demo .

  # Low VRAM (~2-3 GB savings, aggregator in bf16)
  docker build --build-arg LOW_VRAM_MODE=1 -t lingbot-map-demo-light .

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Passing --conf_threshold <1.0 caused a viser error because the slider
initial_value fell below the min constraint. Also, the 1.0 floor made it
impossible to view low-confidence reconstructions (e.g. with --num_scale_frames 2)
since all points were filtered out in the UI.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Prints world_points finite ratio and confidence score statistics
to help diagnose why nothing appears in the viser viewer.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
collapsible layout hides the control panel by default, requiring users
to find and click a toggle button. Switch to fixed so sliders/buttons
are always visible on the side.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
demo.py:
- Add --output_dir (default /data/output) and --no_viewer flag
- export_results() writes three files after every inference run:
    predictions.npz   raw numpy arrays (world_points, depth, extrinsic, intrinsic, images)
    pointcloud.ply    confidence-filtered merged point cloud (binary PLY)
    cameras.json      per-frame c2w poses and intrinsic matrices
- Export runs unconditionally; viewer is launched unless --no_viewer is set

docker-compose.yml:
- Add ./output:/data/output volume mount
- Pass --output_dir /data/output in default command

docker-compose.lowvram.yml:
- New file for 8 GB VRAM machines (LOW_VRAM_MODE=1 build)
- Pre-configured windowed inference defaults

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
IMAGE_HOST_PATH: changes the host-side volume mount source (default ./images)
IMAGE_FOLDER: changes the container-side path passed to --image_folder (default /data/images)

Examples:
  IMAGE_HOST_PATH=~/photos docker compose up          # mount custom host dir
  IMAGE_FOLDER=/app/example/oxford docker compose up  # use built-in sample

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…./example/oxford

Single variable IMAGE_HOST_PATH controls the host-side mount.
Default points to the bundled sample so bare `docker compose up` works out of the box.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
images_cpu has shape (1,S,C,H,W) but PLY export expected (S,C,H,W).
Drop leading batch dim when present before saving.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
conf mean was 2.9 on oxford; threshold 0.0 produced ~1GB PLY.
2.0 filters low-confidence points while retaining the majority.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Loads cameras.json, draws frustums colored by frame order (cool colormap),
trajectory line, start/end markers. Requires only numpy + matplotlib.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Reports: shape/dtype, NaN/Inf ratio, coordinate bounds, distance distribution,
confidence percentiles, point counts per threshold, image pixel range.
Plots: top-down/side/front 2D projections, confidence histogram,
distance histogram, valid-points-per-frame bar chart.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…M points

Without downsampling, conf>2.0 still produces ~490MB/32M points which
overwhelms MeshLab and WebGL. Factor 4 brings it to ~2M points (~30MB).
Increase point_size to 0.005 to compensate for sparser sampling.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…rlap 8, mask_sky

- export_results: apply downsample_factor as spatial stride on PLY export
  (was only affecting viser viewer; now 32M pts → ~2M pts with factor=4)
- num_scale_frames 2→4: better global scale estimation
- overlap_size 4→8: smoother window-to-window stitching
- mask_sky: remove sky points (outdoor/driving sequences)

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Checks if world_points are in front of / behind each camera by transforming
to camera space. Reports front% per frame and flags coordinate convention bugs.
Also shows depth histogram, reprojection overlay, and camera forward vector.

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
… scan, cam_fwd output

- Print chunk_scales per window — flags clamped (1e-3/1e3) alignment failures
- Scan every Nth frame for front% to reveal which windows are flipped
- Print camera forward vector alongside cam_pos in per-frame table

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants