Low memory version demo by YoshiRi · Pull Request #43 · Robbyant/lingbot-map

YoshiRi · 2026-04-23T10:21:33Z

No description provided.

Enables running the lingbot-map demo with user-provided images on Docker without any local Python/CUDA setup: - Dockerfile: CUDA 12.8 + PyTorch 2.9.1 + lingbot-map[vis] + FlashInfer - docker/entrypoint.sh: auto-downloads model from HuggingFace on first run, falls back to --use_sdpa if FlashInfer is unavailable - docker-compose.yml: mounts ./images and ./model, exposes port 8080 - README.md: Docker Quick Start section https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

…sues nvidia/cuda tags can be unavailable without Docker Hub auth or may not exist for CUDA 12.8 with the cudnn9 suffix. pytorch/pytorch official images ship PyTorch + CUDA pre-installed and are publicly accessible without auth. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

- Add image design table (base image, backends, ports, volume layout) - Add directory structure diagram showing /app, /model, /data - Add dedicated section for built-in example scenes (church/oxford/university/loop) - Clarify that examples are baked into the image — no extra data mount needed - Expand tips section: pre-downloaded model, GPU memory, long sequences, fast inference https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

opencv-python requires libGL.so.1 which is not present in the pytorch/pytorch base image. Adding libgl1 resolves the error. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

demo.py: load checkpoint to CPU before GPU transfer to avoid holding both the state dict and the model weights in GPU memory simultaneously. Cast the aggregator (DINOv2 trunk) to bfloat16 on CPU before model.to(device) to halve its VRAM footprint (~2-3 GB saved). Heads remain in fp32. Dockerfile: rename PYTORCH_CUDA_ALLOC_CONF to PYTORCH_ALLOC_CONF (deprecated in newer PyTorch versions). https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Restores original load_model behaviour: checkpoint loaded directly to the target device in fp32. The bfloat16 pre-cast degraded model accuracy and is not appropriate outside of memory-constrained environments. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

map_location=device caused the state dict to be loaded onto GPU while the model itself was still on CPU. load_state_dict then performed a D2H copy, leaving the GPU state dict alive until the function returned — at which point model.to(device) also needed GPU space, doubling peak VRAM with no benefit. Changing to map_location="cpu" ensures a single H2D transfer via model.to(device). https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

The DINOv2 aggregator trunk accounts for ~2-3 GB of the fp32 model. Casting it on CPU before model.to(device) avoids the temporary fp32+bf16 coexistence on GPU that would OOM on cards with <=6 GB VRAM. Per the original authors (demo.py:329-336): "no measurable quality change". Heads remain in fp32; the matching cast in main() becomes a no-op. To revert: git revert HEAD or switch back to claude/docker-image-stream-demo-Lgnfl. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Instead of hardcoding the aggregator bf16 cast, gate it behind LOW_VRAM_MODE. Both image variants now build from a single Dockerfile on a single branch: # Standard (full precision) docker build -t lingbot-map-demo . # Low VRAM (~2-3 GB savings, aggregator in bf16) docker build --build-arg LOW_VRAM_MODE=1 -t lingbot-map-demo-light . https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Passing --conf_threshold <1.0 caused a viser error because the slider initial_value fell below the min constraint. Also, the 1.0 floor made it impossible to view low-confidence reconstructions (e.g. with --num_scale_frames 2) since all points were filtered out in the UI. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Prints world_points finite ratio and confidence score statistics to help diagnose why nothing appears in the viser viewer. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

collapsible layout hides the control panel by default, requiring users to find and click a toggle button. Switch to fixed so sliders/buttons are always visible on the side. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

demo.py: - Add --output_dir (default /data/output) and --no_viewer flag - export_results() writes three files after every inference run: predictions.npz raw numpy arrays (world_points, depth, extrinsic, intrinsic, images) pointcloud.ply confidence-filtered merged point cloud (binary PLY) cameras.json per-frame c2w poses and intrinsic matrices - Export runs unconditionally; viewer is launched unless --no_viewer is set docker-compose.yml: - Add ./output:/data/output volume mount - Pass --output_dir /data/output in default command docker-compose.lowvram.yml: - New file for 8 GB VRAM machines (LOW_VRAM_MODE=1 build) - Pre-configured windowed inference defaults https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

IMAGE_HOST_PATH: changes the host-side volume mount source (default ./images) IMAGE_FOLDER: changes the container-side path passed to --image_folder (default /data/images) Examples: IMAGE_HOST_PATH=~/photos docker compose up # mount custom host dir IMAGE_FOLDER=/app/example/oxford docker compose up # use built-in sample https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

…./example/oxford Single variable IMAGE_HOST_PATH controls the host-side mount. Default points to the bundled sample so bare `docker compose up` works out of the box. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

images_cpu has shape (1,S,C,H,W) but PLY export expected (S,C,H,W). Drop leading batch dim when present before saving. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

conf mean was 2.9 on oxford; threshold 0.0 produced ~1GB PLY. 2.0 filters low-confidence points while retaining the majority. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Loads cameras.json, draws frustums colored by frame order (cool colormap), trajectory line, start/end markers. Requires only numpy + matplotlib. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Reports: shape/dtype, NaN/Inf ratio, coordinate bounds, distance distribution, confidence percentiles, point counts per threshold, image pixel range. Plots: top-down/side/front 2D projections, confidence histogram, distance histogram, valid-points-per-frame bar chart. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

…M points Without downsampling, conf>2.0 still produces ~490MB/32M points which overwhelms MeshLab and WebGL. Factor 4 brings it to ~2M points (~30MB). Increase point_size to 0.005 to compensate for sparser sampling. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

…rlap 8, mask_sky - export_results: apply downsample_factor as spatial stride on PLY export (was only affecting viser viewer; now 32M pts → ~2M pts with factor=4) - num_scale_frames 2→4: better global scale estimation - overlap_size 4→8: smoother window-to-window stitching - mask_sky: remove sky points (outdoor/driving sequences) https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Checks if world_points are in front of / behind each camera by transforming to camera space. Reports front% per frame and flags coordinate convention bugs. Also shows depth histogram, reprojection overlay, and camera forward vector. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

… scan, cam_fwd output - Print chunk_scales per window — flags clamped (1e-3/1e3) alignment failures - Scan every Nth frame for front% to reveal which windows are flipped - Print camera forward vector alongside cam_pos in per-frame table https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

claude added 24 commits April 18, 2026 22:51

Fix cv2 ImportError: add libgl1 to apt packages

56f6b22

opencv-python requires libGL.so.1 which is not present in the pytorch/pytorch base image. Adding libgl1 resolves the error. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Add debug output to _process_pred_dict for visibility diagnosis

840aa6a

Prints world_points finite ratio and confidence score statistics to help diagnose why nothing appears in the viser viewer. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Remove temporary vis debug print from point_cloud_viewer

81e0ef7

https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Fix export_results crash: squeeze batch dim from images tensor

5802ff1

images_cpu has shape (1,S,C,H,W) but PLY export expected (S,C,H,W). Drop leading batch dim when present before saving. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Raise default conf_threshold to 2.0 in lowvram compose

5063608

conf mean was 2.9 on oxford; threshold 0.0 produced ~1GB PLY. 2.0 filters low-confidence points while retaining the majority. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Add tools/visualize_cameras.py: 3D camera trajectory viewer

c8ec8c8

Loads cameras.json, draws frustums colored by frame order (cool colormap), trajectory line, start/end markers. Requires only numpy + matplotlib. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low memory version demo#43

Low memory version demo#43
YoshiRi wants to merge 24 commits intoRobbyant:mainfrom
YoshiRi:claude/bf16-aggregator-lowmem

YoshiRi commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YoshiRi commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants