Open
Conversation
Enables running the lingbot-map demo with user-provided images on Docker without any local Python/CUDA setup: - Dockerfile: CUDA 12.8 + PyTorch 2.9.1 + lingbot-map[vis] + FlashInfer - docker/entrypoint.sh: auto-downloads model from HuggingFace on first run, falls back to --use_sdpa if FlashInfer is unavailable - docker-compose.yml: mounts ./images and ./model, exposes port 8080 - README.md: Docker Quick Start section https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…sues nvidia/cuda tags can be unavailable without Docker Hub auth or may not exist for CUDA 12.8 with the cudnn9 suffix. pytorch/pytorch official images ship PyTorch + CUDA pre-installed and are publicly accessible without auth. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
- Add image design table (base image, backends, ports, volume layout) - Add directory structure diagram showing /app, /model, /data - Add dedicated section for built-in example scenes (church/oxford/university/loop) - Clarify that examples are baked into the image — no extra data mount needed - Expand tips section: pre-downloaded model, GPU memory, long sequences, fast inference https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
opencv-python requires libGL.so.1 which is not present in the pytorch/pytorch base image. Adding libgl1 resolves the error. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
demo.py: load checkpoint to CPU before GPU transfer to avoid holding both the state dict and the model weights in GPU memory simultaneously. Cast the aggregator (DINOv2 trunk) to bfloat16 on CPU before model.to(device) to halve its VRAM footprint (~2-3 GB saved). Heads remain in fp32. Dockerfile: rename PYTORCH_CUDA_ALLOC_CONF to PYTORCH_ALLOC_CONF (deprecated in newer PyTorch versions). https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Restores original load_model behaviour: checkpoint loaded directly to the target device in fp32. The bfloat16 pre-cast degraded model accuracy and is not appropriate outside of memory-constrained environments. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
map_location=device caused the state dict to be loaded onto GPU while the model itself was still on CPU. load_state_dict then performed a D2H copy, leaving the GPU state dict alive until the function returned — at which point model.to(device) also needed GPU space, doubling peak VRAM with no benefit. Changing to map_location="cpu" ensures a single H2D transfer via model.to(device). https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
The DINOv2 aggregator trunk accounts for ~2-3 GB of the fp32 model. Casting it on CPU before model.to(device) avoids the temporary fp32+bf16 coexistence on GPU that would OOM on cards with <=6 GB VRAM. Per the original authors (demo.py:329-336): "no measurable quality change". Heads remain in fp32; the matching cast in main() becomes a no-op. To revert: git revert HEAD or switch back to claude/docker-image-stream-demo-Lgnfl. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Instead of hardcoding the aggregator bf16 cast, gate it behind LOW_VRAM_MODE. Both image variants now build from a single Dockerfile on a single branch: # Standard (full precision) docker build -t lingbot-map-demo . # Low VRAM (~2-3 GB savings, aggregator in bf16) docker build --build-arg LOW_VRAM_MODE=1 -t lingbot-map-demo-light . https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Passing --conf_threshold <1.0 caused a viser error because the slider initial_value fell below the min constraint. Also, the 1.0 floor made it impossible to view low-confidence reconstructions (e.g. with --num_scale_frames 2) since all points were filtered out in the UI. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Prints world_points finite ratio and confidence score statistics to help diagnose why nothing appears in the viser viewer. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
collapsible layout hides the control panel by default, requiring users to find and click a toggle button. Switch to fixed so sliders/buttons are always visible on the side. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
demo.py:
- Add --output_dir (default /data/output) and --no_viewer flag
- export_results() writes three files after every inference run:
predictions.npz raw numpy arrays (world_points, depth, extrinsic, intrinsic, images)
pointcloud.ply confidence-filtered merged point cloud (binary PLY)
cameras.json per-frame c2w poses and intrinsic matrices
- Export runs unconditionally; viewer is launched unless --no_viewer is set
docker-compose.yml:
- Add ./output:/data/output volume mount
- Pass --output_dir /data/output in default command
docker-compose.lowvram.yml:
- New file for 8 GB VRAM machines (LOW_VRAM_MODE=1 build)
- Pre-configured windowed inference defaults
https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
IMAGE_HOST_PATH: changes the host-side volume mount source (default ./images) IMAGE_FOLDER: changes the container-side path passed to --image_folder (default /data/images) Examples: IMAGE_HOST_PATH=~/photos docker compose up # mount custom host dir IMAGE_FOLDER=/app/example/oxford docker compose up # use built-in sample https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…./example/oxford Single variable IMAGE_HOST_PATH controls the host-side mount. Default points to the bundled sample so bare `docker compose up` works out of the box. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
images_cpu has shape (1,S,C,H,W) but PLY export expected (S,C,H,W). Drop leading batch dim when present before saving. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
conf mean was 2.9 on oxford; threshold 0.0 produced ~1GB PLY. 2.0 filters low-confidence points while retaining the majority. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Loads cameras.json, draws frustums colored by frame order (cool colormap), trajectory line, start/end markers. Requires only numpy + matplotlib. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Reports: shape/dtype, NaN/Inf ratio, coordinate bounds, distance distribution, confidence percentiles, point counts per threshold, image pixel range. Plots: top-down/side/front 2D projections, confidence histogram, distance histogram, valid-points-per-frame bar chart. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…M points Without downsampling, conf>2.0 still produces ~490MB/32M points which overwhelms MeshLab and WebGL. Factor 4 brings it to ~2M points (~30MB). Increase point_size to 0.005 to compensate for sparser sampling. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
…rlap 8, mask_sky - export_results: apply downsample_factor as spatial stride on PLY export (was only affecting viser viewer; now 32M pts → ~2M pts with factor=4) - num_scale_frames 2→4: better global scale estimation - overlap_size 4→8: smoother window-to-window stitching - mask_sky: remove sky points (outdoor/driving sequences) https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
Checks if world_points are in front of / behind each camera by transforming to camera space. Reports front% per frame and flags coordinate convention bugs. Also shows depth histogram, reprojection overlay, and camera forward vector. https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
… scan, cam_fwd output - Print chunk_scales per window — flags clamped (1e-3/1e3) alignment failures - Scan every Nth frame for front% to reveal which windows are flipped - Print camera forward vector alongside cam_pos in per-frame table https://claude.ai/code/session_012nvgo5ETaSxQ7AjxLpPMfx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.