Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive documentation for deploying custom and private models to Cube AI Confidential VMs (CVMs). It expands the previously minimal "private-model-upload.md" documentation to cover three distinct deployment approaches: build-time embedding via Buildroot, cloud-init provisioning for Ubuntu, and runtime upload over SSH.
Changes:
- Comprehensive documentation for deploying custom models to CVMs using Ollama and vLLM backends
- Detailed instructions for three deployment methods: Buildroot build-time embedding, cloud-init provisioning, and runtime SSH upload
- Port reference table documenting the standard CVM network port mappings (6190→SSH, 6193→Cube Agent)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| content/docs/developer-guide/private-model-upload.md | Complete rewrite expanding from 15 lines to 290 lines with detailed multi-platform deployment procedures, configuration examples, and verification steps |
| content/docs/developer-guide/index.md | Added descriptive subtitle to "Private Model Upload" menu item clarifying the three deployment methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 3. Add a startup script in the overlay to register the model after Ollama starts: | ||
|
|
||
| ```bash | ||
| mkdir -p cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/ | ||
| cat > cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/register-custom-models.sh << 'SCRIPT' | ||
| #!/bin/sh | ||
| # Wait for Ollama to be ready | ||
| for i in $(seq 1 30); do | ||
| if curl -s http://localhost:11434/api/version > /dev/null 2>&1; then | ||
| break | ||
| fi | ||
| sleep 2 | ||
| done | ||
|
|
||
| # Register custom models from Modelfiles | ||
| for mf in /etc/cube/modelfiles/*.Modelfile; do | ||
| [ -f "$mf" ] || continue | ||
| name=$(basename "$mf" .Modelfile) | ||
| ollama create "$name" -f "$mf" | ||
| done | ||
| SCRIPT | ||
| chmod +x cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/register-custom-models.sh | ||
| ``` |
There was a problem hiding this comment.
The documentation creates a startup script at /usr/libexec/ollama/register-custom-models.sh but does not explain how this script is executed at boot time. The script needs to be integrated into the init system (systemd or SysV init) or called by the Ollama service startup. Please add documentation on how to ensure this script runs automatically after Ollama starts, for example by adding it to a systemd service as an ExecStartPost command or by integrating it into the Buildroot package configuration.
|
|
||
| ```bash | ||
| # Package model files on the host | ||
| tar -czvf my-model.tar.gz /path/to/model/files |
There was a problem hiding this comment.
The tar command creates an archive from /path/to/model/files, but this path is just a placeholder. The command should clarify whether this refers to a directory or a file. If it's a directory of model files, the command should use tar -czvf my-model.tar.gz -C /path/to/model files/ or tar -czvf my-model.tar.gz /path/to/model/files/ (with trailing slash). If it's meant to tar multiple files, consider showing a more explicit example like tar -czvf my-model.tar.gz weights.gguf config.json to avoid confusion.
| tar -czvf my-model.tar.gz /path/to/model/files | |
| tar -czvf my-model.tar.gz /path/to/model/files/ |
|
|
||
| ```bash | ||
| BR2_PACKAGE_VLLM_MODEL="meta-llama/Llama-2-7b-hf" | ||
| BR2_PACKAGE_VLLM_GPU_MEMORY="0.90" |
There was a problem hiding this comment.
The Buildroot configuration variable name BR2_PACKAGE_VLLM_GPU_MEMORY seems inconsistent with the runtime environment variable VLLM_GPU_MEMORY_UTILIZATION used in line 199. If these represent the same configuration parameter, consider using consistent naming (e.g., BR2_PACKAGE_VLLM_GPU_MEMORY_UTILIZATION) to make the relationship clearer. Verify that the Buildroot package correctly maps this configuration to the runtime environment variable.
| BR2_PACKAGE_VLLM_GPU_MEMORY="0.90" | |
| BR2_PACKAGE_VLLM_GPU_MEMORY_UTILIZATION="0.90" |
| - pip install vllm | ||
| - mkdir -p /var/lib/vllm/models | ||
| # Download from a private HuggingFace registry (requires token for gated models) | ||
| - | |
There was a problem hiding this comment.
The hardcoded token "your-token-here" should include a warning or note that this is a placeholder and should not be committed to version control or exposed in production configurations. Consider adding a comment or note that recommends using environment variables or secure secret management for tokens in actual deployments.
| - | | |
| - | | |
| # NOTE: This is a placeholder token for documentation purposes only. | |
| # Do NOT hardcode real tokens in scripts or configs; in production, | |
| # pass HF_TOKEN via environment variables or a secure secret manager. |
| ``` | ||
|
|
||
| :::note | ||
| For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`. |
There was a problem hiding this comment.
The default password "password" is documented here for development/testing purposes. While this is acceptable for documentation, consider adding a security warning that this default password should be changed in production environments. This is particularly important since the password is exposed in plaintext and the SSH port is forwarded to the host.
| For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`. | |
| For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`. The `password` credential is intended **only** for local development and testing; on any production or network-accessible deployment, you **must** change this password immediately (or disable password login and use SSH keys), especially since the SSH port is forwarded from the host to the CVM. |
| ## 1. Package Model Files | ||
| ### Port Reference | ||
|
|
||
| CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch scripts (`hal/buildroot/qemu.sh` and `hal/ubuntu/qemu.sh`): |
There was a problem hiding this comment.
The path references hal/buildroot/qemu.sh and hal/ubuntu/qemu.sh here, but the existing CVM Management documentation (cvm-management.md) consistently refers to hal/cloud/qemu.sh. Please verify that these paths are correct and consistent with the actual repository structure. If the directory structure has changed, the CVM Management documentation should also be updated for consistency.
| CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch scripts (`hal/buildroot/qemu.sh` and `hal/ubuntu/qemu.sh`): | |
| CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch script (`hal/cloud/qemu.sh`): |
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
…uracy Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
What type of PR is this?
This is a docs update
What does this do?
Updates docs on deploying custom models
Which issue(s) does this PR fix/relate to?
Have you included tests for your changes?
Did you document any new/modified features?
Notes