Skip to content

CD-29 - Add deploying custom models docs#4

Open
WashingtonKK wants to merge 5 commits intomainfrom
cd-29
Open

CD-29 - Add deploying custom models docs#4
WashingtonKK wants to merge 5 commits intomainfrom
cd-29

Conversation

@WashingtonKK
Copy link
Copy Markdown
Contributor

@WashingtonKK WashingtonKK commented Feb 26, 2026

What type of PR is this?

This is a docs update

What does this do?

Updates docs on deploying custom models

Which issue(s) does this PR fix/relate to?

Have you included tests for your changes?

Did you document any new/modified features?

Notes

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for deploying custom and private models to Cube AI Confidential VMs (CVMs). It expands the previously minimal "private-model-upload.md" documentation to cover three distinct deployment approaches: build-time embedding via Buildroot, cloud-init provisioning for Ubuntu, and runtime upload over SSH.

Changes:

  • Comprehensive documentation for deploying custom models to CVMs using Ollama and vLLM backends
  • Detailed instructions for three deployment methods: Buildroot build-time embedding, cloud-init provisioning, and runtime SSH upload
  • Port reference table documenting the standard CVM network port mappings (6190→SSH, 6193→Cube Agent)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
content/docs/developer-guide/private-model-upload.md Complete rewrite expanding from 15 lines to 290 lines with detailed multi-platform deployment procedures, configuration examples, and verification steps
content/docs/developer-guide/index.md Added descriptive subtitle to "Private Model Upload" menu item clarifying the three deployment methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +85 to +107
3. Add a startup script in the overlay to register the model after Ollama starts:

```bash
mkdir -p cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/
cat > cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/register-custom-models.sh << 'SCRIPT'
#!/bin/sh
# Wait for Ollama to be ready
for i in $(seq 1 30); do
if curl -s http://localhost:11434/api/version > /dev/null 2>&1; then
break
fi
sleep 2
done

# Register custom models from Modelfiles
for mf in /etc/cube/modelfiles/*.Modelfile; do
[ -f "$mf" ] || continue
name=$(basename "$mf" .Modelfile)
ollama create "$name" -f "$mf"
done
SCRIPT
chmod +x cube/hal/buildroot/linux/board/cube/overlay/usr/libexec/ollama/register-custom-models.sh
```
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation creates a startup script at /usr/libexec/ollama/register-custom-models.sh but does not explain how this script is executed at boot time. The script needs to be integrated into the init system (systemd or SysV init) or called by the Ollama service startup. Please add documentation on how to ensure this script runs automatically after Ollama starts, for example by adding it to a systemd service as an ExecStartPost command or by integrating it into the Buildroot package configuration.

Copilot uses AI. Check for mistakes.

```bash
# Package model files on the host
tar -czvf my-model.tar.gz /path/to/model/files
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar command creates an archive from /path/to/model/files, but this path is just a placeholder. The command should clarify whether this refers to a directory or a file. If it's a directory of model files, the command should use tar -czvf my-model.tar.gz -C /path/to/model files/ or tar -czvf my-model.tar.gz /path/to/model/files/ (with trailing slash). If it's meant to tar multiple files, consider showing a more explicit example like tar -czvf my-model.tar.gz weights.gguf config.json to avoid confusion.

Suggested change
tar -czvf my-model.tar.gz /path/to/model/files
tar -czvf my-model.tar.gz /path/to/model/files/

Copilot uses AI. Check for mistakes.

```bash
BR2_PACKAGE_VLLM_MODEL="meta-llama/Llama-2-7b-hf"
BR2_PACKAGE_VLLM_GPU_MEMORY="0.90"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Buildroot configuration variable name BR2_PACKAGE_VLLM_GPU_MEMORY seems inconsistent with the runtime environment variable VLLM_GPU_MEMORY_UTILIZATION used in line 199. If these represent the same configuration parameter, consider using consistent naming (e.g., BR2_PACKAGE_VLLM_GPU_MEMORY_UTILIZATION) to make the relationship clearer. Verify that the Buildroot package correctly maps this configuration to the runtime environment variable.

Suggested change
BR2_PACKAGE_VLLM_GPU_MEMORY="0.90"
BR2_PACKAGE_VLLM_GPU_MEMORY_UTILIZATION="0.90"

Copilot uses AI. Check for mistakes.
- pip install vllm
- mkdir -p /var/lib/vllm/models
# Download from a private HuggingFace registry (requires token for gated models)
- |
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded token "your-token-here" should include a warning or note that this is a placeholder and should not be committed to version control or exposed in production configurations. Consider adding a comment or note that recommends using environment variables or secure secret management for tokens in actual deployments.

Suggested change
- |
- |
# NOTE: This is a placeholder token for documentation purposes only.
# Do NOT hardcode real tokens in scripts or configs; in production,
# pass HF_TOKEN via environment variables or a secure secret manager.

Copilot uses AI. Check for mistakes.
```

:::note
For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`.
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default password "password" is documented here for development/testing purposes. While this is acceptable for documentation, consider adding a security warning that this default password should be changed in production environments. This is particularly important since the password is exposed in plaintext and the SSH port is forwarded to the host.

Suggested change
For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`.
For Ubuntu cloud-init CVMs, the default SSH user is `ultraviolet` (password: `password`). For Buildroot CVMs, the default user is `root`. The `password` credential is intended **only** for local development and testing; on any production or network-accessible deployment, you **must** change this password immediately (or disable password login and use SSH keys), especially since the SSH port is forwarded from the host to the CVM.

Copilot uses AI. Check for mistakes.
## 1. Package Model Files
### Port Reference

CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch scripts (`hal/buildroot/qemu.sh` and `hal/ubuntu/qemu.sh`):
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path references hal/buildroot/qemu.sh and hal/ubuntu/qemu.sh here, but the existing CVM Management documentation (cvm-management.md) consistently refers to hal/cloud/qemu.sh. Please verify that these paths are correct and consistent with the actual repository structure. If the directory structure has changed, the CVM Management documentation should also be updated for consistency.

Suggested change
CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch scripts (`hal/buildroot/qemu.sh` and `hal/ubuntu/qemu.sh`):
CVM network access uses QEMU user-mode port forwarding. The following host-to-guest port mappings are configured in the QEMU launch script (`hal/cloud/qemu.sh`):

Copilot uses AI. Check for mistakes.
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
…uracy

Signed-off-by: WashingtonKK <washingtonkigan@gmail.com>
@SammyOina SammyOina requested a review from fbugarski March 25, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document deploying custom models with HAL and cloud init

3 participants