Cube in a Box

The Cube in a Box is a simple way to run the Open Data Cube. The current repository is a fork of https://github.com/opendatacube/cube-in-a-box with the following major changes:

Multi-user JupyterHub
shared folders for collaboration
Planetary Computer as a STAC datastore
ODC Explorer
Traefik integration as a reverse proxy
Multi-architecture support (AMD64 & ARM64)
Dask integration for parallel processing
Admin and User documentation (Quarto)

All the developments have made possible thanks to the financial support of the European Union ‘Horizon Europe Program’ that funded the LandShift (Grant Agreement no. 101182007), Nostradamus (Grant Agreement no. 101134888), and NEMESIS (Grant Agreement no. 101219087) projects.

Repository Structure

Makefile: Main entry point for all commands (start, stop, index, etc.).
docker-compose.yml: Docker services definition.
hub/: Configuration for JupyterHub.
- Dockerfile: Custom JupyterHub image definition.
- jupyterhub_config.py: Main configuration file.
- custom_authenticator.py: Custom logic to restrict signups to authorized users.
- spawner_hooks.py: Helper functions to configure user environments (volumes, permissions) before spawning.
- templates/: Custom UI templates (e.g., signup page).
data/: Configuration and data persistence.
- jupyterhub_data/: JupyterHub database and state.
- local_data/: Mapped to /local_data in containers.
- shared/: Read-only shared folder for all users.
docs/: Built documentation (Quarto).
quarto/: Source files for documentation.
products/: ODC product definitions.

How to use:

1. Local environment setup (Linux, macOS, Windows)

This project is run via docker compose and a Makefile. Before running docker compose commands through make, ensure you have:

Docker with Docker Compose support
GNU Make

Below are platform-specific setup instructions.

Linux

Install Docker Engine
- Install Docker Engine for your distribution (Ubuntu/Debian/Fedora, etc.) using the official Docker instructions.
- Add your user to the docker group so you can run Docker without sudo, then log out/in.
Install Docker Compose
- Recent Docker Engine installations include the Compose plugin and expose it as docker compose ....
- Verify:
  - docker --version
  - docker compose version
Install Make
- Install GNU Make using your package manager.
- Verify:
  - make --version

macOS

Install Docker Desktop
- Install Docker Desktop for Mac and ensure it is running.
- Verify:
  - docker --version
  - docker compose version
Install Make
- macOS typically has make available via Xcode Command Line Tools.
- Install if needed: xcode-select --install
- Verify:
  - make --version

Windows (recommended: WSL2 + Docker Desktop)

The simplest way to use make on Windows is to run the project inside WSL2 (Windows Subsystem for Linux) while using Docker Desktop as the Docker backend.

Install WSL2
- Install WSL2 and a Linux distribution (Ubuntu is a common choice).
- Open your WSL terminal (e.g., Ubuntu).
Install Docker Desktop
- Install Docker Desktop for Windows and enable:
  - Use WSL 2 based engine
  - WSL Integration for your chosen Linux distribution (Settings → Resources → WSL Integration)
Install Make inside WSL
- In your WSL terminal, install GNU Make:
  - Debian/Ubuntu: sudo apt update && sudo apt install -y make
- Verify:
  - make --version
Verify Docker access from WSL

In your WSL terminal, run:
- docker --version
- docker compose version
If these work, WSL is correctly talking to Docker Desktop.

Notes:

Run all make ... commands from the WSL terminal (not PowerShell) to ensure a consistent Linux-like environment.

Store the repository inside the WSL filesystem (e.g., ~/projects/...) for better performance than /mnt/c/....

Quick verification

Once installed, you should be able to run:

make --version
docker --version
docker compose version

2. Usage

Environment variables

This repository uses environment variables to configure the local domain, database credentials, and the Jupyter password.

Create a .env file (Docker Compose reads .env by default):
```
cp .env.default .env
```
Edit .env to match your setup:
- Set strong passwords for POSTGRES_PASS
- Configure JUPYTERHUB_ADMINS with admin usernames
- Optionally add regular users to JUPYTERHUB_USERS

Required variables

Variable	Required	Default (as provided)	Example	Description
`DOMAIN`	Yes	`localhost`	`localhost`	Hostname used to access the web endpoints (Jupyter/Explorer).
`IMAGE_VERSION`	Yes	`20260211`	`20260211`	Version of the images to use.
`POSTGRES_HOSTNAME`	Yes	`postgres`	`postgres`	Hostname used to access the PostgreSQL database.
`POSTGRES_PORT`	Yes	`5432`	`5432`	Port used to access the PostgreSQL database.
`POSTGRES_DBNAME`	Yes	`opendatacube`	`opendatacube`	PostgreSQL database name used by Open Data Cube.
`POSTGRES_USER`	Yes	`opendatacube`	`opendatacube`	PostgreSQL user for the Open Data Cube database.
`POSTGRES_PASS`	Yes	`opendatacubepassword`	`a-strong-password`	PostgreSQL password for the Open Data Cube database.
`JUPYTERHUB_ADMINS`	Yes	`admin`	`admin,bruno`	Comma-separated list of JupyterHub admin usernames.
`JUPYTERHUB_USERS`	No	`guest`	`guest,alice,bob`	Comma-separated list of authorized non-admin usernames.

Advanced Variables (for Docker-out-of-Docker)

These variables are automatically set in the environment but can be overridden if needed. They are crucial for mapping host paths to user container volumes when spawning containers from within the JupyterHub container.

Variable	Description
`HOST_PRODUCTS_DIR`	Host path to the `./products` directory.
`HOST_DATA_DIR`	Host path to the `./data/local_data` directory.
`HOST_DISTRIBUTED_CONFIG`	Host path to the `distributed.yaml` file.
`HOST_SHARED_STATIC`	Host path to the `./shared` directory.
`HOST_USER_FOLDERS`	Host path to the `./data/shared` directory.

User Management

JupyterHub uses NativeAuthenticator with a custom signup handler that restricts access to pre-authorized users only.

But admin users can add new users through the JupyterHub admin panel.

How User Authorization Works

Authorized Users: Only users listed in JUPYTERHUB_ADMINS or JUPYTERHUB_USERS in the .env file, or manually added by an admin user can successfully sign up
Unauthorized Users: Unauthorized users will see an error message directing them to contact the administrator if they try to sign up
Self-Service Signup: Authorized users can create their own accounts via the signup page
Admin Creation: Administrators can also create user accounts through the JupyterHub admin panel

Adding Authorized Users

Method 1: Via .env file (Recommended for initial setup)

Edit the .env file:

# Admin users (have full control over JupyterHub)
JUPYTERHUB_ADMINS=admin,bob

# Regular users (can only access their own notebooks)
JUPYTERHUB_USERS=guest,charlie

Restart JupyterHub to apply changes:
```
docker-compose restart jupyterhub
```
Users can now visit http://<DOMAIN>/jupyter/hub/signup to create their accounts

Method 2: Via JupyterHub Admin Panel (For ad-hoc user additions)

Log in as an admin user
File> Hub Control Panel > Admin, or navigate to http://<DOMAIN>/jupyter/hub/admin
Click Add Users
Enter the username and click Add
The user is created immediately and can sign up

User Signup Flow

For Authorized Users:

Visit http://<DOMAIN>/jupyter/hub/signup
Fill in username (must match an authorized one), password, and optional email
Submit the form
See success message: "The signup was successful! You can now go to the home page and log in to the system."
Log in at http://<DOMAIN>/jupyter/hub/login

For Unauthorized Users:

Contact the administrator to be added

Managing Existing Users

View all users:

Log in as admin → Navigate to http://<DOMAIN>/jupyter/hub/admin
You'll see a list of all users with their status and last activity

Edit user:

Click "Edit User" next to any user
You can make them admin, delete them, or manage their servers

Delete user:

Click "Edit User" → "Delete User"
This removes the user account but doesn't delete their notebook files (stored in jupyterhub-user-<username> volume)

Password reset or recovery:

Using the JupyterHub Admin interface, delete the user and re-create it, without deleting user volume
Inform user he will hav to sign up again

User Data and Notebooks

Each user's notebooks are stored in a Docker volume named jupyterhub-user-<username>. These volumes persist even if the user account is deleted.

Backup user notebooks:

docker run --rm -v jupyterhub-user-<username>:/source -v $(pwd)/backups:/backup alpine tar czf /backup/user-<username>-notebooks.tar.gz -C /source .

Restore user notebooks:

docker run --rm -v jupyterhub-user-<username>:/target -v $(pwd)/backups:/backup alpine tar xzf /backup/user-<username>-notebooks.tar.gz -C /target

Remove all user volumes:

make purge-users CONFIRM=1

Remove a specific user volume:

make purge-user HUB_USER=alice CONFIRM=1

Security Best Practices

Use strong passwords for admin accounts
Regularly review the user list in the admin panel
Remove unused accounts to minimize security risks
Backup user data regularly (see Backup and Restore section)
Keep admin users minimal - only trusted users should have admin access

Using the Open Data Cube via `make`

All interaction with the stack is wrapped behind make targets. To see the authoritative list on your machine:

make help

Command reference (from `make help`)

Command	Description
Runtime Control
`make up`	Start the environment in the background (then open Jupyter in your browser)
`make down`	Stop the running services (keeps your data and images)
`make status`	Show what is running (containers and their status)
`make logs`	Show live logs from all services (useful for troubleshooting)
`make docs`	Render Quarto documentation (Admin and User guides)
`make shell`	Open a terminal inside the Jupyter container (requires HUB_USER)
`make wait-for-db`	Wait for PostgreSQL to be ready to accept connections
Setup & Init
`make setup`	First-time setup (mode-dependent: uses pull in prod, build in dev)
`make init`	Initialize the Open Data Cube database (run once after setup)
`make build`	Build the images locally
`make build-nocache`	Build the images locally from scratch
`make pull`	Download all service images (recommended before first run in prod mode)
Data & Indexing
`make product`	Load product definitions into the database (describes available datasets)
`make index`	Index example data for the selected area/time (uses BBOX and DATETIME)
`make index-parallel`	Index data using the automated script (recommended)
`make index-serie`	Index data step-by-step (older method; slower)
`make update-explorer`	Rebuild the Explorer index so datasets appear in the web UI
Maintenance
`make backup`	Create a backup of the PostgreSQL database
`make restore`	Restore PostgreSQL database from a backup file (requires BACKUP_FILE and CONFIRM=1)
`make clean`	Stop everything and remove containers, volumes, and built images
`make purge-data`	Delete local data in ./data (pg, local_data, shared). Irreversible; requires CONFIRM=1
`make purge-user`	Remove a specific user container and volume. Irreversible; requires HUB_USER and CONFIRM=1
`make purge-users`	Remove all spawned JupyterHub user containers and volumes. Irreversible; requires CONFIRM=1
Advanced
`make release-push`	Build and push multi-architecture production images to the configured container registry
`make help`	Show available commands

Common usage patterns

First-time setup (default parameters):
```
make setup
```

Setup with a specific area/time (BBOX, DATETIME):

# Switzerland 1 year
make setup BBOX=5.95,45.81,10.50,47.81 DATETIME=2024-01-01/2024-12-31

# Switzerland all years (till end 2025, might take a while)
make setup BBOX=5.95,45.81,10.50,47.81 DATETIME=1984-01-01/2025-12-31

Start/stop and troubleshoot:
```
make up
make status
make logs
make down
```

Reset options (use with care):

# Stop everything and remove containers/volumes/images
make clean

# Irreversible: delete local data in ./data (requires confirmation)
make purge-data CONFIRM=1

# ⚠️ TOTAL WIPE OUT (USE WITH EXTRA CARE !!!)
make clean && make purge-data CONFIRM=1 && make purge-users CONFIRM=1
# then check eventual remains
docker ps -a && docker images -a && docker volume ls && ls -la ./data

Dev mode (local builds):

# Set dev mode for the entire session
export MODE=dev
make *

# Go back to prod mode
unset MODE

# One-off dev invocation (not recommended as it might requires to be repeated in several commands)
make up MODE=dev

Access to applications

JupyterHub is available on: http://<DOMAIN>/jupyter/ (Use NativeAuthenticator for login - admin users defined in JUPYTERHUB_ADMINS)
Explorer is available on: http://<DOMAIN>/explorer

Documentation

Detailed documentation is available in the docs/ directory (built using Quarto):

Admin Guide: docs/admin/index.html
User Guide: docs/user/index.html

Architecture and Integration

Reverse Proxy and Routing

This stack uses Traefik v3 as a reverse proxy. It handles routing based on the hostname (DOMAIN) and path prefixes (/jupyter, /explorer). Traefik also manages the internal docker network for service communication.

User Spawning (Docker-out-of-Docker)

JupyterHub uses the DockerSpawner with a Docker-out-of-Docker (DooD) pattern. The JupyterHub container has access to the host's /var/run/docker.sock, allowing it to spawn user notebook containers directly on the host machine. This ensures that user environments are isolated and can be managed by standard Docker tools.

Dask Integration

The environment is pre-configured for Dask parallel processing. The distributed.yaml file ensures that the Dask dashboard is accessible through the Jupyter proxy at /jupyter/proxy/{port}/status.

Shared Directory

This directory is shared among users of the JupyterHub instance.

Purpose

The primary purpose of this folder is to facilitate file sharing and collaboration between users.

The /notebooks/shared directory contains:

Static Content: Any file or directory in the ./shared folder on the host is mounted here as Read-Only at the exception of the user own folder.
User Folders: The all_users directory contains individual user folders.

Editing and saving shared folders and files

Static Content: Files directly under /notebooks/shared/ (from the host ./shared folder) are read-only for everyone in the Jupyter interface. To modify them, an admin must edit them on the host machine.
User Folders: Under /notebooks/shared/all_users/, you can see other users' folders (read-only) and your own folder (read-write). This allows you to copy notebooks from others but not modify their work directly.

To work on a shared notebook, copy it to your own workspace e.g.:

cp -r /notebooks/shared/notebooks_demo ~/my_notebooks_demo
cp /notebooks/shared/all_users/alice/analysis.ipynb ~/from_alice.ipynb

Important Notes

Visibility: Content in this folder is visible to all users.
Data Safety: Do not place sensitive credentials or private data in this directory.

Backup and Restore

Creating a backup

To create a backup of your PostgreSQL database:

make backup

This will create a timestamped SQL dump file in the ./backups directory (e.g., ./backups/opendatacube_20260121_141530.sql).

Restoring from a backup

To restore a database from a backup file:

make restore BACKUP_FILE=./backups/opendatacube_20260121_141530.sql CONFIRM=1

⚠️ WARNING: Restoring will overwrite your current database. Make sure you have a recent backup before proceeding.

Volume backup procedures

The following directories contain persistent data and should be backed up regularly:

./data/pg/ - PostgreSQL database files
./data/local_data/ - Local data cache
./data/jupyterhub_data/ - JupyterHub configuration and user data
User notebooks are stored in Docker volumes named jupyterhub-user-<username>

Manual volume backup:

# Backup user notebooks
docker run --rm -v jupyterhub-user-<username>:/source -v $(pwd)/backups:/backup alpine tar czf /backup/user-<username>-notebooks.tar.gz -C /source .

# Backup all data directories
tar czf backups/data-backup-$(date +%Y%m%d).tar.gz ./data/

Restore user notebooks:

# Restore user notebooks
docker run --rm -v jupyterhub-user-<username>:/target -v $(pwd)/backups:/backup alpine tar xzf /backup/user-<username>-notebooks.tar.gz -C /target

License

This project is licensed under the MIT License.

You are free to use, modify, and distribute this software under the terms of the MIT License. For more details, see the full license text: MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
datacube-explorer		datacube-explorer
docs		docs
hub		hub
products		products
quarto		quarto
scripts		scripts
shared		shared
.dockerignore		.dockerignore
.env.default		.env.default
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
distributed.yaml		distributed.yaml
docker-bake.hcl		docker-bake.hcl
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
index-parallel.sh		index-parallel.sh
license-MIT.png		license-MIT.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Cube in a Box

Repository Structure

How to use:

1. Local environment setup (Linux, macOS, Windows)

Linux

macOS

Windows (recommended: WSL2 + Docker Desktop)

Quick verification

2. Usage

Environment variables

Required variables

Advanced Variables (for Docker-out-of-Docker)

User Management

How User Authorization Works

Adding Authorized Users

User Signup Flow

Managing Existing Users

User Data and Notebooks

Security Best Practices

Using the Open Data Cube via make

Command reference (from make help)

Common usage patterns

Access to applications

Documentation

Architecture and Integration

Reverse Proxy and Routing

User Spawning (Docker-out-of-Docker)

Dask Integration

Shared Directory

Purpose

Editing and saving shared folders and files

Important Notes

Backup and Restore

Creating a backup

Restoring from a backup

Volume backup procedures

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using the Open Data Cube via `make`

Command reference (from `make help`)

Packages