Skip to content

LivingEarthLab/cube-in-a-box

 
 

Repository files navigation

Cube in a Box

License: MIT

The Cube in a Box is a simple way to run the Open Data Cube. The current repository is a fork of https://github.com/opendatacube/cube-in-a-box with the following major changes:

  • Multi-user JupyterHub
  • shared folders for collaboration
  • Planetary Computer as a STAC datastore
  • ODC Explorer
  • Traefik integration as a reverse proxy
  • Multi-architecture support (AMD64 & ARM64)
  • Dask integration for parallel processing
  • Admin and User documentation (Quarto)

All the developments have made possible thanks to the financial support of the European Union ‘Horizon Europe Program’ that funded the LandShift (Grant Agreement no. 101182007), Nostradamus (Grant Agreement no. 101134888), and NEMESIS (Grant Agreement no. 101219087) projects.

Repository Structure

  • Makefile: Main entry point for all commands (start, stop, index, etc.).
  • docker-compose.yml: Docker services definition.
  • hub/: Configuration for JupyterHub.
    • Dockerfile: Custom JupyterHub image definition.
    • jupyterhub_config.py: Main configuration file.
    • custom_authenticator.py: Custom logic to restrict signups to authorized users.
    • spawner_hooks.py: Helper functions to configure user environments (volumes, permissions) before spawning.
    • templates/: Custom UI templates (e.g., signup page).
  • data/: Configuration and data persistence.
    • jupyterhub_data/: JupyterHub database and state.
    • local_data/: Mapped to /local_data in containers.
    • shared/: Read-only shared folder for all users.
  • docs/: Built documentation (Quarto).
  • quarto/: Source files for documentation.
  • products/: ODC product definitions.

How to use:

1. Local environment setup (Linux, macOS, Windows)

This project is run via docker compose and a Makefile. Before running docker compose commands through make, ensure you have:

  • Docker with Docker Compose support
  • GNU Make

Below are platform-specific setup instructions.

Linux

  1. Install Docker Engine

    • Install Docker Engine for your distribution (Ubuntu/Debian/Fedora, etc.) using the official Docker instructions.
    • Add your user to the docker group so you can run Docker without sudo, then log out/in.
  2. Install Docker Compose

    • Recent Docker Engine installations include the Compose plugin and expose it as docker compose ....
    • Verify:
      • docker --version
      • docker compose version
  3. Install Make

    • Install GNU Make using your package manager.
    • Verify:
      • make --version

macOS

  1. Install Docker Desktop

  2. Install Make

    • macOS typically has make available via Xcode Command Line Tools.
    • Install if needed: xcode-select --install
    • Verify:
      • make --version

Windows (recommended: WSL2 + Docker Desktop)

The simplest way to use make on Windows is to run the project inside WSL2 (Windows Subsystem for Linux) while using Docker Desktop as the Docker backend.

  1. Install WSL2

    • Install WSL2 and a Linux distribution (Ubuntu is a common choice).
    • Open your WSL terminal (e.g., Ubuntu).
  2. Install Docker Desktop

    • Install Docker Desktop for Windows and enable:
      • Use WSL 2 based engine
      • WSL Integration for your chosen Linux distribution (Settings → Resources → WSL Integration)
  3. Install Make inside WSL

    • In your WSL terminal, install GNU Make:
      • Debian/Ubuntu: sudo apt update && sudo apt install -y make
    • Verify:
      • make --version
  4. Verify Docker access from WSL

    In your WSL terminal, run:

    • docker --version
    • docker compose version

    If these work, WSL is correctly talking to Docker Desktop.

Notes:

  • Run all make ... commands from the WSL terminal (not PowerShell) to ensure a consistent Linux-like environment.
  • Store the repository inside the WSL filesystem (e.g., ~/projects/...) for better performance than /mnt/c/....

Quick verification

Once installed, you should be able to run:

  • make --version
  • docker --version
  • docker compose version

2. Usage

Environment variables

This repository uses environment variables to configure the local domain, database credentials, and the Jupyter password.

  1. Create a .env file (Docker Compose reads .env by default):

    cp .env.default .env
  2. Edit .env to match your setup:

    • Set strong passwords for POSTGRES_PASS
    • Configure JUPYTERHUB_ADMINS with admin usernames
    • Optionally add regular users to JUPYTERHUB_USERS
Required variables
Variable Required Default (as provided) Example Description
DOMAIN Yes localhost localhost Hostname used to access the web endpoints (Jupyter/Explorer).
IMAGE_VERSION Yes 20260211 20260211 Version of the images to use.
POSTGRES_HOSTNAME Yes postgres postgres Hostname used to access the PostgreSQL database.
POSTGRES_PORT Yes 5432 5432 Port used to access the PostgreSQL database.
POSTGRES_DBNAME Yes opendatacube opendatacube PostgreSQL database name used by Open Data Cube.
POSTGRES_USER Yes opendatacube opendatacube PostgreSQL user for the Open Data Cube database.
POSTGRES_PASS Yes opendatacubepassword a-strong-password PostgreSQL password for the Open Data Cube database.
JUPYTERHUB_ADMINS Yes admin admin,bruno Comma-separated list of JupyterHub admin usernames.
JUPYTERHUB_USERS No guest guest,alice,bob Comma-separated list of authorized non-admin usernames.
Advanced Variables (for Docker-out-of-Docker)

These variables are automatically set in the environment but can be overridden if needed. They are crucial for mapping host paths to user container volumes when spawning containers from within the JupyterHub container.

Variable Description
HOST_PRODUCTS_DIR Host path to the ./products directory.
HOST_DATA_DIR Host path to the ./data/local_data directory.
HOST_DISTRIBUTED_CONFIG Host path to the distributed.yaml file.
HOST_SHARED_STATIC Host path to the ./shared directory.
HOST_USER_FOLDERS Host path to the ./data/shared directory.

User Management

JupyterHub uses NativeAuthenticator with a custom signup handler that restricts access to pre-authorized users only.

But admin users can add new users through the JupyterHub admin panel.

How User Authorization Works
  1. Authorized Users: Only users listed in JUPYTERHUB_ADMINS or JUPYTERHUB_USERS in the .env file, or manually added by an admin user can successfully sign up
  2. Unauthorized Users: Unauthorized users will see an error message directing them to contact the administrator if they try to sign up
  3. Self-Service Signup: Authorized users can create their own accounts via the signup page
  4. Admin Creation: Administrators can also create user accounts through the JupyterHub admin panel
Adding Authorized Users

Method 1: Via .env file (Recommended for initial setup)

  1. Edit the .env file:

    # Admin users (have full control over JupyterHub)
    JUPYTERHUB_ADMINS=admin,bob
    
    # Regular users (can only access their own notebooks)
    JUPYTERHUB_USERS=guest,charlie
  2. Restart JupyterHub to apply changes:

    docker-compose restart jupyterhub
  3. Users can now visit http://<DOMAIN>/jupyter/hub/signup to create their accounts

Method 2: Via JupyterHub Admin Panel (For ad-hoc user additions)

  1. Log in as an admin user
  2. File> Hub Control Panel > Admin, or navigate to http://<DOMAIN>/jupyter/hub/admin
  3. Click Add Users
  4. Enter the username and click Add
  5. The user is created immediately and can sign up
User Signup Flow

For Authorized Users:

  1. Visit http://<DOMAIN>/jupyter/hub/signup
  2. Fill in username (must match an authorized one), password, and optional email
  3. Submit the form
  4. See success message: "The signup was successful! You can now go to the home page and log in to the system."
  5. Log in at http://<DOMAIN>/jupyter/hub/login

For Unauthorized Users:

  1. Contact the administrator to be added
Managing Existing Users

View all users:

  • Log in as admin → Navigate to http://<DOMAIN>/jupyter/hub/admin
  • You'll see a list of all users with their status and last activity

Edit user:

  • Click "Edit User" next to any user
  • You can make them admin, delete them, or manage their servers

Delete user:

  • Click "Edit User" → "Delete User"
  • This removes the user account but doesn't delete their notebook files (stored in jupyterhub-user-<username> volume)

Password reset or recovery:

  • Using the JupyterHub Admin interface, delete the user and re-create it, without deleting user volume
  • Inform user he will hav to sign up again
User Data and Notebooks

Each user's notebooks are stored in a Docker volume named jupyterhub-user-<username>. These volumes persist even if the user account is deleted.

Backup user notebooks:

docker run --rm -v jupyterhub-user-<username>:/source -v $(pwd)/backups:/backup alpine tar czf /backup/user-<username>-notebooks.tar.gz -C /source .

Restore user notebooks:

docker run --rm -v jupyterhub-user-<username>:/target -v $(pwd)/backups:/backup alpine tar xzf /backup/user-<username>-notebooks.tar.gz -C /target

Remove all user volumes:

make purge-users CONFIRM=1

Remove a specific user volume:

make purge-user HUB_USER=alice CONFIRM=1
Security Best Practices
  1. Use strong passwords for admin accounts
  2. Regularly review the user list in the admin panel
  3. Remove unused accounts to minimize security risks
  4. Backup user data regularly (see Backup and Restore section)
  5. Keep admin users minimal - only trusted users should have admin access

Using the Open Data Cube via make

All interaction with the stack is wrapped behind make targets. To see the authoritative list on your machine:

make help
Command reference (from make help)
Command Description
Runtime Control
make up Start the environment in the background (then open Jupyter in your browser)
make down Stop the running services (keeps your data and images)
make status Show what is running (containers and their status)
make logs Show live logs from all services (useful for troubleshooting)
make docs Render Quarto documentation (Admin and User guides)
make shell Open a terminal inside the Jupyter container (requires HUB_USER)
make wait-for-db Wait for PostgreSQL to be ready to accept connections
Setup & Init
make setup First-time setup (mode-dependent: uses pull in prod, build in dev)
make init Initialize the Open Data Cube database (run once after setup)
make build Build the images locally
make build-nocache Build the images locally from scratch
make pull Download all service images (recommended before first run in prod mode)
Data & Indexing
make product Load product definitions into the database (describes available datasets)
make index Index example data for the selected area/time (uses BBOX and DATETIME)
make index-parallel Index data using the automated script (recommended)
make index-serie Index data step-by-step (older method; slower)
make update-explorer Rebuild the Explorer index so datasets appear in the web UI
Maintenance
make backup Create a backup of the PostgreSQL database
make restore Restore PostgreSQL database from a backup file (requires BACKUP_FILE and CONFIRM=1)
make clean Stop everything and remove containers, volumes, and built images
make purge-data Delete local data in ./data (pg, local_data, shared). Irreversible; requires CONFIRM=1
make purge-user Remove a specific user container and volume. Irreversible; requires HUB_USER and CONFIRM=1
make purge-users Remove all spawned JupyterHub user containers and volumes. Irreversible; requires CONFIRM=1
Advanced
make release-push Build and push multi-architecture production images to the configured container registry
make help Show available commands
Common usage patterns
  • First-time setup (default parameters):

    make setup
  • Setup with a specific area/time (BBOX, DATETIME):

    # Switzerland 1 year
    make setup BBOX=5.95,45.81,10.50,47.81 DATETIME=2024-01-01/2024-12-31
    
    # Switzerland all years (till end 2025, might take a while)
    make setup BBOX=5.95,45.81,10.50,47.81 DATETIME=1984-01-01/2025-12-31
  • Start/stop and troubleshoot:

    make up
    make status
    make logs
    make down
  • Reset options (use with care):

    # Stop everything and remove containers/volumes/images
    make clean
    
    # Irreversible: delete local data in ./data (requires confirmation)
    make purge-data CONFIRM=1
    
    # ⚠️ TOTAL WIPE OUT (USE WITH EXTRA CARE !!!)
    make clean && make purge-data CONFIRM=1 && make purge-users CONFIRM=1
    # then check eventual remains
    docker ps -a && docker images -a && docker volume ls && ls -la ./data
  • Dev mode (local builds):

    # Set dev mode for the entire session
    export MODE=dev
    make *
    
    # Go back to prod mode
    unset MODE
    
    # One-off dev invocation (not recommended as it might requires to be repeated in several commands)
    make up MODE=dev

Access to applications

  • JupyterHub is available on: http://<DOMAIN>/jupyter/ (Use NativeAuthenticator for login - admin users defined in JUPYTERHUB_ADMINS)
  • Explorer is available on: http://<DOMAIN>/explorer

Documentation

Detailed documentation is available in the docs/ directory (built using Quarto):

Architecture and Integration

Reverse Proxy and Routing

This stack uses Traefik v3 as a reverse proxy. It handles routing based on the hostname (DOMAIN) and path prefixes (/jupyter, /explorer). Traefik also manages the internal docker network for service communication.

User Spawning (Docker-out-of-Docker)

JupyterHub uses the DockerSpawner with a Docker-out-of-Docker (DooD) pattern. The JupyterHub container has access to the host's /var/run/docker.sock, allowing it to spawn user notebook containers directly on the host machine. This ensures that user environments are isolated and can be managed by standard Docker tools.

Dask Integration

The environment is pre-configured for Dask parallel processing. The distributed.yaml file ensures that the Dask dashboard is accessible through the Jupyter proxy at /jupyter/proxy/{port}/status.

Shared Directory

This directory is shared among users of the JupyterHub instance.

Purpose

The primary purpose of this folder is to facilitate file sharing and collaboration between users.

The /notebooks/shared directory contains:

  1. Static Content: Any file or directory in the ./shared folder on the host is mounted here as Read-Only at the exception of the user own folder.
  2. User Folders: The all_users directory contains individual user folders.
Editing and saving shared folders and files
  • Static Content: Files directly under /notebooks/shared/ (from the host ./shared folder) are read-only for everyone in the Jupyter interface. To modify them, an admin must edit them on the host machine.
  • User Folders: Under /notebooks/shared/all_users/, you can see other users' folders (read-only) and your own folder (read-write). This allows you to copy notebooks from others but not modify their work directly.

To work on a shared notebook, copy it to your own workspace e.g.:

  • cp -r /notebooks/shared/notebooks_demo ~/my_notebooks_demo
  • cp /notebooks/shared/all_users/alice/analysis.ipynb ~/from_alice.ipynb
Important Notes
  • Visibility: Content in this folder is visible to all users.
  • Data Safety: Do not place sensitive credentials or private data in this directory.

Backup and Restore

Creating a backup

To create a backup of your PostgreSQL database:

make backup

This will create a timestamped SQL dump file in the ./backups directory (e.g., ./backups/opendatacube_20260121_141530.sql).

Restoring from a backup

To restore a database from a backup file:

make restore BACKUP_FILE=./backups/opendatacube_20260121_141530.sql CONFIRM=1

⚠️ WARNING: Restoring will overwrite your current database. Make sure you have a recent backup before proceeding.

Volume backup procedures

The following directories contain persistent data and should be backed up regularly:

  • ./data/pg/ - PostgreSQL database files
  • ./data/local_data/ - Local data cache
  • ./data/jupyterhub_data/ - JupyterHub configuration and user data
  • User notebooks are stored in Docker volumes named jupyterhub-user-<username>

Manual volume backup:

# Backup user notebooks
docker run --rm -v jupyterhub-user-<username>:/source -v $(pwd)/backups:/backup alpine tar czf /backup/user-<username>-notebooks.tar.gz -C /source .

# Backup all data directories
tar czf backups/data-backup-$(date +%Y%m%d).tar.gz ./data/

Restore user notebooks:

# Restore user notebooks
docker run --rm -v jupyterhub-user-<username>:/target -v $(pwd)/backups:/backup alpine tar xzf /backup/user-<username>-notebooks.tar.gz -C /target

License

This project is licensed under the MIT License.

Copyright (c) 2018 Alex Leith Copyright © 2025 UNIGE/GRID-Geneva

You are free to use, modify, and distribute this software under the terms of the MIT License. For more details, see the full license text: MIT.

About

A reference deployment of the Open Data Cube.

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
license-MIT.png

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.5%
  • Python 1.4%
  • Makefile 0.1%
  • Dockerfile 0.0%
  • Shell 0.0%
  • HTML 0.0%