diff --git a/.claude/hooks/approve-commands.sh b/.claude/hooks/approve-commands.sh new file mode 100755 index 00000000..3446a913 --- /dev/null +++ b/.claude/hooks/approve-commands.sh @@ -0,0 +1,37 @@ +#!/bin/bash + +# This hook automatically approves known-good commands +# Hook receives tool call details via stdin as JSON +# +# Any command containing one of these patterns is approved. + +APPROVED_PATTERNS=( + 'ls' + 'find' + 'playwright-cli' + 'uv run make' +) + +# Read the tool call data +input=$(cat) + +# PreToolUse input uses tool_name and tool_input (see https://code.claude.com/docs/en/hooks) +tool_name=$(echo "$input" | jq -r '.tool_name // empty') +command=$(echo "$input" | jq -r '.tool_input.command // empty') + +if [[ "$tool_name" != "Bash" ]]; then + exit 10 +fi + +for pattern in "${APPROVED_PATTERNS[@]}"; do + if [[ "$command" == *"$pattern"* ]]; then + reason="Command matches approved pattern: $pattern" + jq -n \ + --arg reason "$reason" \ + '{ hookSpecificOutput: { hookEventName: "PreToolUse", permissionDecision: "allow", permissionDecisionReason: $reason } }' + exit 0 + fi +done + +# Exit 10 means "no decision" - defer to normal approval flow +exit 10 diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 00000000..3f671526 --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,16 @@ +{ + "$schema": "https://json.schemastore.org/claude-code-settings.json", + "hooks": { + "PreToolUse": [ + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": ".claude/hooks/approve-commands.sh" + } + ] + } + ] + } +} diff --git a/.claude/skills/playwright-cli/SKILL.md b/.claude/skills/playwright-cli/SKILL.md new file mode 100644 index 00000000..d937bc87 --- /dev/null +++ b/.claude/skills/playwright-cli/SKILL.md @@ -0,0 +1,182 @@ +--- +name: playwright-cli +description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. +allowed-tools: Bash(playwright-cli:*) +--- + +# Browser Automation with playwright-cli + +## Quick start + +```bash +playwright-cli open https://playwright.dev +playwright-cli click e15 +playwright-cli type "page.click" +playwright-cli press Enter +``` + +## Core workflow + +1. Navigate: `playwright-cli open https://example.com` +2. Interact using refs from the snapshot +3. Re-snapshot after significant changes + +## Commands + +### Core + +```bash +playwright-cli open https://example.com/ +playwright-cli close +playwright-cli type "search query" +playwright-cli click e3 +playwright-cli dblclick e7 +playwright-cli fill e5 "user@example.com" +playwright-cli drag e2 e8 +playwright-cli hover e4 +playwright-cli select e9 "option-value" +playwright-cli upload ./document.pdf +playwright-cli check e12 +playwright-cli uncheck e12 +playwright-cli snapshot +playwright-cli eval "document.title" +playwright-cli eval "el => el.textContent" e5 +playwright-cli dialog-accept +playwright-cli dialog-accept "confirmation text" +playwright-cli dialog-dismiss +playwright-cli resize 1920 1080 +``` + +### Navigation + +```bash +playwright-cli go-back +playwright-cli go-forward +playwright-cli reload +``` + +### Keyboard + +```bash +playwright-cli press Enter +playwright-cli press ArrowDown +playwright-cli keydown Shift +playwright-cli keyup Shift +``` + +### Mouse + +```bash +playwright-cli mousemove 150 300 +playwright-cli mousedown +playwright-cli mousedown right +playwright-cli mouseup +playwright-cli mouseup right +playwright-cli mousewheel 0 100 +``` + +### Save as + +```bash +playwright-cli screenshot +playwright-cli screenshot e5 +playwright-cli pdf +``` + +### Tabs + +```bash +playwright-cli tab-list +playwright-cli tab-new +playwright-cli tab-new https://example.com/page +playwright-cli tab-close +playwright-cli tab-close 2 +playwright-cli tab-select 0 +``` + +### DevTools + +```bash +playwright-cli console +playwright-cli console warning +playwright-cli network +playwright-cli run-code "async page => await page.context().grantPermissions(['geolocation'])" +playwright-cli tracing-start +playwright-cli tracing-stop +playwright-cli video-start +playwright-cli video-stop video.webm +``` + +### Configuration + +```bash +# Configure the session +playwright-cli config --config my-config.json +playwright-cli config --headed --isolated --browser=firefox +# Configure named session +playwright-cli --session=mysession config my-config.json +# Start with configured session +playwright-cli open --config=my-config.json +``` + +### Sessions + +```bash +playwright-cli --session=mysession open example.com +playwright-cli --session=mysession click e6 +playwright-cli session-list +playwright-cli session-stop mysession +playwright-cli session-stop-all +playwright-cli session-delete +playwright-cli session-delete mysession +``` + +## Example: Form submission + +```bash +playwright-cli open https://example.com/form +playwright-cli snapshot + +playwright-cli fill e1 "user@example.com" +playwright-cli fill e2 "password123" +playwright-cli click e3 +playwright-cli snapshot +``` + +## Example: Multi-tab workflow + +```bash +playwright-cli open https://example.com +playwright-cli tab-new https://example.com/other +playwright-cli tab-list +playwright-cli tab-select 0 +playwright-cli snapshot +``` + +## Example: Debugging with DevTools + +```bash +playwright-cli open https://example.com +playwright-cli click e4 +playwright-cli fill e7 "test" +playwright-cli console +playwright-cli network +``` + +```bash +playwright-cli open https://example.com +playwright-cli tracing-start +playwright-cli click e4 +playwright-cli fill e7 "test" +playwright-cli tracing-stop +``` + +## Specific tasks + +- **Request mocking** [references/request-mocking.md](references/request-mocking.md) +- **Running Playwright code** [references/running-code.md](references/running-code.md) +- **Session management** [references/session-management.md](references/session-management.md) +- **Storage state (cookies, localStorage)** [references/storage-state.md](references/storage-state.md) +- **Test generation** [references/test-generation.md](references/test-generation.md) +- **Tracing** [references/tracing.md](references/tracing.md) +- **Video recording** [references/video-recording.md](references/video-recording.md) diff --git a/.claude/skills/release-testing/SKILL.md b/.claude/skills/release-testing/SKILL.md new file mode 100644 index 00000000..a0656074 --- /dev/null +++ b/.claude/skills/release-testing/SKILL.md @@ -0,0 +1,40 @@ +--- +name: release-testing +description: Test documentation instructions by working through the page step by step and performing actions using the playwright-cli. Use when the user asks you to test a specific page. +allowed-tools: Bash(playwright-cli:*) +--- + +# Testing documentation pages + +When testing documentation pages you should: + +- Build a fresh version of the nightly documentation. +- Locate the page you have been asked to test within the `build` directory (use the `*.html.md` version of the page). +- Read the instructions on the page. +- Check the parent directory for another project called `deployment-internal` as it contains additional instructions. + - Additional private instructions for using NVIDIA systems are stored in markdown files in this project. + - If you cannot find this project than ask the user for it's location. + - Before you start testing a page review any related pages in `deployment-internal`. +- Use the `playwright-cli` tool to drive a browser and follow the instructions (use a headed browser). +- Provide a report to the user, include: + - Full list of compute resources you created. + - Any problems you found. + +## Goals + +- Follow every step of the instructions in the documentation. +- Track any differences between the documentation and the real world experience. +- Verify that RAPIDS is installed successfully. + +## Instructions + +Most of the instructions focus on creating cloud infrastructure by clicking through GUIs on third-party cloud platforms. Some things to focus on: + +- If you need to authenticate ask the user to do this for you. +- If documentation has both GUI and CLI instructions focus on the GUI instructions. +- Wait for compute resources to launch before continuing. +- Buttons and menus may change and move around, if you cannot find the exact thing try and find a way to do the same action, note this difference in your report. +- If a deployment fails or you get an error message ask the user for assistance. +- If you repeat steps over and over without making progress ask the user for help. +- Every page should end with a step that verifies RAPIDS is installed and can be used. If this is missing note it in your report. +- Before performing destructuve actions like deleting a resource always ask the user for confirmation first diff --git a/.gitignore b/.gitignore index 036b3150..07354b83 100644 --- a/.gitignore +++ b/.gitignore @@ -25,6 +25,8 @@ __pycache__ cufile.log node_modules/ jupyter_execute/ +.playwright-cli/ +.claude/*.local.* # files manually written by example code source/examples/rapids-azureml-hpo/Dockerfile diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..038aceef --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,191 @@ +# AGENTS.md + +This file provides guidance to coding agents when working with code in this repository. + +## Project Overview + +This is the **RAPIDS Deployment Documentation** repository - a Sphinx-based documentation site that provides deployment guides, platform-specific instructions, and workflow examples for running RAPIDS (GPU-accelerated data science libraries) across various cloud platforms, HPC environments, and containerized systems. + +## Build Commands + +### Development + +```bash +# Install dependencies (creates .venv and installs from uv.lock) +uv venv && uv sync --locked + +# Build and auto-reload docs with live server at http://127.0.0.1:8000 +uv run sphinx-autobuild -b dirhtml source build/html + +# Clear build cache +uv run make clean + +# Build static site to build/html +uv run make dirhtml + +# Build with warnings as errors (CI mode) +uv run make dirhtml SPHINXOPTS="-W --keep-going -n" +``` + +### Dependency Management + +```bash +# Upgrade dependencies (updates uv.lock) +uv lock --upgrade +``` + +## Linting and Pre-commit + +This project uses **pre-commit** to enforce code quality automatically: + +```bash +# Install pre-commit hooks +pre-commit install + +# Run linters manually on all files +pre-commit run --all-files +``` + +Linters used: + +- **black** and **black-jupyter** - Python code formatting +- **prettier** - Markdown, JSON, YAML formatting +- **markdownlint** - Markdown style consistency +- **ruff** - Python linting (pycodestyle, pyflakes, isort, pyupgrade, flake8-bugbear) +- **shellcheck** - Shell script validation +- **codespell** - Spell checking + +## Repository Architecture + +### Documentation Structure + +```text +source/ +├── cloud/ # Cloud provider deployment guides (AWS, Azure, GCP, IBM, NVIDIA) +├── platforms/ # Platform integrations (Databricks, Kubeflow, Snowflake, etc.) +├── examples/ # Jupyter notebook workflow examples (galleries with tags) +├── guides/ # Technical guides (MIG, InfiniBand, scheduler optimizations) +├── tools/ # Documentation for dask-cuda and Kubernetes tools +├── developer/ # Developer resources (CI/CD with RAPIDS) +├── _includes/ # Reusable markdown snippets +├── _static/ # Static assets (CSS, JS, images) +├── _templates/ # Jinja2 templates for Sphinx +└── conf.py # Sphinx configuration +``` + +### Custom Sphinx Extensions + +Located in `extensions/` directory. These are critical to the documentation system: + +1. **rapids_version_templating.py** - Jinja2 templating for version substitution + - Allows `{{ rapids_container }}` or `~~~rapids_api_docs_version~~~` in docs + - Pulls versions from `conf.py` `versions` dict + - Use `{{ ... }}` for text, `~~~...~~~` for URLs + +2. **rapids_related_examples.py** - Notebook gallery and cross-linking system + - Reads tags from first cell of Jupyter notebooks + - Tags are hierarchical (e.g., `cloud/aws/sagemaker`) + - Creates `relatedexamples` directive to show notebooks with matching tags + - Powers the example gallery with filtering by tag namespace + +3. **rapids_notebook_files.py** - Discovers supporting files for notebooks + - Auto-lists Dockerfiles, scripts, configs alongside notebooks + +4. **rapids_grid_toctree.py** - Grid layout for table of contents + +5. **rapids_admonitions.py** - Custom admonitions like `docref` + +### Notebook Examples System + +Example notebooks live in `source/examples/{example-name}/notebook.ipynb`: + +**Critical Requirements:** + +- First cell must be markdown with at least one `#` heading +- Add hierarchical tags to first cell metadata (e.g., `cloud/aws/eks`, `tools/dask`) +- Tags create bidirectional links: notebooks appear on tagged doc pages, doc pages appear on notebook pages +- Add notebook path to `notebookgallerytoctree` in `source/examples/index.md` +- Supporting files (Dockerfiles, scripts) go in same directory and are auto-discovered + +**Tag Organization:** + +- Root namespace becomes filter category (e.g., `cloud`, `platform`, `tools`) +- Full tag path shows in UI (e.g., `cloud/aws/sagemaker`) +- Keep root namespaces consistent to avoid UI clutter +- Custom tag CSS can be added in `source/_static/css/custom.css` as `.tag-{name}` + +### Version Management + +The `versions` dict in `source/conf.py` manages RAPIDS versions: + +```python +versions = { + "stable": { + "rapids_version": "25.12", + "rapids_container": "nvcr.io/nvidia/rapidsai/base:25.12-cuda12-py3.13", + ... + }, + "nightly": { + "rapids_version": "26.02", + "rapids_container": "rapidsai/base:26.02a-cuda12-py3.13", + ... + } +} +``` + +- Builds use `nightly` by default (local dev, PR previews, main branch) +- Set `DEPLOYMENT_DOCS_BUILD_STABLE=true` to use `stable` (done automatically on tag builds) +- Before release: update both `stable` (new release) and `nightly` (next version) + +## Releasing + +Docs are continuously deployed via `.github/workflows/build-and-deploy.yml`: + +- **main branch** → `deployment/nightly` at docs.rapids.ai +- **tags** → `deployment/stable` at docs.rapids.ai + +To release: + +```bash +export RELEASE=x.x.x # e.g., 25.12.0 (see https://docs.rapids.ai/resources/versions/) + +# Update versions in source/conf.py first, then: +git commit --allow-empty -m "Release $RELEASE" +git tag -a $RELEASE -m "Version $RELEASE" +git push upstream --tags +``` + +## Writing Guidelines + +### Markdown Style + +- Use **MyST Markdown** (not reStructuredText) +- Follow [Kubernetes style guide](https://kubernetes.io/docs/contribute/style/style-guide/) for API objects (use `UpperCamelCase`: `Pod`, `Deployment`, not `pod`) +- Use `console` blocks for commands with output (start lines with `$`) +- Use `bash` blocks for scripts or commands without output +- Add custom `docref` admonitions to link to related pages: + +````markdown +```{docref} /cloud/gcp/gke +For detailed GKE setup, see the documentation. +``` +```` + +### Code Formatting + +- Python notebooks: formatted with **black-jupyter** (line length 120) +- Python extensions: must pass **ruff** checks +- Never include sensitive info (API keys, tokens) in examples + +## CI/CD + +- **pre-commit.yml** - Runs linters on all PRs +- **build-and-deploy.yml** - Builds docs and deploys to S3/CloudFront on push to main or tags + +Test your changes match CI expectations by running: + +```bash +uv run make clean && uv run make dirhtml SPHINXOPTS="-W --keep-going -n" +``` + +This treats warnings as errors (`-W`), continues on errors (`--keep-going`), and enables nitpicky mode (`-n`). diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 00000000..47dc3e3d --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/Makefile b/Makefile index d0c3cbf1..899a0054 100644 --- a/Makefile +++ b/Makefile @@ -12,6 +12,9 @@ BUILDDIR = build help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) +clean: + rm -rf build + .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new diff --git a/README.md b/README.md index e8993e68..45abd209 100644 --- a/README.md +++ b/README.md @@ -146,6 +146,40 @@ prettier.................................................................Passed markdownlint.............................................................Passed ``` +## Agentic testing + +Testing documentation can be tricky. When it comes to these documentation pages there are a few things we often want to check: + +1. Do the docs pages build successfully? +2. Can RAPIDS be installed on the Cloud platforms we are documenting? +3. Do the instructions in the documentation sufficiently guide a user through running RAPIDS on their preferred cloud platform? + +Items 1 and 2 are relatively easy to automate. We can check builds in CI and we can verify cloud platforms using scripts or terraform configurations. But item 3 is much harder to automate. + +However, coding agents are the perfect solution to this with a couple of extra tools. + +In `.claude/skills` you will find a couple of skills which allow you to test individual pages. It works by using [playwright](https://github.com/microsoft/playwright) to drive a browser and carry out the steps in the documentation page. + +The goal here is for human-in-the-loop testing. You may not want to automate every action, and you may want the agent to defer to the user in case of failure. The goal is to kick off an agent and keep an eye on it while it runs through the documentation, requiring minimal effort from the user. + +### Example + +You can kick off a testing job using the `/release-testing` skill, just tell it which page you want to test. + +```bash +claude "/release-testing test the AWS EC2 page" +``` + +The agent will take the following actions: + +- Rebuild a local copy of the nightly documentation +- Locate the page you asked it to test +- Open a fresh Chrome web browser with `playwright-cli` +- Follows the instructions in the docs page and prompts you if it gets stuck +- Once complete it will give you a report of what it found and if it has recommendations for updating the docs + +**Note: It will not perform destructive tasks like deleting resources so you will need to do this manually.** + ## Releasing This repository is continuously deployed to the [nightly docs at docs.rapids.ai](https://docs.rapids.ai/deployment/nightly/) via the [build-and-deploy](https://github.com/rapidsai/deployment/blob/main/.github/workflows/build-and-deploy.yml) workflow. All commits to main are built to static HTML and pushed to the [`deployment/nightly` subdirectory in the rapidsai/docs repo](https://github.com/rapidsai/docs/tree/gh-pages/deployment) which in turn is published to GitHub Pages.