From 5290a6c8c3b8da6de4f96d0400853052adae3672 Mon Sep 17 00:00:00 2001
From: Vineeth N K <vineethkrishnan007@gmail.com>
Date: Sun, 14 Jun 2026 21:35:01 +0530
Subject: [PATCH] docs: capture deployment gotchas (TLS defaults, app-key
 match, local embeddings)

Fold the issues hit during a real self-hosted bring-up into the docs and troubleshooting.

- docker-deployment: correct the stale "migrations are manual" section (the entrypoint auto-runs them now) and add a TLS section covering DATABASE_SSL/REDIS_TLS with the bundled plain Postgres/Redis.
- configuration reference: add DATABASE_SSL, REDIS_TLS, EMBEDDING_CACHE_DIR, the agent LLM provider, local embedding defaults, and clarify that GITHUB_PRIVATE_KEY is key contents (real newlines) belonging to the same App that is installed.
- llm-providers: add an Embeddings (PR memory) section documenting voyage vs local, dimensions, and the glibc-image requirement.
- faq + setup-walkthrough: troubleshooting for the crash-loop from production TLS defaults, the 404 installation-token error from mismatched App credentials, the empty-installation case, and "Indexed 0 comments" being expected on repos without review history.
---
 docs-site/guide/docker-deployment.md | 30 +++++++++++++++-------
 docs-site/guide/faq.md               | 38 ++++++++++++++++++++++++++++
 docs-site/guide/llm-providers.md     | 22 ++++++++++++++++
 docs-site/guide/setup-walkthrough.md | 25 ++++++++++++++++--
 docs-site/reference/configuration.md | 29 ++++++++++++++-------
 5 files changed, 124 insertions(+), 20 deletions(-)

diff --git a/docs-site/guide/docker-deployment.md b/docs-site/guide/docker-deployment.md
index a53fe41..86f1737 100644
--- a/docs-site/guide/docker-deployment.md
+++ b/docs-site/guide/docker-deployment.md
@@ -20,25 +20,37 @@ docker compose up -d
 
 ## Database Migrations
 
-The container does NOT auto-run migrations. You need to run them yourself the first time you bring the stack up, and again after every release that ships a new migration.
+Migrations run **automatically on container start**. The image entrypoint runs `typeorm migration:run` before the app boots, so a fresh database is schema-ready on the first `docker compose up -d`, and any new migration in a release is applied when the new image starts. You'll see it in the logs:
 
-On a fresh database, after `docker compose up -d`:
+```
+[entrypoint] Running TypeORM migrations...
+... No migrations are pending
+[entrypoint] Starting application...
+```
+
+For zero-downtime deploys with a **breaking** migration, apply it out-of-band before rolling the new image:
 
 ```bash
 docker compose run --rm app npm run migration:run
 ```
 
-After each release that includes a new migration, run the same command again before the new image starts serving traffic.
+Additive (non-breaking) migrations are safe to let the entrypoint apply during a normal rolling deploy.
 
-For production, the safest order is:
+## Connecting to Postgres and Redis (TLS)
+
+In production (`NODE_ENV=production`, which the shipped compose sets), ClearPR **defaults to requiring TLS** for both Postgres and Redis. The bundled `db` and `redis` services are plain (no TLS), so if you use them as-is you must turn TLS off explicitly or the app crash-loops on boot:
+
+```env
+DATABASE_SSL=false
+REDIS_TLS=false
+```
 
-1. Stop the app: `docker compose stop app`
-2. Run migrations: `docker compose run --rm app npm run migration:run`
-3. Start the app: `docker compose start app`
+Symptoms if you forget:
 
-If your migrations are non-breaking (additive only), a blue/green or rolling deploy works too: run the migration first, then roll the new image.
+- Postgres: `Error: The server does not support SSL connections` (app restarts in a loop)
+- Redis: repeated `ioredis ... connect ETIMEDOUT` on a TLS socket, and `/health/ready` hangs
 
-Future: a parallel PR (`feat(docker): ...`) is adding migration-on-startup; once that lands, this manual step goes away.
+Leave them at the production default (TLS on) only when your Postgres/Redis actually terminate TLS, e.g. a managed database or `rediss://` endpoint.
 
 ## Services
 
diff --git a/docs-site/guide/faq.md b/docs-site/guide/faq.md
index 101953d..0ef9a16 100644
--- a/docs-site/guide/faq.md
+++ b/docs-site/guide/faq.md
@@ -24,6 +24,22 @@ Only to the LLM provider you configure (e.g., Anthropic, OpenAI). Source code is
 
 ## Setup Issues
 
+### The app container keeps restarting on startup
+
+Check `docker compose logs app`. Two common causes, both from production TLS defaults clashing with the bundled (plain) database/redis:
+
+- `Error: The server does not support SSL connections` — Postgres. Set `DATABASE_SSL=false`.
+- `ioredis ... connect ETIMEDOUT` on a TLS socket / `/health/ready` hangs — Redis. Set `REDIS_TLS=false`.
+
+The shipped compose runs plain Postgres/Redis but sets `NODE_ENV=production`, which turns both TLS requirements on by default. Add to `.env`:
+
+```env
+DATABASE_SSL=false
+REDIS_TLS=false
+```
+
+Then `docker compose up -d --force-recreate app`. Only keep them on when your database/redis actually terminate TLS.
+
 ### ClearPR isn't receiving webhooks
 
 1. **Check the webhook URL** — it must be reachable from GitHub's servers. Use `curl https://your-domain/health/live` from an external machine.
@@ -49,6 +65,18 @@ The response shows the status of each subsystem:
 3. **Check the installation** — the repo must be included in the GitHub App installation. Go to Settings > GitHub Apps > Configure on the installed app.
 4. **Check the queue** — `curl http://localhost:3000/health` shows queue depths. If `reviews.failed` is high, jobs are failing.
 
+### Webhooks return 200 but reviews and indexing never run (404 on installation token)
+
+If logs show `Failed to index repository: Not Found - .../create-an-installation-access-token-for-an-app` (or reviews dispatch but do nothing), your `GITHUB_APP_ID` + `GITHUB_PRIVATE_KEY` belong to a **different GitHub App** than the one that's installed and sending webhooks. The webhook still verifies (the secret is configured per-webhook, independent of the App key), but ClearPR can't mint an installation token for an installation that isn't under that App, so GitHub returns 404.
+
+Fix: use the App ID **and** a private key from the *same* App whose webhook points at your ClearPR URL. To confirm which App a key belongs to, mint a JWT and call `GET https://api.github.com/app`, the returned `slug` is the App. Regenerate a key on the correct App's settings page if needed, update `.env`, and recreate the app container.
+
+### Installed the App but ClearPR has no record of it (empty `installations`/`repositories`)
+
+`installations`, `repositories`, and the memory index are only populated when the `installation.created` event is processed. If you installed the App **before** the webhook secret and App credentials were correct, that event was rejected (401/404) and dropped.
+
+Fix: make the webhook secret and App credentials correct first, then **re-install** the App (or replay the `installation.created` delivery from the App's Advanced > Recent Deliveries tab). That registers the installation and enqueues indexing.
+
 ### Duplicate reviews on the same PR
 
 ClearPR has a 30-second debounce window. If you push multiple commits within 30 seconds, only the latest SHA is reviewed. If you're still seeing duplicates, check that your webhook isn't configured to send to multiple URLs.
@@ -117,3 +145,13 @@ The webhook is acknowledged in < 500ms. The review runs asynchronously in the ba
 ### Memory usage is growing
 
 The PR memory system stores one embedding (~2 KB) per review comment from merged PRs. At 10,000 entries per repo, this is roughly 20 MB. If storage is a concern, reduce `HISTORY_DEPTH` (default: 200 merged PRs indexed).
+
+### PR memory isn't flagging repeat issues / "Indexed 0 comments"
+
+PR memory learns from **past human review comments on merged PRs**. If indexing logs `Indexed 0 comments from <repo>` and `pr_memory` stays empty, the repo simply has no review-comment history to learn from (common for solo or new repos), this is expected, not a failure. The semantic diff and AI review still work fully; only the "similar to PR #X" hints are absent.
+
+Memory fills in two ways: the on-install backfill of repos that *do* have review history, and accumulation over time as feedback on new PRs gets accepted. To see the backfill populate immediately, install on a repo with real past PR review discussions.
+
+### Indexing failed for every repo
+
+Re-check the App credentials, see "Webhooks return 200 but reviews and indexing never run (404 ...)" above. Indexing runs once at install; if it failed (e.g. wrong App key at the time), the repos stay `failed` and don't auto-retry. Fix the credentials, then re-install (or re-scope the installation) to re-enqueue indexing.
diff --git a/docs-site/guide/llm-providers.md b/docs-site/guide/llm-providers.md
index 7ae81a5..0accfd8 100644
--- a/docs-site/guide/llm-providers.md
+++ b/docs-site/guide/llm-providers.md
@@ -99,6 +99,28 @@ LLM_PROVIDER=openai
 LLM_MODEL=gpt-4-turbo
 ```
 
+## Embeddings (PR memory)
+
+Separate from the LLM, the PR-memory feature embeds past review comments so it can flag repeat issues. Pick the embedding provider with `EMBEDDING_PROVIDER`:
+
+| Provider | `EMBEDDING_PROVIDER` | Default model | Dimensions | API key |
+|---|---|---|---|---|
+| Voyage AI | `voyage` | `voyage-3-lite` | 512 | Yes (`VOYAGE_API_KEY`) |
+| Local | `local` | `Xenova/all-MiniLM-L6-v2` | 384 | No |
+
+**Local** runs a sentence-transformers model in-process via transformers.js, no API key, fully on-box. It downloads the model once (cache it on a volume with `EMBEDDING_CACHE_DIR`):
+
+```env
+EMBEDDING_PROVIDER=local
+EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
+EMBEDDING_DIMENSIONS=384
+EMBEDDING_CACHE_DIR=/app/models
+```
+
+::: warning
+`EMBEDDING_DIMENSIONS` must match the model (512 for `voyage-3-lite`, 384 for `all-MiniLM-L6-v2`). Local embeddings require a glibc-based image (the shipped image is `node:slim`); they will not load on Alpine. If you leave `EMBEDDING_PROVIDER` unset/`voyage` with no key, PR memory is silently skipped and the rest of the review still works.
+:::
+
 ## Architecture
 
 All providers extend the same `LlmProviderPort` abstract class. The `LlmProviderRegistry` selects the right adapter at startup based on `LLM_PROVIDER`. Adding a new provider means creating one adapter file - no changes to domain logic.
diff --git a/docs-site/guide/setup-walkthrough.md b/docs-site/guide/setup-walkthrough.md
index 169f308..79e5ec0 100644
--- a/docs-site/guide/setup-walkthrough.md
+++ b/docs-site/guide/setup-walkthrough.md
@@ -21,7 +21,7 @@ Estimated time: 20-30 minutes.
 ::: warning Pick a strong model for real reviews
 Small local models (under ~14B parameters) miss real bugs and produce confident false positives. They're fine for verifying the pipeline is wired up correctly, but **switch to Claude Sonnet 4 or GPT-4o before pointing the bot at PRs you actually care about**. See [Choosing an LLM](./choosing-an-llm) for the full breakdown.
 :::
-| **Voyage AI API key** | For PR memory (similarity search on past comments). [Get one from dash.voyageai.com](https://dash.voyageai.com). Optional: leave unset and the memory feature is silently skipped, the rest of the review still works. |
+| **Embeddings for PR memory** | Optional. Either a [Voyage AI key](https://dash.voyageai.com), or set `EMBEDDING_PROVIDER=local` to run embeddings on-box with no key. Leave unset and the memory feature is silently skipped, the rest of the review still works. See [LLM Providers → Embeddings](./llm-providers#embeddings-pr-memory). |
 
 ## Step 1: Run ClearPR with Docker
 
@@ -34,15 +34,28 @@ curl -O https://raw.githubusercontent.com/vineethkrishnan/clearpr/main/docker-co
 curl -o .env https://raw.githubusercontent.com/vineethkrishnan/clearpr/main/.env.example
 ```
 
-Open `.env` in an editor. Don't fill in `GITHUB_*` yet - we get those from the GitHub App in Step 2. For now just set the LLM keys:
+Open `.env` in an editor. Don't fill in `GITHUB_*` yet - we get those from the GitHub App in Step 2. For now set the LLM keys, and (because the bundled `db`/`redis` are plain while `NODE_ENV=production`) turn off the production TLS requirement so the app can connect:
 
 ```env
 LLM_PROVIDER=anthropic
 LLM_API_KEY=sk-ant-...
 
+# Bundled Postgres/Redis have no TLS — required, or the app crash-loops on boot
+DATABASE_SSL=false
+REDIS_TLS=false
+
+# PR memory embeddings: either a Voyage key, or run it locally with no key:
 VOYAGE_API_KEY=pa-...
+# EMBEDDING_PROVIDER=local
+# EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
+# EMBEDDING_DIMENSIONS=384
+# EMBEDDING_CACHE_DIR=/app/models
 ```
 
+::: warning
+If you skip `DATABASE_SSL=false` / `REDIS_TLS=false` with the bundled services, Step 3 will fail with `The server does not support SSL connections` (or a Redis timeout) and the container will restart in a loop.
+:::
+
 Pin the image version (skip `:latest` for production):
 
 ```bash
@@ -298,6 +311,14 @@ Use the `https://smee.io/aBcDeFgHiJ123` URL as the webhook URL when creating the
 
 ## Troubleshooting
 
+### App container restarts in a loop on first start
+
+With the bundled (plain) `db`/`redis` and `NODE_ENV=production`, ClearPR defaults to requiring TLS and can't connect. Logs show `Error: The server does not support SSL connections` (Postgres) or `ioredis ... ETIMEDOUT` (Redis). Add `DATABASE_SSL=false` and `REDIS_TLS=false` to `.env`, then `docker compose up -d --force-recreate app`.
+
+### Reviews/indexing fail with "Not Found" on an installation token
+
+If logs show `Failed to index repository: Not Found - .../create-an-installation-access-token-for-an-app`, your `GITHUB_APP_ID` + `GITHUB_PRIVATE_KEY` are from a **different App** than the one installed. The webhook still passes (the secret is per-webhook), but ClearPR can't mint a token for that installation. Use the App ID and a private key from the *same* App whose webhook points at your ClearPR URL. Confirm an App key with a JWT call to `GET https://api.github.com/app` (the `slug` tells you which App it is).
+
 ### `health/ready` returns 503
 
 Check which subsystem is down:
diff --git a/docs-site/reference/configuration.md b/docs-site/reference/configuration.md
index 58083f6..860e8e3 100644
--- a/docs-site/reference/configuration.md
+++ b/docs-site/reference/configuration.md
@@ -6,9 +6,9 @@ All configuration is via environment variables. Set them in `.env` or pass direc
 
 | Variable | Description |
 |---|---|
-| `GITHUB_APP_ID` | GitHub App ID |
-| `GITHUB_PRIVATE_KEY` | Path to `.pem` file or key content |
-| `GITHUB_WEBHOOK_SECRET` | HMAC secret for webhook verification |
+| `GITHUB_APP_ID` | GitHub App ID. Must belong to the **same** App that is installed and sending webhooks (see troubleshooting if reviews/indexing fail with a 404). |
+| `GITHUB_PRIVATE_KEY` | The private key **contents** (PEM) of that same App, with real newlines, not a file path. In `.env`, the simplest reliable form is a double-quoted value spanning multiple lines. |
+| `GITHUB_WEBHOOK_SECRET` | HMAC secret for webhook verification. Must match the secret configured on the GitHub App's webhook exactly. |
 | `LLM_API_KEY` | API key for the selected LLM provider (not required for Ollama) |
 | `DATABASE_URL` | PostgreSQL connection string |
 | `REDIS_URL` | Redis connection string |
@@ -17,18 +17,23 @@ All configuration is via environment variables. Set them in `.env` or pass direc
 
 | Variable | Default | Description |
 |---|---|---|
-| `LLM_PROVIDER` | `anthropic` | `anthropic`, `openai`, `ollama`, `mistral`, `gemini` |
+| `LLM_PROVIDER` | `anthropic` | `anthropic`, `openai`, `ollama`, `mistral`, `gemini`, `agent` |
 | `LLM_MODEL` | (per provider) | Model ID override |
-| `LLM_BASE_URL` | (per provider) | Custom API base URL |
+| `LLM_BASE_URL` | (per provider) | Custom API base URL. Required for `ollama` and for `agent` (the agent's `host:port`; ClearPR appends `/trigger`). |
 
 ## Embedding Configuration
 
 | Variable | Default | Description |
 |---|---|---|
-| `EMBEDDING_PROVIDER` | `voyage` | `voyage` or `local` |
-| `EMBEDDING_MODEL` | `voyage-3-lite` | Embedding model ID |
-| `EMBEDDING_DIMENSIONS` | `512` | Vector dimension. Must match the chosen model (`voyage-3-lite` = 512, `voyage-3` = 1024). Drives the `pr_memory.embedding` column type in the initial migration. |
-| `VOYAGE_API_KEY` | — | Voyage AI API key |
+| `EMBEDDING_PROVIDER` | `voyage` | `voyage` (API) or `local` (in-process via transformers.js, no API key). Used only by the PR-memory feature. |
+| `EMBEDDING_MODEL` | per provider | Model ID. Defaults: `voyage-3-lite` (voyage), `Xenova/all-MiniLM-L6-v2` (local). |
+| `EMBEDDING_DIMENSIONS` | `512` | Vector dimension. Must match the chosen model (`voyage-3-lite` = 512, `voyage-3` = 1024, `all-MiniLM-L6-v2` = 384). Drives the `pr_memory.embedding` column type; a migration re-aligns it if you change providers. |
+| `EMBEDDING_CACHE_DIR` | (package cache) | Where `local` embeddings cache the downloaded model. Set to a mounted volume (e.g. `/app/models`) so it persists across restarts. |
+| `VOYAGE_API_KEY` | — | Voyage AI API key (only for `EMBEDDING_PROVIDER=voyage`). |
+
+::: tip Local embeddings need a glibc image
+`EMBEDDING_PROVIDER=local` loads native `onnxruntime` bindings, which do not work on Alpine/musl. The shipped image is `node:slim` (glibc) for this reason. If you build a custom image, base it on a glibc distro, not Alpine.
+:::
 
 ## Application Settings
 
@@ -54,7 +59,13 @@ All configuration is via environment variables. Set them in `.env` or pass direc
 | Variable | Default | Description |
 |---|---|---|
 | `REDIS_PASSWORD` | — | Redis auth password (recommended in production) |
+| `DATABASE_SSL` | (on if `NODE_ENV=production`) | Require TLS to Postgres. **Set `false` when using the bundled (plain) `db` service**, otherwise the app crash-loops with `The server does not support SSL connections`. |
+| `REDIS_TLS` | (on if `NODE_ENV=production`) | Require TLS to Redis. **Set `false` when using the bundled (plain) `redis` service**, otherwise connections time out and `/health/ready` hangs. |
 
 ::: warning
 Never commit `.env` files. Use `.env.example` as a template.
 :::
+
+::: tip Self-hosting with the bundled database/redis?
+The shipped `docker-compose.yml` runs plain Postgres and Redis but sets `NODE_ENV=production`, which defaults both `DATABASE_SSL` and `REDIS_TLS` to on. Add `DATABASE_SSL=false` and `REDIS_TLS=false` to your `.env` for that setup. Only enable them when your database/redis actually terminate TLS (e.g. a managed service).
+:::