diff --git a/content/docs/architecture-section/overview.mdx b/content/docs/architecture-section/overview.mdx
index 2d42e0b..4e096f9 100644
--- a/content/docs/architecture-section/overview.mdx
+++ b/content/docs/architecture-section/overview.mdx
@@ -76,9 +76,3 @@ The architecture follows a modular, graph-like structure that ensures task relia
 * **Human Oversight** – Critical decisions require human validation to prevent errors.
 * **State Recovery** – System can resume from any point if interrupted.
 * **Performance Monitoring** – Real-time metrics ensure optimal execution across web and API environments.
-
----
-
-👉 Next step could be to include an **inline Mermaid diagram** inside the README, so that the architecture is rendered directly on GitHub instead of just in the SVG.
-
-Want me to add that Mermaid diagram block so the README is fully self-contained?
diff --git a/content/docs/customization/authentication.mdx b/content/docs/customization/authentication.mdx
new file mode 100644
index 0000000..20c1edc
--- /dev/null
+++ b/content/docs/customization/authentication.mdx
@@ -0,0 +1,86 @@
+---
+title: Authentication & Authorization
+description: Optional OIDC/BFF authentication and role-based authorization for the CUGA server.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+
+CUGA's demo server is unauthenticated by default. For shared or multi-user deployments, you can enable OpenID Connect (OIDC) authentication using a Backend-for-Frontend (BFF) session cookie, optionally combined with role-based authorization.
+
+The full option list lives in the [Settings reference — Auth section](/docs/customization/settings-reference#auth).
+
+## Quick enable
+
+```toml
+[auth]
+enabled = true
+authorization_enabled = true
+manage_roles = ["ServiceOwner", "ServiceAdmin"]
+chat_roles = ["ServiceOwner", "ServiceAdmin", "ServiceUser"]
+session_cookie_name = "cuga_session"
+session_max_age = 3600
+require_https = true
+```
+
+Then provide the OIDC client details via environment variables (none of them belong in `settings.toml`):
+
+```bash
+export OIDC_ISSUER="https://issuer.example.com"
+export OIDC_CLIENT_ID="cuga"
+export OIDC_CLIENT_SECRET="..."
+export OIDC_REDIRECT_URI="https://cuga.example.com/auth/callback"
+```
+
+## Authentication vs authorization
+
+| Setting | Effect |
+|---------|--------|
+| `enabled = true` | Users must log in via the IdP. Anonymous traffic is rejected. |
+| `authorization_enabled = true` | Roles in `manage_roles` / `chat_roles` are enforced for protected endpoints. |
+| `enabled = true`, `authorization_enabled = false` | Authenticated users can use the agent regardless of role. |
+
+### Where roles come from
+
+`role_token_source` controls which token CUGA inspects for the user's roles claim:
+
+| Value | Used when |
+|-------|-----------|
+| `"auto"` (default) | CUGA inspects the access token first, then falls back to the id_token, then the IAM proxy header. |
+| `"id_token"` | Force roles to come from the OIDC id_token. |
+| `"access_token"` | Force roles to come from the OIDC access token. |
+| `"iam_proxy"` | Trust an upstream IAM proxy header (for deployments fronted by IBM Cloud / OpenShift IAM). |
+
+## Behind an IAM proxy
+
+```toml
+[auth]
+enabled = true
+authorization_enabled = true
+iam_proxy_url = "https://iam-proxy.internal"
+iam_proxy_skip_verify = false
+iam_proxy_ca_bundle = "/etc/cuga/iam-proxy-ca.pem"
+role_token_source = "iam_proxy"
+```
+
+`iam_proxy_ca_bundle` and `OIDC_CA_BUNDLE` are independent — set both if your proxy and IdP use different internal CAs.
+
+## TLS termination
+
+When CUGA terminates TLS itself (i.e. there's no reverse proxy):
+
+```toml
+[auth]
+require_https = true
+ssl_keyfile = "/etc/cuga/tls/key.pem"
+ssl_certfile = "/etc/cuga/tls/cert.pem"
+```
+
+In Kubernetes / Ingress / OpenShift Route deployments leave these empty and let the platform handle TLS.
+
+## Optional: profile-token authorization workflow
+
+Combined with the [authorization workflow](https://github.com/cuga-project/cuga-agent) (cuga-agent PRs #60 and #92), authenticated users can opt-in to attach their own profile token to outbound tool calls. This lets the agent act _as_ the user when calling APIs that require user-level credentials, while still gating which tools are reachable via `manage_roles` / `chat_roles`.
+
+<Callout type="warning">
+Always set `require_https = true` (or terminate TLS upstream) when authentication is on — the BFF session cookie must never travel over plaintext.
+</Callout>
diff --git a/content/docs/customization/context-summarization.mdx b/content/docs/customization/context-summarization.mdx
new file mode 100644
index 0000000..87abb96
--- /dev/null
+++ b/content/docs/customization/context-summarization.mdx
@@ -0,0 +1,62 @@
+---
+title: Context Summarization
+description: Automatically summarize older messages when the context window fills up — for both CugaAgent and CugaSupervisor.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+
+For long conversations, CUGA can roll older turns into a running summary so the LLM keeps the most useful context without blowing the window.
+
+The full option list lives in the [Settings reference — Context Summarization](/docs/customization/settings-reference#context-summarization).
+
+## Enable
+
+```toml
+[context_summarization]
+enabled = true
+keep_last_n_messages = 10
+trim_tokens_to_summarize = 500
+summarization_model = "gpt-4o-mini"
+trigger_fraction = 0.75
+```
+
+With this configuration:
+
+- Summarization fires when the prompt would exceed **75 %** of the model's context window.
+- The **last 10 messages** are always preserved verbatim.
+- Older messages are condensed into ~**500 tokens** by `gpt-4o-mini`.
+
+## Trigger options
+
+You can use any combination of the three trigger conditions; whichever fires first wins.
+
+| Trigger | Use when |
+|---------|----------|
+| `trigger_fraction = 0.75` | You want the trigger to track the model's actual context window — recommended for production. |
+| `trigger_tokens = 2000` | You want a fixed token cap regardless of model. |
+| `trigger_messages = 20` | You want to summarize after a fixed number of turns (useful for testing). |
+
+If you set more than one, the **first** condition that becomes true triggers summarization.
+
+## Custom prompt
+
+By default LangChain's built-in summarization prompt is used. To override:
+
+```toml
+[context_summarization]
+custom_summary_prompt = "Provide a concise summary of the following conversation, preserving all numeric values and named entities: {messages}"
+```
+
+The `{messages}` placeholder is the only required variable.
+
+## Choice of summarization model
+
+`summarization_model` is independent of the agent's main model. Most users keep it on a small/cheap model (`gpt-4o-mini`, `claude-haiku`, etc.) — the goal is fast, lossy compression, not high reasoning.
+
+## Works with CugaSupervisor
+
+Context summarization applies to both `CugaAgent` and `CugaSupervisor` runs. Each delegated sub-agent invocation gets the summarized history just like a standalone agent.
+
+<Callout type="info">
+Summarization is lossy by design. If your task depends on remembering every literal detail (e.g. exact figures from a document), prefer the [Knowledge Base](/docs/customization/knowledge) — it keeps the original document available for retrieval.
+</Callout>
diff --git a/content/docs/customization/evolve.mdx b/content/docs/customization/evolve.mdx
new file mode 100644
index 0000000..94a7d08
--- /dev/null
+++ b/content/docs/customization/evolve.mdx
@@ -0,0 +1,125 @@
+---
+title: Evolve Integration
+description: Bring task-specific guidelines into CugaLite from altk-evolve, and save trajectories back after every run.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
+
+[altk-evolve](https://pypi.org/project/altk-evolve/) is an Anthropic-style "tip generation" service that learns guidelines from past trajectories and surfaces them at the start of similar future tasks. CUGA can use Evolve in **CugaLite mode** to:
+
+- Inject task-specific guidelines into the system prompt before execution.
+- Save the user/assistant trajectory after the run so future tasks benefit from what worked (or failed).
+
+The full settings list is in the [Settings reference — Evolve](/docs/customization/settings-reference#evolve).
+
+## How Evolve runs
+
+You have two options for how the Evolve MCP server starts:
+
+<Tabs items={['Registry-managed (recommended)', 'Standalone SSE server']}>
+<Tab value="Registry-managed (recommended)">
+Let the CUGA MCP registry launch Evolve for you. In the Manager UI, add an MCP tool with:
+
+- **Name**: `evolve`
+- **Connection type**: `Command (stdio)`
+- **Command**: `uvx`
+- **Args**: `--from altk-evolve --with setuptools<70 evolve-mcp`
+
+Add these env values in the same MCP tool UI:
+
+```bash
+EVOLVE_BACKEND=postgres
+EVOLVE_PG_HOST=localhost
+EVOLVE_PG_PORT=5432
+EVOLVE_PG_USER=postgres
+EVOLVE_PG_PASSWORD=postgres
+EVOLVE_PG_DBNAME=evolve
+EVOLVE_MODEL_NAME=Azure/gpt-4o
+OPENAI_API_KEY=env://OPENAI_API_KEY
+OPENAI_BASE_URL=env://OPENAI_BASE_URL
+```
+
+The `env://VAR` placeholders tell CUGA to read the actual values from its own environment at runtime.
+
+In `settings.toml`, leave `mode = "auto"` (or set `mode = "registry"`) and set `app_name = "evolve"`.
+</Tab>
+<Tab value="Standalone SSE server">
+Run Evolve yourself as an SSE server (useful for debugging):
+
+```bash
+# From a checkout of altk-evolve:
+uv sync --extra pgvector
+evolve-mcp --transport sse --port 8201
+```
+
+In `settings.toml`:
+
+```toml
+[evolve]
+enabled = true
+url = "http://127.0.0.1:8201/sse"
+mode = "direct"
+```
+
+`mode = "direct"` skips registry lookup entirely.
+</Tab>
+</Tabs>
+
+## Enable in `settings.toml`
+
+```toml
+[advanced_features]
+lite_mode = true     # Evolve only runs for CugaLite
+
+[evolve]
+enabled = true
+url = "http://127.0.0.1:8201/sse"
+mode = "auto"
+app_name = "evolve"
+lite_mode_only = true
+save_on_success = true
+save_on_failure = true
+async_save = true
+timeout = 30.0
+```
+
+## Try it
+
+```bash
+cuga start demo_crm --sample-memory-data
+```
+
+Then run a CugaLite task, e.g.:
+
+```
+Identify the common cities between my cuga_workspace/cities.txt and cuga_workspace/company.txt
+```
+
+## What happens during a run
+
+1. CUGA derives a task description from the current sub-task (or the first user message).
+2. CugaLite asks Evolve for relevant guidelines.
+3. Returned guidelines are appended to the system prompt under an `Evolve Guidelines` section.
+4. The task executes normally.
+5. The user/assistant trajectory is saved back to Evolve after completion.
+
+## Tuning
+
+| Setting | Effect |
+|---------|--------|
+| `async_save = true` | Save in the background; doesn't block the response. |
+| `save_on_success` | Only persist successful runs. |
+| `save_on_failure` | Only persist failed runs. |
+| `mode = "auto"` | Try registry first, fall back to direct SSE. |
+| `mode = "registry"` | Force registry-managed Evolve. |
+| `mode = "direct"` | Skip registry lookup; use `url`. |
+| `lite_mode_only = true` | Disable Evolve for non-lite paths. |
+
+<Callout type="info">
+If Evolve is unavailable, times out, or returns no guidance, CUGA continues normally — Evolve never blocks task execution.
+</Callout>
+
+<Callout type="warning">
+If you use Evolve's tip generation, make sure the Evolve MCP server's environment includes the required model settings (e.g. `EVOLVE_MODEL_NAME`, OpenAI/LiteLLM credentials). Otherwise `save_trajectory` may fail later with a model-access error even if the MCP connection itself works.
+</Callout>
diff --git a/content/docs/customization/knowledge.mdx b/content/docs/customization/knowledge.mdx
new file mode 100644
index 0000000..ad63cd0
--- /dev/null
+++ b/content/docs/customization/knowledge.mdx
@@ -0,0 +1,124 @@
+---
+title: Knowledge Base
+description: Self-contained document ingestion and retrieval for CUGA agents using Docling and local vector stores.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
+
+CUGA includes a built-in knowledge base powered by LangChain and local vector stores. **Docling** is integrated for document ingestion: it parses and normalizes PDFs, Office files, HTML, Markdown, images, and other supported types before chunking and embedding, so the pipeline stays self-contained with no external document services.
+
+When enabled, the agent can search, ingest, and manage documents — and it automatically becomes aware of what documents are available.
+
+## Enabling Knowledge
+
+Knowledge is **enabled by default** via `settings.toml` (see [Storage](/docs/customization/settings-reference#storage) for the embedding provider). To opt out for a specific agent in the SDK:
+
+```python
+from cuga import CugaAgent
+
+agent = CugaAgent(tools=[...], enable_knowledge=False)
+```
+
+The SDK auto-injects knowledge tools and an awareness block into the agent prompt, so the agent knows what documents are available and how to search them.
+
+## Try the Demo
+
+```bash
+cuga start demo_knowledge
+```
+
+This is the same surface as `cuga start demo_crm` but with the knowledge engine on — you can upload documents through the UI and query them.
+
+## Programmatic Access
+
+```python
+from cuga import CugaAgent
+import asyncio
+
+agent = CugaAgent(enable_knowledge=True)
+
+async def main():
+    # Ingest a document
+    await agent.knowledge.ingest("/path/to/quarterly_report.pdf")
+
+    # The agent now automatically knows about this document
+    result = await agent.invoke("What does the report say about Q4 revenue?")
+    print(result.answer)
+
+    # Direct search (skip the agent loop)
+    results = await agent.knowledge.search("Q4 revenue figures")
+    for r in results:
+        print(f"{r['filename']} (page {r['page']}): {r['text'][:100]}")
+
+    # List documents
+    docs = await agent.knowledge.list_documents()
+
+    # Clean up
+    await agent.aclose()
+
+asyncio.run(main())
+```
+
+## Scopes
+
+Documents can be **agent-scoped** (the default — permanent and shared across conversations) or **session-scoped** (tied to a single thread).
+
+<Tabs items={['Agent scope (default)', 'Session scope']}>
+<Tab value="Agent scope (default)">
+```python
+# Permanent, shared across conversations
+await agent.knowledge.ingest("/path/to/file.pdf", scope="agent")
+
+results = await agent.knowledge.search("query", scope="agent")
+```
+</Tab>
+<Tab value="Session scope">
+```python
+thread_id = "user-session-123"
+
+# Temporary, per-conversation
+await agent.knowledge.ingest(
+    "/path/to/file.pdf",
+    scope="session",
+    thread_id=thread_id,
+)
+
+results = await agent.knowledge.search(
+    "query",
+    scope="session",
+    thread_id=thread_id,
+)
+```
+</Tab>
+</Tabs>
+
+## Supported Document Types
+
+PDF, DOCX, XLSX, PPTX, HTML, Markdown, images, and more — anything Docling can parse.
+
+## Storage and Embeddings
+
+The knowledge backend is selected by the global `[storage].mode` setting:
+
+| Data | `mode = "local"` | `mode = "prod"` |
+|------|------------------|-----------------|
+| Knowledge vectors | `{knowledge.persist_dir}/knowledge_vectors.db` (vec0 tables per collection) | `storage.postgres_url` (pgvector) |
+| Knowledge metadata | `{knowledge.persist_dir}/metadata.db` | Postgres tables `cuga_knowledge_meta_*`. Uploaded files still live under `persist_dir/files/`. |
+
+Embeddings are configured under `[storage.embedding]` and default to a local `BAAI/bge-small-en-v1.5` model (no OpenAI key required). See [Storage](/docs/customization/settings-reference#storage) for full options.
+
+The knowledge persistence directory defaults to `<cwd>/.cuga/knowledge/` and can be overridden in `knowledge_settings.toml`.
+
+## Routing Knowledge Through CugaLite
+
+The `[advanced_features].force_lite_mode_apps` list defaults to `["knowledge"]`, so knowledge queries always run through CugaLite's faster execution path regardless of `lite_mode_tool_threshold`. To change this, edit `settings.toml`:
+
+```toml
+[advanced_features]
+force_lite_mode_apps = ["knowledge", "crm"]   # add more apps as needed
+```
+
+<Callout type="info">
+The agent's awareness block is rebuilt as documents are ingested or removed, so newly added documents are usable immediately on the next invocation.
+</Callout>
diff --git a/content/docs/customization/llm-config.mdx b/content/docs/customization/llm-config.mdx
index f002a08..19c8915 100644
--- a/content/docs/customization/llm-config.mdx
+++ b/content/docs/customization/llm-config.mdx
@@ -91,7 +91,7 @@ MODEL_NAME="gpt-4o"
 1. Add to your `.env` file:
    ```bash
    # For Groq
-   # GENT_SETTING_CONFIG="settings.groq.toml"
+   # AGENT_SETTING_CONFIG="settings.groq.toml"
    # GROQ_API_KEY="XXXX"
    ```
 
diff --git a/content/docs/customization/memory.mdx b/content/docs/customization/memory.mdx
index 5f0c193..f7bffc8 100644
--- a/content/docs/customization/memory.mdx
+++ b/content/docs/customization/memory.mdx
@@ -3,6 +3,16 @@ title: Memory & Learning
 description: Enable CUGA's memory system to learn from past interactions and improve over time
 ---
 
+import { Callout } from 'fumadocs-ui/components/callout';
+
+<Callout type="warning" title="Deprecated for CUGA classic">
+The `mem0`-based memory system documented on this page (`enable_memory`, `enable_fact`, `memory_provider`, `memory` server port) was **removed from CUGA classic** in cuga-agent PR #153 (2026-04-23, _\"feat: remove memory support for cuga classic\"_). The settings still appear in older `settings.toml` files but no longer have any effect.
+
+Trajectory-based learning is now provided by the **[Evolve integration](/docs/customization/settings-reference#evolve)** for CugaLite, and per-conversation context is managed by **[Context Summarization](/docs/customization/settings-reference#context-summarization)**. Document- and knowledge-aware behavior is provided by the new **[Knowledge Base](/docs/customization/knowledge)** (Docling-powered).
+
+This page is kept as-is for users still running older CUGA versions.
+</Callout>
+
 CUGA's memory system allows the agent to learn from past interactions, remember patterns, and improve performance on similar tasks over time. This creates a personalized, adaptive agent experience.
 
 ## Overview
diff --git a/content/docs/customization/meta.json b/content/docs/customization/meta.json
index 1abae1a..bd968db 100644
--- a/content/docs/customization/meta.json
+++ b/content/docs/customization/meta.json
@@ -11,6 +11,14 @@
     "special-instructions",
     "tools",
     "cli-sdk",
+    "knowledge",
+    "context-summarization",
+    "evolve",
+    "storage",
+    "secrets-vault",
+    "authentication",
+    "observability",
+    "ui-branding",
     "memory",
     "e2b-sandbox",
     "settings-reference"
diff --git a/content/docs/customization/observability.mdx b/content/docs/customization/observability.mdx
new file mode 100644
index 0000000..5b71335
--- /dev/null
+++ b/content/docs/customization/observability.mdx
@@ -0,0 +1,53 @@
+---
+title: Observability (OpenLit)
+description: OpenTelemetry-based LLM tracing, metrics, and logs via OpenLit.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+
+CUGA can emit OpenTelemetry traces, metrics, and logs for every LLM call using [OpenLit](https://github.com/openlit/openlit) — a drop-in OTel instrumentation for popular LLM SDKs.
+
+## Install
+
+OpenLit ships as an optional extra:
+
+```bash
+pip install "cuga[observability]"
+# or with uv:
+uv sync --group observability
+```
+
+## Configure
+
+```toml
+[observability]
+openlit = true
+```
+
+Point OpenLit at your OTLP collector via environment:
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
+```
+
+Common service-identifying env vars (`OTEL_SERVICE_NAME`, `OTEL_RESOURCE_ATTRIBUTES`) work as usual — CUGA does not override them.
+
+## Local testing stack
+
+The cuga-agent repo ships a docker-compose stack under `deployment/docker-compose/openlit/` containing an OTel Collector, Tempo (traces), Prometheus (metrics), and Grafana (UI). Start it, point CUGA at the collector, run a task, and you'll see per-call traces with prompt, model, token counts, latency, and cost.
+
+## What gets captured
+
+For each LLM invocation OpenLit records:
+
+- **Trace span** — start/end, duration, parent-child relationships across planner / shortlister / coder / reflection nodes.
+- **Attributes** — model name, provider, temperature, prompt/response content (off by default — configure per OpenLit's docs), token usage (input/output/total), and any raised exception.
+- **Metrics** — request counts, token counts, and latency histograms exported via OTLP.
+
+## Combining with Langfuse
+
+`langfuse_tracing = true` under `[advanced_features]` is independent of OpenLit and can be enabled in parallel — useful when you want both an OTel-native pipeline and a Langfuse dashboard.
+
+<Callout type="info">
+OpenLit's instrumentation is opt-in per LLM SDK. CUGA enables instrumentation for the providers it ships with (OpenAI, LiteLLM, WatsonX). If you wire in a custom provider, follow [OpenLit's instrumentation docs](https://docs.openlit.io/) to enable it explicitly.
+</Callout>
diff --git a/content/docs/customization/secrets-vault.mdx b/content/docs/customization/secrets-vault.mdx
new file mode 100644
index 0000000..799fd5f
--- /dev/null
+++ b/content/docs/customization/secrets-vault.mdx
@@ -0,0 +1,100 @@
+---
+title: Secrets & Vault
+description: Resolve secrets from environment variables or HashiCorp Vault — with KV v1/v2 and Kubernetes auth.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
+
+CUGA reads secrets at runtime from one of two backends:
+
+1. **Local** — environment variables (with optional UI overrides stored encrypted on disk).
+2. **Vault** — HashiCorp Vault KV v1 or v2, with token or Kubernetes auth.
+
+The backend is selected via `[secrets].mode` in `settings.toml`. See the full option list in the [Settings reference](/docs/customization/settings-reference#secrets).
+
+## Local mode (default)
+
+```toml
+[secrets]
+mode = "local"
+force_env = true
+db_encryption_key_env = "CUGA_SECRET_KEY"
+```
+
+When `force_env = true`, CUGA always resolves from `os.environ` and ignores any UI overrides. Set `CUGA_SECRET_KEY` in the environment to a stable encryption key — it is used to encrypt UI-provided overrides on disk when `force_env = false`.
+
+## Vault mode
+
+### Token auth
+
+```toml
+[secrets]
+mode = "vault"
+vault_addr = "https://vault.example.com:8200"
+vault_auth_method = "token"
+vault_token_env = "VAULT_TOKEN"
+vault_mount = "secret"
+vault_kv_version = ""           # empty = KV v2
+vault_secret_path = "cuga/prod"
+```
+
+Then export the token:
+
+```bash
+export VAULT_TOKEN="hvs.CAESI..."
+```
+
+### Kubernetes auth
+
+When CUGA runs in a Kubernetes pod, use the projected service-account JWT:
+
+```toml
+[secrets]
+mode = "vault"
+vault_addr = "https://vault.example.com:8200"
+vault_auth_method = "kubernetes"
+vault_k8s_role = "cuga"
+vault_k8s_mount_path = "kubernetes"
+vault_k8s_jwt_path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
+vault_mount = "secret"
+vault_secret_path = "cuga/prod"
+```
+
+The auth method, role, and secret path can also be set at runtime via `DYNACONF_SECRETS__VAULT_AUTH_METHOD` and `DYNACONF_SECRETS__VAULT_SECRET_PATH`.
+
+### TLS
+
+If your Vault server uses an internal CA:
+
+```toml
+vault_cacert = "/etc/cuga/vault-root-ca.pem"
+vault_skip_verify = false
+```
+
+`VAULT_CACERT` and `VAULT_SKIP_VERIFY` env vars also work. **Do not** disable verification in production.
+
+### Writing secrets back to Vault
+
+By default, CUGA reads secrets only:
+
+```toml
+vault_write_enabled = false
+```
+
+Set to `true` only if you intend to manage secrets through CUGA's UI — most deployments should leave this off.
+
+## Referencing env-resolved secrets
+
+When configuring tools (e.g. an Evolve MCP server), pass `env://VAR_NAME` placeholders so values are read from the process environment at runtime:
+
+```bash
+OPENAI_API_KEY=env://OPENAI_API_KEY
+OPENAI_BASE_URL=env://OPENAI_BASE_URL
+```
+
+This pattern works whether secrets ultimately come from `os.environ` or are injected by Vault.
+
+<Callout type="warning">
+Never commit secrets to `settings.toml` or to git. Use environment variables, Vault, or your deployment's secret manager.
+</Callout>
diff --git a/content/docs/customization/settings-reference.mdx b/content/docs/customization/settings-reference.mdx
index b89640a..c5c09e4 100644
--- a/content/docs/customization/settings-reference.mdx
+++ b/content/docs/customization/settings-reference.mdx
@@ -75,7 +75,6 @@ Core feature configuration.
 ```toml
 [features]
 cuga_mode = "balanced"
-memory_provider = "mem0"
 ```
 
 ### Options
@@ -83,7 +82,8 @@ memory_provider = "mem0"
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
 | `cuga_mode` | String | `"balanced"` | Execution reasoning mode. Options: `"fast"`, `"balanced"`, `"accurate"`, `"save_reuse_fast"`, `"custom"`. Fast is quicker but less accurate; Accurate is slower but more precise; Save & Reuse caches workflows. |
-| `memory_provider` | String | `"mem0"` | Memory system provider. Currently supports `"mem0"`. Used for learning from past errors and improving performance. |
+
+> **Note:** The legacy `memory_provider` key (mem0) was removed from CUGA classic in cuga-agent PR #153. See [Memory & Learning](/docs/customization/memory) for details and migration guidance.
 
 ---
 
@@ -96,6 +96,7 @@ Advanced configuration flags for specialized behavior.
 # Benchmark and Evaluation
 web_arena_eval = false
 benchmark = "default"
+appworld_final_answer_plain = true
 
 # Vision and Analysis
 use_vision = true
@@ -116,13 +117,18 @@ use_extension = false
 # Planning and Optimization
 code_planner_enabled = true
 api_planner_hitl = false
+reflection_enabled = false
 lite_mode = true
 lite_mode_tool_threshold = 70
+force_lite_mode_apps = ["knowledge"]
 shortlisting_tool_threshold = 35
-
-# Memory and Learning
-enable_memory = false
-enable_fact = false
+cuga_lite_enable_few_shots = true
+cuga_lite_max_steps = 70
+cuga_lite_bind_tools_mode = "none"
+cuga_lite_bind_tools_apps = []
+cuga_lite_bind_tools_include_find_tools = false
+cuga_lite_nl_auto_continue = false
+enable_todos = false
 
 # Workflows
 save_reuse_generate_html = false
@@ -137,9 +143,19 @@ e2b_sandbox_ttl_buffer = 60
 e2b_cleanup_on_create = true
 e2b_cleanup_frequency = 0
 
-# Limits
+# Variable Lifecycle
+sub_task_keep_last_n = 5
+code_executor_keep_last_n = -1
+
+# Limits & Timeouts
 message_window_limit = 100
 max_input_length = 5000
+tool_call_timeout = 30
+execution_output_max_length = 70000
+
+# Misc
+path_segment_index = 1
+force_autonomous_mode = false
 ```
 
 ### Benchmark & Evaluation Options
@@ -148,6 +164,7 @@ max_input_length = 5000
 |--------|------|---------|-------------|
 | `web_arena_eval` | Boolean | `false` | Enable WebArena benchmark evaluation mode. For testing on WebArena benchmark suite. |
 | `benchmark` | String | `"default"` | Benchmark mode. Options: `"default"`, `"appworld"`, `"webarena"`. Controls evaluation settings. |
+| `appworld_final_answer_plain` | Boolean | `true` | When `benchmark = "appworld"`, use plain `answer:` completion prompts (no JSON) for final formatting. |
 
 ### Vision & Analysis Options
 
@@ -183,16 +200,18 @@ max_input_length = 5000
 |--------|------|---------|-------------|
 | `code_planner_enabled` | Boolean | `true` | Enable code generation planning. Controls whether CUGA generates Python code for complex operations. |
 | `api_planner_hitl` | Boolean | `false` | Enable Human-in-the-Loop for API planner. Pauses at decision points requiring human approval. See [Human-in-the-Loop](/docs/guides/human-in-the-loop). |
+| `reflection_enabled` | Boolean | `false` | Run an extra reflection pass after planning/execution to detect and recover from errors. |
 | `lite_mode` | Boolean | `true` | Enable CugaLite mode for simple API tasks. Automatically routes simple tasks to faster execution path. |
 | `lite_mode_tool_threshold` | Integer | `70` | Tool count threshold for CugaLite routing. If app has fewer than this many tools, use CugaLite. |
-| `shortlisting_tool_threshold` | Integer | `35` | Threshold for enabling tool shortlisting. If total tools exceed this, enable intelligent tool filtering. |
-
-### Memory & Learning Options
-
-| Option | Type | Default | Description |
-|--------|------|---------|-------------|
-| `enable_memory` | Boolean | `false` | Enable memory system. Learn from past errors and improve over time. Requires `uv sync --group memory`. See [Memory](/docs/customization/memory). |
-| `enable_fact` | Boolean | `false` | Enable fact checking. Verify agent outputs against known facts. |
+| `force_lite_mode_apps` | Array&lt;String&gt; | `["knowledge"]` | App names that always run in CugaLite regardless of `lite_mode_tool_threshold` (e.g. `["knowledge", "crm"]`). |
+| `shortlisting_tool_threshold` | Integer | `35` | Threshold for enabling tool shortlisting. If total tools exceed this, enable intelligent tool filtering (`find_tools`). |
+| `cuga_lite_enable_few_shots` | Boolean | `true` | MCP few-shots: prompt block + few-shot chat prefix in CugaLite. Set `false` to disable. |
+| `cuga_lite_max_steps` | Integer | `70` | Maximum number of steps (call_model + sandbox cycles) in CugaLite before returning an error. |
+| `cuga_lite_bind_tools_mode` | String | `"none"` | How CugaLite binds tools to the model. Options: `"none"`, `"all"`, `"apps"`. (Per-model overrides live in `model_runtime_profile.py`.) |
+| `cuga_lite_bind_tools_apps` | Array&lt;String&gt; | `[]` | When `cuga_lite_bind_tools_mode = "apps"`, list of app names to bind (e.g. `["crm", "slack"]`). |
+| `cuga_lite_bind_tools_include_find_tools` | Boolean | `false` | When binding tools, also bind the `find_tools` StructuredTool alongside `all`/`apps`. |
+| `cuga_lite_nl_auto_continue` | Boolean | `false` | When the model returns NL with no code, classify interim vs final; if interim, simulate a user `continue` and re-call the model. |
+| `enable_todos` | Boolean | `false` | Enable the todos feature for managing complex multi-step tasks. |
 
 ### Workflow Options
 
@@ -215,12 +234,28 @@ See [E2B Cloud Sandbox](/docs/customization/e2b-sandbox) for detailed E2B config
 | `e2b_cleanup_on_create` | Boolean | `true` | Run cleanup when creating new sandboxes. Prevents sandbox accumulation. |
 | `e2b_cleanup_frequency` | Integer | `0` | Check all sandboxes every N get_or_create calls. 0 = only on create. Higher values reduce cleanup overhead. |
 
-### Limit Options
+### Variable Lifecycle Options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `sub_task_keep_last_n` | Integer | `5` | Number of most recent generated variables to keep when executing sub-tasks. |
+| `code_executor_keep_last_n` | Integer | `-1` | Variables retained after code execution. `-1` keeps all; positive integers keep the last N. |
+
+### Limit & Timeout Options
 
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
 | `message_window_limit` | Integer | `100` | Maximum messages to keep in conversation history. Older messages discarded when exceeded. Reduces context size. |
 | `max_input_length` | Integer | `5000` | Maximum character length for user input. Prevents abuse and excessive processing. |
+| `tool_call_timeout` | Integer (seconds) | `30` | Timeout for tool/API calls (sandbox operations). Raises `TimeoutError` when exceeded. |
+| `execution_output_max_length` | Integer | `70000` | Maximum characters returned in execution output. Prevents token overflow on very large tool responses. |
+
+### Misc Options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `path_segment_index` | Integer | `1` | Which path segment to use for OpenAPI operation naming (1 = first, 2 = second, etc.). |
+| `force_autonomous_mode` | Boolean | `false` | Force fully autonomous execution (no HITL prompts) regardless of other settings. |
 
 ---
 
@@ -232,17 +267,20 @@ Controls service ports and URLs for all CUGA services.
 [server_ports]
 registry = 8001
 demo = 7860
+demo_server_startup_max_retries = 420
 apis_url = 9000
 crm_api = 8007
 saved_flows = 8003
 environment_url = 8000
+filesystem_mcp = 8112
+docs_mcp = 8113
 digital_sales_api = 8000
 mcp_server = 8000
 petstore_api = 8081
 graph_visualization = 8080
 orchestrate_url = 4321
 trm_url = 8080
-memory = 8888
+oak_health_api = 8090
 ```
 
 ### Options
@@ -251,17 +289,22 @@ memory = 8888
 |--------|------|---------|-------------|
 | `registry` | Integer | `8001` | API registry service port. Where CUGA tools and APIs are exposed. |
 | `demo` | Integer | `7860` | CUGA demo interface port. Open browser to http://localhost:7860 |
+| `demo_server_startup_max_retries` | Integer | `420` | CLI `cuga start` polls the demo Uvicorn process every ~0.5s up to this many times before timing out (default ≈ 3 minutes). |
 | `apis_url` | Integer | `9000` | APIs service port. (Rarely used) |
-| `crm_api` | Integer | `8007` | CRM demo application port. Used in demo_crm mode. |
+| `crm_api` | Integer | `8007` | CRM demo application port. Used in `cuga start demo_crm`. |
 | `saved_flows` | Integer | `8003` | Saved workflows service port. For Save & Reuse feature. |
 | `environment_url` | Integer | `8000` | Environment service port. Base configuration service. |
+| `filesystem_mcp` | Integer | `8112` | Filesystem MCP server port. Used in the [Filesystem MCP demo](/docs/guides/filesystem-mcp-demo). |
+| `docs_mcp` | Integer | `8113` | Docs MCP server port. |
 | `digital_sales_api` | Integer | `8000` | Digital Sales API port. Used in digital sales demo. |
 | `mcp_server` | Integer | `8000` | MCP server port for tool integration. |
 | `petstore_api` | Integer | `8081` | Petstore demo API port. Example API for testing. |
 | `graph_visualization` | Integer | `8080` | Graph visualization service port. For execution flow visualization. |
 | `orchestrate_url` | Integer | `4321` | Orchestration service port. (Enterprise only) |
 | `trm_url` | Integer | `8080` | Task/Routing/Management URL port. (Advanced) |
-| `memory` | Integer | `8888` | Memory service port. Used when memory is enabled. |
+| `oak_health_api` | Integer | `8090` | `cuga-oak-health` OpenAPI port. Used by `cuga start demo_health`. |
+
+> **Note:** The `memory = 8888` port has been removed. Memory support for CUGA classic was deprecated in cuga-agent PR #153.
 
 ### Advanced Port Configuration
 
@@ -280,6 +323,324 @@ For E2B cloud sandbox, configure registry exposure:
 
 ---
 
+## Supervisor
+
+Configures the multi-agent supervisor when running CUGA as a server. SDK-only usage (building a `CugaSupervisor` in Python) does not require this section.
+
+```toml
+[supervisor]
+enabled = false
+config_path = "src/cuga/backend/tools_env/registry/config/supervisor_demo_crm.yaml"
+agent_approval = true
+pass_variables_a2a = false
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | Boolean | `false` | Enable `CugaSupervisor` in the server. See [`CugaSupervisor` SDK doc](/docs/sdk/cuga_supervisor). |
+| `config_path` | String | `""` | Path to the supervisor YAML config. If empty, uses the default supervisor setup. |
+| `agent_approval` | Boolean | `true` | Require user approval before delegating to any sub-agent (human-in-the-loop). |
+| `pass_variables_a2a` | Boolean | `false` | When `true`, the A2A delegate tool accepts variables and sends them in request metadata (A2A protocol extension). |
+
+The bundled multi-agent demo can be launched with:
+
+```bash
+cuga start demo_supervisor
+```
+
+---
+
+## Storage
+
+Selects the backend for policy vectors, knowledge vectors, and knowledge metadata.
+
+```toml
+[storage]
+mode = "local"
+local_db_path = ""
+postgres_url = ""
+
+[storage.embedding]
+provider = "local"
+model = "BAAI/bge-small-en-v1.5"
+dim = 384
+base_url = ""
+api_key = ""
+```
+
+### Storage modes
+
+| Data | `local` | `prod` |
+|------|---------|--------|
+| Policy vectors | sqlite-vec at `[policy].policy_db_path` or `storage.local_db_path` (defaults to `DBS_DIR/cuga.db`) | `storage.postgres_url` (pgvector) |
+| Knowledge vectors | `{knowledge.persist_dir}/knowledge_vectors.db` (vec0 tables per collection) | `storage.postgres_url` |
+| Knowledge metadata | `{knowledge.persist_dir}/metadata.db` | Postgres tables `cuga_knowledge_meta_*` (uploaded files stay under `persist_dir/files/`) |
+
+`DBS_DIR` defaults to the package `dbs/` directory or the value of the `CUGA_DBS_DIR` env var.
+
+### `[storage]` options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `mode` | String | `"local"` | `"local"` (sqlite/sqlite-vec) or `"prod"` (Postgres + pgvector). |
+| `local_db_path` | String | `""` | Override path for the local SQLite DB. Empty = `DBS_DIR/cuga.db`. |
+| `postgres_url` | String | `""` | Postgres connection URL. **Required** when `mode = "prod"`. |
+
+### `[storage.embedding]` options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `provider` | String | `"local"` | `"openai"`, `"local"`, or `"auto"` (tries OpenAI, falls back to local). |
+| `model` | String | `"BAAI/bge-small-en-v1.5"` | Embedding model name. |
+| `dim` | Integer | `384` | Embedding dimension. `1536` for OpenAI, `384` for `BAAI/bge-small-en-v1.5`. |
+| `base_url` | String | `""` | Optional custom endpoint for an OpenAI-compatible embedding service. |
+| `api_key` | String | `""` | Optional API key for the embedding endpoint. Falls back to `OPENAI_API_KEY`. |
+
+---
+
+## Policy
+
+Configures the [policy system](/docs/sdk/policies).
+
+```toml
+[policy]
+enabled = true
+collection_name = "cuga_policies"
+policy_db_path = ""
+playbook_refine = false
+filesystem_sync = true
+cuga_folder = ".cuga"
+auto_load_policies = true
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | Boolean | `true` | Enable the policy system (intent guards, playbooks, tool guides, tool approvals, output formatters). |
+| `collection_name` | String | `"cuga_policies"` | Vector store collection name for policies. |
+| `policy_db_path` | String | `""` | Optional explicit path for the policy DB. When empty, uses `storage.local_db_path`. |
+| `playbook_refine` | Boolean | `false` | Enable playbook refinement based on user progress (requires an LLM call). |
+| `filesystem_sync` | Boolean | `true` | Sync policies to/from the `.cuga` folder on disk. |
+| `cuga_folder` | String | `".cuga"` | Path to the `.cuga` folder used for policy files. |
+| `auto_load_policies` | Boolean | `true` | Automatically load policies from the `.cuga` folder on startup. |
+
+---
+
+## Service
+
+Identifies the running CUGA instance, useful for multi-tenant or Kubernetes deployments.
+
+```toml
+[service]
+instance_id = ""
+tenant_id = ""
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `instance_id` | String | `""` | Override with `DYNACONF_SERVICE__INSTANCE_ID` (e.g. K8s pod name, deployment id). |
+| `tenant_id` | String | `""` | Multi-tenant SaaS tenant id. Override with `DYNACONF_SERVICE__TENANT_ID`. |
+
+---
+
+## Secrets
+
+Selects how CUGA resolves secrets at runtime — local environment variables or HashiCorp Vault.
+
+```toml
+[secrets]
+mode = "local"
+force_env = true
+db_encryption_key_env = "CUGA_SECRET_KEY"
+vault_addr = ""
+vault_token_env = "VAULT_TOKEN"
+vault_auth_method = ""
+vault_k8s_role = ""
+vault_k8s_mount_path = "kubernetes"
+vault_k8s_jwt_path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
+vault_cacert = ""
+vault_skip_verify = false
+vault_mount = "secret"
+vault_kv_version = ""
+vault_secret_path = ""
+vault_write_enabled = false
+aws_region = ""
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `mode` | String | `"local"` | `"local"` (env vars / UI overrides) or `"vault"`. |
+| `force_env` | Boolean | `true` | If `true`, always resolve from `os.environ` (ignores UI overrides and Vault). |
+| `db_encryption_key_env` | String | `"CUGA_SECRET_KEY"` | Environment variable holding the encryption key for stored secrets. |
+| `vault_addr` | String | `""` | Vault server URL (e.g. `https://vault.example.com:8200`). |
+| `vault_token_env` | String | `"VAULT_TOKEN"` | Env var name that holds the Vault token (when `vault_auth_method = "token"`). |
+| `vault_auth_method` | String | `""` | `""`, `"token"`, or `"kubernetes"`. Override with `DYNACONF_SECRETS__VAULT_AUTH_METHOD`. |
+| `vault_k8s_role` | String | `""` | Vault role used by Kubernetes auth. |
+| `vault_k8s_mount_path` | String | `"kubernetes"` | Mount path of the Kubernetes auth backend. |
+| `vault_k8s_jwt_path` | String | `/var/run/.../token` | Path to the service-account JWT inside the pod. |
+| `vault_cacert` | String | `""` | Path to a PEM bundle used to verify Vault TLS (env: `VAULT_CACERT`). |
+| `vault_skip_verify` | Boolean | `false` | Dev only — disable TLS verification (env: `VAULT_SKIP_VERIFY`). |
+| `vault_mount` | String | `"secret"` | KV mount path within Vault. |
+| `vault_kv_version` | String | `""` | `"1"` or `"2"`. Empty defaults to KV v2; use `"1"` only for KV v1 mounts. |
+| `vault_secret_path` | String | `""` | Base path for secrets. Override with `DYNACONF_SECRETS__VAULT_SECRET_PATH`. |
+| `vault_write_enabled` | Boolean | `false` | Allow CUGA to write secrets back to Vault (most setups should leave this off). |
+| `aws_region` | String | `""` | Reserved for AWS Secrets Manager integration. |
+
+---
+
+## Auth
+
+Optional OIDC/BFF authentication for the CUGA server.
+
+```toml
+[auth]
+enabled = false
+authorization_enabled = false
+manage_roles = ["ServiceOwner", "ServiceAdmin"]
+chat_roles = ["ServiceOwner", "ServiceAdmin", "ServiceUser"]
+session_cookie_name = "cuga_session"
+session_max_age = 3600
+jwks_cache_ttl = 3600
+require_https = false
+ssl_keyfile = ""
+ssl_certfile = ""
+iam_proxy_url = ""
+iam_proxy_skip_verify = false
+iam_proxy_ca_bundle = ""
+role_token_source = "auto"
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | Boolean | `false` | Enable OIDC/BFF authentication on the demo server. |
+| `authorization_enabled` | Boolean | `false` | Enforce role-based authorization in addition to authentication. |
+| `manage_roles` | Array&lt;String&gt; | `["ServiceOwner", "ServiceAdmin"]` | Roles allowed to manage policies, tools, and configuration. |
+| `chat_roles` | Array&lt;String&gt; | `["ServiceOwner", "ServiceAdmin", "ServiceUser"]` | Roles allowed to chat with the agent. |
+| `session_cookie_name` | String | `"cuga_session"` | Name of the BFF session cookie. |
+| `session_max_age` | Integer (seconds) | `3600` | Session lifetime. |
+| `jwks_cache_ttl` | Integer (seconds) | `3600` | How long signed-key sets from the IdP are cached. |
+| `require_https` | Boolean | `false` | Reject non-HTTPS traffic (production). |
+| `ssl_keyfile` | String | `""` | Path to TLS private key (when terminating TLS in CUGA). |
+| `ssl_certfile` | String | `""` | Path to TLS certificate. |
+| `iam_proxy_url` | String | `""` | URL of an upstream IAM proxy in front of CUGA. |
+| `iam_proxy_skip_verify` | Boolean | `false` | Skip TLS verification against the IAM proxy (dev only). |
+| `iam_proxy_ca_bundle` | String | `""` | PEM bundle for IAM-proxy TLS (independent of `oidc_ca_bundle`). |
+| `role_token_source` | String | `"auto"` | Where roles come from: `"auto"`, `"id_token"`, `"access_token"`, `"iam_proxy"`. |
+
+OIDC client/issuer/secret values are configured via environment variables — see [Environment Variable Overrides](#environment-variable-overrides).
+
+---
+
+## UI
+
+Customize the demo UI branding.
+
+```toml
+[ui]
+hide_cuga_logo = false
+brand_name = "CUGA Agent"
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `hide_cuga_logo` | Boolean | `false` | Hide the CUGA logo in the header (e.g. when white-labelling). |
+| `brand_name` | String | `"CUGA Agent"` | App name shown in the header. |
+
+---
+
+## Context Summarization
+
+Automatically summarize older parts of the conversation when the context window starts to fill up.
+
+```toml
+[context_summarization]
+enabled = false
+keep_last_n_messages = 10
+trim_tokens_to_summarize = 500
+summarization_model = "gpt-4o-mini"
+trigger_fraction = 0.75
+# trigger_tokens = 2000
+# trigger_messages = 20
+# custom_summary_prompt = "Provide a concise summary of the conversation: {messages}"
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | Boolean | `false` | Enable intelligent context summarization. |
+| `keep_last_n_messages` | Integer | `10` | Number of recent messages preserved unsummarized. |
+| `trim_tokens_to_summarize` | Integer | `500` | Target token count for generated summaries. |
+| `summarization_model` | String | `"gpt-4o-mini"` | Model used to generate summaries (kept fast and cheap by default). |
+| `trigger_fraction` | Float | `0.75` | Trigger summarization at this fraction of the model's context window. |
+| `trigger_tokens` | Integer | _(unset)_ | Alternative trigger: total tokens above this count. |
+| `trigger_messages` | Integer | _(unset)_ | Alternative trigger: number of messages since the last summary. |
+| `custom_summary_prompt` | String | _(unset)_ | Optional custom prompt template (uses LangChain default if not set). |
+
+---
+
+## Connections
+
+Controls TLS for outbound LLM inference connections.
+
+```toml
+[connections]
+inference_ca_cert = ""
+inference_disable_ssl = false
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `inference_ca_cert` | String | `""` | Path to a CA certificate for the inference HTTP clients (OpenAI and LiteLLM). Env: `CUGA_INFERENCE_CA_CERT`. |
+| `inference_disable_ssl` | Boolean | `false` | Disable SSL verification for all inference connections (overrides `inference_ca_cert`). Env: `CUGA_DISABLE_SSL`. |
+
+---
+
+## Observability
+
+Optional OpenLit / OpenTelemetry observability for LLM calls.
+
+```toml
+[observability]
+openlit = false
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `openlit` | Boolean | `false` | Enable OpenLit LLM observability via OpenTelemetry (OTLP). Requires `pip install cuga[observability]`. Configure the OTLP endpoint via `OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318`. |
+
+A local testing stack (OTel Collector + Tempo + Prometheus + Grafana) is provided under `deployment/docker-compose/openlit/` in the cuga-agent repo.
+
+---
+
+## Evolve
+
+Optional integration with [altk-evolve](https://pypi.org/project/altk-evolve/) for trajectory-based learning in CugaLite.
+
+```toml
+[evolve]
+enabled = true
+url = "http://127.0.0.1:8201/sse"
+mode = "auto"
+app_name = "evolve"
+lite_mode_only = true
+save_on_success = true
+save_on_failure = true
+async_save = true
+timeout = 30.0
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | Boolean | `true` | Master toggle for Evolve integration. |
+| `url` | String | `"http://127.0.0.1:8201/sse"` | SSE endpoint of a manually-run Evolve MCP server (used when `mode = "direct"` or as a fallback in `"auto"`). |
+| `mode` | String | `"auto"` | `"auto"` = registry first then direct SSE fallback; `"registry"` = registry only; `"direct"` = direct SSE only. |
+| `app_name` | String | `"evolve"` | MCP app/server name when Evolve is managed by the CUGA registry. |
+| `lite_mode_only` | Boolean | `true` | Only activate Evolve for CugaLite mode. |
+| `save_on_success` | Boolean | `true` | Save trajectory on successful task completion. |
+| `save_on_failure` | Boolean | `true` | Save trajectory on failed task completion. |
+| `async_save` | Boolean | `true` | Save trajectories in the background (non-blocking). |
+| `timeout` | Float (seconds) | `30.0` | Timeout for Evolve MCP calls. |
+
+---
+
 ## Configuration Examples
 
 ### Fast Development Setup
@@ -287,14 +648,12 @@ For E2B cloud sandbox, configure registry exposure:
 ```toml
 [features]
 cuga_mode = "fast"
-memory_provider = "mem0"
 
 [advanced_features]
 use_vision = true
 code_planner_enabled = true
 api_planner_hitl = false
 lite_mode = true
-enable_memory = false
 mode = 'api'
 
 [server_ports]
@@ -307,23 +666,38 @@ registry = 8001
 ```toml
 [features]
 cuga_mode = "accurate"
-memory_provider = "mem0"
 
 [advanced_features]
 use_vision = true
 code_planner_enabled = true
 api_planner_hitl = true          # Require approval for critical actions
 lite_mode = true
-enable_memory = true             # Learn from experience
 langfuse_tracing = true          # Full observability
 mode = 'api'
 message_window_limit = 200
 max_input_length = 10000
 
+[auth]
+enabled = true
+authorization_enabled = true
+require_https = true
+
+[secrets]
+mode = "vault"
+vault_addr = "https://vault.example.com:8200"
+vault_auth_method = "kubernetes"
+vault_k8s_role = "cuga"
+
+[storage]
+mode = "prod"
+postgres_url = "postgresql+psycopg://user:pass@db:5432/cuga"
+
+[observability]
+openlit = true                   # OTLP-based tracing
+
 [server_ports]
 demo = 7860
 registry = 8001
-memory = 8888
 ```
 
 ### E2B Cloud Execution Setup
@@ -331,7 +705,6 @@ memory = 8888
 ```toml
 [features]
 cuga_mode = "balanced"
-memory_provider = "mem0"
 
 [advanced_features]
 e2b_sandbox = true
@@ -351,10 +724,8 @@ function_call_host = "https://your-ngrok-url.ngrok.io"  # E2B tunnel URL
 ```toml
 [features]
 cuga_mode = "save_reuse_fast"
-memory_provider = "mem0"
 
 [advanced_features]
-enable_memory = true
 save_reuse_generate_html = false  # Disable for performance
 decomposition_strategy = "flexible"
 lite_mode = true
@@ -364,7 +735,6 @@ code_planner_enabled = true
 demo = 7860
 registry = 8001
 saved_flows = 8003
-memory = 8888
 ```
 
 ### Web/Hybrid Mode Setup
@@ -375,7 +745,6 @@ start_url = "https://example.com"
 
 [features]
 cuga_mode = "balanced"
-memory_provider = "mem0"
 
 [advanced_features]
 mode = 'hybrid'                    # or 'web' for web-only
@@ -434,6 +803,53 @@ All environment variables that can be used to configure CUGA:
 | `MAC_USER_DATA_PATH` | Chrome profile path on macOS | `~/Library/Application Support/Google/Chrome/AgentS` |
 | `WINDOWS_USER_DATA_PATH` | Chrome profile path on Windows | `C:/Users/<User>/AppData/Local/Google/Chrome/User Data/AgentS` |
 
+#### Secrets & Vault
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `CUGA_SECRET_KEY` | Encryption key for secrets stored by CUGA (matches `[secrets].db_encryption_key_env`). | |
+| `VAULT_TOKEN` | Vault token (when `vault_auth_method = "token"`). | |
+| `VAULT_CACERT` | Path to PEM bundle for Vault TLS. | |
+| `VAULT_SKIP_VERIFY` | Disable Vault TLS verification (dev only). | `true` |
+| `DYNACONF_SECRETS__VAULT_AUTH_METHOD` | Override `[secrets].vault_auth_method` at runtime. | `kubernetes` |
+| `DYNACONF_SECRETS__VAULT_SECRET_PATH` | Override `[secrets].vault_secret_path`. | |
+
+#### Authentication (OIDC / BFF)
+
+| Variable | Description |
+|----------|-------------|
+| `OIDC_ISSUER` | OIDC issuer URL. |
+| `OIDC_CLIENT_ID` | OIDC client id. |
+| `OIDC_CLIENT_SECRET` | OIDC client secret. |
+| `OIDC_REDIRECT_URI` | Callback URL registered with the IdP. |
+| `OIDC_CA_BUNDLE` | Optional CA bundle for OIDC TLS (independent of `iam_proxy_ca_bundle`). |
+
+#### Service Identity
+
+| Variable | Description |
+|----------|-------------|
+| `DYNACONF_SERVICE__INSTANCE_ID` | Override `[service].instance_id` (e.g. K8s pod name). |
+| `DYNACONF_SERVICE__TENANT_ID` | Override `[service].tenant_id` for multi-tenant deployments. |
+
+#### TLS for Inference
+
+| Variable | Description |
+|----------|-------------|
+| `CUGA_INFERENCE_CA_CERT` | CA cert for OpenAI/LiteLLM HTTP clients (overrides `[connections].inference_ca_cert`). |
+| `CUGA_DISABLE_SSL` | Disable TLS verification for all inference connections (overrides `inference_disable_ssl`). |
+
+#### Storage
+
+| Variable | Description |
+|----------|-------------|
+| `CUGA_DBS_DIR` | Override `DBS_DIR` (default location for the local SQLite policy DB). |
+
+#### Observability
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP collector endpoint when `[observability].openlit = true`. | `http://localhost:4318` |
+
 #### Server Ports (Dynaconf)
 
 Use `DYNACONF_SERVER_PORTS__<NAME>` to override port settings:
@@ -446,7 +862,8 @@ Use `DYNACONF_SERVER_PORTS__<NAME>` to override port settings:
 | `DYNACONF_SERVER_PORTS__EMAIL_MCP` | Email MCP server port | `8000` |
 | `DYNACONF_SERVER_PORTS__EMAIL_SINK` | Email SMTP sink port | `1025` |
 | `DYNACONF_SERVER_PORTS__FILESYSTEM_MCP` | File System MCP port | `8112` |
-| `DYNACONF_SERVER_PORTS__MEMORY` | Memory service port | `8888` |
+| `DYNACONF_SERVER_PORTS__DOCS_MCP` | Docs MCP port | `8113` |
+| `DYNACONF_SERVER_PORTS__OAK_HEALTH_API` | Health-demo OpenAPI port | `8090` |
 
 #### Example .env File
 
@@ -508,7 +925,7 @@ kill -9 <PID>
 
 **Optimization Strategy**:
 1. Enable `lite_mode = true` for simple tasks
-2. Reduce `message_window_limit` to 50 if using memory
+2. Reduce `message_window_limit` to 50 to keep prompts small
 3. Disable `langfuse_tracing` unless needed
 4. Use `fast` mode if accuracy isn't critical
 5. Enable `save_reuse_fast` mode for repetitive tasks
diff --git a/content/docs/customization/storage.mdx b/content/docs/customization/storage.mdx
new file mode 100644
index 0000000..61e9943
--- /dev/null
+++ b/content/docs/customization/storage.mdx
@@ -0,0 +1,73 @@
+---
+title: Storage Backends
+description: Choose between local SQLite and production Postgres for policy and knowledge data.
+---
+
+import { Callout } from 'fumadocs-ui/components/callout';
+
+CUGA persists three things: **policies** (vectors + metadata), **knowledge documents** (vectors + metadata + uploaded files), and **knowledge tasks/settings**. The `[storage].mode` setting selects a single backend stack used for all three.
+
+## Modes
+
+```toml
+[storage]
+mode = "local"        # "local" | "prod"
+local_db_path = ""    # default DBS_DIR/cuga.db when empty
+postgres_url = ""     # required when mode = "prod"
+```
+
+| Data | `local` | `prod` |
+|------|---------|--------|
+| Policy vectors | sqlite-vec at `[policy].policy_db_path` or `storage.local_db_path` (default `DBS_DIR/cuga.db`); table named after `[policy].collection_name`. | `storage.postgres_url` (pgvector). |
+| Knowledge vectors | `{knowledge.persist_dir}/knowledge_vectors.db` (vec0 tables per collection). | `storage.postgres_url` (same DB). |
+| Knowledge metadata (tasks, documents, collection_config, settings) | `{knowledge.persist_dir}/metadata.db`. Default `persist_dir` is `<cwd>/.cuga/knowledge/`. | Postgres tables `cuga_knowledge_meta_*` on `storage.postgres_url`. Uploaded **files** still live under `persist_dir/files/`. |
+
+`DBS_DIR` defaults to the package's `dbs/` directory, or to the value of `CUGA_DBS_DIR` if set. `persist_dir` can be overridden in `knowledge_settings.toml`.
+
+## Local mode (default)
+
+Best for development, single-user demos, and small deployments.
+
+```toml
+[storage]
+mode = "local"
+```
+
+No external services required. SQLite + sqlite-vec keeps everything in a single file, so you can ship a working agent with `git`-cloneable state.
+
+## Production mode
+
+Best for shared deployments, multi-replica services, or anywhere you need transactional guarantees and proper backups.
+
+```toml
+[storage]
+mode = "prod"
+postgres_url = "postgresql+psycopg://cuga:secret@db.internal:5432/cuga"
+```
+
+Postgres must have the **pgvector** extension enabled:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS vector;
+```
+
+CUGA creates the policy and knowledge tables on first startup. Uploaded knowledge files (the originals, not the parsed chunks) continue to live under `persist_dir/files/` — mount that directory on persistent storage in container deployments.
+
+## Embeddings
+
+Embeddings live in `[storage.embedding]` and are independent of the backend mode:
+
+```toml
+[storage.embedding]
+provider = "local"                       # "openai" | "local" | "auto"
+model = "BAAI/bge-small-en-v1.5"
+dim = 384                                # 1536 for OpenAI, 384 for the BAAI model
+base_url = ""                            # optional OpenAI-compatible endpoint
+api_key = ""                             # falls back to OPENAI_API_KEY
+```
+
+`provider = "auto"` tries OpenAI and falls back to the local model if no API key is configured — handy when the same `settings.toml` ships across dev and prod.
+
+<Callout type="info">
+Switching from `local` to `prod` does **not** migrate existing data. If you've been running with policies or knowledge documents in SQLite, export and re-import them in the new backend.
+</Callout>
diff --git a/content/docs/customization/ui-branding.mdx b/content/docs/customization/ui-branding.mdx
new file mode 100644
index 0000000..0d1459d
--- /dev/null
+++ b/content/docs/customization/ui-branding.mdx
@@ -0,0 +1,33 @@
+---
+title: UI Branding
+description: White-label the CUGA demo UI — hide the logo and change the displayed app name.
+---
+
+The demo UI exposes two simple branding hooks under the `[ui]` section of `settings.toml`. Use them when CUGA is embedded in a customer-facing product, evaluation environment, or internal tool that should carry your own branding.
+
+## Settings
+
+```toml
+[ui]
+hide_cuga_logo = false
+brand_name = "CUGA Agent"
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `hide_cuga_logo` | Boolean | `false` | Hide the CUGA logo in the demo header. |
+| `brand_name` | String | `"CUGA Agent"` | App name shown in the header. |
+
+Both options take effect on the next demo restart (`cuga start demo`, `cuga start demo_crm`, etc.).
+
+## Example
+
+```toml
+[ui]
+hide_cuga_logo = true
+brand_name = "Acme Assistant"
+```
+
+This produces a header with no CUGA mark and "Acme Assistant" in place of the default product name.
+
+For deeper UI customization (custom logos, themes, or full white-label builds), build the demo frontend from source — see the cuga-agent repo for the build instructions.
diff --git a/content/docs/getting-started/index.mdx b/content/docs/getting-started/index.mdx
index f568db0..54cc917 100644
--- a/content/docs/getting-started/index.mdx
+++ b/content/docs/getting-started/index.mdx
@@ -117,7 +117,7 @@ CUGA is still early, but already provides useful building blocks:
     - **Python 3.12**: Core runtime environment
     - **UV**: Modern Python package management
     - **FastAPI**: High-performance web framework
-    - **Selenium/Playwright**: Browser automation capabilities
+    - **Playwright**: Browser automation (with a Chromium-based extension for web/hybrid modes)
     - **OpenAI/LiteLLM**: LLM integration for intelligent decision making
     - **Docker**: Containerized deployment and evaluation
   </Accordion>
diff --git a/content/docs/sdk/cuga_agent.mdx b/content/docs/sdk/cuga_agent.mdx
index 3575a4c..733900e 100644
--- a/content/docs/sdk/cuga_agent.mdx
+++ b/content/docs/sdk/cuga_agent.mdx
@@ -61,7 +61,7 @@ agent = CugaAgent(
 <TypeTable
   type={{
     tools: {
-      description: 'Optional list of LangChain tools (BaseTool or @tool decorated functions).',
+      description: 'Optional list of LangChain tools (BaseTool or \u0040tool decorated functions).',
       type: 'List[BaseTool]',
       default: 'None'
     },
@@ -80,7 +80,7 @@ agent = CugaAgent(
       type: 'List[BaseCallbackHandler]',
       default: 'None'
     },
-     policy_system: {
+    policy_system: {
       description: 'Optional policy configuration instance.',
       type: 'PolicyConfigurable',
       default: 'None'
@@ -89,6 +89,11 @@ agent = CugaAgent(
       description: 'Optional instructions added globally to the system prompt (e.g. guidelines, constraints, or context for all turns).',
       type: 'Optional[str]',
       default: 'None'
+    },
+    enable_knowledge: {
+      description: 'Enable the built-in knowledge base (Docling-powered ingestion + retrieval). When True, knowledge tools and an awareness block are auto-injected into the agent.',
+      type: 'bool',
+      default: 'True'
     }
   }}
 />
@@ -269,6 +274,34 @@ await compiled_graph.ainvoke(...)
 
 Access the `PoliciesManager`. See [Policies](../policies).
 
+### `knowledge`
+
+Access the `KnowledgeManager` when the agent is constructed with `enable_knowledge=True` (the default).
+
+```python
+await agent.knowledge.ingest("/path/to/quarterly_report.pdf")
+results = await agent.knowledge.search("Q4 revenue figures")
+docs = await agent.knowledge.list_documents()
+```
+
+Both `ingest` and `search` accept a `scope` argument (`"agent"` for permanent, shared documents — the default — or `"session"` for thread-scoped documents that require a `thread_id`). See the [Knowledge Base guide](/docs/customization/knowledge) for the full surface, supported document types, and storage details.
+
+## Resource Cleanup
+
+### `aclose`
+
+Release async resources held by the agent (DB connections, background tasks, etc.). Call this at the end of long-running scripts or before process exit:
+
+```python
+agent = CugaAgent(enable_knowledge=True)
+try:
+    result = await agent.invoke("...")
+finally:
+    await agent.aclose()
+```
+
+Short-lived scripts can rely on garbage collection, but `aclose` is recommended any time `enable_knowledge=True` is used.
+
 ## Tool Call Tracking
 
 CUGA provides built-in tool call tracking to help with debugging, observability, and auditing. When enabled, every tool invocation is recorded with detailed metadata.
diff --git a/content/docs/sdk/cuga_supervisor.mdx b/content/docs/sdk/cuga_supervisor.mdx
index a5c696f..1a0bb67 100644
--- a/content/docs/sdk/cuga_supervisor.mdx
+++ b/content/docs/sdk/cuga_supervisor.mdx
@@ -10,6 +10,16 @@ import { TypeTable } from 'fumadocs-ui/components/type-table';
 
 The `CugaSupervisor` class coordinates multiple agents: it receives a user task, delegates work to specialized sub-agents, and returns a final answer. You can mix local `CugaAgent` instances with remote agents via the **A2A** protocol.
 
+## Try the Demo
+
+The bundled CRM + email multi-agent demo can be launched with:
+
+```bash
+cuga start demo_supervisor
+```
+
+This brings up the same demo surface as `demo_crm` but with the supervisor wired to a CRM sub-agent and an email sub-agent. Use it to see delegation and variable-passing end to end before building your own configuration.
+
 ## Quick Start
 
 <Steps>