Skip to content

Latest commit

 

History

History
345 lines (282 loc) · 15.7 KB

File metadata and controls

345 lines (282 loc) · 15.7 KB

Demo runbook — "Powering AI Apps and Agents at Scale with Azure App Platform"

Copy/paste-ready presenter script for Demos 1–7. Resource names assume AZURE_ENV_NAME=demo in the eastus2 region. Substitute the values printed by azd env get-values after the first azd up.

Looking for just Demos 2, 5, 7? Use DEMO-RUNBOOK-CORE.md.

Pre-show

All one-time setup (auth, azd up, GitHub issue, OIDC, Foundry extension, SRE Agent, AKS + KAITO + ingress) lives in PRE-SHOW.md. Each step there is annotated with which demo needs it; for the full runbook you need every step.


Demo 1 — GitHub Copilot (VS Code, agent mode) generates lint.yml (≈2 min)

  1. In VS Code, open the Copilot Chat side panel and switch the mode dropdown to Agent. Confirm the workspace is agentic-devops-demo so the agent can read/write files directly.
  2. Send this prompt:
    Create a GitHub Actions workflow at .github/workflows/lint.yml that runs ruff
    on apps/api and apps/tools on PRs to main. Use Python 3.12, matrix over the
    two services, and pip install -e .[dev] inside each service directory.
    
  3. Watch the agent create .github/workflows/lint.yml in the editor. Accept the change, then commit and push from the integrated terminal:
    git add .github/workflows/lint.yml
    git commit -m "ci: add lint workflow"
    git push
  4. Open the PR check tab on GitHub — green.

Why this is safe: purely additive; nothing else depends on this workflow. Why agent mode (not chat): zero copy/paste — the agent writes the file, runs the commit, and you narrate. Faster on stage and shows off the newer Copilot UX.


Demo 2 — The agent is code, not clicks (≈3 min)

Framing: "I'm not going to build an agent in a portal — that's not the point of this talk. The point is: when an agent goes to production, it has to live in the same place every other piece of your system lives — in Git, behind a pipeline."

  1. Open the Foundry / AI Toolkit VS Code extension. Connect to the project provisioned by azd up. Show:

    • The gpt-4o-mini model deployment (from Bicep).
    • The clinical-trial-matcher agent already exists, with versions.

    (Beat: "I never opened a portal. This came from azd up.")

  2. Open three files side-by-side and walk them top-down:

    File What it owns
    .foundry/agent-metadata.yaml Declarative agent: name, model, instructions file, OpenAPI tool.
    infra/modules/foundry-connection.bicep The clinical_trial_matcher project connection that injects x-api-key on every tool call.
    infra/scripts/sync_agent.py Reconciles the YAML into Foundry via AIProjectClient.agents.create_version, binding the connection as the OpenAPI auth. Runs as the azd postdeploy hook.
  3. (Optional 30s beat — only if it lands) In the extension, expand the agent's versions list. Point at the latest version's tool definition — read-only, but it proves the YAML actually shaped what's running.

  4. Live edit → PR → CI → new agent version (the punchline). Kick this off first, then narrate steps 1–3 while CI runs. Goal: same question to the running app produces visibly different behaviour by the end of the demo, triggered only by a Git diff.

    1. In a terminal, before going on stage, set the patient query you'll use twice. Pick one and say it out loud both times for the audience:

      "Find a stage 2 lung cancer trial."

      With today's prompt the agent asks for age/sex/location first. That's the "before."

    2. Open .foundry/prompts/system.md and make two paste-in-place edits.

      Edit A — search-first behaviour. Find the line that starts with 1. The user provides a free-text patient description... and replace that single numbered item with this block (keeps numbering intact):

      1. The user provides a free-text patient description plus structured fields (age, sex,
         primary condition, optional location).
         - If the user provides only a condition, call `search_trials` immediately with
         sensible defaults (limit=3) and present matches first.
         - Ask follow-up questions about age/sex/location *after* showing the initial
         results, to refine.

      Edit B — clinical disclaimer. In the ## Style and safety section, find the line - Be concise. Lists, not paragraphs. and add this new bullet directly under it:

      - When you mention a specific trial, end the response with a single-line italic disclaimer: *Informational only — eligibility must be confirmed by the trial site.*
    3. Reconcile the change into Foundry (~5s). This calls the same infra/scripts/sync_agent.py that the azd postdeploy hook runs in CI — just locally and without waiting on the pipeline. First run takes ~20s while it builds a venv; subsequent runs are ~5s.

      ./demo2-sync.sh

      Watch for the final line: Created agent version: name=clinical-trial-matcher version=N.

    4. Refresh the agent in the Foundry/AI Toolkit extension — the version list ticks up by one. Ask the running app the same question again. The answer now goes straight to trial cards and ends with the disclaimer. No portal clicks.

    (Stage line as the new behaviour appears: "Two product asks — 'don't gate answers behind demographics' and 'add a clinical disclaimer.' In a portal, that's a meeting and a ticket. Here, it's a PR.")

    1. Optional — close the GitOps loop after the live demo. Commit the same file change, push, merge. CI re-runs sync_agent.py, producing one more version. You don't need this on stage; it's the proof point for "the local script and the CI pipeline run identical code."
      git checkout -b demo2-prompt-update
      git add .foundry/prompts/system.md
      git commit -m "agent: search-first behaviour + clinical disclaimer"
      git push -u origin demo2-prompt-update
      gh pr create --fill --base main && gh pr merge --squash --auto
  5. Make the limitation explicit, then turn it into the punchline:

    "The extension can't author the OpenAPI tool wiring — there's no picker for it today. That's fine. I don't want my agents authored in a UI anyway. Demo 4 will show this YAML get reconciled by GitHub Actions on every PR merge — same pipeline as the rest of the app."

Narrative beat: "Low-code is great for exploration. Production demands a Git SHA, a pipeline, and a reproducible deploy. The agent gets the same treatment as the API container."


Demo 3 — GitHub Coding Agent fixes the /version issue (≈3 min)

  1. Open the pre-staged issue. Assign to Copilot.
  2. Switch to a different slide while the agent works (~2–3 min).
  3. Return; review the PR; merge.

Demo 4 — GitHub Actions deploys to Azure Container Apps (≈3 min)

The merge in Demo 3 triggered deploy.yml. While it runs:

  1. Open the Actions tab. Walk through the steps:
    • OIDC login (no secrets, federated credential).
    • azd up (idempotent — Bicep is a no-op on already-provisioned resources).
    • Postdeploy hook runs infra/scripts/sync_agent.py which calls AIProjectClient.agents.create_version to upsert the agent from the committed .foundry/agent-metadata.yaml.
  2. When green, hit the frontend URL. Send a real chat — agent responds via Foundry.
  3. Show the new /version endpoint via the watch loop staged in pre-show step 7 (already running in a side terminal):
    ./demo4-version-watch.sh
    The git_sha it prints flips from (no response) to the merge commit from Demo 3 — proof, not promises.

Demo 5 (optional) — ACA scales under load (≈4 min)

  1. Show current replicas:
    az containerapp replica list -g <rg> -n adgd-api-<token> -o table
  2. Run k6 from your laptop (wraps the FRONTEND_URL lookup + k6 run):
    ./load.sh
  3. Re-run replica list every 30s. Watch count grow → 10 → settle back to 1.
  4. Show the Container App Metrics blade: Replica Count + Requests.

Demo 6 — Azure SRE Agent diagnoses a memory leak (≈6 min)

Why pre-stage terminals? ACA auto-replaces OOM'd replicas, so the portal's Replicas blade almost always shows N/N healthy — there's no RESTARTS counter like kubectl get pods. Pre-stage these so the failure is visible.

Pre-stage terminals (open before the demo, split-screen on stage)

Substitute <rg>, <api>, and <rev> with the values from azd env get-values (e.g. rg-aullah-agentic-devops, adgd-api-3wnyg3nk2w76m, adgd-api-3wnyg3nk2w76m--0000003).

All shell scripts auto-resolve RG / API_NAME / FRONTEND_URL from azd env get-values via demo6-env.sh — no copy/pasting resource names mid-demo.

  1. Prep (run ~2 min before the recording, NOT during) — caps the API to maxReplicas=2 and waits for the new revision to become Active and the replica count to settle. Without this, demo6.sh's load briefly runs against the old config (maxReplicas=10) and the OOM signal hides in the per-replica average:
    ./demo6-prep.sh
  2. Replica churn — names rotate and created timestamps reset on every OOMKill, even though the count stays at N/N. In its own pane:
    ./demo6-watch.sh
  3. System log stream — the crispest "things are dying" signal; you'll see OOMKilled and Replica … has been provisioned lines scroll by. In its own pane:
    ./demo6-logs.sh
  4. Container App → Metrics blade — pin three on one chart, last 30 min:
    • Replica Restart Count (step-increments on every OOMKill — the money graph for the audience)
    • Memory Working Set Bytes, split by Replica (sawtooth = leak + restart)
    • Replica Count
  5. (Optional) Log Analytics → Logs blade, pre-loaded with this query so you can hit Run live:
    ContainerAppSystemLogs_CL
    | where ContainerAppName_s == "<api>"
    | where TimeGenerated > ago(15m)
    | where Reason_s in ("OOMKilled", "BackOff", "Unhealthy", "Killing")
    | project TimeGenerated, ReplicaName_s, Reason_s, Log_s
    | order by TimeGenerated desc

Demo flow

  1. Kick off the leak scenario — sets ENABLE_MEMORY_LEAK=true, then runs k6 run load/k6-leak.js for 10 min. Assumes ./demo6-prep.sh has already capped maxReplicas=2 (it sanity-checks this and warns if not):
    ./demo6.sh
    Within 1–2 min the pre-staged terminals (./demo6-watch.sh and ./demo6-logs.sh) will start showing OOMKills, replica churn, and the Replica Restart Count metric stepping up.
  2. After ~3 min, show the unhealthy revision:
    az containerapp revision show -g <rg> -n adgd-api-<token> --revision <rev> -o jsonc
  3. Open Azure SRE Agent scoped to the resource group. Ask it to investigate the API container app. It should surface:
    • Restart loop on the API revision.
    • Memory growth pattern (the [demo-leak] buffer holds N chunks warnings).
    • Correlated env-var change (ENABLE_MEMORY_LEAK=true).
    • Recommend rolling back the env var.
  4. Apply the fix — disables the leak and restores --max-replicas 5:
    ./demo6-fix.sh
  5. Watch healthy replicas come back up in the ./demo6-watch.sh pane.

Demo 7 — Trial Watch on AKS: in-cluster model + stateful watcher (≈4 min)

One-line pitch: "ACA gave us request/response. AKS gives us a long-running, stateful inference loop with an in-cluster open-source model — that's the AKS-shaped workload."

What's actually running

Component Where Why on AKS
frontend (nginx + React) Deployment, 2 replicas identical to ACA
api (FastAPI) Deployment, HPA 2→20 adds /api/watches/* and /api/trials
redis (single replica, no PVC) Deployment shared state for watches + results
watcher (FastAPI loop) Deployment ticks every 45s — the long-running workload
tools (FastAPI search) Deployment, 1 replica in-memory trial dataset for the demo
workspace-llama-3-3b KAITO Workspace, CPU node apps=llama-3-3b Llama-3.2-3B Instruct (Q4_K_M GGUF via aikit/llama.cpp), OpenAI-compatible /v1

Flow: POST /api/watches → Redis → watcher tick reads watches → tools search → in-cluster Llama scores each trial → results back to Redis → GET /api/watches/stream (SSE) pushes updates to UI.

On-stage script

  1. Open the AKS ingress (http://<lb-ip>.nip.io/) → click the 🔭 Trial Watch tab. You'll see three seeded watches (Aunt Helen NSCLC, Patient B melanoma, cohort screening) with score badges, tier labels, and a green "live" pill — that's the SSE stream.
  2. Show what makes this AKS-shaped — flip to a terminal:
    kubectl -n trial-matcher get pods -l app=watcher
    kubectl -n trial-matcher logs deploy/watcher --tail=20
    Point at the lines:
    watcher tick start watches=3
    HTTP Request: POST http://tools:8000/tools/search_trials "200 OK"
    HTTP Request: POST http://workspace-llama-3-3b/v1/chat/completions "200 OK"
    match watch=demo-w1 trial=TM-2025-001 score=60 prev=60 new=False
    
    "Every 45 seconds, in-cluster. No Foundry call here — that's our open-source Llama running on a CPU node next door."
  3. Inject a fresh trial live. Back in the UI, click + Import trialfill with NSCLC exampleAdd trial. A toast appears: Added trial TM-DEMO-XXXXXX — next watcher tick will pick it up and emit a NEW pill.
  4. Wait one tick (~45s). The Aunt Helen card should:
    • flash an indigo pulse border (recent-update animation),
    • gain a 1 NEW aggregate badge in the header,
    • show the new trial result with a NEW pill and an emerald score badge (≈80–100, "Strong match" — the watcher pre-computes age/sex/location verdicts in Python and feeds them to the 3B model as ground truth, then floors the score at 80 when every hard check passes). If the score later changes on a subsequent tick, a ▲N / ▼N delta badge appears.
  5. Closing line: "Same containers, same pipeline, same agent — plus a stateful inference loop and an open-source model that lives inside the cluster. ACA for the fastest path. AKS when you need the ceiling."

If something goes sideways

Symptom Fix
+ Import trial returns 502 kubectl -n trial-matcher rollout restart deploy/tools (single-worker; pod must be healthy)
Tick runs but no NEW pill Check kubectl logs deploy/watcher — confirm trials_found increased; the new trial id only matches when the watch's search terms hit the title/condition
SSE pill says "stalled" kubectl -n trial-matcher rollout restart deploy/api
Llama 5xx kubectl get workspace workspace-llama-3-3b — must be Ready; the GGUF runtime is single-replica on the labeled node

Reset Trial Watch state between rehearsals

# Drop watch results + injected trials; restart watcher and tools:
kubectl -n trial-matcher exec deploy/redis -- redis-cli FLUSHDB
kubectl -n trial-matcher rollout restart deploy/watcher deploy/tools

The three demo seeds (demo-w1/w2/w3) are re-created by the watcher on next start.


Reset between rehearsals

# Disable the leak and restore max-replicas in one shot:
./demo6-fix.sh

# Clean teardown:
azd down --purge --force