Copy/paste-ready presenter script for Demos 1–7. Resource names assume
AZURE_ENV_NAME=demoin the eastus2 region. Substitute the values printed byazd env get-valuesafter the firstazd up.Looking for just Demos 2, 5, 7? Use DEMO-RUNBOOK-CORE.md.
All one-time setup (auth, azd up, GitHub issue, OIDC, Foundry extension,
SRE Agent, AKS + KAITO + ingress) lives in PRE-SHOW.md.
Each step there is annotated with which demo needs it; for the full runbook
you need every step.
- In VS Code, open the Copilot Chat side panel and switch the mode dropdown
to Agent. Confirm the workspace is
agentic-devops-demoso the agent can read/write files directly. - Send this prompt:
Create a GitHub Actions workflow at .github/workflows/lint.yml that runs ruff on apps/api and apps/tools on PRs to main. Use Python 3.12, matrix over the two services, and pip install -e .[dev] inside each service directory. - Watch the agent create
.github/workflows/lint.ymlin the editor. Accept the change, then commit and push from the integrated terminal:git add .github/workflows/lint.yml git commit -m "ci: add lint workflow" git push - Open the PR check tab on GitHub — green.
Why this is safe: purely additive; nothing else depends on this workflow. Why agent mode (not chat): zero copy/paste — the agent writes the file, runs the commit, and you narrate. Faster on stage and shows off the newer Copilot UX.
Framing: "I'm not going to build an agent in a portal — that's not the point of this talk. The point is: when an agent goes to production, it has to live in the same place every other piece of your system lives — in Git, behind a pipeline."
-
Open the Foundry / AI Toolkit VS Code extension. Connect to the project provisioned by
azd up. Show:- The
gpt-4o-minimodel deployment (from Bicep). - The
clinical-trial-matcheragent already exists, with versions.
(Beat: "I never opened a portal. This came from
azd up.") - The
-
Open three files side-by-side and walk them top-down:
File What it owns .foundry/agent-metadata.yaml Declarative agent: name, model, instructions file, OpenAPI tool. infra/modules/foundry-connection.bicep The clinical_trial_matcherproject connection that injectsx-api-keyon every tool call.infra/scripts/sync_agent.py Reconciles the YAML into Foundry via AIProjectClient.agents.create_version, binding the connection as the OpenAPI auth. Runs as theazdpostdeploy hook. -
(Optional 30s beat — only if it lands) In the extension, expand the agent's versions list. Point at the latest version's tool definition — read-only, but it proves the YAML actually shaped what's running.
-
Live edit → PR → CI → new agent version (the punchline). Kick this off first, then narrate steps 1–3 while CI runs. Goal: same question to the running app produces visibly different behaviour by the end of the demo, triggered only by a Git diff.
-
In a terminal, before going on stage, set the patient query you'll use twice. Pick one and say it out loud both times for the audience:
"Find a stage 2 lung cancer trial."
With today's prompt the agent asks for age/sex/location first. That's the "before."
-
Open .foundry/prompts/system.md and make two paste-in-place edits.
Edit A — search-first behaviour. Find the line that starts with
1. The user provides a free-text patient description...and replace that single numbered item with this block (keeps numbering intact):1. The user provides a free-text patient description plus structured fields (age, sex, primary condition, optional location). - If the user provides only a condition, call `search_trials` immediately with sensible defaults (limit=3) and present matches first. - Ask follow-up questions about age/sex/location *after* showing the initial results, to refine.
Edit B — clinical disclaimer. In the
## Style and safetysection, find the line- Be concise. Lists, not paragraphs.and add this new bullet directly under it:- When you mention a specific trial, end the response with a single-line italic disclaimer: *Informational only — eligibility must be confirmed by the trial site.*
-
Reconcile the change into Foundry (~5s). This calls the same infra/scripts/sync_agent.py that the
azd postdeployhook runs in CI — just locally and without waiting on the pipeline. First run takes ~20s while it builds a venv; subsequent runs are ~5s../demo2-sync.sh
Watch for the final line:
Created agent version: name=clinical-trial-matcher version=N. -
Refresh the agent in the Foundry/AI Toolkit extension — the version list ticks up by one. Ask the running app the same question again. The answer now goes straight to trial cards and ends with the disclaimer. No portal clicks.
(Stage line as the new behaviour appears: "Two product asks — 'don't gate answers behind demographics' and 'add a clinical disclaimer.' In a portal, that's a meeting and a ticket. Here, it's a PR.")
- Optional — close the GitOps loop after the live demo. Commit the
same file change, push, merge. CI re-runs
sync_agent.py, producing one more version. You don't need this on stage; it's the proof point for "the local script and the CI pipeline run identical code."git checkout -b demo2-prompt-update git add .foundry/prompts/system.md git commit -m "agent: search-first behaviour + clinical disclaimer" git push -u origin demo2-prompt-update gh pr create --fill --base main && gh pr merge --squash --auto
-
-
Make the limitation explicit, then turn it into the punchline:
"The extension can't author the OpenAPI tool wiring — there's no picker for it today. That's fine. I don't want my agents authored in a UI anyway. Demo 4 will show this YAML get reconciled by GitHub Actions on every PR merge — same pipeline as the rest of the app."
Narrative beat: "Low-code is great for exploration. Production demands a Git SHA, a pipeline, and a reproducible deploy. The agent gets the same treatment as the API container."
- Open the pre-staged issue. Assign to Copilot.
- Switch to a different slide while the agent works (~2–3 min).
- Return; review the PR; merge.
The merge in Demo 3 triggered deploy.yml. While it runs:
- Open the Actions tab. Walk through the steps:
- OIDC login (no secrets, federated credential).
azd up(idempotent — Bicep is a no-op on already-provisioned resources).- Postdeploy hook runs
infra/scripts/sync_agent.pywhich callsAIProjectClient.agents.create_versionto upsert the agent from the committed.foundry/agent-metadata.yaml.
- When green, hit the frontend URL. Send a real chat — agent responds via Foundry.
- Show the new
/versionendpoint via the watch loop staged in pre-show step 7 (already running in a side terminal):The./demo4-version-watch.sh
git_shait prints flips from(no response)to the merge commit from Demo 3 — proof, not promises.
- Show current replicas:
az containerapp replica list -g <rg> -n adgd-api-<token> -o table
- Run k6 from your laptop (wraps the
FRONTEND_URLlookup +k6 run):./load.sh
- Re-run
replica listevery 30s. Watch count grow → 10 → settle back to 1. - Show the Container App Metrics blade: Replica Count + Requests.
Why pre-stage terminals? ACA auto-replaces OOM'd replicas, so the portal's Replicas blade almost always shows N/N healthy — there's no
RESTARTScounter likekubectl get pods. Pre-stage these so the failure is visible.
Substitute <rg>, <api>, and <rev> with the values from azd env get-values
(e.g. rg-aullah-agentic-devops, adgd-api-3wnyg3nk2w76m,
adgd-api-3wnyg3nk2w76m--0000003).
All shell scripts auto-resolve
RG/API_NAME/FRONTEND_URLfromazd env get-valuesvia demo6-env.sh — no copy/pasting resource names mid-demo.
- Prep (run ~2 min before the recording, NOT during) — caps the API to
maxReplicas=2and waits for the new revision to become Active and the replica count to settle. Without this,demo6.sh's load briefly runs against the old config (maxReplicas=10) and the OOM signal hides in the per-replica average:./demo6-prep.sh
- Replica churn — names rotate and
createdtimestamps reset on every OOMKill, even though the count stays at N/N. In its own pane:./demo6-watch.sh
- System log stream — the crispest "things are dying" signal; you'll see
OOMKilledandReplica … has been provisionedlines scroll by. In its own pane:./demo6-logs.sh
- Container App → Metrics blade — pin three on one chart, last 30 min:
- Replica Restart Count (step-increments on every OOMKill — the money graph for the audience)
- Memory Working Set Bytes, split by Replica (sawtooth = leak + restart)
- Replica Count
- (Optional) Log Analytics → Logs blade, pre-loaded with this query so
you can hit Run live:
ContainerAppSystemLogs_CL | where ContainerAppName_s == "<api>" | where TimeGenerated > ago(15m) | where Reason_s in ("OOMKilled", "BackOff", "Unhealthy", "Killing") | project TimeGenerated, ReplicaName_s, Reason_s, Log_s | order by TimeGenerated desc
- Kick off the leak scenario — sets
ENABLE_MEMORY_LEAK=true, then runsk6 run load/k6-leak.jsfor 10 min. Assumes./demo6-prep.shhas already cappedmaxReplicas=2(it sanity-checks this and warns if not):Within 1–2 min the pre-staged terminals (./demo6.sh
./demo6-watch.shand./demo6-logs.sh) will start showing OOMKills, replica churn, and the Replica Restart Count metric stepping up. - After ~3 min, show the unhealthy revision:
az containerapp revision show -g <rg> -n adgd-api-<token> --revision <rev> -o jsonc
- Open Azure SRE Agent scoped to the resource group. Ask it to investigate
the API container app. It should surface:
- Restart loop on the API revision.
- Memory growth pattern (the
[demo-leak] buffer holds N chunkswarnings). - Correlated env-var change (
ENABLE_MEMORY_LEAK=true). - Recommend rolling back the env var.
- Apply the fix — disables the leak and restores
--max-replicas 5:./demo6-fix.sh
- Watch healthy replicas come back up in the
./demo6-watch.shpane.
One-line pitch: "ACA gave us request/response. AKS gives us a long-running, stateful inference loop with an in-cluster open-source model — that's the AKS-shaped workload."
| Component | Where | Why on AKS |
|---|---|---|
frontend (nginx + React) |
Deployment, 2 replicas | identical to ACA |
api (FastAPI) |
Deployment, HPA 2→20 | adds /api/watches/* and /api/trials |
redis (single replica, no PVC) |
Deployment | shared state for watches + results |
watcher (FastAPI loop) |
Deployment | ticks every 45s — the long-running workload |
tools (FastAPI search) |
Deployment, 1 replica | in-memory trial dataset for the demo |
workspace-llama-3-3b |
KAITO Workspace, CPU node apps=llama-3-3b |
Llama-3.2-3B Instruct (Q4_K_M GGUF via aikit/llama.cpp), OpenAI-compatible /v1 |
Flow: POST /api/watches → Redis → watcher tick reads watches → tools search → in-cluster Llama scores each trial → results back to Redis → GET /api/watches/stream (SSE) pushes updates to UI.
- Open the AKS ingress (
http://<lb-ip>.nip.io/) → click the 🔭 Trial Watch tab. You'll see three seeded watches (Aunt Helen NSCLC, Patient B melanoma, cohort screening) with score badges, tier labels, and a green "live" pill — that's the SSE stream. - Show what makes this AKS-shaped — flip to a terminal:
Point at the lines:
kubectl -n trial-matcher get pods -l app=watcher kubectl -n trial-matcher logs deploy/watcher --tail=20
"Every 45 seconds, in-cluster. No Foundry call here — that's our open-source Llama running on a CPU node next door."watcher tick start watches=3 HTTP Request: POST http://tools:8000/tools/search_trials "200 OK" HTTP Request: POST http://workspace-llama-3-3b/v1/chat/completions "200 OK" match watch=demo-w1 trial=TM-2025-001 score=60 prev=60 new=False - Inject a fresh trial live. Back in the UI, click
+ Import trial→fill with NSCLC example→ Add trial. A toast appears: Added trial TM-DEMO-XXXXXX — next watcher tick will pick it up and emit a NEW pill. - Wait one tick (~45s). The Aunt Helen card should:
- flash an indigo pulse border (recent-update animation),
- gain a
1 NEWaggregate badge in the header, - show the new trial result with a
NEWpill and an emerald score badge (≈80–100, "Strong match" — the watcher pre-computes age/sex/location verdicts in Python and feeds them to the 3B model as ground truth, then floors the score at 80 when every hard check passes). If the score later changes on a subsequent tick, a▲N/▼Ndelta badge appears.
- Closing line: "Same containers, same pipeline, same agent — plus a stateful inference loop and an open-source model that lives inside the cluster. ACA for the fastest path. AKS when you need the ceiling."
| Symptom | Fix |
|---|---|
+ Import trial returns 502 |
kubectl -n trial-matcher rollout restart deploy/tools (single-worker; pod must be healthy) |
| Tick runs but no NEW pill | Check kubectl logs deploy/watcher — confirm trials_found increased; the new trial id only matches when the watch's search terms hit the title/condition |
| SSE pill says "stalled" | kubectl -n trial-matcher rollout restart deploy/api |
| Llama 5xx | kubectl get workspace workspace-llama-3-3b — must be Ready; the GGUF runtime is single-replica on the labeled node |
# Drop watch results + injected trials; restart watcher and tools:
kubectl -n trial-matcher exec deploy/redis -- redis-cli FLUSHDB
kubectl -n trial-matcher rollout restart deploy/watcher deploy/toolsThe three demo seeds (demo-w1/w2/w3) are re-created by the watcher on next start.
# Disable the leak and restore max-replicas in one shot:
./demo6-fix.sh
# Clean teardown:
azd down --purge --force