Skip to content

Latest commit

 

History

History
79 lines (66 loc) · 18.5 KB

File metadata and controls

79 lines (66 loc) · 18.5 KB

Progress log

Living record of what's been done and what's next. Updated at the end of every phase.

Phase Title Status Notes
0 Prereqs probe ✅ Done Git, gh, Node, MCS + Agents Toolkit extensions OK. Az CLI / pac / SWA + Bicep extensions still missing — install when we hit Phase 4.
1 GitHub repo + scaffold ✅ Done KarimaKT/MCSMCPapps (public) created and cloned. Monorepo skeleton + initial docs committed.
2 Copilot Studio agent IDs ✅ Done Bot ID, Environment ID, CDX tenant ID captured (see IDs.md). Tenant expires Aug 2026. Schema name + Direct Line endpoint still TBD — captured later when we wire auth.
3 WebChat UI build ✅ Done Vite + TS + Bot Framework Web Chat (CDN) + MSAL + Teams JS scaffolded. npm run build and npm run typecheck both pass. SSO chain Teams JS → MSAL silent → anonymous fallback.
4 Azure subscription + SWA host ✅ Done SWA swa-mcsmcpapps deployed in rg-mcsmcpapps (westus2). Hostname: icy-field-07d5bef1e.7.azurestaticapps.net. GitHub Actions deploy token set as repo secret AZURE_STATIC_WEB_APPS_API_TOKEN.
5 MCP server + DA manifest ✅ Code/infra done All Phase 5 code shipped: MCP server live at app-mcsmcpapps-mcp.azurewebsites.net (centralus, B1 Linux), SWA CSP allowlists widget-renderer host, Declarative Agent manifest + Agents Toolkit project scaffolded with placeholder icons. Next maker step: open declarative-agent/ in VS Code with the M365 Agents Toolkit and run ProvisionPublish to sideload to the CDX tenant.
5g Stateless MCP attempt + revert ✅ Done Tried sessionless MCP server to dodge "Something went wrong"; broke SDK init handshake. Reverted to session-keyed transports with proper 404 + "Session not found" recovery so clients re-init transparently after container restarts. Kept tool-level userQuery arg + parallel first-message handoff.
5h OpenAI Apps SDK contract fix ✅ Done Empty card in M365 Copilot turned out to be wrong contract. Verified against Microsoft's mcp-interactiveUI-samples reference: MIME is text/html+skybridge (not text/html;profile=mcp-app); tool _meta needs openai/outputTemplate AND openai/widgetAccessible: true; same _meta re-emitted on tool RESPONSE. Resource needs _meta.ui.csp.frameDomains because we iframe the SWA. Widget rewritten to listen for JSON-RPC ui/notifications/tool-result + window.openai snapshot + openai:set_globals events. See MCP-APPS-CONTRACT.md. Manifest v1.0.5.
5i Doc set + modular code ✅ Done Full doc set authored: SPEC.md, ARCHITECTURE.md, TEST-PLAN.md, COMPARISON.md, BLOG.md, FEATURE-REQUESTS.md. README rewritten as fork-and-rebrand front door. MCP server refactored into modular layout: tools/, resources/, server.ts factory; index.ts is HTTP host only; full JSDoc on every config field; widget.ts marked with v0.6 migration plan. Server version bumped 0.2.0 → 0.3.0.
5j Single-file widget bundle ✅ Done Replaced iframe-of-SWA with single-file React bundle. New webchat-ui/src/widget/ (Widget.tsx + main.tsx + cs-connection.ts + host-bridge.ts + style-options.json). Uses botframework-webchat Composer + BasicWebChat (OOB) and CopilotStudioWebChat.createConnection() (OOB) — no hand-rolled transport. vite-plugin-singlefile produces dist-widget/index.widget.html (~5.5 MB / ~1.4 MB gzip). MCP server's widget.ts reads the bundle from disk at startup; CI workflow builds widget + copies to mcp-server/dist/assets/widget.html before deploy. Repo variables set for VITE_* brand + CS env params. CSP frameDomains removed (no sub-iframe). Customization paths documented in WIDGET-CUSTOMIZATION.md: env vars (60 sec) → styleOptions JSON (CS Kit Webchat Playground export) → React component (full flexibility). Manifest v1.0.6.
5j.1 Sandbox-friendly bundle (stripCrossorigin + production mode) ✅ Done Test in M365 Copilot showed: tool routed correctly on specific prompts; widget body downloaded (5.8 MB confirmed in resources/read); but React app never executed — blank card. Found Microsoft's stripCrossorigin post-transform Vite plugin in trey-research/.../widgets/build.mts: the skybridge sandbox iframe has a null origin, so the default <script type="module" crossorigin> triggers a CORS check on the inline script and silently fails. Added the same plugin + forced mode: 'production' + define NODE_ENV (eliminates HMR eval blocked by sandbox CSP). Documented in MCP-APPS-CONTRACT.md §6, BLOG.md "four contract details", WIDGET-CUSTOMIZATION.md "Critical: don't break the skybridge bundle". New feature requests filed: 2.5 (document silent failures + ship the strip plugin officially) and 2.6 (make tool routing reliable for tool-only DAs).
5j.2 Boot-marker + trace breadcrumbs in widget ✅ Done Test showed widget never produced any console output from inside the iframe — couldn't tell if the bundle even ran. Added an inline non-module <script> in index.widget.html that fires BEFORE the React module bundle and paints a yellow on-screen marker with phase timestamps. Exposes window.__mcsmcpappsTrace so React phases (module-bundle-evaluating, react-render-start/returned/threw, app-mounted, msal-initialized, token-acquire-start/acquired/failed/popup-failed, cs-connection-build/ready/failed) all append to the same trace stream. Hooks window.onerror + unhandledrejection so any sandbox CSP violation surfaces visibly. Trace events are also postMessage'd to the host parent.
5j.3 Widget contract in ai-plugin.json (x-mcp_tool_description) ✅ Done M365 Copilot host was rendering the tool's text response as a search result {"results":[{"reference_id":"turn2search1",...}]} instead of mounting the widget. Verified against four MS reference samples (oai-apps-sdk: trey-research, fieldops, zava-insurance, approvals-box): the host reads the widget _meta (openai/outputTemplate, widgetAccessible, toolInvocation/*) from the plugin manifest's runtimes[].spec.x-mcp_tool_description.tools[]._meta block, NOT from the live MCP descriptor. The basic functions[] list is enough for routing but not for widget mounting. Declared openCopilotStudioChat inside x-mcp_tool_description with full inputSchema (so model passes userQuery instead of {}), annotations, and _meta. Bumped manifest 1.0.6 → 1.0.7. Republished to CDX.
5j.4 Stateless MCP transport (matches MS reference samples) ✅ Done Live diagnostic showed init returned 200 with a session id, then notifications/initialized immediately returned 404 Session not found — every follow-up call from the host hit the same 404 → host gave up the tool call → "Something went wrong" message in chat. Root cause: with enableJsonResponse: true the SDK closes the response stream right after sending the init reply, which fires transport.onclose before our onsessioninitialized map insert is observable on the next request. MS samples (trey-research, fieldops, zava, approvals) all use stateless transport (sessionIdGenerator: undefined, fresh server+transport per request, closed on res.close). Rewrote mcp-server/src/index.ts to match. Added cors middleware that allows null / missing Origin (sandbox iframes, server-to-server callers). Verified live: init/list/call all 200, _meta.openai/outputTemplate and structuredContent.userQuery round-trip correctly.
5j.5 Tighten DA + tool descriptions for routing ✅ Done Test in M365 Copilot showed tool not called for verbose prompts ("compare inflation in greece and italy" → answered from host LLM); diagnostic shim confirmed zero [mcsmcpapps] traces for those turns. Function-calling routing is sensitive to imperative phrasing. Rewrote both descriptions (DA instructions + tool description in functions[] + x-mcp_tool_description) as terse imperative directives: "Always call this tool for every user message. Pass the user's text verbatim as userQuery." Bumped manifest 1.0.7 → 1.0.8. Republished.
5l Entra SSO server scaffolding (feature-flagged) ✅ Done MSAL-in-skybridge dead end (sandbox null origin can't reach login.microsoftonline.com). Switched to RemoteMCPServer Entra SSO: host attaches Bearer on every /mcp call, server validates JWT + OBO-exchanges for a PP token, used server-side to call CS Direct Engine. No tokens cross to the browser. auth.ts wired as feature-flagged middleware. See AUTH-ARCHITECTURE.md and ADR 0003.
5l.1–5l.9 Entra SSO live wiring + diagnostics ✅ Done Manifest 1.0.9 enabled OAuthPluginVault ref. Diagnostics shipped to surface OBO outcome via _metastructuredContent.diag (host strips _meta from JSON path it passes to widget). Persistent file logger at /home/LogFiles/Application/mcsmcpapps.log. Removed in-skybridge MSAL fallback entirely (impossible to succeed).
5k D365 Omnichannel handoff ⏳ Planned Configure CS agent Settings → Agent transfers → Omnichannel tile. Verify M4 escalation scenario from TEST-PLAN.md §4.4. No code on our side.
6 v0.6 data-widget pivot ✅ Done Pivoted from chat-in-chat to data-widget pattern per MS UX guidelines. Server calls CS via Direct Engine; widget renders structured payload only. Bundle 5.5MB → ~250KB. See ADR 0001 + spec 0001.
6.1 Smart CS drain ✅ Done v0.6.1: exit on first bot reply + idle (not on stream close). v0.6.2: mirror MS sample — exit on ActivityTypes.EndOfConversation. Confirms the sample-first rule (5 sec of reading the canonical signal saved hours of hand-rolled timeout logic).
6.3 Faster first turn + silent dispatcher ✅ Done Removed text summary from content[0] to discourage host LLM narration. Manifest 1.1.1.
6.4 Server-side caches + parity matrix ✅ Done Per-thread CS conversation cache keyed on (oid + x-microsoft-ai-conversationid). PP token cache. CS-PARITY.md authored as the source-of-truth fidelity matrix. Manifest 1.1.2.
7.0 Adaptive Cards + Markdown + fullscreen ✅ Done marked + DOMPurify for markdown. adaptivecards v3 renderer for static AC (text, columns, images URL/base64, OpenUrl, multi-card). Per-thread cache. Fullscreen analyst canvas with sticky header, conversation chip, Copy/Print toolbar. Manifest 1.1.3. See ADR 0004 + spec 0002.
7.1 AC Submit + form inputs ✅ Done New submitAdaptiveCardAction MCP tool. AC Input.Text/ChoiceSet/Date/Time/Number/Toggle round-trip back to CS as activity.value. Double-click guard via ref. Manifest 1.1.4. See spec 0003.
7.2 Suggested actions + tuning ✅ Done Quick-reply chips wired to callTool('openCopilotStudioChat', { userQuery: title }). v0.7.2a: raised CS backstop 30s→180s. v0.7.2b: filter interim "I'll get that for you" messages. v0.7.2c: handle BF cumulative streaming chunks + streaminfo entity. v0.7.2d: cap inline preview height + open-analyst hint. v0.7.2e: REGRESSION — flipped conversationId optional→required, broke routing.
7.3a Recovery from v0.7.2e ✅ Done Production failure 2026-05-04: host LLM emitted tool args as plaintext into chat instead of invoking. Reverted conversationId to .optional() + shortened tool description. Lesson written up in ADR 0005 — arg optionality IS part of the locked-contract surface.
7.3 Server-side escalation detection (additive, widget banner pending) 🔄 In progress cs.ts detects ActivityTypes.Handoff + hint-phrase fallback; emits escalation: 'none'|'waiting'|'connected' in structuredContent. Widget treats unset as today's behavior — additive only, safe to deploy. Banner UI + magic-ping check is the next commit. See spec 0004.
7 Distribution package + docs ✅ Done v0.7 ship: README rewritten as fork-front-door, QUICK-START.md, SMOKE-CHECKLIST.md, scripts/swap-brand.ps1, BLOG.md updated, CS-PARITY.md reflects shipped reality.
ops Manifest v1.1.5 ✅ Done Conversation starters refreshed (Switzerland GDP/inflation, Greece chart). DA approved at this version in CDX.

Decisions made

  • Repo visibility: Public as of Phase 2.
  • Account: KarimaKT (default authed GitHub user).
  • CS agent: Existing agent in CDX tenant 301759bc-5be1-40f1-8a44-822e286f5a9d (Dynamics org orgea8005ed.crm.dynamics.com, expires Aug 2026). IDs in IDs.md.
  • No separate CEA. The user's "CEA" reference was the CS agent itself.
  • Cross-tenant by design: Azure hosting in personal MSA tenant 4420bedf-...; M365 / CS in CDX tenant 301759bc-.... Entra app reg goes in CDX tenant.
  • Auth strategy (final): Server-side Entra SSO + OBO. No browser-side MSAL. See ADR 0003.
  • Architecture (final): Data-widget pattern. Server calls CS Direct Engine; widget renders structured payload. See ADR 0001 + ADR 0002.
  • Hosting: App Service B1 Linux for the MCP server (centralus). SWA Free for the standalone WebChat (secondary surface).
  • Skipped: All Microsoft Foundry / AI Toolkit MCP tooling — not relevant to this pattern.

Source of truth for versions

Surface Bumped when Current
declarative-agent/appPackage/manifest.json version Any locked-contract change (tool name/args/types/optionality/description, MIME, outputTemplate URI, tool count). Source of truth. 1.1.5
mcp-server/src/server.ts SERVER_VERSION When the MCP server's behavior changes; advertised in initialize. 0.3.0
package.json version (mcp-server, webchat-ui) Not used. Set to 0.1.0 and ignored. 0.1.0
Git tags / commit prefix (v0.7.3a, etc.) Per-feature ship label. Not always 1:1 with manifest version. v0.7.3a

When in doubt: the manifest version is the customer-facing version. Bump it whenever a host re-approval is needed.

Deferred work tracked here so it's not lost

  • C1 (cleanup, partially started 2026-05-06): Delete v1 widget tree (webchat-ui/src/widget/, index.widget.html, vite.widget.config.ts, dist-widget/).
    • Done today: dropped the v0.5 fallback paths from mcp-server/src/widget.ts CANDIDATES list (no doc dependency, safe surgical change).
    • Blocked on doc rewrite: docs/WIDGET-CUSTOMIZATION.md is substantively v0.5-oriented — it documents the BotFramework Web Chat styleOptions JSON path (Layer 2) and editing Widget.tsx with <Composer> + <BasicWebChat> (Layer 3). Both are dead in v0.6's data-widget pattern. The doc needs a v0.6 rewrite that says "edit webchat-ui/src/widget-v2/main.tsx directly" — not just path updates. Until that rewrite ships, deleting webchat-ui/src/widget/ would create broken doc links pointing at non-existent files.
    • Also blocked: local-dev scripts build:widget and build:all in webchat-ui/package.json reference vite.widget.config.ts. They're dead but not orphaned (someone could still try npm run build:all on a fork). Remove together with the source files when the doc rewrite is ready.
  • B3 follow-up (RESOLVED 2026-05-06): Verified against MS reference trey-research/ai-plugin.json — the MS pattern declares ALL tools (widget + data) in functions[], run_for_functions[], and x-mcp_tool_description.tools[]. Updated source ai-plugin.json to declare submitAdaptiveCardAction correctly. Smoke also caught two pre-existing drifts in openCopilotStudioChat: manifest's required[] was empty (server says userQuery required) and conversationId wasn't declared at all. Fixed both. Next publish must bump manifest.json from 1.1.5 to 1.2.0 (or 1.1.6) AND re-approve in CDX tenant admin because tool count + arg-schema both changed (locked-contract surface per ADR 0005).
  • Contracts JSON lock (originally B3): v0.7.3a shipped the smoke-test path — mcp-server/scripts/smoke-mcp.mjs runs as a pre-deploy gate in CI and asserts the locked-contract surface end-to-end. The --manifest <path> flag (added 2026-05-06) cross-checks server tools/list against the source ai-plugin.json (functions, run_for_functions, x-mcp_tool_description.tools, per-tool required/properties). A separate contracts.lock.json snapshot file would be redundant.
  • Smoke-test live-endpoint gate: The CI workflow runs the smoke against a local server; it does NOT yet call the deployed App Service after publish. Adding a post-deploy smoke against https://app-mcsmcpapps-mcp.azurewebsites.net/mcp (with a service-principal Bearer token) would close the loop on remote deploys. Token-minting in CI is the unsolved piece.

Next publish to CDX — required actions

The source manifest in declarative-agent/appPackage/ai-plugin.json is now ahead of what's live in CDX (v1.1.5). When you next publish:

  1. Bump declarative-agent/appPackage/manifest.json version from 1.1.5 to 1.2.0 (minor bump because we added a tool to the contract; treat as semver-ish).
  2. Run Agents Toolkit → Provision → Publish.
  3. Approve the new version in Microsoft 365 admin center → All agents → Requests (the surface from FEATURE-REQUESTS.md §1.1).
  4. Within 30 sec of admin approval going live, run node mcp-server/scripts/smoke-mcp.mjs https://app-mcsmcpapps-mcp.azurewebsites.net/mcp (with a Bearer token if SSO is enforced) and S01 from SMOKE-CHECKLIST.md. Revert the manifest commit if either fails.

The server-side change is already live in production (manifest is what's stale). So the publish is risk-controlled — worst case the host LLM continues using its v1.1.5 cached catalog until approval.