AI Foundry Prompt Agent — DevOps & Lifecycle Management

A complete CI/CD lifecycle for an Azure AI Foundry prompt agent, demonstrating versioned prompts, tool changes, model upgrades, evaluation gates, and rollback.

Agent: Technology Trend Research & Analysis

Phase 1: Web search only (web_search tool)
Phase 2: Web search + Code Interpreter (web_search + code_interpreter tools) for data analysis

Repository Structure

agents/
  tech-trends-agent.json             Active agent config (tools, model ref, eval pointers)
  tech-trends-agent.default.json     Default baseline (rollback target)
prompts/
  tech-trends-agent.md               Active system prompt (Markdown)
  tech-trends-agent.default.md       Default baseline prompt (rollback target)
evals/                               Golden dataset + evaluator config
scripts/
  deploy_agent.py                    Deploy agent to TEST or PROD
  rollback_agent.py                  Rollback to default or a saved artifact
  run_evaluation.py                  Run Foundry evaluation locally
  compare_models.py                  Side-by-side model comparison
  bootstrap.sh                       One-time Azure + GitHub setup
  teardown.sh                        Reverse everything bootstrap created
  lifecycle/
    01-phase1-web-search.sh          PR: agent with web search only
    02-phase2-code-interpreter.sh    PR: add code interpreter
    03-model-upgrade.sh              PR: upgrade model to gpt-4.1
infra/                               Bicep IaC for Foundry project
artifacts/                           Generated deployment snapshots (post-deploy)
.github/workflows/                   CI/CD pipelines
tests/                               Unit tests

Prerequisites

Python 3.12+
Azure CLI (az login)
GitHub CLI (gh auth login)
An Azure subscription with permissions to create resources and App Registrations
An Azure OpenAI deployment (e.g. gpt-4o-2024-11-20)

Quick Start — Automated Bootstrap

The bootstrap script provisions all Azure infrastructure and GitHub configuration in one shot.

# 1. Login to Azure and GitHub
az login
gh auth login

# 2. Run bootstrap
./scripts/bootstrap.sh \
  --resource-group rg-agent-devops \
  --account-name agentdevops \
  --location eastus \
  --github-repo san360/agent-devops

This creates:

A resource group with TEST and PROD AI Foundry projects (via Bicep)
An App Registration with a Service Principal
3 federated credentials for GitHub OIDC (main branch, pull requests, tags)
RBAC role assignments (Azure AI User, Cognitive Services OpenAI User)
Model availability validation (checks current + upgrade target model)
6 GitHub repository variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, FOUNDRY_TEST_ENDPOINT, FOUNDRY_PROD_ENDPOINT, GPT_DEPLOYMENT)

State is saved to .bootstrap-state.json for use by the teardown script.

Bootstrap Parameters

Flag	Required	Default	Description
`--resource-group`	Yes	—	Azure resource group name
`--account-name`	Yes	—	Base name for Foundry accounts (suffixed with `test`/`prod`)
`--location`	No	`eastus`	Azure region
`--github-repo`	No	`san360/agent-devops`	GitHub `owner/repo` for variables and federation
`--gpt-deployment`	No	`gpt-4o-2024-11-20`	GPT model deployment name
`--gpt-capacity`	No	`30`	GPT deployment capacity (tokens per minute in thousands)

Lifecycle Demo — Phase 1 → Phase 2 → Model Upgrade

Three scripts simulate the full agent lifecycle by creating PRs that trigger the CI/CD pipeline. Run them sequentially — each builds on the previous phase.

Phase 1: Web Search Agent

./scripts/lifecycle/01-phase1-web-search.sh

Creates branch feature/phase1-web-search
Configures the agent with the web_search tool
Evaluation runs 5 Phase 1 test cases
Opens a PR — evaluate.yml triggers, deploys to TEST, runs eval

After the eval passes, merge the PR. deploy-prod.yml deploys to PROD.

Phase 2: Add Code Interpreter

./scripts/lifecycle/02-phase2-code-interpreter.sh

Creates branch feature/phase2-code-interpreter from updated main
Adds code_interpreter tool alongside existing web_search
Extends the system prompt with a ## Data Analysis section
Evaluation now runs all 8 test cases (Phase 1 + Phase 2) — checks for regressions
Opens a PR

After the eval passes, merge the PR.

Phase 3: Model Upgrade

./scripts/lifecycle/03-model-upgrade.sh

Creates branch chore/model-upgrade-gpt41
Upgrades model from gpt-4o-2024-11-20 (default) to gpt-4.1
Updates the GPT_DEPLOYMENT GitHub variable to gpt-4.1
Adds a model history entry in the agent config
Opens a PR — the eval gate verifies the new model scores at or above thresholds

The bootstrap script validates that both the current model and the upgrade target (gpt-4.1) are available in your chosen Azure region. If gpt-4.1 is not available, the script will list alternatives you can use instead.

After the eval passes, merge the PR. The full lifecycle demo is complete.

Lifecycle Flow Diagram

flowchart LR
    subgraph PR Pipeline
        A[Developer Push] --> B[Create PR]
        B --> C[Deploy to TEST]
        C --> D[Smoke Test]
        D --> E[Evaluation Run]
        E --> F{Scores Pass?}
        F -->|Yes| G[Post Results to PR]
        F -->|No| H[Block Merge]
    end

    subgraph Merge Pipeline
        G --> I[Merge to main]
        I --> J[Deploy to PROD]
        J --> K[Commit Artifact]
    end

Agent Evaluation Flow

flowchart TD
    A[Pipeline Triggered] --> B{Evaluation exists?}
    B -->|No| C[Create Evaluation\ntech-trends-agent-eval]
    B -->|Yes| D[Reuse Existing Evaluation]
    C --> E[Create Run]
    D --> E
    E --> F[Upload Golden Dataset]
    F --> G[Invoke Agent with Queries]
    G --> H[Run Evaluators\ntask_adherence, relevance,\ngroundedness, coherence]
    H --> I[Wait for Completion]
    I --> J[Output Scores & Report URL]
    J --> K[Post Summary to PR]

    style C fill:#4ade80,stroke:#16a34a
    style D fill:#60a5fa,stroke:#2563eb

Lifecycle Phases

flowchart LR
    P1["Phase 1\nweb_search"] -->|merge| P2["Phase 2\ncode_interpreter"]
    P2 -->|merge| P3["Phase 3\nModel Upgrade\ngpt-4o → gpt-4.1"]

    P1 -.->|eval gate| P1E[✅ 5 queries]
    P2 -.->|eval gate| P2E[✅ 8 queries]
    P3 -.->|eval gate| P3E[✅ 8 queries]

Teardown

Remove all Azure resources and GitHub configuration created by bootstrap:

./scripts/teardown.sh          # interactive confirmation prompt
./scripts/teardown.sh --yes    # skip confirmation

This deletes:

The resource group (and all resources within — TEST project, PROD project, model deployments)
Federated credentials and the App Registration
All 7 GitHub repository variables
The .bootstrap-state.json state file

Manual Quick Start

If you prefer to set up infrastructure manually instead of using bootstrap:

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure environment
cp .env.example .env
# Edit .env with your Foundry endpoints and deployment names
# Ensure each variable is prefixed with 'export' so they are
# visible to Python (os.environ) when sourced.

# 3. Login to Azure
az login

# 4. Deploy to test
source .env
python scripts/deploy_agent.py --env test --semver 1.0.0 --tools web_search

CI/CD Workflows

Workflow	Trigger	Purpose
`evaluate.yml`	PR touching `agents/`, `prompts/`, `evals/`	Deploy to test, run eval, post results to PR
`deploy-prod.yml`	Push to `main` touching `agents/`, `prompts/`	Deploy to prod, commit artifact
`monitor.yml`	Daily cron (06:00 UTC)	Eval prod agent, open issue on drift

Evaluation

The eval gate uses a create-once, run-many pattern with four evaluators:

Task Adherence (threshold: 0.80)
Relevance (threshold: 0.75)
Groundedness (threshold: 0.75)
Coherence (threshold: 0.80)

A smoke test step runs before evaluation — it invokes the agent with a test query and verifies a valid response is returned.

Evaluation naming: A single evaluation named tech-trends-agent-eval is created on the first pipeline run. Subsequent runs reuse the same evaluation and add new runs. Each run is named {branch}/{commit-sha} for full traceability back to the source change.

Rollback

The rollback script supports two modes:

Reset to default baseline

python scripts/rollback_agent.py --default prod

Copies agents/tech-trends-agent.default.json and prompts/tech-trends-agent.default.md over the active config and prompt, then re-deploys the clean baseline to Foundry.

Rollback to a specific artifact

python scripts/rollback_agent.py artifacts/tech-trends-agent-v1.0.1.json prod

Reconstructs the agent config from the artifact's definition (model, tools, prompt) and re-deploys that exact state to Foundry. Also writes the artifact's definition back to agents/tech-trends-agent.json so local state matches production.

What a rollback does

Creates a new Foundry version with the restored prompt, tools, and model
Updates the local agents/tech-trends-agent.json to match the rolled-back state
Description field notes it is a rollback and the source version
Does not delete the bad version from Foundry (history is immutable)

Artifacts

The artifacts/ folder is the deployment ledger. Every production deploy via deploy-prod.yml commits a versioned JSON snapshot here, recording:

What was deployed (model, tools, prompt hash)
When and where (timestamp, endpoint, environment)
Which code produced it (git commit SHA, branch, tag)

Artifacts enable rollback, auditability, and drift detection. They are committed with [skip ci] to avoid triggering re-deployment.

Model Comparison

python scripts/compare_models.py --current gpt-4o-2024-11-20 --candidate gpt-4.1 --tools web_search

Deploys both model versions to test for side-by-side evaluation.

Authentication

Uses GitHub OIDC federation — no secrets stored in the repository. The bootstrap.sh script configures this automatically by creating an App Registration with federated credentials for three GitHub Actions contexts:

Credential	Subject	Used by
`github-main`	`repo:owner/repo:ref:refs/heads/main`	`deploy-prod.yml`
`github-pr`	`repo:owner/repo:pull_request`	`evaluate.yml`
`github-release`	`repo:owner/repo:ref:refs/tags/*`	Future release workflows

For manual setup, create these federated credentials on an App Registration and assign the Azure AI User and Cognitive Services OpenAI User roles scoped to your resource group.

Running Tests

pytest tests/ -v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Foundry Prompt Agent — DevOps & Lifecycle Management

Agent: Technology Trend Research & Analysis

Repository Structure

Prerequisites

Quick Start — Automated Bootstrap

Bootstrap Parameters

Lifecycle Demo — Phase 1 → Phase 2 → Model Upgrade

Phase 1: Web Search Agent

Phase 2: Add Code Interpreter

Phase 3: Model Upgrade

Lifecycle Flow Diagram

Agent Evaluation Flow

Lifecycle Phases

Teardown

Manual Quick Start

CI/CD Workflows

Evaluation

Rollback

Reset to default baseline

Rollback to a specific artifact

What a rollback does

Artifacts

Model Comparison

Authentication

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
agents		agents
artifacts		artifacts
evals		evals
infra		infra
prompts		prompts
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Foundry Prompt Agent — DevOps & Lifecycle Management

Agent: Technology Trend Research & Analysis

Repository Structure

Prerequisites

Quick Start — Automated Bootstrap

Bootstrap Parameters

Lifecycle Demo — Phase 1 → Phase 2 → Model Upgrade

Phase 1: Web Search Agent

Phase 2: Add Code Interpreter

Phase 3: Model Upgrade

Lifecycle Flow Diagram

Agent Evaluation Flow

Lifecycle Phases

Teardown

Manual Quick Start

CI/CD Workflows

Evaluation

Rollback

Reset to default baseline

Rollback to a specific artifact

What a rollback does

Artifacts

Model Comparison

Authentication

Running Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages