Skip to content

san360/agent-devops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Foundry Prompt Agent — DevOps & Lifecycle Management

A complete CI/CD lifecycle for an Azure AI Foundry prompt agent, demonstrating versioned prompts, tool changes, model upgrades, evaluation gates, and rollback.

Agent: Technology Trend Research & Analysis

  • Phase 1: Web search only (web_search tool)
  • Phase 2: Web search + Code Interpreter (web_search + code_interpreter tools) for data analysis

Repository Structure

agents/
  tech-trends-agent.json             Active agent config (tools, model ref, eval pointers)
  tech-trends-agent.default.json     Default baseline (rollback target)
prompts/
  tech-trends-agent.md               Active system prompt (Markdown)
  tech-trends-agent.default.md       Default baseline prompt (rollback target)
evals/                               Golden dataset + evaluator config
scripts/
  deploy_agent.py                    Deploy agent to TEST or PROD
  rollback_agent.py                  Rollback to default or a saved artifact
  run_evaluation.py                  Run Foundry evaluation locally
  compare_models.py                  Side-by-side model comparison
  bootstrap.sh                       One-time Azure + GitHub setup
  teardown.sh                        Reverse everything bootstrap created
  lifecycle/
    01-phase1-web-search.sh          PR: agent with web search only
    02-phase2-code-interpreter.sh    PR: add code interpreter
    03-model-upgrade.sh              PR: upgrade model to gpt-4.1
infra/                               Bicep IaC for Foundry project
artifacts/                           Generated deployment snapshots (post-deploy)
.github/workflows/                   CI/CD pipelines
tests/                               Unit tests

Prerequisites

  • Python 3.12+
  • Azure CLI (az login)
  • GitHub CLI (gh auth login)
  • An Azure subscription with permissions to create resources and App Registrations
  • An Azure OpenAI deployment (e.g. gpt-4o-2024-11-20)

Quick Start — Automated Bootstrap

The bootstrap script provisions all Azure infrastructure and GitHub configuration in one shot.

# 1. Login to Azure and GitHub
az login
gh auth login

# 2. Run bootstrap
./scripts/bootstrap.sh \
  --resource-group rg-agent-devops \
  --account-name agentdevops \
  --location eastus \
  --github-repo san360/agent-devops

This creates:

  • A resource group with TEST and PROD AI Foundry projects (via Bicep)
  • An App Registration with a Service Principal
  • 3 federated credentials for GitHub OIDC (main branch, pull requests, tags)
  • RBAC role assignments (Azure AI User, Cognitive Services OpenAI User)
  • Model availability validation (checks current + upgrade target model)
  • 6 GitHub repository variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, FOUNDRY_TEST_ENDPOINT, FOUNDRY_PROD_ENDPOINT, GPT_DEPLOYMENT)

State is saved to .bootstrap-state.json for use by the teardown script.

Bootstrap Parameters

Flag Required Default Description
--resource-group Yes Azure resource group name
--account-name Yes Base name for Foundry accounts (suffixed with test/prod)
--location No eastus Azure region
--github-repo No san360/agent-devops GitHub owner/repo for variables and federation
--gpt-deployment No gpt-4o-2024-11-20 GPT model deployment name
--gpt-capacity No 30 GPT deployment capacity (tokens per minute in thousands)

Lifecycle Demo — Phase 1 → Phase 2 → Model Upgrade

Three scripts simulate the full agent lifecycle by creating PRs that trigger the CI/CD pipeline. Run them sequentially — each builds on the previous phase.

Phase 1: Web Search Agent

./scripts/lifecycle/01-phase1-web-search.sh
  • Creates branch feature/phase1-web-search
  • Configures the agent with the web_search tool
  • Evaluation runs 5 Phase 1 test cases
  • Opens a PR — evaluate.yml triggers, deploys to TEST, runs eval

After the eval passes, merge the PR. deploy-prod.yml deploys to PROD.

Phase 2: Add Code Interpreter

./scripts/lifecycle/02-phase2-code-interpreter.sh
  • Creates branch feature/phase2-code-interpreter from updated main
  • Adds code_interpreter tool alongside existing web_search
  • Extends the system prompt with a ## Data Analysis section
  • Evaluation now runs all 8 test cases (Phase 1 + Phase 2) — checks for regressions
  • Opens a PR

After the eval passes, merge the PR.

Phase 3: Model Upgrade

./scripts/lifecycle/03-model-upgrade.sh
  • Creates branch chore/model-upgrade-gpt41
  • Upgrades model from gpt-4o-2024-11-20 (default) to gpt-4.1
  • Updates the GPT_DEPLOYMENT GitHub variable to gpt-4.1
  • Adds a model history entry in the agent config
  • Opens a PR — the eval gate verifies the new model scores at or above thresholds

The bootstrap script validates that both the current model and the upgrade target (gpt-4.1) are available in your chosen Azure region. If gpt-4.1 is not available, the script will list alternatives you can use instead.

After the eval passes, merge the PR. The full lifecycle demo is complete.

Lifecycle Flow Diagram

flowchart LR
    subgraph PR Pipeline
        A[Developer Push] --> B[Create PR]
        B --> C[Deploy to TEST]
        C --> D[Smoke Test]
        D --> E[Evaluation Run]
        E --> F{Scores Pass?}
        F -->|Yes| G[Post Results to PR]
        F -->|No| H[Block Merge]
    end

    subgraph Merge Pipeline
        G --> I[Merge to main]
        I --> J[Deploy to PROD]
        J --> K[Commit Artifact]
    end
Loading

Agent Evaluation Flow

flowchart TD
    A[Pipeline Triggered] --> B{Evaluation exists?}
    B -->|No| C[Create Evaluation\ntech-trends-agent-eval]
    B -->|Yes| D[Reuse Existing Evaluation]
    C --> E[Create Run]
    D --> E
    E --> F[Upload Golden Dataset]
    F --> G[Invoke Agent with Queries]
    G --> H[Run Evaluators\ntask_adherence, relevance,\ngroundedness, coherence]
    H --> I[Wait for Completion]
    I --> J[Output Scores & Report URL]
    J --> K[Post Summary to PR]

    style C fill:#4ade80,stroke:#16a34a
    style D fill:#60a5fa,stroke:#2563eb
Loading

Lifecycle Phases

flowchart LR
    P1["Phase 1\nweb_search"] -->|merge| P2["Phase 2\ncode_interpreter"]
    P2 -->|merge| P3["Phase 3\nModel Upgrade\ngpt-4o → gpt-4.1"]

    P1 -.->|eval gate| P1E[✅ 5 queries]
    P2 -.->|eval gate| P2E[✅ 8 queries]
    P3 -.->|eval gate| P3E[✅ 8 queries]
Loading

Teardown

Remove all Azure resources and GitHub configuration created by bootstrap:

./scripts/teardown.sh          # interactive confirmation prompt
./scripts/teardown.sh --yes    # skip confirmation

This deletes:

  • The resource group (and all resources within — TEST project, PROD project, model deployments)
  • Federated credentials and the App Registration
  • All 7 GitHub repository variables
  • The .bootstrap-state.json state file

Manual Quick Start

If you prefer to set up infrastructure manually instead of using bootstrap:

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure environment
cp .env.example .env
# Edit .env with your Foundry endpoints and deployment names
# Ensure each variable is prefixed with 'export' so they are
# visible to Python (os.environ) when sourced.

# 3. Login to Azure
az login

# 4. Deploy to test
source .env
python scripts/deploy_agent.py --env test --semver 1.0.0 --tools web_search

CI/CD Workflows

Workflow Trigger Purpose
evaluate.yml PR touching agents/, prompts/, evals/ Deploy to test, run eval, post results to PR
deploy-prod.yml Push to main touching agents/, prompts/ Deploy to prod, commit artifact
monitor.yml Daily cron (06:00 UTC) Eval prod agent, open issue on drift

Evaluation

The eval gate uses a create-once, run-many pattern with four evaluators:

  • Task Adherence (threshold: 0.80)
  • Relevance (threshold: 0.75)
  • Groundedness (threshold: 0.75)
  • Coherence (threshold: 0.80)

A smoke test step runs before evaluation — it invokes the agent with a test query and verifies a valid response is returned.

Evaluation naming: A single evaluation named tech-trends-agent-eval is created on the first pipeline run. Subsequent runs reuse the same evaluation and add new runs. Each run is named {branch}/{commit-sha} for full traceability back to the source change.

Rollback

The rollback script supports two modes:

Reset to default baseline

python scripts/rollback_agent.py --default prod

Copies agents/tech-trends-agent.default.json and prompts/tech-trends-agent.default.md over the active config and prompt, then re-deploys the clean baseline to Foundry.

Rollback to a specific artifact

python scripts/rollback_agent.py artifacts/tech-trends-agent-v1.0.1.json prod

Reconstructs the agent config from the artifact's definition (model, tools, prompt) and re-deploys that exact state to Foundry. Also writes the artifact's definition back to agents/tech-trends-agent.json so local state matches production.

What a rollback does

  • Creates a new Foundry version with the restored prompt, tools, and model
  • Updates the local agents/tech-trends-agent.json to match the rolled-back state
  • Description field notes it is a rollback and the source version
  • Does not delete the bad version from Foundry (history is immutable)

Artifacts

The artifacts/ folder is the deployment ledger. Every production deploy via deploy-prod.yml commits a versioned JSON snapshot here, recording:

  • What was deployed (model, tools, prompt hash)
  • When and where (timestamp, endpoint, environment)
  • Which code produced it (git commit SHA, branch, tag)

Artifacts enable rollback, auditability, and drift detection. They are committed with [skip ci] to avoid triggering re-deployment.

Model Comparison

python scripts/compare_models.py --current gpt-4o-2024-11-20 --candidate gpt-4.1 --tools web_search

Deploys both model versions to test for side-by-side evaluation.

Authentication

Uses GitHub OIDC federation — no secrets stored in the repository. The bootstrap.sh script configures this automatically by creating an App Registration with federated credentials for three GitHub Actions contexts:

Credential Subject Used by
github-main repo:owner/repo:ref:refs/heads/main deploy-prod.yml
github-pr repo:owner/repo:pull_request evaluate.yml
github-release repo:owner/repo:ref:refs/tags/* Future release workflows

For manual setup, create these federated credentials on an App Registration and assign the Azure AI User and Cognitive Services OpenAI User roles scoped to your resource group.

Running Tests

pytest tests/ -v

About

AI Foundry Prompt Agent — DevOps & Lifecycle Management with versioned prompts, eval gates, and CI/CD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors