Model Upgrade: gpt-4o-2024-11-20 → gpt-4.1 by san360 · Pull Request #3 · san360/agent-devops

san360 · 2026-05-15T19:40:43Z

Summary

Upgrades the agent model from gpt-4o-2024-11-20 to gpt-4.1
GitHub variable GPT_DEPLOYMENT updated to gpt-4.1
Model history annotation updated in agent config

Why this needs an eval gate

Swapping models is a behaviour change. The new model may:

Format responses differently
Handle tool calls differently
Score higher or lower on evaluation dimensions

The eval workflow will deploy with the new model and verify all scores
meet thresholds before this can be merged.

Changes

File	Change
`agents/tech-trends-agent.json`	Updated `_model_history` annotation
GitHub variable `GPT_DEPLOYMENT`	`gpt-4o-2024-11-20` → `gpt-4.1`

What to check

All 4 evaluator scores meet or exceed current thresholds
No regression on Phase 1 or Phase 2 queries
Response format still follows the structured template
Tool calls (web search + code interpreter) still work correctly

Phase

Phase 3 of 3 — model upgrade. Lifecycle demo complete after merge.

github-actions · 2026-05-15T19:41:39Z

✅ Agent Deployment & Evaluation Report

🤖 Agent Details

Property	Value
Agent	`tech-trends-agent`
Version	`18`
Semver	`0.0.0-pr.3`
Phase	2
Model	`gpt-4.1`
Commit	`04796b2`
Timestamp	2026-05-15 19:54:45 UTC

📊 Pipeline Results

Step	Status	Details
Deploy to TEST	✅ PASSED	Agent version `18` deployed
Smoke Test	✅ PASSED	Invoked agent via Responses API
Foundry Evaluation	✅ PASSED	Evaluated with golden dataset

🛠️ Tools Configuration

Tool	Enabled
`web_search`	—
`code_interpreter`	✅

🔗 Links

_{🤖 Updated automatically by the CI pipeline · 2026-05-15 19:54:45 UTC}

Strip newlines from output preview before writing to GITHUB_OUTPUT, as multiline values break the key=value format.

Replace microsoft/ai-agent-evals action with custom run_evaluation.py that creates one evaluation per agent and adds runs on each pipeline execution. Run names encode branch/commit for traceability. - First run: creates evaluation named '{agent-name}-eval' - Subsequent runs: reuses existing evaluation, adds new run - Run name format: '{branch}/{commit-sha}'

chore: upgrade model from gpt-4o-2024-11-20 to gpt-4.1

1314b2b

san360 added 2 commits May 15, 2026 21:42

fix: sanitize response_preview for GITHUB_OUTPUT format

7265240

Strip newlines from output preview before writing to GITHUB_OUTPUT, as multiline values break the key=value format.

san360 merged commit b602722 into main May 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Upgrade: gpt-4o-2024-11-20 → gpt-4.1#3

Model Upgrade: gpt-4o-2024-11-20 → gpt-4.1#3
san360 merged 3 commits into
mainfrom
chore/model-upgrade-gpt41

san360 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

san360 commented May 15, 2026

Summary

Why this needs an eval gate

Changes

What to check

Phase

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Agent Deployment & Evaluation Report

🤖 Agent Details

📊 Pipeline Results

🛠️ Tools Configuration

🔗 Links

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 15, 2026 •

edited

Loading