Skip to content

Model Upgrade: gpt-4o-2024-11-20 → gpt-4.1#3

Merged
san360 merged 3 commits into
mainfrom
chore/model-upgrade-gpt41
May 15, 2026
Merged

Model Upgrade: gpt-4o-2024-11-20 → gpt-4.1#3
san360 merged 3 commits into
mainfrom
chore/model-upgrade-gpt41

Conversation

@san360
Copy link
Copy Markdown
Owner

@san360 san360 commented May 15, 2026

Summary

  • Upgrades the agent model from gpt-4o-2024-11-20 to gpt-4.1
  • GitHub variable GPT_DEPLOYMENT updated to gpt-4.1
  • Model history annotation updated in agent config

Why this needs an eval gate

Swapping models is a behaviour change. The new model may:

  • Format responses differently
  • Handle tool calls differently
  • Score higher or lower on evaluation dimensions

The eval workflow will deploy with the new model and verify all scores
meet thresholds before this can be merged.

Changes

File Change
agents/tech-trends-agent.json Updated _model_history annotation
GitHub variable GPT_DEPLOYMENT gpt-4o-2024-11-20gpt-4.1

What to check

  • All 4 evaluator scores meet or exceed current thresholds
  • No regression on Phase 1 or Phase 2 queries
  • Response format still follows the structured template
  • Tool calls (web search + code interpreter) still work correctly

Phase

Phase 3 of 3 — model upgrade. Lifecycle demo complete after merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

✅ Agent Deployment & Evaluation Report

🤖 Agent Details

Property Value
Agent tech-trends-agent
Version 18
Semver 0.0.0-pr.3
Phase 2
Model gpt-4.1
Commit 04796b2
Timestamp 2026-05-15 19:54:45 UTC

📊 Pipeline Results

Step Status Details
Deploy to TEST PASSED Agent version 18 deployed
Smoke Test PASSED Invoked agent via Responses API
Foundry Evaluation PASSED Evaluated with golden dataset

🛠️ Tools Configuration

Tool Enabled
web_search
code_interpreter

🔗 Links


🤖 Updated automatically by the CI pipeline · 2026-05-15 19:54:45 UTC

san360 added 2 commits May 15, 2026 21:42
Strip newlines from output preview before writing to GITHUB_OUTPUT,
as multiline values break the key=value format.
Replace microsoft/ai-agent-evals action with custom run_evaluation.py
that creates one evaluation per agent and adds runs on each pipeline
execution. Run names encode branch/commit for traceability.

- First run: creates evaluation named '{agent-name}-eval'
- Subsequent runs: reuses existing evaluation, adds new run
- Run name format: '{branch}/{commit-sha}'
@san360 san360 merged commit b602722 into main May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant