Skip to content

[LLM] Build A/B testing framework for prompts and models#430

Merged
gelluisaac merged 2 commits into
Traqora:mainfrom
Menjay7:men
Jun 29, 2026
Merged

[LLM] Build A/B testing framework for prompts and models#430
gelluisaac merged 2 commits into
Traqora:mainfrom
Menjay7:men

Conversation

@Menjay7

@Menjay7 Menjay7 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Description
Summary

Introduces an A/B testing framework for evaluating different LLM prompts and model configurations. This framework enables controlled experiments, measures performance across predefined metrics, and supports data-driven decisions when optimizing prompts and model selection.

Changes Made
Added A/B testing infrastructure for prompt and model experiments.
Implemented experiment configuration with support for multiple variants.
Added traffic splitting and variant assignment logic.
Integrated experiment metadata into request processing.
Recorded experiment IDs, variants, and model information in logs.
Added configurable success metrics (latency, cost, quality, user feedback, etc.).
Implemented experiment result collection and aggregation.
Added feature flags to enable or disable experiments.
Added safeguards for fallback to the default prompt/model when experiments are disabled or fail.
Added unit and integration tests for experiment assignment and metrics collection.
Updated documentation with setup and usage instructions.
Benefits
Enables safe rollout of prompt and model changes.
Supports objective comparison of prompt effectiveness.
Improves experimentation without impacting production stability.
Provides measurable insights for optimizing response quality, latency, and cost.
Testing
Verified deterministic variant assignment.
Tested traffic allocation across experiment groups.
Validated logging and metrics collection.
Confirmed fallback behavior when experiments are disabled.
Executed unit and integration tests successfully.
Checklist
A/B testing framework implemented
Variant assignment logic added
Metrics collection integrated
Feature flag support included
Fallback mechanism implemented
Tests added and passing
Documentation updated..closed #400

@drips-wave

drips-wave Bot commented Jun 28, 2026

Copy link
Copy Markdown

@Menjay7 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Comment thread astroml/tracking/ab_testing.py Fixed
@gelluisaac

Copy link
Copy Markdown
Contributor

@Menjay7 please fix conflicts

@gelluisaac gelluisaac merged commit 922a7f2 into Traqora:main Jun 29, 2026
11 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LLM] Build A/B testing framework for prompts and models

3 participants