[LLM] Build A/B testing framework for prompts and models by Menjay7 · Pull Request #430 · Traqora/astroml

Menjay7 · 2026-06-28T18:39:35Z

Description
Summary

Introduces an A/B testing framework for evaluating different LLM prompts and model configurations. This framework enables controlled experiments, measures performance across predefined metrics, and supports data-driven decisions when optimizing prompts and model selection.

Changes Made
Added A/B testing infrastructure for prompt and model experiments.
Implemented experiment configuration with support for multiple variants.
Added traffic splitting and variant assignment logic.
Integrated experiment metadata into request processing.
Recorded experiment IDs, variants, and model information in logs.
Added configurable success metrics (latency, cost, quality, user feedback, etc.).
Implemented experiment result collection and aggregation.
Added feature flags to enable or disable experiments.
Added safeguards for fallback to the default prompt/model when experiments are disabled or fail.
Added unit and integration tests for experiment assignment and metrics collection.
Updated documentation with setup and usage instructions.
Benefits
Enables safe rollout of prompt and model changes.
Supports objective comparison of prompt effectiveness.
Improves experimentation without impacting production stability.
Provides measurable insights for optimizing response quality, latency, and cost.
Testing
Verified deterministic variant assignment.
Tested traffic allocation across experiment groups.
Validated logging and metrics collection.
Confirmed fallback behavior when experiments are disabled.
Executed unit and integration tests successfully.
Checklist
A/B testing framework implemented
Variant assignment logic added
Metrics collection integrated
Feature flag support included
Fallback mechanism implemented
Tests added and passing
Documentation updated..closed #400

drips-wave · 2026-06-28T18:39:46Z

@Menjay7 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

gelluisaac · 2026-06-29T11:31:05Z

@Menjay7 please fix conflicts

Merge branch 'main' into men

d3101d5

github-advanced-security AI found potential problems Jun 28, 2026

View reviewed changes

Comment thread astroml/tracking/ab_testing.py Fixed

Merge branch 'main' into men

106f395

gelluisaac merged commit 922a7f2 into Traqora:main Jun 29, 2026
11 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLM] Build A/B testing framework for prompts and models#430

[LLM] Build A/B testing framework for prompts and models#430
gelluisaac merged 2 commits into
Traqora:mainfrom
Menjay7:men

Menjay7 commented Jun 28, 2026

Uh oh!

drips-wave Bot commented Jun 28, 2026

Uh oh!

Uh oh!

gelluisaac commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Menjay7 commented Jun 28, 2026

Uh oh!

drips-wave Bot commented Jun 28, 2026

Uh oh!

Uh oh!

gelluisaac commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants