Skip to content

Comments

LLM Experiments#6343

Draft
robertomonteromiguel wants to merge 2 commits intomainfrom
robertomonteromiguel/llm_experiments
Draft

LLM Experiments#6343
robertomonteromiguel wants to merge 2 commits intomainfrom
robertomonteromiguel/llm_experiments

Conversation

@robertomonteromiguel
Copy link
Collaborator

Motivation

Test LLM Experiment to evaluate the system-tests repo prompts

Changes

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

@github-actions
Copy link
Contributor

CODEOWNERS have been resolved as:

questions.csv                                                           @DataDog/system-tests-core
tests/llm_experiments/__init__.py                                       @DataDog/system-tests-core
tests/llm_experiments/evaluate_system_tests_prompts.py                  @DataDog/system-tests-core
requirements.txt                                                        @DataDog/system-tests-core

# ============================================================================
# STEP 1: Define Custom Evaluator
# ============================================================================
class SemanticSimilarityEvaluator(BaseEvaluator):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use from ddtrace.llmobs._evaluators import SemanticSimilarityEvaluator ?


# Load test dataset from CSV
# Expected format: question,category,answer
dataset = LLMObs.create_dataset_from_csv(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably try to pull the dataset before creating it to make the file re-runnable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants