Skip to content

fix: VES ratio always 1.0 due to gold_sql used for both operands in iterated_execution#42

Open
nimaboubanian wants to merge 1 commit intopremAI-io:mainfrom
nimaboubanian:fix/ves-ratio-predicted-sql
Open

fix: VES ratio always 1.0 due to gold_sql used for both operands in iterated_execution#42
nimaboubanian wants to merge 1 commit intopremAI-io:mainfrom
nimaboubanian:fix/ves-ratio-predicted-sql

Conversation

@nimaboubanian
Copy link
Copy Markdown

@nimaboubanian nimaboubanian commented Apr 18, 2026

What's wrong

In premsql/executors/base.py, the iterated_execution method computes the efficiency ratio for VES (Valid Efficiency Score) like this:

diff_list = [
    self.execute_sql(sql=gold_sql, ...)['execution_time']
    / self.execute_sql(sql=gold_sql, ...)['execution_time']  # ← same query both sides
    for _ in range(num_iterations)
]

Both the numerator and denominator call execute_sql with gold_sql, so the ratio is always gold_time / gold_time = 1.0 regardless of how efficient or slow the predicted query actually is. This makes VES a flat constant (100.0 for every matching prediction), giving no useful signal whatsoever.

Fix

Change the denominator to use predicted_sql instead of gold_sql:

diff_list = [
    self.execute_sql(sql=gold_sql, ...)['execution_time']
    / self.execute_sql(sql=predicted_sql, ...)['execution_time']  # ← correct
    for _ in range(num_iterations)
]

The ratio gold_time / predicted_time correctly reflects efficiency: a value > 1.0 means the predicted query is faster than the gold standard, < 1.0 means it's slower. The downstream mean(sqrt(ratio)) × 100 aggregation then produces a meaningful score.

A short inline comment has been added to the fixed line to document the original bug for future readers.

Impact

Without this fix, iterated_execution silently returns 1.0 for every matching prediction. Any benchmark or leaderboard numbers produced with the existing code are not measuring execution efficiency at all.

No API changes, no new dependencies.

The denominator was mistakenly using gold_sql for both operands,
making the ratio always 1.0. Changed to predicted_sql so the
efficiency score actually reflects how fast the predicted query
runs relative to the gold standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant