Skip to content

fix(grpo): fix list/list division TypeError in partial parse reward calculation#13

Open
isaacbmiller wants to merge 1 commit intomainfrom
fix/bootstrap-trace-list-division
Open

fix(grpo): fix list/list division TypeError in partial parse reward calculation#13
isaacbmiller wants to merge 1 commit intomainfrom
fix/bootstrap-trace-list-division

Conversation

@isaacbmiller
Copy link
Copy Markdown

When computing the format reward for partially-parsed outputs during GRPO
bootstrapping, the code divides two lists (present / expected) instead of
their lengths. This raises TypeError: unsupported operand type(s) for /: 'list' and 'list'.

The intent is to compute the fraction of expected output fields that were
successfully parsed, so len(present) / len(expected) is the correct
expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant