Skip to content

Add evaluation pre-registration subsection (Unit 1 hands-on)#696

Open
sk8ordie84 wants to merge 1 commit into
huggingface:mainfrom
sk8ordie84:add-prml-preregistration-section
Open

Add evaluation pre-registration subsection (Unit 1 hands-on)#696
sk8ordie84 wants to merge 1 commit into
huggingface:mainfrom
sk8ordie84:add-prml-preregistration-section

Conversation

@sk8ordie84
Copy link
Copy Markdown

What

Adds a ~250-word subsection to Unit 1 hands-on, between the evaluate_policy block and the "Publish to Hub" section, introducing the concept of pre-registering an evaluation claim to a cryptographic hash.

Why

The course is excellent at teaching RL agents and decent at warning about benchmark pitfalls. It does not currently address the reporting-side problem: when self-learners post their scores in blog posts or notebooks, they often decide what counts as "success" after the training run — a falsifiability issue that has nothing to do with RL specifically but applies to every score reported in this course.

A short subsection that names the problem and points to one open spec (PRML, CC BY 4.0) for solving it gives readers a working mental model and a tool, without taking sides on whether PRML is the spec to use long-term.

What this PR is not

  • Not an endorsement of PRML over alternatives. Text says "one open spec", not "the spec".
  • Not a dependency change. pip install falsify is shown as optional in a code block, not added to course requirements.
  • Not a syllabus reordering. The addition is a single subsection appended to an existing evaluation section.

Disclosure

I am the author of PRML and a co-founder of Studio 11 (Turkey), which maintains the falsify reference implementations. PRML is released under CC BY 4.0 with all code under MIT and a published patent non-assertion grant. I have no commercial product upstream of the spec.

If the maintainers prefer to point readers at a different pre-registration spec, or to introduce the concept without naming any specific spec, I'm happy to revise. The goal is the concept landing in the curriculum, not the spec name landing.

Where the addition lives

units/en/unit1/hands-on.mdx — new subsection "Pre-registering your evaluation claim (optional)" inserted between the evaluate_policy mean reward note and "Publish our trained model on the Hub".

Spec & related work

Adds a ~250-word subsection to Unit 1 hands-on, between the
evaluate_policy block and the 'Publish to Hub' section,
introducing the concept of pre-registering an evaluation claim
to a cryptographic hash.

The subsection is marked optional, names PRML (CC BY 4.0) as one
open spec for the concept rather than the spec, and points
readers at the linked CLI without modifying any course
dependencies.

Disclosure: the author of PRML is the author of this PR.
PRML is published under CC BY 4.0; reference implementations are
MIT; a patent non-assertion grant is published with the spec.
There is no commercial product upstream of the spec.

If maintainers prefer to introduce pre-registration as a concept
without naming a specific spec, I am happy to revise.

Spec: https://spec.falsify.dev/v0.1
Repo: https://github.com/studio-11-co/falsify
@sk8ordie84
Copy link
Copy Markdown
Author

Thanks for the great course — Unit 1 is what got me thinking about this.

Would love your eyes on this when you have a moment, @ThomasSimonini. The addition is one optional subsection in unit1/hands-on.mdx, marked optional, no dependency changes. Happy to revise framing or scope if you'd prefer the concept introduced without naming a specific spec.

If a different placement works better (e.g. unitbonus, or a course-end appendix), I can move it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant