Add evaluation pre-registration subsection (Unit 1 hands-on)#696
Open
sk8ordie84 wants to merge 1 commit into
Open
Add evaluation pre-registration subsection (Unit 1 hands-on)#696sk8ordie84 wants to merge 1 commit into
sk8ordie84 wants to merge 1 commit into
Conversation
Adds a ~250-word subsection to Unit 1 hands-on, between the evaluate_policy block and the 'Publish to Hub' section, introducing the concept of pre-registering an evaluation claim to a cryptographic hash. The subsection is marked optional, names PRML (CC BY 4.0) as one open spec for the concept rather than the spec, and points readers at the linked CLI without modifying any course dependencies. Disclosure: the author of PRML is the author of this PR. PRML is published under CC BY 4.0; reference implementations are MIT; a patent non-assertion grant is published with the spec. There is no commercial product upstream of the spec. If maintainers prefer to introduce pre-registration as a concept without naming a specific spec, I am happy to revise. Spec: https://spec.falsify.dev/v0.1 Repo: https://github.com/studio-11-co/falsify
Author
|
Thanks for the great course — Unit 1 is what got me thinking about this. Would love your eyes on this when you have a moment, @ThomasSimonini. The addition is one optional subsection in unit1/hands-on.mdx, marked optional, no dependency changes. Happy to revise framing or scope if you'd prefer the concept introduced without naming a specific spec. If a different placement works better (e.g. unitbonus, or a course-end appendix), I can move it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a ~250-word subsection to Unit 1 hands-on, between the
evaluate_policyblock and the "Publish to Hub" section, introducing the concept of pre-registering an evaluation claim to a cryptographic hash.Why
The course is excellent at teaching RL agents and decent at warning about benchmark pitfalls. It does not currently address the reporting-side problem: when self-learners post their scores in blog posts or notebooks, they often decide what counts as "success" after the training run — a falsifiability issue that has nothing to do with RL specifically but applies to every score reported in this course.
A short subsection that names the problem and points to one open spec (PRML, CC BY 4.0) for solving it gives readers a working mental model and a tool, without taking sides on whether PRML is the spec to use long-term.
What this PR is not
pip install falsifyis shown as optional in a code block, not added to course requirements.Disclosure
I am the author of PRML and a co-founder of Studio 11 (Turkey), which maintains the falsify reference implementations. PRML is released under CC BY 4.0 with all code under MIT and a published patent non-assertion grant. I have no commercial product upstream of the spec.
If the maintainers prefer to point readers at a different pre-registration spec, or to introduce the concept without naming any specific spec, I'm happy to revise. The goal is the concept landing in the curriculum, not the spec name landing.
Where the addition lives
units/en/unit1/hands-on.mdx— new subsection "Pre-registering your evaluation claim (optional)" inserted between theevaluate_policymean reward note and "Publish our trained model on the Hub".Spec & related work