-
Notifications
You must be signed in to change notification settings - Fork 19
Benchmarks #3
Copy link
Copy link
Open
Description
Hey! I'm a student at Berkeley and I've been playing around with MathCode for a course project. Couple of questions:
- Do you have any benchmark numbers? Like pass rates on miniF2F or MATH or anything similar? Curious how formalization and proving compare.
- Does it do better on straightforward textbook proofs vs harder competition-style stuff?
- When proving fails, what's usually the bottleneck - missing lemmas, bad strategy, or type errors?
I'm going to run it on some Erdős Problems from my class and can share what I find. Also put up a PR on the AUTOLEAN repo (T3S1AMAX/autolean#1) if you want to take a look.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels