[Benchmark] Add some smaller benchmarks

All end-to-end evaluations in the Quartz paper need to run for 24 hours.

To see the improvements in future commits, it would be a good idea to create some smaller benchmarks that can be run in several minutes.