Unsupervised Evaluation

Section 1

Aggregate evaluation is useful, but it can hide that some instances are consistently harder than others.

Focus

Main Notebook