Home

Guide

Roadmap

Unsupervised Evaluation

Overview Activity Answer

Binary IRT and 2PL

Overview Activity Answer

Beta4-IRT and CLAIRE

Overview Extra Activity Answer

Authors

Contact me

Guide

Guide

A direct page for the workshop guide, bringing the main repository context into the site navigation.

Repository Overview

This project contains the full material for a hands-on workshop on latent-ability-aware evaluation in machine learning.

Main Message

aggregate metrics are useful, but incomplete;
they usually treat all instances as if they were equally difficult;
latent-variable models help us separate model ability from item difficulty;
this leads naturally to richer analyses such as Beta4-IRT and CLAIRE.

Main Sections

Unsupervised evaluation and the limitation of weighting all instances equally.
Binary IRT, with emphasis on 1PL intuition, 2PL-IRT, and ICC interpretation.
Beta4-IRT and CLAIRE as a latent-ability-aware framework for model evaluation.

Full Guide

Open full guide on GitHub