MMLU Benchmark
MMLU (Measuring Massive Multitask Language Understanding) is a multiple-choice benchmark spanning 57 subjects from STEM and international law to nutrition and religion. It contains 15,908 multiple-choice questions (four options each), of which 1,540 are reserved for hyperparameter tuning. Per its overview, "It was one of the most commonly used benchmarks for comparing the capabilities of large language models, with over 100 million downloads as of July 2024."