HLE Human Question Reasoning Benchmark Dataset
HLE stands for Humanity's Last Exam, a multimodal human problem benchmark dataset jointly released by the Center for AI Safety and Scale AI in 2025. The related paper results are:Humanity's Last Exam", aims to build the ultimate closed evaluation system covering the frontiers of human knowledge.
The dataset contains 2,500 questions covering dozens of subjects such as mathematics, humanities, and natural sciences, including multiple-choice questions and short-answer questions suitable for automatic grading.
Subject distribution:
- Mathematics (41%):Abstract problems such as advanced mathematics, probability theory, and algorithm design.
- Computer Science/Artificial Intelligence (10%):Machine learning theory, computational complexity, natural language processing.
- Natural Sciences (27%):Physics (9%), Chemistry (7%), Biology/Medicine (11%), involving quantum physics, organic synthesis, pathological mechanisms, etc.
- Humanities/Social Sciences (9%):Critical analysis questions in philosophy, history, economics, and sociology.
- Engineering (4%) and other disciplines (9%):Covers engineering design, art history, and interdisciplinary cutting-edge issues.

Discipline Distribution
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.