HyperAIHyperAI

SSTQA Semi-structured Tabular Question Answering Dataset

Download Help

SSTQA is a benchmark dataset for semi-structured table question answering tasks released in 2025 by Shanghai Jiao Tong University, Simon Fraser University, Tsinghua University and other institutions. The relevant paper results are "ST-Raptor: LLM-Powered Semi-Structured Table Question Answering", which aims to test the understanding and answering capabilities of large-scale language models and table question answering systems when faced with complex layouts in real tables (such as merged cells, hierarchical headers, multi-level nesting, etc.).

This dataset contains 102 complex, real-world tables and 764 corresponding questions, covering 19 representative real-world application scenarios. Table features include nested cells, multi-level headers, and irregular layouts, fully reflecting the structural complexity of real-world problems. Question-answer pairs are constructed through a combination of automatic generation and manual review, and are categorized into three difficulty levels: easy, medium, and hard. The dataset covers tasks ranging from direct retrieval to complex reasoning, ensuring diverse and challenging tasks.

This dataset addresses the problems of existing semi-structured datasets, such as small scale, simple structure, and disconnection from real applications. It has the characteristics of complex structure, rich scenarios, clear difficulty levels, and high-quality annotation. It is suitable for the training and evaluation of large multimodal models and table question-answering systems, and is an important benchmark for promoting table understanding and intelligent applications.