Date

5 months ago

Organization

Paper URL

2508.18190

Tags

LLM

Benchmarks

Multimodal Representation

Document Understanding

Visual Question Answering

SSTQA is a benchmark dataset for semi-structured table question answering tasks released in 2025 by Shanghai Jiao Tong University, Simon Fraser University, Tsinghua University and other institutions. The relevant paper results are "ST-Raptor: LLM-Powered Semi-Structured Table Question Answering", which aims to test the understanding and answering capabilities of large-scale language models and table question answering systems when faced with complex layouts in real tables (such as merged cells, hierarchical headers, multi-level nesting, etc.).

This dataset contains 102 complex, real-world tables and 764 corresponding questions, covering 19 representative real-world application scenarios. Table features include nested cells, multi-level headers, and irregular layouts, fully reflecting the structural complexity of real-world problems. Question-answer pairs are constructed through a combination of automatic generation and manual review, and are categorized into three difficulty levels: easy, medium, and hard. The dataset covers tasks ranging from direct retrieval to complex reasoning, ensuring diverse and challenging tasks.

This dataset addresses the problems of existing semi-structured datasets, such as small scale, simple structure, and disconnection from real applications. It has the characteristics of complex structure, rich scenarios, clear difficulty levels, and high-quality annotation. It is suitable for the training and evaluation of large multimodal models and table question-answering systems, and is an important benchmark for promoting table understanding and intelligent applications.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.