HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Mahir Labib Dihan Md Tanvir Hassan Md Tanvir Parvez Md Hasebul Hasan Md Almash Alam Muhammad Aamir Cheema Mohammed Eunus Ali Md Rizwan Parvez

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation
  Models

Abstract

Recent advancements in foundation models have enhanced AI systems'capabilities in autonomous tool usage and reasoning. However, their ability inlocation or map-based reasoning - which improves daily life by optimizingnavigation, facilitating resource discovery, and streamlining logistics - hasnot been systematically studied. To bridge this gap, we introduce MapEval, abenchmark designed to assess diverse and complex map-based user queries withgeo-spatial reasoning. MapEval features three task types (textual, API-based,and visual) that require collecting world information via map tools, processingheterogeneous geo-spatial contexts (e.g., named entities, travel distances,user reviews or ratings, images), and compositional reasoning, which allstate-of-the-art foundation models find challenging. Comprising 700 uniquemultiple-choice questions about locations across 180 cities and 54 countries,MapEval evaluates foundation models' ability to handle spatial relationships,map infographics, travel planning, and navigation challenges. Using MapEval, weconducted a comprehensive evaluation of 28 prominent foundation models. Whileno single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, andGemini-1.5-Pro achieved competitive performance overall. However, substantialperformance gaps emerged, particularly in MapEval, where agents withClaude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%,respectively, and the gaps became even more amplified when compared toopen-source LLMs. Our detailed analyses provide insights into the strengths andweaknesses of current models, though all models still fall short of humanperformance by more than 20% on average, struggling with complex map images andrigorous geo-spatial reasoning. This gap highlights MapEval's critical role inadvancing general-purpose foundation models with stronger geo-spatialunderstanding.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-mapeval-api-1GPT-3.5-Turbo (Chameleon)
Accuracy (%): 49.33
question-answering-on-mapeval-api-1Claude-3.5-Sonnet (ReAct)
Accuracy (%): 64.00
question-answering-on-mapeval-textualClaude-3.5-Sonnet
Accuracy (% ): 66.33
visual-question-answering-on-mapeval-visualClaude-3.5-Sonnet
Accuracy (% ): 61.65

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp