HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Are NLP Models really able to Solve Simple Math Word Problems?

Arkil Patel Satwik Bhattamishra Navin Goyal

Are NLP Models really able to Solve Simple Math Word Problems?

Abstract

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs. In this paper, we restrict our attention to English MWPs taught in grades four and lower. We provide strong evidence that the existing MWP solvers rely on shallow heuristics to achieve high performance on the benchmark datasets. To this end, we show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy. Further, we introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over examples sampled from existing datasets. The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.

Code Repositories

arkilpatel/SVAMP
Official
pytorch
Mentioned in GitHub
debjitpaul/refiner
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
math-word-problem-solving-on-asdiv-aLSTM Seq2Seq with RoBERTa
Execution Accuracy: 76.9
math-word-problem-solving-on-asdiv-aGraph2Tree with RoBERTa
Execution Accuracy: 82.2
math-word-problem-solving-on-asdiv-aGTS with RoBERTa
Execution Accuracy: 81.2
math-word-problem-solving-on-mawpsGTS with RoBERTa
Accuracy (%): 88.5
math-word-problem-solving-on-mawpsGraph2Tree with RoBERTa
Accuracy (%): 88.7
math-word-problem-solving-on-svampGTS with RoBERTa
Accuracy: 41.0
Execution Accuracy: 41.0
math-word-problem-solving-on-svampLSTM Seq2Seq with RoBERTa
Accuracy: 40.3
Execution Accuracy: 40.3
math-word-problem-solving-on-svampGraph2Tree with RoBERTa
Accuracy: 43.8
Execution Accuracy: 43.8
math-word-problem-solving-on-svampTransformer with RoBERTa
Accuracy: 38.9
Execution Accuracy: 38.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp