3 months ago

LXMERT Model Compression for Visual Question Answering

Maryam Hashemi Ghazaleh Mahmoudi Sara Kodeiri Hadi Sheikhi Sauleh Eetemadi

Abstract

Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.

Code Repositories

pwc-1/Paper-9/tree/main/lxmert

mindspore

ghazaleh-mahmoodi/lxmert_compression

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
visual-question-answering-on-vqa-v2-test-dev-1	LXMERT (low-magnitude pruning)	Accuracy: 70.72
visual-question-answering-on-vqa-v2-test-std-1	LXMERT (low-magnitude pruning)	Accuracy: 70.87

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette