8 months ago

Multi-Task Learning

Method/Architecture

Jiasen Lu†* Christopher Clark†* Rowan Zellers†○ Roozbeh Mottaghi†○ Aniruddha Kembhavi†○

Abstract

We propose Unified-IO, a model that performs a large variety of AI tasksspanning classical computer vision tasks, including pose estimation, objectdetection, depth estimation and image generation, vision-and-language taskssuch as region captioning and referring expression, to natural languageprocessing tasks such as question answering and paraphrasing. Developing asingle unified model for such a large variety of tasks poses unique challengesdue to the heterogeneous inputs and outputs pertaining to each task, includingRGB images, per-pixel maps, binary masks, bounding boxes, and language. Weachieve this unification by homogenizing every supported input and output intoa sequence of discrete vocabulary tokens. This common representation across alltasks allows us to train a single transformer-based architecture, jointly onover 90 diverse datasets in the vision and language fields. Unified-IO is thefirst model capable of performing all 7 tasks on the GRIT benchmark andproduces strong results across 16 diverse benchmarks like NYUv2-Depth,ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with notask-specific fine-tuning. Code and demos for Unified-IO are available at:https://unified-io.allenai.org.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Multi-Task Learning

Method/Architecture

Jiasen Lu†* Christopher Clark†* Rowan Zellers†○ Roozbeh Mottaghi†○ Aniruddha Kembhavi†○

Abstract

We propose Unified-IO, a model that performs a large variety of AI tasksspanning classical computer vision tasks, including pose estimation, objectdetection, depth estimation and image generation, vision-and-language taskssuch as region captioning and referring expression, to natural languageprocessing tasks such as question answering and paraphrasing. Developing asingle unified model for such a large variety of tasks poses unique challengesdue to the heterogeneous inputs and outputs pertaining to each task, includingRGB images, per-pixel maps, binary masks, bounding boxes, and language. Weachieve this unification by homogenizing every supported input and output intoa sequence of discrete vocabulary tokens. This common representation across alltasks allows us to train a single transformer-based architecture, jointly onover 90 diverse datasets in the vision and language fields. Unified-IO is thefirst model capable of performing all 7 tasks on the GRIT benchmark andproduces strong results across 16 diverse benchmarks like NYUv2-Depth,ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with notask-specific fine-tuning. Code and demos for Unified-IO are available at:https://unified-io.allenai.org.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp