HyperAIHyperAI

Command Palette

Search for a command to run...

OmniVec: Learning robust representations with cross modal sharing

Siddharth Srivastava Gaurav Sharma

Abstract

Majority of research in learning based methods has been towards designing andtraining networks for specific tasks. However, many of the learning basedtasks, across modalities, share commonalities and could be potentially tackledin a joint framework. We present an approach in such direction, to learnmultiple tasks, in multiple modalities, with a unified architecture. Theproposed network is composed of task specific encoders, a common trunk in themiddle, followed by task specific prediction heads. We first pre-train it byself-supervised masked training, followed by sequential training for thedifferent tasks. We train the network on all major modalities, e.g.\ visual,audio, text and 3D, and report results on 222222 diverse and challenging publicbenchmarks. We demonstrate empirically that, using a joint network to trainacross modalities leads to meaningful information sharing and this allows us toachieve state-of-the-art results on most of the benchmarks. We also showgeneralization of the trained network on cross-modal tasks as well as unseendatasets and tasks.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp