Command Palette
Search for a command to run...

Abstract
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| optical-flow-estimation-on-kitti-2015 | Perceiver IO | Average End-Point Error: 4.98 |
| optical-flow-estimation-on-sintel-clean | Perceiver IO | Average End-Point Error: 1.81 |
| optical-flow-estimation-on-sintel-final | Perceiver IO | Average End-Point Error: 2.42 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.