Command Palette
Search for a command to run...
Zhao Zelin ; Samel Karan ; Chen Binghong ; Song Le

Abstract
Programs, consisting of semantic and structural information, play animportant role in the communication between humans and agents. Towards learninggeneral program executors to unify perception, reasoning, and decision making,we formulate program-guided tasks which require learning to execute a givenprogram on the observed task specification. Furthermore, we propose theProgram-guided Transformer (ProTo), which integrates both semantic andstructural guidance of a program by leveraging cross-attention and maskedself-attention to pass messages between the specification and routines in theprogram. ProTo executes a program in a learned latent space and enjoys strongerrepresentation ability than previous neural-symbolic approaches. We demonstratethat ProTo significantly outperforms the previous state-of-the-art methods onGQA visual reasoning and 2D Minecraft policy learning datasets. Additionally,ProTo demonstrates better generalization to unseen, complex, and human-writtenprograms.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| visual-question-answering-on-gqa-test-std | ProTo | Accuracy: 65.14 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.