Command Palette
Search for a command to run...
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
Samuel Humeau; Kurt Shuster; Marie-Anne Lachaux; Jason Weston

Abstract
The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on three existing tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| conversational-response-selection-on-douban-1 | Poly-encoder | MAP: 0.608 MRR: 0.650 P@1: 0.475 R10@1: 0.299 R10@2: 0.494 R10@5: 0.822 |
| conversational-response-selection-on-dstc7 | Bi-encoder | 1-of-100 Accuracy: 66.3% |
| conversational-response-selection-on-dstc7 | Bi-encoder (v2) | 1-of-100 Accuracy: 70.9% |
| conversational-response-selection-on-rrs-1 | Poly-encoder | NDCG@3: 0.679 NDCG@5: 0.765 |
| conversational-response-selection-on-ubuntu-1 | Poly-encoder | R10@1: 0.882 R10@2: 0.949 R10@5: 0.990 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.