HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Natural Language Inference
Natural Language Inference On Rte
Natural Language Inference On Rte
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Repository
LTG-BERT-small 24M
53.7
Not all layers are equally as important: Every Layer Counts BERT
-
DistilBERT 66M
62.9%
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Flipped-3B
71.05
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
ERNIE
68.8%
ERNIE: Enhanced Language Representation with Informative Entities
T5-XXL 11B
92.5%
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
ALBERT
89.2%
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
GPT-NeoX 20B (1-shot)
53.8%
BloombergGPT: A Large Language Model for Finance
-
PaLM 540B (1-shot)
78.7%
PaLM: Scaling Language Modeling with Pathways
KiC-770M
74.00
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
-
data2vec
69.9%
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
LaMini-F-T5 783M
65%
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
PaLM 540B (0-shot)
72.9%
PaLM: Scaling Language Modeling with Pathways
Q-BERT (Shen et al., 2020)
84.7
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
-
PaLM 2-L (1-shot)
79.3%
PaLM 2 Technical Report
RoE-3B
64.01
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
SMART-BERT
71.2%
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
RealFormer
73.7%
RealFormer: Transformer Likes Residual Attention
PaLM 2-S (1-shot)
78.7%
PaLM 2 Technical Report
XLNet (single model)
85.9%
XLNet: Generalized Autoregressive Pretraining for Language Understanding
BigBird
75.0%
Big Bird: Transformers for Longer Sequences
0 of 90 row(s) selected.
Previous
Next