Rethinking with retrieval (GPT-3) | 77.73 | Rethinking with Retrieval: Faithful Large Language Model Inference | |
Self-Evaluation Guided Decoding
(Codex, CoT, single reasoning chain, 6-shot gen, 4-shot eval) | 77.2 | - | - |
PaLM 2 (few-shot, CoT, SC) | 90.4 | PaLM 2 Technical Report | |