Command Palette
Search for a command to run...
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
Samuel J. Paech

Abstract
We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| emotional-intelligence-on-emotional | OpenAI gpt-3.5-0613 | EQ-Bench Score: 49.17 |
| emotional-intelligence-on-emotional | lmsys/vicuna-33b-v1.3 | EQ-Bench Score: 36.52 |
| emotional-intelligence-on-emotional | lmsys/vicuna-13b-v1.1 | EQ-Bench Score: 32.85 |
| emotional-intelligence-on-emotional | OpenAI text-davinci-002 | EQ-Bench Score: 39.44 |
| emotional-intelligence-on-emotional | OpenAI text-davinci-003 | EQ-Bench Score: 43.73 |
| emotional-intelligence-on-emotional | meta-llama/Llama-2-70b-chat-hf | EQ-Bench Score: 51.56 |
| emotional-intelligence-on-emotional | OpenAI ADA | EQ-Bench Score: 2.25 |
| emotional-intelligence-on-emotional | meta-llama/Llama-2-7b-chat-hf | EQ-Bench Score: 25.43 |
| emotional-intelligence-on-emotional | OpenAI gpt-3.5-turbo-0301 | EQ-Bench Score: 47.61 |
| emotional-intelligence-on-emotional | Intel/neural-chat-7b-v3-1 | EQ-Bench Score: 43.61 |
| emotional-intelligence-on-emotional | Qwen/Qwen-72B-Chat | EQ-Bench Score: 52.44 |
| emotional-intelligence-on-emotional | openchat/openchat 3.5 | EQ-Bench Score: 37.08 |
| emotional-intelligence-on-emotional | migtissera/SynthIA-70B-v1.5 | EQ-Bench Score: 54.83 |
| emotional-intelligence-on-emotional | Open-Orca/Mistral-7B-OpenOrca | EQ-Bench Score: 44.40 |
| emotional-intelligence-on-emotional | OpenAI gpt-4-0613 | EQ-Bench Score: 62.52 |
| emotional-intelligence-on-emotional | OpenAI gpt-4-0314 | EQ-Bench Score: 53.39 |
| emotional-intelligence-on-emotional | Qwen/Qwen-14B-Chat | EQ-Bench Score: 43.76 |
| emotional-intelligence-on-emotional | Koala 13B | EQ-Bench Score: 24.92 |
| emotional-intelligence-on-emotional | meta-llama/Llama-2-13b-chat-hf | EQ-Bench Score: 33.02 |
| emotional-intelligence-on-emotional | OpenAI ADA | EQ-Bench Score: 2.25 |
| emotional-intelligence-on-emotional | Anthropic Claude2 | EQ-Bench Score: 52.14 |
| emotional-intelligence-on-emotional | 01-ai/Yi-34B-Chat | EQ-Bench Score: 51.03 |
| emotional-intelligence-on-emotional | lmsys/vicuna-7b-v1.1 | EQ-Bench Score: 22.24 |
| emotional-intelligence-on-emotional | OpenAI text-davinci-001 | EQ-Bench Score: 15.19 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.