Hellobench
评估指标
average
chat-rescaled score
heuristic text generation-rescaled score
llm_model
model_url
open-ended qa-rescaled score
organization
parameters
release_date
summarization-rescaled score
text completion-rescaled score
updated_time
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | average | chat-rescaled score | heuristic text generation-rescaled score | llm_model | model_url | open-ended qa-rescaled score | organization | parameters | release_date | summarization-rescaled score | text completion-rescaled score | updated_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|
模型 1 | 48.55 | 42.88 | 47.87 | GPT-4o-2024-08-06 | https://platform.openai.com/docs/guides | 54.82 | OpenAI | N/A | 2024/8/6 | 29.71 | 67.49 | 2024/9/24 |