Interactive Evaluation Of Dialog On Dstc9
评估指标
Coherent
Consistent
Diversity
Error Recovery
Flexible
Informative
Inquisitive
Likeable
Overall Human Rating
Topic Depth
Understanding
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Coherent | Consistent | Diversity | Error Recovery | Flexible | Informative | Inquisitive | Likeable | Overall Human Rating | Topic Depth | Understanding |
---|---|---|---|---|---|---|---|---|---|---|---|
a-unified-pre-training-framework-for | 2.8017 | 0.9390 | 2.7441 | 2.7518 | 2.8000 | 2.7881 | 2.7949 | 2.7878 | 4.15 | 2.7678 | 2.8285 |