Conversational Web Navigation On Weblinx
Metrics
Element (IoU)
Intent Match
Overall score
Text (F1)
Results
Performance results of various models on this benchmark
Model Name | Element (IoU) | Intent Match | Overall score | Text (F1) | Paper Title | Repository |
---|---|---|---|---|---|---|
GPT-3.5T (Zero-Shot) | 8.62 | 42.77 | 8.51 | 3.45 | - | - |
S-LLaMA-1.3B | 20.54 | 83.32 | 23.73 | 25.85 | - | - |
Pix2Act-1.3B | 8.28 | 81.80 | 16.88 | 25.21 | - | - |
MindAct-3B | 16.50 | 79.89 | 20.94 | 23.16 | - | - |
Fuyu-8B | 15.70 | 80.07 | 19.97 | 22.30 | - | - |
Llama-2-13B | 22.82 | 81.91 | 25.21 | 26.60 | - | - |
GPT-3.5F | 18.64 | 77.56 | 21.22 | 22.39 | - | - |
MindAct-780M | 13.39 | 75.87 | 15.13 | 13.58 | - | - |
Flan-T5-780M | 15.36 | 80.02 | 17.27 | 14.05 | - | - |
MindAct-250M | 12.05 | 74.25 | 12.63 | 7.67 | - | - |
Pix2Act-282M | 6.20 | 79.71 | 12.51 | 16.40 | - | - |
S-LLaMA-2.7B | 22.60 | 84.00 | 25.02 | 27.17 | - | - |
GPT-4T (Zero-Shot) | 10.85 | 41.66 | 10.72 | 6.75 | - | - |
Flan-T5-250M | 14.86 | 79.69 | 14.99 | 9.21 | - | - |
Flan-T5-3B | 20.31 | 81.14 | 23.77 | 25.75 | - | - |
GPT-4V (Zero-Shot) | 10.91 | 42.36 | 10.45 | 6.21 | - | - |
Llama-2-7B | 22.26 | 82.64 | 24.57 | 26.50 | - | - |
0 of 17 row(s) selected.