Image Generation On Textatlaseval
Metrics
StyledTextSynth Clip Score
StyledTextSynth FID
StyledTextSynth OCR (Accuracy)
StyledTextSynth OCR (Cer)
StyledTextSynth OCR (F1 Score)
TextScenesHQ Clip Score
TextScenesHQ FID
TextScenesHQ OCR (Accuracy)
TextScenesHQ OCR (Cer)
TextScenesHQ OCR (F1 Score)
TextVisionBlend Clip Score
TextVisionBlend FID
TextVisionBlend OCR (Accuracy)
TextVisionBlend OCR (Cer)
TextVsionBlend OCR (F1 Score)
Results
Performance results of various models on this benchmark
Model Name | StyledTextSynth Clip Score | StyledTextSynth FID | StyledTextSynth OCR (Accuracy) | StyledTextSynth OCR (Cer) | StyledTextSynth OCR (F1 Score) | TextScenesHQ Clip Score | TextScenesHQ FID | TextScenesHQ OCR (Accuracy) | TextScenesHQ OCR (Cer) | TextScenesHQ OCR (F1 Score) | TextVisionBlend Clip Score | TextVisionBlend FID | TextVisionBlend OCR (Accuracy) | TextVisionBlend OCR (Cer) | TextVsionBlend OCR (F1 Score) | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Infinity-2B | 0.2727 | 84.95 | 0.80 | 0.93 | 1.42 | 0.2346 | 71.59 | 1.06 | 0.88 | 1.74 | 0.1979 | 95.69 | 2.98 | 0.83 | 3.44 | Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | |
Dalle3 | 0.2938 | 90.70 | 30.58 | 0.78 | 38.25 | 0.3367 | 86.73 | 69.26 | - | 51.63 | 0.1938 | 153.21 | 8.38 | 0.93 | 7.94 | - | - |
SD3.5 Large | 0.2849 | 71.09 | 27.21 | 0.73 | 33.86 | 0.2363 | 64.44 | 19.03 | 0.73 | 24.45 | 0.1846 | 118.85 | 14.55 | 0.88 | 16.25 | - | - |
Grok3 | 0.2938 | 80.33 | 15.82 | 0.73 | 21.40 | 0.3197 | - | 35.07 | 0.57 | 37.94 | 0.1697 | - | 41.54 | 0.57 | 44.22 | - | - |
PixArt-Sigma | 0.2764 | 82.83 | 0.42 | 0.90 | 0.62 | 0.2347 | 72.62 | 0.34 | 0.91 | 0.53 | 0.1891 | 81.29 | 2.40 | 0.83 | 1.57 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | |
TextDiffuser2 | 0.2510 | 114.31 | 0.76 | 0.99 | 1.46 | 0.2252 | 84.10 | 0.66 | 0.96 | 1.25 | - | - | - | - | - | TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering | - |
Anytext | 0.2501 | 117.71 | 0.35 | 0.98 | 0.66 | 0.2174 | 101.32 | 0.42 | 0.95 | 0.8 | - | - | - | - | - | AnyText: Multilingual Visual Text Generation And Editing |
0 of 7 row(s) selected.