Search for a command to run...
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation