HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Adaptive Split-Fusion Transformer

Zixuan Su Hao Zhang Jingjing Chen Lei Pang Chong-Wah Ngo Yu-Gang Jiang

Adaptive Split-Fusion Transformer

Abstract

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models to best utilize each technique. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention, without concerning the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of dual-path are fused with weighting scalars calculated from visual cues. We also design the convolutional path compactly for efficiency concerns. Extensive experiments on standard benchmarks, such as ImageNet-1K, CIFAR-10, and CIFAR-100, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs/56.7M Params, without large-scale pre-training). The code is available at: https://github.com/szx503045266/ASF-former.

Code Repositories

szx503045266/asf-former
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-cifar-10ASF-former-S
Percentage correct: 98.7
image-classification-on-cifar-10ASF-former-B
Percentage correct: 98.8%
image-classification-on-cifar-10-imageASF-former-B
Params: 56.7M
image-classification-on-cifar-10-imageASF-former-S
Params: 19.3M
image-classification-on-imagenetASF-former-B
Number of params: 56.7M
Top 1 Accuracy: 83.9%
image-classification-on-imagenetASF-former-S
Number of params: 19.3M
Top 1 Accuracy: 82.7%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp