FlashMoBA
FlashMoBA was jointly proposed by research teams from MIT and Nvidia in November 2025, and the relevant research results were published in a paper. SOptimizing Mixture of Block Attention .
FlashMoBA is a hardware-aware CUDA kernel that enables efficient MoBA execution even at our theoretically recommended small block size. By borrowing techniques from FlashAttention and adding novel optimizations for block sparsity, this paradigm achieves a 14.7x speedup over FlashAttention-2, making it possible to deploy previously impractical, theoretically optimal configurations.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.