HyperAIHyperAI

Command Palette

Search for a command to run...

Console

FlashMoBA

Date

3 days ago

Organization

MIT
NVIDIA

Paper URL

2511.11571

FlashMoBA was jointly proposed by research teams from MIT and Nvidia in November 2025, and the relevant research results were published in a paper. SOptimizing Mixture of Block Attention .

FlashMoBA is a hardware-aware CUDA kernel that enables efficient MoBA execution even at our theoretically recommended small block size. By borrowing techniques from FlashAttention and adding novel optimizations for block sparsity, this paradigm achieves a 14.7x speedup over FlashAttention-2, making it possible to deploy previously impractical, theoretically optimal configurations.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp