Date

2 months ago

Organization

Paper URL

Tags

FlashMoBA was jointly proposed by research teams from MIT and Nvidia in November 2025, and the relevant research results were published in a paper. SOptimizing Mixture of Block Attention .

FlashMoBA is a hardware-aware CUDA kernel that enables efficient MoBA execution even at our theoretically recommended small block size. By borrowing techniques from FlashAttention and adding novel optimizations for block sparsity, this paradigm achieves a 14.7x speedup over FlashAttention-2, making it possible to deploy previously impractical, theoretically optimal configurations.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FlashMoBA

Build AI with AI

HyperAI Newsletters

Command Palette

FlashMoBA

Related Wiki

Multi-agent Workflow CudaForge

Gated Attention

ReinFlow, an Online Reinforcement Learning Framework

TreeSynth Is a Synthetic Data Method Based on tree-guided subspaces.

UserBench Benchmark

SERES Semantic Aware Sparse View Reconstruction Framework

Byzantine Robust Federal Learning (BRFL)

Fractal Forensics

Normalized Spatiotemporal Gradient (NSG)

Build AI with AI

HyperAI Newsletters

Command Palette

FlashMoBA

Related Wiki

Multi-agent Workflow CudaForge

Gated Attention

ReinFlow, an Online Reinforcement Learning Framework

TreeSynth Is a Synthetic Data Method Based on tree-guided subspaces.

UserBench Benchmark

SERES Semantic Aware Sparse View Reconstruction Framework

Byzantine Robust Federal Learning (BRFL)

Fractal Forensics

Normalized Spatiotemporal Gradient (NSG)

Build AI with AI

HyperAI Newsletters

Related Wiki

Multi-agent Workflow CudaForge

Gated Attention

ReinFlow, an Online Reinforcement Learning Framework

TreeSynth Is a Synthetic Data Method Based on tree-guided subspaces.

UserBench Benchmark

SERES Semantic Aware Sparse View Reconstruction Framework

Byzantine Robust Federal Learning (BRFL)

Fractal Forensics

Normalized Spatiotemporal Gradient (NSG)

Related Wiki

Multi-agent Workflow CudaForge

Gated Attention

ReinFlow, an Online Reinforcement Learning Framework

TreeSynth Is a Synthetic Data Method Based on tree-guided subspaces.

UserBench Benchmark

SERES Semantic Aware Sparse View Reconstruction Framework

Byzantine Robust Federal Learning (BRFL)

Fractal Forensics

Normalized Spatiotemporal Gradient (NSG)