What You're Seeing
Green arrows = Mamba's selective state space:
information flows linearly through hidden states, with selective gating choosing what to remember.
Red matrix = Transformer attention:
every token attends to every other token (O(n²) complexity).
Mamba
O(n)
1000 tokens = 1K ops
⚠️ 1000x more compute at scale