15 Researches about Mamba Architecture

This week brought some interesting researches on Mamba architecture, showing that it is gaining popularity. Mamba is a simplified model designed to improve how neural networks process sequences of data, such as text, audio or vision data. It replaces complex parts like attention mechanisms and multilayer perceptrons (MLPs) with a streamlined approach using selective state space models (SSMs). This allow it to handle large sequences more efficiently than traditional models, like Transformers. You can find a detailed overview of Mamba architecture in our AI 101 episode.

Here is a list of Mamba-related studies with open code published this summer that could be useful for your research:

“Jamba-1.5: Hybrid Transformer-Mamba Models at Scale” by AI21Labs introduces a combination of Transformer and Mamba architectures to create efficient open-source LMs with high performance and low memory use, even for long texts. → Read more
“Scalable Autoregressive Image Generation with Mamba” proposes AiM, a new image generation model, that directly applies next-token prediction for image generation instead of traditional modifying Mamba for 2D signals. It results in better quality and faster speeds. → Read more
“MambaEVT: Event Stream based Visual Object Tracking using State Space Model” proposes to use a new Mamba-based approach for event camera-based visual tracking. This improves accuracy and efficiency of visual tracking, especially on large datasets. → Read more
“MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval” shows a new approach, that uses Mamba architecture because of its ability to efficiently handle multi-scale representations, which are crucial for Text-Video Retrieval (TVR). → Read more
“DeMansia: Mamba Never Forgets Any Tokens” explores the limitations of transformers in handling long sequences and introduces DeMansia architecture, combining state space models, such as Mamba, Vision Mamba (ViM), and LV-ViT, with token labeling, that highly improves image classification. → Read more
“BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba” shows how a specialized model, built on Mamba, can understand complex biomedical texts and be more effective than models like BioBERT. → Read more
“VSSD: Vision Mamba with Non-Causal State Space Duality” introduces a VSSD model that uses non-causal methods to improve both performance and efficiency in vision tasks like classification and segmentation. → Read more
“MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection”: MambaMixer model selectively mixes data across tokens and channels, improving performance in tasks like image classification and time series forecasting. It outperforms traditional models in both efficiency and accuracy. → Read more
“MambaVision: A Hybrid Mamba-Transformer Vision Backbone” by NVIDIA introduces a hybrid model that combines the Mamba architecture with Vision Transformers (ViT) for better performance in visual tasks. The redesigned Mamba part models visual features efficiently, while self-attention blocks enhance long-range spatial detail capture. → Read more
“Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis” tested three Mamba models for speech separation, recognition, and synthesis, and found that they generally match or outperform Transformers in performance, especially with longer speech, but are less efficient for shorter speech or joint text-speech processing. → Read more
“Audio Mamba: Bidirectional State Space Model for Audio Representation Learning” introduces Audio Mamba (AuM), a self-attention-free model based on state space models (SSMs) like Mamba. AuM avoids the high computational costs of self-attention and performs comparably or better than traditional Audio Spectrogram Transformers. → Read more
“Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing” offers a neuromorphic computing architecture that combines spiking neural networks (SNNs) with the Mamba backbone to efficiently process time-varying data. Mamba's linear-time sequence modeling is used for handling complex temporal dependencies. → Read more
“Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference”: By integrating the Mamba LM, Cobra achieves linear complexity, enhancing speed while maintaining strong performance. It excels in visual tasks and spatial judgments. → Read more
“Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models” introduces Meteor, a Mamba-based LLVM, that enhances understanding and answering with efficient, detailed rationales in linear time. It boosts performance across benchmarks without larger models or extra vision encoders. → Read more
BONUS: An original research on Mamba architecture:
“Mamba: Linear-Time Sequence Modeling with Selective State Spaces” by Carnegie Mellon University and Princeton University explains how Mamba works. → Read more

15 Researches about Mamba Architecture

Reply

Keep Reading

Turing Post