This week brought some interesting researches on Mamba architecture, showing that it is gaining popularity. Mamba is a simplified model designed to improve how neural networks process sequences of data, such as text, audio or vision data. It replaces complex parts like attention mechanisms and multilayer perceptrons (MLPs) with a streamlined approach using selective state space models (SSMs). This allow it to handle large sequences more efficiently than traditional models, like Transformers. You can find a detailed overview of Mamba architecture in our AI 101 episode.
Here is a list of Mamba-related studies with open code published this summer that could be useful for your research:
βJamba-1.5: Hybrid Transformer-Mamba Models at Scaleβ by AI21Labs introduces a combination of Transformer and Mamba architectures to create efficient open-source LMs with high performance and low memory use, even for long texts. β Read more
βScalable Autoregressive Image Generation with Mambaβ proposes AiM, a new image generation model, that directly applies next-token prediction for image generation instead of traditional modifying Mamba for 2D signals. It results in better quality and faster speeds. β Read more
βMambaEVT: Event Stream based Visual Object Tracking using State Space Modelβ proposes to use a new Mamba-based approach for event camera-based visual tracking. This improves accuracy and efficiency of visual tracking, especially on large datasets. β Read more
βMUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrievalβ shows a new approach, that uses Mamba architecture because of its ability to efficiently handle multi-scale representations, which are crucial for Text-Video Retrieval (TVR). β Read more
βDeMansia: Mamba Never Forgets Any Tokensβ explores the limitations of transformers in handling long sequences and introduces DeMansia architecture, combining state space models, such as Mamba, Vision Mamba (ViM), and LV-ViT, with token labeling, that highly improves image classification. β Read more
βBioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mambaβ shows how a specialized model, built on Mamba, can understand complex biomedical texts and be more effective than models like BioBERT. β Read more
βVSSD: Vision Mamba with Non-Causal State Space Dualityβ introduces a VSSD model that uses non-causal methods to improve both performance and efficiency in vision tasks like classification and segmentation. β Read more
βMambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selectionβ: MambaMixer model selectively mixes data across tokens and channels, improving performance in tasks like image classification and time series forecasting. It outperforms traditional models in both efficiency and accuracy. β Read more
βMambaVision: A Hybrid Mamba-Transformer Vision Backboneβ by NVIDIA introduces a hybrid model that combines the Mamba architecture with Vision Transformers (ViT) for better performance in visual tasks. The redesigned Mamba part models visual features efficiently, while self-attention blocks enhance long-range spatial detail capture. β Read more
βSpeech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesisβ tested three Mamba models for speech separation, recognition, and synthesis, and found that they generally match or outperform Transformers in performance, especially with longer speech, but are less efficient for shorter speech or joint text-speech processing. β Read more
βAudio Mamba: Bidirectional State Space Model for Audio Representation Learningβ introduces Audio Mamba (AuM), a self-attention-free model based on state space models (SSMs) like Mamba. AuM avoids the high computational costs of self-attention and performs comparably or better than traditional Audio Spectrogram Transformers. β Read more
βMamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processingβ offers a neuromorphic computing architecture that combines spiking neural networks (SNNs) with the Mamba backbone to efficiently process time-varying data. Mamba's linear-time sequence modeling is used for handling complex temporal dependencies. β Read more
βCobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inferenceβ: By integrating the Mamba LM, Cobra achieves linear complexity, enhancing speed while maintaining strong performance. It excels in visual tasks and spatial judgments. β Read more
βMeteor: Mamba-based Traversal of Rationale for Large Language and Vision Modelsβ introduces Meteor, a Mamba-based LLVM, that enhances understanding and answering with efficient, detailed rationales in linear time. It boosts performance across benchmarks without larger models or extra vision encoders. β Read more
BONUS: An original research on Mamba architecture:
