This week, the spotlight was on Small Language Models (SLMs). With fewer parameters and a more compact architecture, SLMs perform tasks faster than large-scale models, needing less processing power and memory. All that means that they can run on local devices like our smartphones.
Researchers are increasingly interested in SLMs for their potential to enable new applications, reduce inference costs, and enhance user privacy. When designed and trained carefully, small models can achieve results comparable to large-scale models.
Here is a list of new SMLs announced this week:
GPT-4o mini is the most cost-efficient small model developed by OpenAI. GPT-4o mini supports a wide range of tasks, such as chaining or parallelizing model calls, handling large context volumes, or providing fast, real-time customer interactions. It has a context window of 128K tokens, supports up to 16K output tokens per request and supports text and vision in the API. β Read more
SmolLM by Hugging Face offers a series of state-of-the-art small language models in three sizes: 135M, 360M, and 1.7B parameters. Itβs built on a high-quality training corpus called SmolLM-Corpus. These models outperform others in their size categories across diverse benchmarks, testing common sense reasoning and world knowledge. β Read more
Code is available here.
Mistral NeMo, a 12B model developed in collaboration with NVIDIA, provides a 128k token context window and excels in reasoning, world knowledge, and coding accuracy for its size. As it has standard architecture it is ease of use. β Read more
Mathstral 7B is specified for math reasoning and scientific research with 32 context window. It shows significant reasoning across various industry-standard benchmarks. β Read more
Codestral Mamba 7B uses Mamba2 architecture for code generation. It's one of the first open source models with this architecture. Leveraging advantages of Mamba models, it demonstrates advanced code and reasoning capabilities. β Read more
Hereβs a bonus list of top 3 SLMs that were released earlier:
Microsoft's Phi-3 family of SLMs: State-of-the-art open models trained with the Phi-3 datasets, including synthetic data and filtered high-quality public web data. Each model from Phi-3 family is available in two context window length. Explore the model catalog. β Read more
Gemma 2B and 7B developed by Google DeepMind share the same technical components with Gemini. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. They are capable of running on a developer laptop or desktop computer. β Read more
Meta Llama 3 8B is a small version of Llama 70B. Itβs pretrained and instruction tuned generative text model with 8k context window. Llama 3 8B is optimized for dialogue and outperforms many open-source chat models on industry benchmarks. Explore the model performance parameters. β Read more
