A Small Language Model (SLM) is a language model designed to provide useful capability with substantially fewer parameters and lower computational requirements than frontier Large Language Models (LLMs). There is no universal parameter threshold; the term is relative to the models and deployment hardware of a given period.
SLMs are useful when an application prioritizes:
- low inference latency;
- reduced serving cost;
- on-device or offline operation;
- data locality and privacy;
- predictable capacity; or
- high request volume for a narrow task.
Model size alone does not determine quality. Data curation, architecture, tokenization, training compute, and post-training can allow a smaller model to perform competitively on specific workloads. Model distillation, synthetic data, and quantization are frequently used in SLM development and deployment.
SLMs are often selected for classification, extraction, routing, summarization, embedded assistants, and constrained tool calling. A larger reasoning model may be used only when the smaller model is uncertain or the task exceeds its capability.
The tradeoff is reduced breadth and robustness. An SLM may require more careful task definition, narrower context, domain-specific fine-tuning, and stronger fallback logic. Evaluation must use the actual deployment environment because memory, thermal limits, and hardware acceleration affect practical performance.
The Phi-3 technical report describes a compact language model designed to run locally on consumer hardware while retaining strong benchmark performance.
The LLM Knowledge Base is a collection of bite-sized explanations for commonly used terms and abbreviations related to Large Language Models and Generative AI.
It's an educational resource that helps you stay up-to-date with the latest developments in AI research and its applications.