Reasoning Model

A reasoning model is a Large Language Model (LLM) optimized to perform additional inference before producing a final answer. It is designed for tasks that benefit from planning, decomposition, verification, or multi-step problem solving, such as mathematics, software engineering, scientific analysis, and complex tool calling.

The defining characteristic is not that the model emits a visible explanation. Rather, the system allocates computational work to intermediate reasoning before generating the user-facing output. Some APIs represent this through a reasoning-effort setting or account for internal reasoning with separate tokens.

This additional work is one form of inference-time scaling. Training methods such as Reinforcement Learning with Verifiable Rewards (RLVR) can encourage reasoning strategies whose outcomes are checked automatically.

Reasoning models are related to chain-of-thought prompting, but the terms are not interchangeable. Chain-of-thought prompting is an input technique intended to elicit intermediate steps. A reasoning model is trained or otherwise optimized to reason during inference, and its internal reasoning may not be exposed.

Compared with a low-latency general-purpose model, a reasoning model commonly trades:

more inference time and compute for higher accuracy on difficult tasks;
concise final answers for additional hidden deliberation;
predictable single-pass behavior for iterative planning and verification; and
lower per-request cost for improved performance on problems where naive generation fails.

Reasoning models are particularly useful inside an AI agent, where the model must decide which action to take, interpret tool results, recover from errors, and maintain progress over many steps. However, more reasoning does not guarantee correctness. Models can still make false assumptions, misuse tools, or produce confident hallucinations.

Evaluation should therefore focus on externally verifiable outcomes rather than the plausibility of an explanation. Suitable methods include deterministic checks, execution-based tests, held-out task suites, and agent evaluation for long-horizon behavior.

The OpenAI reasoning model guide documents practical differences between reasoning-oriented and conventional model usage.

The LLM Knowledge Base is a collection of bite-sized explanations for commonly used terms and abbreviations related to Large Language Models and Generative AI.

It's an educational resource that helps you stay up-to-date with the latest developments in AI research and its applications.

Quantization

Proximal Policy Optimization (PPO)

Proprietary Model

Prompt Optimization

Prompt Injection Attack

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning with Verifiable Rewards (RLVR)

Reranking

Retrieval-Augmented Generation (RAG)

Role prompting