Hybrid Search

Hybrid search combines lexical retrieval with semantic vector search to produce one ranked set of results. It is commonly used in Retrieval-Augmented Generation (RAG) because the two retrieval methods have complementary strengths.

Lexical search, often based on BM25 or a related term-frequency method, matches exact words and phrases. It performs well for identifiers, product codes, error messages, names, and rare terminology. Vector search compares embeddings and can retrieve conceptually related documents even when they use different vocabulary.

A hybrid system runs both searches and combines their result lists. Fusion methods include:

weighted normalization of lexical and vector scores;
Reciprocal Rank Fusion, which combines result positions rather than raw scores;
learned ranking models; and
retrieving a broad union before applying a separate reranker.

Raw lexical and vector scores usually have different distributions and should not be added without calibration. The optimal weighting also depends on the corpus and query type. A query containing an exact serial number may need stronger lexical weighting, while a conceptual question may benefit from semantic retrieval.

Metadata filtering, access control, chunking, and index freshness remain important. Hybrid retrieval cannot recover a document that was never indexed or is excluded by an incorrect filter.

For agentic RAG, the agent may select hybrid search for broad discovery and then refine results with additional queries or tools.

Google Cloud's hybrid search documentation describes combining dense vector and sparse token embeddings for retrieval.

The LLM Knowledge Base is a collection of bite-sized explanations for commonly used terms and abbreviations related to Large Language Models and Generative AI.

It's an educational resource that helps you stay up-to-date with the latest developments in AI research and its applications.

Hallucination

Group Relative Policy Optimization (GRPO)

Graph RAG

Generative Pre-trained Transformer (GPT)

Generative AI

In-context Learning

Inference

Inference-Time Scaling (Test-Time Compute)

Input Token

Integrated Prompting Environment (IPE)