Retrieval Pipeline

Information Retrieval

The end-to-end sequence of steps in a RAG system: query processing, document retrieval, reranking, context construction, and LLM generation.

A retrieval pipeline is the complete sequence of steps that transforms a user query into a grounded, cited answer in a RAG system. The standard pipeline consists of: (1) query processing (rewriting, expansion, decomposition), (2) retrieval (searching for relevant document chunks using vector similarity, keyword matching, or both), (3) reranking (rescoring candidates with a cross-encoder for better precision), (4) context construction (formatting retrieved documents for the LLM), and (5) generation (producing the final answer with citations).

Each stage in the pipeline involves design decisions with significant quality implications. Query rewriting can improve recall by 15% or more. Hybrid retrieval combining BM25 and dense search outperforms either alone. Reranking improves precision of the final top results. Context construction must balance providing enough information against overwhelming the LLM with noise.

In hybrid RAG+KG systems, the pipeline is extended with knowledge graph components: entity extraction from the query, graph traversal for structured facts, and context fusion that combines graph results with document results before generation. Production pipelines also include caching, cost optimization, monitoring, and evaluation metrics like faithfulness, relevancy, and groundedness.

Last updated: February 22, 2026

Retrieval Pipeline

Related Terms