Chunking

Information Retrieval

The process of dividing large documents into smaller, semantically coherent pieces suitable for embedding and retrieval in RAG systems.

Chunking is the process of splitting documents into smaller segments (chunks) that can be individually embedded and retrieved. It is one of the most critical steps in a RAG pipeline because embedding models have token limits, LLM context windows require concise relevant content, and large chunks spanning multiple topics produce poor similarity scores.

Common chunking strategies include fixed-size chunking (split every N tokens with overlap), sentence-based chunking, structural chunking (split by headers and sections), semantic chunking (split when consecutive sentence similarity drops), and hierarchical chunking (create chunks at multiple granularities). Advanced approaches include entity-aware chunking that never splits named entities across boundaries and code-aware chunking that keeps code blocks intact.

The sweet spot for most use cases is 200-800 tokens per chunk with 10-20% overlap between consecutive chunks. Too-small chunks lack context and produce poor embeddings, while too-large chunks mix topics and dilute similarity scores. Chunk overlap prevents information loss at boundaries where important context might be split across adjacent chunks.

Last updated: February 22, 2026

Chunking

Related Terms