Sparse Retrieval

Information Retrieval

A retrieval method using high-dimensional sparse vectors based on term frequencies (like BM25 or TF-IDF), where most vector elements are zero.

Sparse retrieval refers to information retrieval methods that represent queries and documents as high-dimensional sparse vectors where most elements are zero. The most common sparse retrieval methods are TF-IDF and BM25, which create vectors with one dimension per vocabulary term, where non-zero values indicate term importance based on frequency statistics.

The term "sparse" describes both the vector representation (a document containing 100 unique words out of a 100,000-word vocabulary produces a vector with 99.9% zeros) and the matching behavior (only documents sharing exact terms with the query receive non-zero scores). This makes sparse retrieval excellent for exact-match scenarios like product IDs, error codes, and technical terminology.

Sparse retrieval requires no GPU, no neural model training, and provides interpretable results (you can see exactly which terms matched and contributed to the score). However, it cannot capture semantic similarity -- "car" and "automobile" are completely different terms in a sparse representation. In modern RAG systems, sparse retrieval is combined with dense retrieval in hybrid search to cover both exact-match and semantic-match needs.

Last updated: February 22, 2026

Sparse Retrieval

Related Terms