Bi-Encoder

Information Retrieval

A model architecture that independently encodes queries and documents into separate embeddings for fast similarity comparison, used for initial retrieval at scale.

A bi-encoder is a model architecture that uses two separate encoders (or a shared encoder) to independently convert queries and documents into fixed-size vector embeddings. Relevance is then determined by computing similarity (typically cosine similarity or dot product) between the query embedding and each document embedding.

The key advantage of bi-encoders is efficiency: document embeddings can be pre-computed and indexed offline, so at query time only the query needs to be encoded. This enables sub-millisecond retrieval over millions of documents using approximate nearest neighbor search in vector databases. Common bi-encoder models include Dense Passage Retrieval (DPR), sentence-transformers models, and commercial embedding APIs like OpenAI's text-embedding-3.

The tradeoff compared to cross-encoders is lower accuracy, because the query and document cannot directly attend to each other during encoding. Each vector must independently capture all relevant meaning. This limitation is why production RAG systems typically use bi-encoders for fast initial retrieval of candidates, followed by cross-encoder reranking for precision on the final results.

Last updated: February 22, 2026

Bi-Encoder

Related Terms