Overview
Reranking improves search quality by scoring query-document pairs with cross-attention. Cross-encoders see both query and document together, enabling deeper semantic matching than embedding similarity alone.
Quick Example
Section titled “Quick Example”from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
query = Item(text="What is machine learning?")items = [ Item(text="Machine learning is a subset of AI that learns from data."), Item(text="The weather forecast predicts rain tomorrow."), Item(text="Deep neural networks power modern ML systems."),]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]: print(f"Rank {entry['rank']}: {entry['score']:.3f}")import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
const query = { text: "What is machine learning?" };const items = [ { text: "Machine learning is a subset of AI that learns from data." }, { text: "The weather forecast predicts rain tomorrow." }, { text: "Deep neural networks power modern ML systems." },];
const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);
for (const entry of result.scores) { console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);}
await client.close();When to Rerank
Section titled “When to Rerank”Use reranking when:
- First-stage retrieval returns good candidates but imperfect ordering
- You retrieve 50-100 candidates and want the top 10
- Query-document relevance requires deep understanding
Skip reranking when:
- You need sub-10ms latency (reranking adds 20-100ms)
- Your retrieval is already high quality
- You’re processing millions of documents (rerank a subset instead)
Two-Stage Retrieval
Section titled “Two-Stage Retrieval”The standard pattern: retrieve many candidates with embeddings, rerank the top-k with a cross-encoder.
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Stage 1: Retrieve candidates with embeddingsquery_text = "What is machine learning?"query_embedding = client.encode( "BAAI/bge-m3", Item(text=query_text), is_query=True,)# ... search your vector database, get top 100 candidates ...
# Stage 2: Rerank top candidates with IDs for trackingquery = Item(text=query_text)candidates = [ Item(id=f"doc-{i}", text=doc["text"]) for i, doc in enumerate(top_100_docs)]
result = client.score("BAAI/bge-reranker-v2-m3", query, candidates)
# Get top 10 by item_id after rerankingtop_10_ids = [entry["item_id"] for entry in result["scores"][:10]]import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
// Stage 1: Retrieve candidates with embeddingsconst queryText = "What is machine learning?";const queryEmbedding = await client.encode( "BAAI/bge-m3", { text: queryText }, { isQuery: true });// ... search your vector database, get top 100 candidates ...
// Stage 2: Rerank top candidates with IDs for trackingconst query = { text: queryText };const candidates = top100Docs.map((doc, i) => ({ id: `doc-${i}`, text: doc.text,}));
const result = await client.score("BAAI/bge-reranker-v2-m3", query, candidates);
// Get top 10 by itemId after rerankingconst top10Ids = result.scores.slice(0, 10).map((entry) => entry.itemId);
await client.close();Response Format
Section titled “Response Format”The ScoreResult contains:
| Field | Type | Description |
|---|---|---|
model | str | Model used for scoring |
query_id | str | None | Query ID if provided |
scores | list[ScoreEntry] | Scored and ranked results |
Each ScoreEntry contains:
| Field | Type | Description |
|---|---|---|
item_id | str | None | Document ID (from input or auto-generated as item-N) |
score | float | Relevance score (higher = more relevant) |
rank | int | Rank position (0 = most relevant) |
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
# Scores are pre-sorted by relevance (highest first)for entry in result["scores"]: print(f"Rank {entry['rank']}: {entry['item_id']} score={entry['score']:.3f}")const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);
// Scores are pre-sorted by relevance (highest first)for (const entry of result.scores) { console.log(`Rank ${entry.rank}: ${entry.itemId} score=${entry.score.toFixed(3)}`);}Using Item IDs
Section titled “Using Item IDs”Track items through reranking with IDs:
query = Item(id="q1", text="What is Python?")items = [ Item(id="doc-1", text="Python is a programming language."), Item(id="doc-2", text="Snakes are reptiles."), Item(id="doc-3", text="Python was created by Guido van Rossum."),]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]: print(f"{entry['item_id']}: rank={entry['rank']}, score={entry['score']:.3f}")# doc-1: rank=0, score=0.891# doc-3: rank=1, score=0.756# doc-2: rank=2, score=0.012const query = { id: "q1", text: "What is Python?" };const items = [ { id: "doc-1", text: "Python is a programming language." }, { id: "doc-2", text: "Snakes are reptiles." }, { id: "doc-3", text: "Python was created by Guido van Rossum." },];
const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);
for (const entry of result.scores) { console.log(`${entry.itemId}: rank=${entry.rank}, score=${entry.score.toFixed(3)}`);}// doc-1: rank=0, score=0.891// doc-3: rank=1, score=0.756// doc-2: rank=2, score=0.012Multi-Vector Reranking
Section titled “Multi-Vector Reranking”ColBERT-style models can also rerank via MaxSim scoring. This uses pre-computed multi-vector embeddings:
from sie_sdk.scoring import maxsim
# Encode query and documents with multivector outputquery_result = client.encode( "jinaai/jina-colbert-v2", Item(text="What is ColBERT?"), output_types=["multivector"], is_query=True,)
doc_results = client.encode( "jinaai/jina-colbert-v2", documents, output_types=["multivector"])
# Score with MaxSimquery_mv = query_result["multivector"]doc_mvs = [r["multivector"] for r in doc_results]scores = maxsim(query_mv, doc_mvs)
# Rank by scoreranked = sorted(enumerate(scores), key=lambda x: -x[1])// Encode query and documents with multivector outputconst queryResult = await client.encode( "jinaai/jina-colbert-v2", { text: "What is ColBERT?" }, { outputTypes: ["multivector"], isQuery: true });
const docResults = await client.encode( "jinaai/jina-colbert-v2", documents, { outputTypes: ["multivector"] });
// Score with MaxSim (client-side computation)const queryMv = queryResult.multivector!;const docMvs = docResults.map((r) => r.multivector!);
// Compute MaxSim scoresconst scores = docMvs.map((docMv) => { let score = 0; for (const qToken of queryMv) { let maxSim = -Infinity; for (const dToken of docMv) { const sim = dotProduct(qToken, dToken); if (sim > maxSim) maxSim = sim; } score += maxSim; } return score;});
// Rank by scoreconst ranked = scores .map((score, idx) => ({ idx, score })) .sort((a, b) => b.score - a.score);See Multi-vector embeddings for details.
Reranker Models
Section titled “Reranker Models”| Model | Max Length | Notes |
|---|---|---|
BAAI/bge-reranker-v2-m3 | 8192 | Multilingual |
jinaai/jina-reranker-v2-base-multilingual | 8192 | Multilingual |
Alibaba-NLP/gte-reranker-modernbert-base | 8192 | ModernBERT architecture |
cross-encoder/ms-marco-MiniLM-L-12-v2 | 512 | Smaller, faster |
See Reranker Models for the complete catalog.
HTTP API
Section titled “HTTP API”The server defaults to msgpack. For JSON responses:
curl -X POST http://localhost:8080/v1/score/BAAI/bge-reranker-v2-m3 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{ "query": {"text": "What is machine learning?"}, "items": [ {"text": "Machine learning uses algorithms to learn from data."}, {"text": "The weather is sunny today."} ] }'Response:
{ "model": "BAAI/bge-reranker-v2-m3", "scores": [ {"item_id": "item-0", "score": 0.891, "rank": 0}, {"item_id": "item-1", "score": 0.023, "rank": 1} ]}Performance Considerations
Section titled “Performance Considerations”Batch size matters. Cross-encoders process query-document pairs. 100 documents = 100 forward passes. Keep candidate sets reasonable (50-200).
Latency vs quality. Smaller models (MiniLM) are faster but less accurate. Larger models (BGE-reranker-v2-m3) give better quality at higher latency.
GPU utilization. Reranking benefits from batching. The server batches concurrent requests automatically.
What’s Next
Section titled “What’s Next”- Reranker models - complete model catalog
- Multi-vector reranking - ColBERT MaxSim scoring