Skip to content
SIE

Overview

Reranking improves search quality by scoring query-document pairs with cross-attention. Cross-encoders see both query and document together, enabling deeper semantic matching than embedding similarity alone.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
query = Item(text="What is machine learning?")
items = [
Item(text="Machine learning is a subset of AI that learns from data."),
Item(text="The weather forecast predicts rain tomorrow."),
Item(text="Deep neural networks power modern ML systems."),
]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]:
print(f"Rank {entry['rank']}: {entry['score']:.3f}")

Use reranking when:

  • First-stage retrieval returns good candidates but imperfect ordering
  • You retrieve 50-100 candidates and want the top 10
  • Query-document relevance requires deep understanding

Skip reranking when:

  • You need sub-10ms latency (reranking adds 20-100ms)
  • Your retrieval is already high quality
  • You’re processing millions of documents (rerank a subset instead)

The standard pattern: retrieve many candidates with embeddings, rerank the top-k with a cross-encoder.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Stage 1: Retrieve candidates with embeddings
query_text = "What is machine learning?"
query_embedding = client.encode(
"BAAI/bge-m3",
Item(text=query_text),
is_query=True,
)
# ... search your vector database, get top 100 candidates ...
# Stage 2: Rerank top candidates with IDs for tracking
query = Item(text=query_text)
candidates = [
Item(id=f"doc-{i}", text=doc["text"])
for i, doc in enumerate(top_100_docs)
]
result = client.score("BAAI/bge-reranker-v2-m3", query, candidates)
# Get top 10 by item_id after reranking
top_10_ids = [entry["item_id"] for entry in result["scores"][:10]]

The ScoreResult contains:

FieldTypeDescription
modelstrModel used for scoring
query_idstr | NoneQuery ID if provided
scoreslist[ScoreEntry]Scored and ranked results

Each ScoreEntry contains:

FieldTypeDescription
item_idstr | NoneDocument ID (from input or auto-generated as item-N)
scorefloatRelevance score (higher = more relevant)
rankintRank position (0 = most relevant)
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
# Scores are pre-sorted by relevance (highest first)
for entry in result["scores"]:
print(f"Rank {entry['rank']}: {entry['item_id']} score={entry['score']:.3f}")

Track items through reranking with IDs:

query = Item(id="q1", text="What is Python?")
items = [
Item(id="doc-1", text="Python is a programming language."),
Item(id="doc-2", text="Snakes are reptiles."),
Item(id="doc-3", text="Python was created by Guido van Rossum."),
]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]:
print(f"{entry['item_id']}: rank={entry['rank']}, score={entry['score']:.3f}")
# doc-1: rank=0, score=0.891
# doc-3: rank=1, score=0.756
# doc-2: rank=2, score=0.012

ColBERT-style models can also rerank via MaxSim scoring. This uses pre-computed multi-vector embeddings:

from sie_sdk.scoring import maxsim
# Encode query and documents with multivector output
query_result = client.encode(
"jinaai/jina-colbert-v2",
Item(text="What is ColBERT?"),
output_types=["multivector"],
is_query=True,
)
doc_results = client.encode(
"jinaai/jina-colbert-v2",
documents,
output_types=["multivector"]
)
# Score with MaxSim
query_mv = query_result["multivector"]
doc_mvs = [r["multivector"] for r in doc_results]
scores = maxsim(query_mv, doc_mvs)
# Rank by score
ranked = sorted(enumerate(scores), key=lambda x: -x[1])

See Multi-vector embeddings for details.

ModelMax LengthNotes
BAAI/bge-reranker-v2-m38192Multilingual
jinaai/jina-reranker-v2-base-multilingual8192Multilingual
Alibaba-NLP/gte-reranker-modernbert-base8192ModernBERT architecture
cross-encoder/ms-marco-MiniLM-L-12-v2512Smaller, faster

See Reranker Models for the complete catalog.

The server defaults to msgpack. For JSON responses:

Terminal window
curl -X POST http://localhost:8080/v1/score/BAAI/bge-reranker-v2-m3 \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"query": {"text": "What is machine learning?"},
"items": [
{"text": "Machine learning uses algorithms to learn from data."},
{"text": "The weather is sunny today."}
]
}'

Response:

{
"model": "BAAI/bge-reranker-v2-m3",
"scores": [
{"item_id": "item-0", "score": 0.891, "rank": 0},
{"item_id": "item-1", "score": 0.023, "rank": 1}
]
}

Batch size matters. Cross-encoders process query-document pairs. 100 documents = 100 forward passes. Keep candidate sets reasonable (50-200).

Latency vs quality. Smaller models (MiniLM) are faster but less accurate. Larger models (BGE-reranker-v2-m3) give better quality at higher latency.

GPU utilization. Reranking benefits from batching. The server batches concurrent requests automatically.