Overview

Reranking improves search quality by scoring query-document pairs with cross-attention. Cross-encoders see both query and document together, enabling deeper semantic matching than embedding similarity alone.

Quick Example

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

query = Item(text="What is machine learning?")
items = [
    Item(text="Machine learning is a subset of AI that learns from data."),
    Item(text="The weather forecast predicts rain tomorrow."),
    Item(text="Deep neural networks power modern ML systems."),
]

result = client.score("BAAI/bge-reranker-v2-m3", query, items)

for entry in result["scores"]:
    print(f"Rank {entry['rank']}: {entry['score']:.3f}")

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

const query = { text: "What is machine learning?" };
const items = [
  { text: "Machine learning is a subset of AI that learns from data." },
  { text: "The weather forecast predicts rain tomorrow." },
  { text: "Deep neural networks power modern ML systems." },
];

const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);

for (const entry of result.scores) {
  console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);
}

await client.close();

When to Rerank

Use reranking when:

First-stage retrieval returns good candidates but imperfect ordering
You retrieve 50-100 candidates and want the top 10
Query-document relevance requires deep understanding

Skip reranking when:

You need sub-10ms latency (reranking adds 20-100ms)
Your retrieval is already high quality
You’re processing millions of documents (rerank a subset instead)

Two-Stage Retrieval

The standard pattern: retrieve many candidates with embeddings, rerank the top-k with a cross-encoder.

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Stage 1: Retrieve candidates with embeddings
query_text = "What is machine learning?"
query_embedding = client.encode(
    "BAAI/bge-m3",
    Item(text=query_text),
    is_query=True,
)
# ... search your vector database, get top 100 candidates ...

# Stage 2: Rerank top candidates with IDs for tracking
query = Item(text=query_text)
candidates = [
    Item(id=f"doc-{i}", text=doc["text"])
    for i, doc in enumerate(top_100_docs)
]

result = client.score("BAAI/bge-reranker-v2-m3", query, candidates)

# Get top 10 by item_id after reranking
top_10_ids = [entry["item_id"] for entry in result["scores"][:10]]

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

// Stage 1: Retrieve candidates with embeddings
const queryText = "What is machine learning?";
const queryEmbedding = await client.encode(
  "BAAI/bge-m3",
  { text: queryText },
  { isQuery: true }
);
// ... search your vector database, get top 100 candidates ...

// Stage 2: Rerank top candidates with IDs for tracking
const query = { text: queryText };
const candidates = top100Docs.map((doc, i) => ({
  id: `doc-${i}`,
  text: doc.text,
}));

const result = await client.score("BAAI/bge-reranker-v2-m3", query, candidates);

// Get top 10 by itemId after reranking
const top10Ids = result.scores.slice(0, 10).map((entry) => entry.itemId);

await client.close();

Response Format

The ScoreResult contains:

Field	Type	Description
`model`	`str`	Model used for scoring
`query_id`	`str \| None`	Query ID if provided
`scores`	`list[ScoreEntry]`	Scored and ranked results

Each ScoreEntry contains:

Field	Type	Description
`item_id`	`str \| None`	Document ID (from input or auto-generated as `item-N`)
`score`	`float`	Relevance score (higher = more relevant)
`rank`	`int`	Rank position (0 = most relevant)

Python
TypeScript

result = client.score("BAAI/bge-reranker-v2-m3", query, items)

# Scores are pre-sorted by relevance (highest first)
for entry in result["scores"]:
    print(f"Rank {entry['rank']}: {entry['item_id']} score={entry['score']:.3f}")

const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);

// Scores are pre-sorted by relevance (highest first)
for (const entry of result.scores) {
  console.log(`Rank ${entry.rank}: ${entry.itemId} score=${entry.score.toFixed(3)}`);
}

Using Item IDs

Track items through reranking with IDs:

Python
TypeScript

query = Item(id="q1", text="What is Python?")
items = [
    Item(id="doc-1", text="Python is a programming language."),
    Item(id="doc-2", text="Snakes are reptiles."),
    Item(id="doc-3", text="Python was created by Guido van Rossum."),
]

result = client.score("BAAI/bge-reranker-v2-m3", query, items)

for entry in result["scores"]:
    print(f"{entry['item_id']}: rank={entry['rank']}, score={entry['score']:.3f}")
# doc-1: rank=0, score=0.891
# doc-3: rank=1, score=0.756
# doc-2: rank=2, score=0.012

const query = { id: "q1", text: "What is Python?" };
const items = [
  { id: "doc-1", text: "Python is a programming language." },
  { id: "doc-2", text: "Snakes are reptiles." },
  { id: "doc-3", text: "Python was created by Guido van Rossum." },
];

const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);

for (const entry of result.scores) {
  console.log(`${entry.itemId}: rank=${entry.rank}, score=${entry.score.toFixed(3)}`);
}
// doc-1: rank=0, score=0.891
// doc-3: rank=1, score=0.756
// doc-2: rank=2, score=0.012

Multi-Vector Reranking

ColBERT-style models can also rerank via MaxSim scoring. This uses pre-computed multi-vector embeddings:

Python
TypeScript

from sie_sdk.scoring import maxsim

# Encode query and documents with multivector output
query_result = client.encode(
    "jinaai/jina-colbert-v2",
    Item(text="What is ColBERT?"),
    output_types=["multivector"],
    is_query=True,
)

doc_results = client.encode(
    "jinaai/jina-colbert-v2",
    documents,
    output_types=["multivector"]
)

# Score with MaxSim
query_mv = query_result["multivector"]
doc_mvs = [r["multivector"] for r in doc_results]
scores = maxsim(query_mv, doc_mvs)

# Rank by score
ranked = sorted(enumerate(scores), key=lambda x: -x[1])

// Encode query and documents with multivector output
const queryResult = await client.encode(
  "jinaai/jina-colbert-v2",
  { text: "What is ColBERT?" },
  { outputTypes: ["multivector"], isQuery: true }
);

const docResults = await client.encode(
  "jinaai/jina-colbert-v2",
  documents,
  { outputTypes: ["multivector"] }
);

// Score with MaxSim (client-side computation)
const queryMv = queryResult.multivector!;
const docMvs = docResults.map((r) => r.multivector!);

// Compute MaxSim scores
const scores = docMvs.map((docMv) => {
  let score = 0;
  for (const qToken of queryMv) {
    let maxSim = -Infinity;
    for (const dToken of docMv) {
      const sim = dotProduct(qToken, dToken);
      if (sim > maxSim) maxSim = sim;
    }
    score += maxSim;
  }
  return score;
});

// Rank by score
const ranked = scores
  .map((score, idx) => ({ idx, score }))
  .sort((a, b) => b.score - a.score);

See Multi-vector embeddings for details.

Reranker Models

Model	Max Length	Notes
`BAAI/bge-reranker-v2-m3`	8192	Multilingual
`jinaai/jina-reranker-v2-base-multilingual`	8192	Multilingual
`Alibaba-NLP/gte-reranker-modernbert-base`	8192	ModernBERT architecture
`cross-encoder/ms-marco-MiniLM-L-12-v2`	512	Smaller, faster

See Reranker Models for the complete catalog.

HTTP API

The server defaults to msgpack. For JSON responses:

curl -X POST http://localhost:8080/v1/score/BAAI/bge-reranker-v2-m3 \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "query": {"text": "What is machine learning?"},
    "items": [
      {"text": "Machine learning uses algorithms to learn from data."},
      {"text": "The weather is sunny today."}
    ]
  }'

Response:

{
  "model": "BAAI/bge-reranker-v2-m3",
  "scores": [
    {"item_id": "item-0", "score": 0.891, "rank": 0},
    {"item_id": "item-1", "score": 0.023, "rank": 1}
  ]
}

Performance Considerations

Batch size matters. Cross-encoders process query-document pairs. 100 documents = 100 forward passes. Keep candidate sets reasonable (50-200).

Latency vs quality. Smaller models (MiniLM) are faster but less accurate. Larger models (BGE-reranker-v2-m3) give better quality at higher latency.

GPU utilization. Reranking benefits from batching. The server batches concurrent requests automatically.

What’s Next

Reranker models - complete model catalog
Multi-vector reranking - ColBERT MaxSim scoring