Models
SIE supports cross-encoder rerankers and ColBERT-style multivector models for reranking. Model performance varies by task. Run mise run eval <model> -t <task> to benchmark on your data.
Cross-Encoder Models
Section titled “Cross-Encoder Models”Cross-encoders score query-document pairs with full cross-attention. They see both inputs together for deeper semantic matching.
BGE Rerankers
Section titled “BGE Rerankers”| Model | Max Length | Notes |
|---|---|---|
BAAI/bge-reranker-base | 512 | Smaller, English-focused |
BAAI/bge-reranker-large | 512 | Larger, English-focused |
BAAI/bge-reranker-v2-m3 | 8192 | Long context, 100+ languages |
Mixedbread Rerankers
Section titled “Mixedbread Rerankers”| Model | Max Length | Notes |
|---|---|---|
mixedbread-ai/mxbai-rerank-base-v2 | 8192 | Base size |
mixedbread-ai/mxbai-rerank-large-v2 | 8192 | Larger model |
Jina Rerankers
Section titled “Jina Rerankers”| Model | Max Length | Notes |
|---|---|---|
jinaai/jina-reranker-v2-base-multilingual | 8192 | 100+ languages |
GTE Rerankers
Section titled “GTE Rerankers”| Model | Max Length | Notes |
|---|---|---|
Alibaba-NLP/gte-reranker-modernbert-base | 8192 | ModernBERT architecture |
MS-MARCO Cross-Encoders
Section titled “MS-MARCO Cross-Encoders”| Model | Max Length | Notes |
|---|---|---|
cross-encoder/ms-marco-MiniLM-L-6-v2 | 512 | Smallest, fastest |
cross-encoder/ms-marco-MiniLM-L-12-v2 | 512 | Balanced |
Multi-Vector Rerankers (ColBERT)
Section titled “Multi-Vector Rerankers (ColBERT)”ColBERT models produce per-token embeddings and score with MaxSim. Useful when you have pre-computed multivector embeddings.
| Model | Token Dim | Max Length | Notes |
|---|---|---|---|
colbert-ir/colbertv2.0 | 128 | 512 | Original ColBERT |
jinaai/jina-colbert-v2 | 128 | 8192 | Long context, multilingual |
mixedbread-ai/mxbai-colbert-large-v1 | 128 | 512 | Larger model |
mixedbread-ai/mxbai-edge-colbert-v0-32m | 64 | 8192 | Compact, edge-friendly |
answerdotai/answerai-colbert-small-v1 | 96 | 512 | Compact |
lightonai/GTE-ModernColBERT-v1 | 128 | 8192 | ModernBERT architecture |
lightonai/Reason-ModernColBERT | 128 | 8192 | Reasoning-focused |
nvidia/llama-nemoretriever-colembed-3b-v1 | 128 | 8192 | Text + image, LLM-based |
MaxSim Scoring
Section titled “MaxSim Scoring”ColBERT reranking uses MaxSim over pre-encoded multivector embeddings:
from sie_sdk import SIEClientfrom sie_sdk.types import Itemfrom sie_sdk.scoring import maxsim
client = SIEClient("http://localhost:8080")
# Encode query and documentsquery_result = client.encode( "jinaai/jina-colbert-v2", Item(text="What is ColBERT?"), output_types=["multivector"], is_query=True,)
doc_results = client.encode( "jinaai/jina-colbert-v2", [Item(text="ColBERT uses late interaction..."), Item(text="The weather is nice.")], output_types=["multivector"])
# Score with MaxSimquery_mv = query_result["multivector"]doc_mvs = [r["multivector"] for r in doc_results]scores = maxsim(query_mv, doc_mvs)
# Rank by scoreranked = sorted(enumerate(scores), key=lambda x: -x[1])import { SIEClient, maxsim } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
// Encode query and documentsconst queryResult = await client.encode( "jinaai/jina-colbert-v2", { text: "What is ColBERT?" }, { outputTypes: ["multivector"], isQuery: true });
const docResults = await client.encode( "jinaai/jina-colbert-v2", [ { text: "ColBERT uses late interaction..." }, { text: "The weather is nice." }, ], { outputTypes: ["multivector"] });
// Score with MaxSim using SDK helperconst queryMv = queryResult.multivector!;const scores = docResults.map((r) => maxsim(queryMv, r.multivector!));
// Rank by scoreconst ranked = scores .map((score, idx) => ({ idx, score })) .sort((a, b) => b.score - a.score);
await client.close();Model Selection
Section titled “Model Selection”By Language Support
Section titled “By Language Support”English only:
BAAI/bge-reranker-base,BAAI/bge-reranker-largecross-encoder/ms-marco-MiniLM-L-6-v2,cross-encoder/ms-marco-MiniLM-L-12-v2
Multilingual (100+ languages):
BAAI/bge-reranker-v2-m3jinaai/jina-reranker-v2-base-multilingualjinaai/jina-colbert-v2
By Context Length
Section titled “By Context Length”Short context (512 tokens):
BAAI/bge-reranker-base,BAAI/bge-reranker-largecross-encoder/ms-marco-MiniLM-L-*colbert-ir/colbertv2.0,mixedbread-ai/mxbai-colbert-large-v1
Long context (8192 tokens):
BAAI/bge-reranker-v2-m3jinaai/jina-reranker-v2-base-multilingualAlibaba-NLP/gte-reranker-modernbert-basemixedbread-ai/mxbai-rerank-base-v2,mixedbread-ai/mxbai-rerank-large-v2jinaai/jina-colbert-v2,lightonai/GTE-ModernColBERT-v1
By Size
Section titled “By Size”Compact (fast inference):
cross-encoder/ms-marco-MiniLM-L-6-v2- smallest cross-encodermixedbread-ai/mxbai-edge-colbert-v0-32m- 32M parametersanswerdotai/answerai-colbert-small-v1- compact ColBERT
Large (higher capacity):
BAAI/bge-reranker-largemixedbread-ai/mxbai-rerank-large-v2mixedbread-ai/mxbai-colbert-large-v1nvidia/llama-nemoretriever-colembed-3b-v1- 3B parameters
Benchmarking
Section titled “Benchmarking”Use the eval harness to benchmark rerankers on your data:
# Quality evaluationmise run eval BAAI/bge-reranker-v2-m3 -t mteb/AskUbuntuDupQuestions --type quality
# Performance evaluationmise run eval BAAI/bge-reranker-v2-m3 -t mteb/AskUbuntuDupQuestions --type perf
# Compare multiple modelsmise run eval BAAI/bge-reranker-base -t mteb/AskUbuntuDupQuestions --type qualitymise run eval cross-encoder/ms-marco-MiniLM-L-12-v2 -t mteb/AskUbuntuDupQuestions --type qualityWhat’s Next
Section titled “What’s Next”- Rerank guide - how to use reranking in your pipeline
- Multi-vector embeddings - ColBERT encoding details