Skip to content
SIE

Models

SIE supports cross-encoder rerankers and ColBERT-style multivector models for reranking. Model performance varies by task. Run mise run eval <model> -t <task> to benchmark on your data.


Cross-encoders score query-document pairs with full cross-attention. They see both inputs together for deeper semantic matching.

ModelMax LengthNotes
BAAI/bge-reranker-base512Smaller, English-focused
BAAI/bge-reranker-large512Larger, English-focused
BAAI/bge-reranker-v2-m38192Long context, 100+ languages
ModelMax LengthNotes
mixedbread-ai/mxbai-rerank-base-v28192Base size
mixedbread-ai/mxbai-rerank-large-v28192Larger model
ModelMax LengthNotes
jinaai/jina-reranker-v2-base-multilingual8192100+ languages
ModelMax LengthNotes
Alibaba-NLP/gte-reranker-modernbert-base8192ModernBERT architecture
ModelMax LengthNotes
cross-encoder/ms-marco-MiniLM-L-6-v2512Smallest, fastest
cross-encoder/ms-marco-MiniLM-L-12-v2512Balanced

ColBERT models produce per-token embeddings and score with MaxSim. Useful when you have pre-computed multivector embeddings.

ModelToken DimMax LengthNotes
colbert-ir/colbertv2.0128512Original ColBERT
jinaai/jina-colbert-v21288192Long context, multilingual
mixedbread-ai/mxbai-colbert-large-v1128512Larger model
mixedbread-ai/mxbai-edge-colbert-v0-32m648192Compact, edge-friendly
answerdotai/answerai-colbert-small-v196512Compact
lightonai/GTE-ModernColBERT-v11288192ModernBERT architecture
lightonai/Reason-ModernColBERT1288192Reasoning-focused
nvidia/llama-nemoretriever-colembed-3b-v11288192Text + image, LLM-based

ColBERT reranking uses MaxSim over pre-encoded multivector embeddings:

from sie_sdk import SIEClient
from sie_sdk.types import Item
from sie_sdk.scoring import maxsim
client = SIEClient("http://localhost:8080")
# Encode query and documents
query_result = client.encode(
"jinaai/jina-colbert-v2",
Item(text="What is ColBERT?"),
output_types=["multivector"],
is_query=True,
)
doc_results = client.encode(
"jinaai/jina-colbert-v2",
[Item(text="ColBERT uses late interaction..."), Item(text="The weather is nice.")],
output_types=["multivector"]
)
# Score with MaxSim
query_mv = query_result["multivector"]
doc_mvs = [r["multivector"] for r in doc_results]
scores = maxsim(query_mv, doc_mvs)
# Rank by score
ranked = sorted(enumerate(scores), key=lambda x: -x[1])

English only:

  • BAAI/bge-reranker-base, BAAI/bge-reranker-large
  • cross-encoder/ms-marco-MiniLM-L-6-v2, cross-encoder/ms-marco-MiniLM-L-12-v2

Multilingual (100+ languages):

  • BAAI/bge-reranker-v2-m3
  • jinaai/jina-reranker-v2-base-multilingual
  • jinaai/jina-colbert-v2

Short context (512 tokens):

  • BAAI/bge-reranker-base, BAAI/bge-reranker-large
  • cross-encoder/ms-marco-MiniLM-L-*
  • colbert-ir/colbertv2.0, mixedbread-ai/mxbai-colbert-large-v1

Long context (8192 tokens):

  • BAAI/bge-reranker-v2-m3
  • jinaai/jina-reranker-v2-base-multilingual
  • Alibaba-NLP/gte-reranker-modernbert-base
  • mixedbread-ai/mxbai-rerank-base-v2, mixedbread-ai/mxbai-rerank-large-v2
  • jinaai/jina-colbert-v2, lightonai/GTE-ModernColBERT-v1

Compact (fast inference):

  • cross-encoder/ms-marco-MiniLM-L-6-v2 - smallest cross-encoder
  • mixedbread-ai/mxbai-edge-colbert-v0-32m - 32M parameters
  • answerdotai/answerai-colbert-small-v1 - compact ColBERT

Large (higher capacity):

  • BAAI/bge-reranker-large
  • mixedbread-ai/mxbai-rerank-large-v2
  • mixedbread-ai/mxbai-colbert-large-v1
  • nvidia/llama-nemoretriever-colembed-3b-v1 - 3B parameters

Use the eval harness to benchmark rerankers on your data:

Terminal window
# Quality evaluation
mise run eval BAAI/bge-reranker-v2-m3 -t mteb/AskUbuntuDupQuestions --type quality
# Performance evaluation
mise run eval BAAI/bge-reranker-v2-m3 -t mteb/AskUbuntuDupQuestions --type perf
# Compare multiple models
mise run eval BAAI/bge-reranker-base -t mteb/AskUbuntuDupQuestions --type quality
mise run eval cross-encoder/ms-marco-MiniLM-L-12-v2 -t mteb/AskUbuntuDupQuestions --type quality