Sparse & Hybrid Search
Sparse vectors capture lexical (keyword) signals. Unlike dense embeddings that compress meaning into fixed-size vectors (384 to 4096+ dimensions depending on the model), sparse vectors assign weights directly to vocabulary tokens. This enables exact term matching alongside semantic search.
Quick Example
Section titled “Quick Example”from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
result = client.encode( "BAAI/bge-m3", Item(text="machine learning algorithms"), output_types=["sparse"])
# Sparse vector: token IDs -> weightssparse = result["sparse"]print(f"Non-zero tokens: {len(sparse['indices'])}")import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
const result = await client.encode( "BAAI/bge-m3", { text: "machine learning algorithms" }, { outputTypes: ["sparse"] });
// Sparse vector: token IDs -> weightsconst sparse = result.sparse;console.log(`Non-zero tokens: ${sparse?.indices.length}`);
await client.close();When to Use Sparse Embeddings
Section titled “When to Use Sparse Embeddings”Use sparse when:
- Exact term matching matters (product names, proper nouns, acronyms)
- You want hybrid search (combining dense + sparse)
- Your domain has specialized vocabulary
Stick to dense when:
- Pure semantic search is sufficient
- Storage is constrained (sparse vectors are larger)
- You’re not using a vector database that supports sparse
Sparse Vector Format
Section titled “Sparse Vector Format”Sparse vectors contain:
indices: Token IDs from the model’s vocabularyvalues: Weights for each token (higher = more important)
result = client.encode("BAAI/bge-m3", Item(text="hello world"), output_types=["sparse"])
sparse = result["sparse"]# {"indices": array([101, 2023, ...]), "values": array([0.45, 0.32, ...])}
# Reconstruct as dictsparse_dict = dict(zip(sparse["indices"], sparse["values"]))const result = await client.encode( "BAAI/bge-m3", { text: "hello world" }, { outputTypes: ["sparse"] });
const sparse = result.sparse;// { indices: Int32Array([101, 2023, ...]), values: Float32Array([0.45, 0.32, ...]) }
// Reconstruct as Mapconst sparseMap = new Map<number, number>();if (sparse) { for (let i = 0; i < sparse.indices.length; i++) { sparseMap.set(sparse.indices[i], sparse.values[i]); }}BGE-M3: Dense + Sparse in One Call
Section titled “BGE-M3: Dense + Sparse in One Call”BGE-M3 produces dense, sparse, and multi-vector outputs simultaneously:
result = client.encode( "BAAI/bge-m3", Item(text="What is machine learning?"), output_types=["dense", "sparse"])
# Dense: 1024-dimensional semantic embeddingprint(f"Dense: {len(result['dense'])} dims")
# Sparse: lexical signalprint(f"Sparse: {len(result['sparse']['indices'])} non-zero terms")const result = await client.encode( "BAAI/bge-m3", { text: "What is machine learning?" }, { outputTypes: ["dense", "sparse"] });
// Dense: 1024-dimensional semantic embeddingconsole.log(`Dense: ${result.dense?.length} dims`);
// Sparse: lexical signalconsole.log(`Sparse: ${result.sparse?.indices.length} non-zero terms`);This is more efficient than calling separate dense and sparse models.
Hybrid Search Pattern
Section titled “Hybrid Search Pattern”Combine dense and sparse scores for retrieval:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
query = Item(text="Python programming tutorial")
# Get both embeddingsresult = client.encode( "BAAI/bge-m3", query, output_types=["dense", "sparse"], is_query=True,)
# Store both in your vector database# Most databases support hybrid search with weighted combination:# final_score = alpha * dense_score + (1 - alpha) * sparse_scoreimport { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
const query = { text: "Python programming tutorial" };
// Get both embeddingsconst result = await client.encode( "BAAI/bge-m3", query, { outputTypes: ["dense", "sparse"], isQuery: true, });
// Store both in your vector database// Most databases support hybrid search with weighted combination:// final_score = alpha * dense_score + (1 - alpha) * sparse_score
await client.close();SPLADE Models
Section titled “SPLADE Models”SPLADE models are purpose-built for sparse retrieval:
# SPLADE-v3result = client.encode( "naver/splade-v3", Item(text="neural information retrieval"), output_types=["sparse"])
# OpenSearch neural sparseresult = client.encode( "opensearch-project/opensearch-neural-sparse-encoding-v2-distill", Item(text="search query"), output_types=["sparse"])// SPLADE-v3const result = await client.encode( "naver/splade-v3", { text: "neural information retrieval" }, { outputTypes: ["sparse"] });
// OpenSearch neural sparseconst osResult = await client.encode( "opensearch-project/opensearch-neural-sparse-encoding-v2-distill", { text: "search query" }, { outputTypes: ["sparse"] });SPLADE uses MLM (masked language model) head to predict term importance.
Sparse Models
Section titled “Sparse Models”| Model | Vocabulary | Notes |
|---|---|---|
BAAI/bge-m3 | 250,002 | Also supports dense + multivector |
naver/splade-v3 | 30,522 | Sparse-focused, BERT vocabulary |
naver/splade-cocondenser-selfdistil | 30,522 | Balanced |
opensearch-project/opensearch-neural-sparse-* | 30,522 | OpenSearch integration |
Vector Database Support
Section titled “Vector Database Support”Sparse vectors require database support. Compatible options:
| Database | Sparse Support |
|---|---|
| Elasticsearch | Yes (native) |
| OpenSearch | Yes (neural sparse) |
| Qdrant | Yes (sparse vectors) |
| Weaviate | Yes (hybrid) |
| Milvus | Yes (sparse index) |
| Pinecone | Yes (hybrid) |
HTTP API
Section titled “HTTP API”The server defaults to msgpack. For JSON, set the Accept header:
curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{"items": [{"text": "sparse query"}], "params": {"output_types": ["sparse"]}}'What’s Next
Section titled “What’s Next”- Dense embeddings - when sparse isn’t needed
- Multi-vector embeddings - ColBERT for maximum quality