Haystack
The sie-haystack package provides native Haystack components. Use SIETextEmbedder and SIEDocumentEmbedder for dense embeddings. Use the sparse variants for hybrid search.
Installation
Section titled “Installation”pip install sie-haystackThis installs sie-sdk and haystack-ai as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestEmbedders
Section titled “Embedders”SIE provides four embedder components following Haystack conventions:
| Component | Use Case |
|---|---|
SIETextEmbedder | Embed queries (dense) |
SIEDocumentEmbedder | Embed documents (dense) |
SIESparseTextEmbedder | Embed queries (sparse) |
SIESparseDocumentEmbedder | Embed documents (sparse) |
Text Embedder
Section titled “Text Embedder”Use SIETextEmbedder for embedding queries in retrieval pipelines:
from sie_haystack import SIETextEmbedder
embedder = SIETextEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
result = embedder.run(text="What is vector search?")embedding = result["embedding"] # list[float]print(len(embedding)) # 1024Document Embedder
Section titled “Document Embedder”Use SIEDocumentEmbedder for embedding documents before indexing:
from haystack import Documentfrom sie_haystack import SIEDocumentEmbedder
embedder = SIEDocumentEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
docs = [ Document(content="Machine learning uses algorithms to learn from data."), Document(content="Neural networks are inspired by biological neurons."),]
result = embedder.run(documents=docs)embedded_docs = result["documents"]
for doc in embedded_docs: print(f"{len(doc.embedding)} dimensions")Metadata Fields
Section titled “Metadata Fields”Include metadata fields in the embedding by specifying meta_fields_to_embed:
embedder = SIEDocumentEmbedder( model="BAAI/bge-m3", meta_fields_to_embed=["title", "author"])
doc = Document( content="Deep learning uses multiple layers.", meta={"title": "Neural Networks", "author": "Jane Doe"})
# Embeds: "Neural Networks Jane Doe Deep learning uses multiple layers."result = embedder.run(documents=[doc])Sparse Embeddings
Section titled “Sparse Embeddings”For hybrid search, use the sparse embedder components. These work with stores like Qdrant that support sparse vectors.
Sparse Text Embedder
Section titled “Sparse Text Embedder”from sie_haystack import SIESparseTextEmbedder
embedder = SIESparseTextEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
result = embedder.run(text="What is vector search?")sparse_embedding = result["sparse_embedding"]print(sparse_embedding.keys()) # dict_keys(['indices', 'values'])Sparse Document Embedder
Section titled “Sparse Document Embedder”from haystack import Documentfrom sie_haystack import SIESparseDocumentEmbedder
embedder = SIESparseDocumentEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
docs = [Document(content="Python is a programming language.")]result = embedder.run(documents=docs)
# Sparse embedding stored in document metadatasparse = result["documents"][0].meta["_sparse_embedding"]print(sparse.keys()) # dict_keys(['indices', 'values'])Full Example
Section titled “Full Example”Complete retrieval pipeline using SIE embeddings with an in-memory document store:
from haystack import Document, Pipelinefrom haystack.components.retrievers.in_memory import InMemoryEmbeddingRetrieverfrom haystack.document_stores.in_memory import InMemoryDocumentStorefrom sie_haystack import SIEDocumentEmbedder, SIETextEmbedder
# 1. Create document store and embedderdocument_store = InMemoryDocumentStore()doc_embedder = SIEDocumentEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
# 2. Prepare and embed documentsdocuments = [ Document(content="Machine learning is a branch of artificial intelligence."), Document(content="Neural networks are inspired by biological neurons."), Document(content="Deep learning uses multiple layers of neural networks."), Document(content="Python is popular for machine learning development."),]
embedded_docs = doc_embedder.run(documents=documents)["documents"]document_store.write_documents(embedded_docs)
# 3. Build retrieval pipelinequery_embedder = SIETextEmbedder( base_url="http://localhost:8080", model="BAAI/bge-m3")
retrieval_pipeline = Pipeline()retrieval_pipeline.add_component("query_embedder", query_embedder)retrieval_pipeline.add_component( "retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=2))retrieval_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
# 4. Queryresult = retrieval_pipeline.run({"query_embedder": {"text": "What is deep learning?"}})
for doc in result["retriever"]["documents"]: print(f"Score: {doc.score:.3f} - {doc.content[:50]}")Configuration Options
Section titled “Configuration Options”All Embedders
Section titled “All Embedders”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | BAAI/bge-m3 | Model to use |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
Document Embedders Only
Section titled “Document Embedders Only”| Parameter | Type | Default | Description |
|---|---|---|---|
meta_fields_to_embed | list[str] | None | Metadata fields to include |
What’s Next
Section titled “What’s Next”- Model Catalog - all supported embedding models
- LangChain Integration - alternative framework option