Skip to content
SIE

Adding Models

Add any HuggingFace model by creating a config file. No code changes required.


Model configs are flat YAML files in the models directory, named {org}-{name}.yaml. The filename uses dashes to separate org from model name.

models/
baai-bge-m3.yaml
my-org-my-custom-model.yaml

For Docker deployments, mount your custom models directory:

Terminal window
docker run --gpus all -p 8080:8080 \
-v /path/to/custom-models:/app/models:ro \
ghcr.io/superlinked/sie:latest

Each model needs a config YAML file. Here is a minimal example:

name: my-org/my-model
hf_id: my-org/my-model
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
- text
outputs:
- dense
dims:
dense: 768
max_sequence_length: 512

FieldTypeDescription
namestringModel name used in API requests
hf_idstringHuggingFace model ID for weight download
adapterstringAdapter class path (see adapters below)
inputslistInput modalities: text, image, audio, video
outputslistOutput types: dense, sparse, multivector, score, extract
dimsobjectEmbedding dimensions per output type

At least one weight source is required (unless using base_model):

FieldDescription
hf_idHuggingFace model ID (e.g., BAAI/bge-m3)
weights_pathLocal path to weights (takes precedence over hf_id)

Specify how the model should be loaded:

FieldDescription
adapterAdapter path: module:Class or file.py:Class
base_modelInherit adapter from another model

FieldTypeDefaultDescription
max_sequence_lengthint512Maximum input tokens
poolingstringnullPooling strategy: cls, mean, last_token, splade, none
normalizebooltrueL2-normalize output embeddings
max_batch_tokensint16384Maximum tokens per batch
compute_precisionstringnullOverride precision: float16, bfloat16, float32

Profiles define named combinations of runtime options. One profile must have is_default: true.

profiles:
default:
is_default: true
sparse:
output_types:
- sparse
banking:
lora: saivamshiatukuri/bge-m3-banking77-lora
instruction: "Classify banking intent"

Options split into loadtime (require reload) and runtime (per-request override):

adapter_options_loadtime:
attn_implementation: sdpa
compute_precision: bfloat16
adapter_options_runtime:
query_template: 'Instruct: {instruction}\nQuery:{text}'
default_instruction: "Retrieve relevant passages"

AdapterUse Case
sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapterStandard embedding models
sie_server.adapters.bge_m3_flash:BGEM3FlashAdapterBGE-M3 with flash attention
sie_server.adapters.cross_encoder:CrossEncoderAdapterReranking models
sie_server.adapters.gliner:GLiNERAdapterEntity extraction models
sie_server.adapters.clip:CLIPAdapterCLIP vision-text models
sie_server.adapters.colbert:ColBERTAdapterMulti-vector (ColBERT) models

A full config with profiles, targets, and runtime options:

name: sentence-transformers/all-MiniLM-L6-v2
hf_id: sentence-transformers/all-MiniLM-L6-v2
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
- text
outputs:
- dense
dims:
dense: 384
max_sequence_length: 256
pooling: mean
normalize: true
max_batch_tokens: 16384
profiles:
default:
is_default: true
adapter_options_runtime:
pooling: mean
normalize: true

After creating the config, verify the model loads and produces correct outputs.

Terminal window
docker run --gpus all -p 8080:8080 \
-v /path/to/custom-models:/app/models:ro \
ghcr.io/superlinked/sie:latest
Terminal window
curl http://localhost:8080/v1/models | jq '.data[].id'
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
result = client.encode("my-org/my-model", Item(text="test input"))
print(result["dense"].shape) # Should match dims.dense
Terminal window
mise run eval my-org/my-model -t mteb/NanoFiQA2018Retrieval --type quality -s sie

The server monitors the models directory for changes. Add new configs without restarting:

  1. Create a new models/{org}-{name}.yaml file
  2. The server detects the new config automatically
  3. Model weights load on first request

For Docker, the mounted volume updates are detected. Changes to existing configs require a server restart.