Adding Models

Add any HuggingFace model by creating a config file. No code changes required.

Directory Layout

Model configs are flat YAML files in the models directory, named {org}-{name}.yaml. The filename uses dashes to separate org from model name.

models/
  baai-bge-m3.yaml
  my-org-my-custom-model.yaml

For Docker deployments, mount your custom models directory:

docker run --gpus all -p 8080:8080 \
  -v /path/to/custom-models:/app/models:ro \
  ghcr.io/superlinked/sie:latest

Config File Structure

Each model needs a config YAML file. Here is a minimal example:

name: my-org/my-model
hf_id: my-org/my-model
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
  - text
outputs:
  - dense
dims:
  dense: 768
max_sequence_length: 512

Required Fields

Field	Type	Description
`name`	string	Model name used in API requests
`hf_id`	string	HuggingFace model ID for weight download
`adapter`	string	Adapter class path (see adapters below)
`inputs`	list	Input modalities: `text`, `image`, `audio`, `video`
`outputs`	list	Output types: `dense`, `sparse`, `multivector`, `score`, `extract`
`dims`	object	Embedding dimensions per output type

Weight Source

At least one weight source is required (unless using base_model):

Field	Description
`hf_id`	HuggingFace model ID (e.g., `BAAI/bge-m3`)
`weights_path`	Local path to weights (takes precedence over `hf_id`)

Adapter Resolution

Specify how the model should be loaded:

Field	Description
`adapter`	Adapter path: `module:Class` or `file.py:Class`
`base_model`	Inherit adapter from another model

Optional Fields

Field	Type	Default	Description
`max_sequence_length`	int	512	Maximum input tokens
`pooling`	string	null	Pooling strategy: `cls`, `mean`, `last_token`, `splade`, `none`
`normalize`	bool	true	L2-normalize output embeddings
`max_batch_tokens`	int	16384	Maximum tokens per batch
`compute_precision`	string	null	Override precision: `float16`, `bfloat16`, `float32`

Profiles

Profiles define named combinations of runtime options. One profile must have is_default: true.

profiles:
  default:
    is_default: true
  sparse:
    output_types:
      - sparse
  banking:
    lora: saivamshiatukuri/bge-m3-banking77-lora
    instruction: "Classify banking intent"

Adapter Options

Options split into loadtime (require reload) and runtime (per-request override):

adapter_options_loadtime:
  attn_implementation: sdpa
  compute_precision: bfloat16

adapter_options_runtime:
  query_template: 'Instruct: {instruction}\nQuery:{text}'
  default_instruction: "Retrieve relevant passages"

Available Adapters

Adapter	Use Case
`sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter`	Standard embedding models
`sie_server.adapters.bge_m3_flash:BGEM3FlashAdapter`	BGE-M3 with flash attention
`sie_server.adapters.cross_encoder:CrossEncoderAdapter`	Reranking models
`sie_server.adapters.gliner:GLiNERAdapter`	Entity extraction models
`sie_server.adapters.clip:CLIPAdapter`	CLIP vision-text models
`sie_server.adapters.colbert:ColBERTAdapter`	Multi-vector (ColBERT) models

Complete Example

A full config with profiles, targets, and runtime options:

name: sentence-transformers/all-MiniLM-L6-v2
hf_id: sentence-transformers/all-MiniLM-L6-v2
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
  - text
outputs:
  - dense
dims:
  dense: 384
max_sequence_length: 256
pooling: mean
normalize: true
max_batch_tokens: 16384

profiles:
  default:
    is_default: true

adapter_options_runtime:
  pooling: mean
  normalize: true

Testing Your Model

After creating the config, verify the model loads and produces correct outputs.

1. Start the server

docker run --gpus all -p 8080:8080 \
  -v /path/to/custom-models:/app/models:ro \
  ghcr.io/superlinked/sie:latest

2. Check model is listed

curl http://localhost:8080/v1/models | jq '.data[].id'

3. Generate embeddings

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
result = client.encode("my-org/my-model", Item(text="test input"))
print(result["dense"].shape)  # Should match dims.dense

4. Run quality eval

mise run eval my-org/my-model -t mteb/NanoFiQA2018Retrieval --type quality -s sie

Hot Reload

The server monitors the models directory for changes. Add new configs without restarting:

Create a new models/{org}-{name}.yaml file
The server detects the new config automatically
Model weights load on first request

For Docker, the mounted volume updates are detected. Changes to existing configs require a server restart.

What’s Next

Model Catalog - browse 85+ supported models
Benchmarking - evaluate model quality and performance