Skip to content
SIE

Docker

Terminal window
# CPU only
docker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# With GPU (recommended for production)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latest

Verify the server is running:

Terminal window
curl http://localhost:8080/healthz
# {"status":"ok"}

SIE publishes images for different compute platforms and dependency bundles.

TagBaseUse Case
cuda12-defaultCUDA 12.4Production with modern NVIDIA GPUs
cuda11-defaultCUDA 11.8Older NVIDIA GPUs
cpu-defaultUbuntu 22.04Development, ARM64, no GPU
latestCUDA 12.4Alias for cuda12-default

Each platform supports multiple bundles for models with conflicting dependencies.

TagModels
cuda12-defaultBGE-M3, E5, Qwen3, GLiNER, ColBERT
cuda12-legacyStella, GritLM-7B
cuda12-gte-qwen2GTE-Qwen2-1.5B, GTE-Qwen2-7B
cuda12-sglangLarge LLM embeddings (4B+ params)
cuda12-florence2Florence-2, Donut vision models

CPU images follow the same pattern: cpu-default, cpu-legacy, etc.


Terminal window
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latest
Terminal window
# Use GPU 0 only
docker run --gpus '"device=0"' -p 8080:8080 ghcr.io/superlinked/sie:latest
# Use GPUs 0 and 1
docker run --gpus '"device=0,1"' -p 8080:8080 ghcr.io/superlinked/sie:latest

The --gpus flag requires NVIDIA Container Toolkit. Install it first:

Terminal window
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Configure the server with environment variables. All variables use the SIE_ prefix.

VariableDefaultDescription
SIE_DEVICEcpuCompute device: cpu, cuda, cuda:0
SIE_MODELS_DIR/app/modelsPath to model configs
SIE_MODEL_FILTER(all)Comma-separated list of models to load
VariableDefaultDescription
SIE_MAX_BATCH_REQUESTS64Maximum requests per batch
SIE_MAX_BATCH_WAIT_MS10Max wait time for batch to fill
SIE_MAX_CONCURRENT_REQUESTS512Queue size limit
VariableDefaultDescription
SIE_MEMORY_PRESSURE_THRESHOLD_PCT90VRAM percent that triggers LRU eviction
SIE_MEMORY_CHECK_INTERVAL_S1.0Background memory monitor interval
VariableDefaultDescription
SIE_LOG_JSONfalseUse JSON log format
SIE_TRACING_ENABLEDfalseEnable OpenTelemetry tracing
SIE_GPU_TYPE(auto)Override GPU type for metrics
Terminal window
docker run --gpus all -p 8080:8080 \
-e SIE_DEVICE=cuda \
-e SIE_MAX_BATCH_REQUESTS=128 \
-e SIE_MEMORY_PRESSURE_THRESHOLD_PCT=85 \
-e SIE_LOG_JSON=true \
ghcr.io/superlinked/sie:latest

Mount a persistent volume for model weights. This avoids re-downloading on restarts.

Terminal window
docker run --gpus all -p 8080:8080 \
-v ~/.cache/huggingface:/app/.cache/huggingface \
ghcr.io/superlinked/sie:latest

The container uses HF_HOME=/app/.cache/huggingface by default.

Add your own model configs by mounting a directory:

Terminal window
docker run --gpus all -p 8080:8080 \
-v /path/to/my-models:/app/models \
ghcr.io/superlinked/sie:latest

For security-hardened deployments, use read-only root with explicit writable mounts:

Terminal window
docker run --gpus all -p 8080:8080 \
--read-only \
-v hf-cache:/app/.cache/huggingface \
--tmpfs /tmp:size=1G \
ghcr.io/superlinked/sie:latest

docker-compose.yml
services:
sie:
image: ghcr.io/superlinked/sie:latest
ports:
- "8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- hf-cache:/app/.cache/huggingface
environment:
- SIE_DEVICE=cuda
- SIE_MAX_BATCH_REQUESTS=128
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/healthz')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
volumes:
hf-cache:

Run multiple bundles for models with conflicting dependencies:

docker-compose.yml
services:
sie-default:
image: ghcr.io/superlinked/sie:cuda12-default
ports:
- "8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0"]
capabilities: [gpu]
volumes:
- hf-cache:/app/.cache/huggingface
environment:
- SIE_DEVICE=cuda
sie-legacy:
image: ghcr.io/superlinked/sie:cuda12-legacy
ports:
- "8081:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["1"]
capabilities: [gpu]
volumes:
- hf-cache:/app/.cache/huggingface
environment:
- SIE_DEVICE=cuda
volumes:
hf-cache:

Start with:

Terminal window
docker compose up -d