Docker
Quick Start
Section titled “Quick Start”# CPU onlydocker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# With GPU (recommended for production)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestVerify the server is running:
curl http://localhost:8080/healthz# {"status":"ok"}Image Tags
Section titled “Image Tags”SIE publishes images for different compute platforms and dependency bundles.
By Platform
Section titled “By Platform”| Tag | Base | Use Case |
|---|---|---|
cuda12-default | CUDA 12.4 | Production with modern NVIDIA GPUs |
cuda11-default | CUDA 11.8 | Older NVIDIA GPUs |
cpu-default | Ubuntu 22.04 | Development, ARM64, no GPU |
latest | CUDA 12.4 | Alias for cuda12-default |
By Bundle
Section titled “By Bundle”Each platform supports multiple bundles for models with conflicting dependencies.
| Tag | Models |
|---|---|
cuda12-default | BGE-M3, E5, Qwen3, GLiNER, ColBERT |
cuda12-legacy | Stella, GritLM-7B |
cuda12-gte-qwen2 | GTE-Qwen2-1.5B, GTE-Qwen2-7B |
cuda12-sglang | Large LLM embeddings (4B+ params) |
cuda12-florence2 | Florence-2, Donut vision models |
CPU images follow the same pattern: cpu-default, cpu-legacy, etc.
GPU Configuration
Section titled “GPU Configuration”Single GPU
Section titled “Single GPU”docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestSpecific GPU
Section titled “Specific GPU”# Use GPU 0 onlydocker run --gpus '"device=0"' -p 8080:8080 ghcr.io/superlinked/sie:latest
# Use GPUs 0 and 1docker run --gpus '"device=0,1"' -p 8080:8080 ghcr.io/superlinked/sie:latestNVIDIA Container Toolkit
Section titled “NVIDIA Container Toolkit”The --gpus flag requires NVIDIA Container Toolkit. Install it first:
# Ubuntu/Debiandistribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkitsudo systemctl restart dockerEnvironment Variables
Section titled “Environment Variables”Configure the server with environment variables. All variables use the SIE_ prefix.
Core Settings
Section titled “Core Settings”| Variable | Default | Description |
|---|---|---|
SIE_DEVICE | cpu | Compute device: cpu, cuda, cuda:0 |
SIE_MODELS_DIR | /app/models | Path to model configs |
SIE_MODEL_FILTER | (all) | Comma-separated list of models to load |
Batching
Section titled “Batching”| Variable | Default | Description |
|---|---|---|
SIE_MAX_BATCH_REQUESTS | 64 | Maximum requests per batch |
SIE_MAX_BATCH_WAIT_MS | 10 | Max wait time for batch to fill |
SIE_MAX_CONCURRENT_REQUESTS | 512 | Queue size limit |
Memory
Section titled “Memory”| Variable | Default | Description |
|---|---|---|
SIE_MEMORY_PRESSURE_THRESHOLD_PCT | 90 | VRAM percent that triggers LRU eviction |
SIE_MEMORY_CHECK_INTERVAL_S | 1.0 | Background memory monitor interval |
Observability
Section titled “Observability”| Variable | Default | Description |
|---|---|---|
SIE_LOG_JSON | false | Use JSON log format |
SIE_TRACING_ENABLED | false | Enable OpenTelemetry tracing |
SIE_GPU_TYPE | (auto) | Override GPU type for metrics |
Example
Section titled “Example”docker run --gpus all -p 8080:8080 \ -e SIE_DEVICE=cuda \ -e SIE_MAX_BATCH_REQUESTS=128 \ -e SIE_MEMORY_PRESSURE_THRESHOLD_PCT=85 \ -e SIE_LOG_JSON=true \ ghcr.io/superlinked/sie:latestVolume Mounts
Section titled “Volume Mounts”HuggingFace Cache
Section titled “HuggingFace Cache”Mount a persistent volume for model weights. This avoids re-downloading on restarts.
docker run --gpus all -p 8080:8080 \ -v ~/.cache/huggingface:/app/.cache/huggingface \ ghcr.io/superlinked/sie:latestThe container uses HF_HOME=/app/.cache/huggingface by default.
Custom Model Configs
Section titled “Custom Model Configs”Add your own model configs by mounting a directory:
docker run --gpus all -p 8080:8080 \ -v /path/to/my-models:/app/models \ ghcr.io/superlinked/sie:latestRead-Only Root Filesystem
Section titled “Read-Only Root Filesystem”For security-hardened deployments, use read-only root with explicit writable mounts:
docker run --gpus all -p 8080:8080 \ --read-only \ -v hf-cache:/app/.cache/huggingface \ --tmpfs /tmp:size=1G \ ghcr.io/superlinked/sie:latestDocker Compose
Section titled “Docker Compose”Single Service
Section titled “Single Service”services: sie: image: ghcr.io/superlinked/sie:latest ports: - "8080:8080" deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda - SIE_MAX_BATCH_REQUESTS=128 healthcheck: test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/healthz')"] interval: 30s timeout: 10s retries: 3 start_period: 60s
volumes: hf-cache:Multi-Bundle Setup
Section titled “Multi-Bundle Setup”Run multiple bundles for models with conflicting dependencies:
services: sie-default: image: ghcr.io/superlinked/sie:cuda12-default ports: - "8080:8080" deploy: resources: reservations: devices: - driver: nvidia device_ids: ["0"] capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda
sie-legacy: image: ghcr.io/superlinked/sie:cuda12-legacy ports: - "8081:8080" deploy: resources: reservations: devices: - driver: nvidia device_ids: ["1"] capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda
volumes: hf-cache:Start with:
docker compose up -dWhat’s Next
Section titled “What’s Next”- Bundles - dependency isolation for conflicting models
- Kubernetes in GCP - production deployment with Helm