Overview
SIE deploys as a single container with no external dependencies. Start with Docker for simplicity. Graduate to Kubernetes when you need scaling or high availability.
Decision Tree
Section titled “Decision Tree”Choose your deployment path based on your requirements:
Do you need horizontal scaling or HA?├─ No → Docker (single instance)│ └─ Need GPU? → Add --gpus all│└─ Yes → Kubernetes ├─ Single GPU type? → Basic Helm deployment └─ Multi-GPU pools? → Elastic cloud with routerStart with Docker. Most teams run SIE as a single container for months before needing Kubernetes. The multi-model architecture means one container serves all your embedding workloads.
Quick Comparison
Section titled “Quick Comparison”| Deployment | Best For | Scaling | Effort |
|---|---|---|---|
| Docker | Development, small production | Vertical (bigger GPU) | Minutes |
| Docker Compose | Multi-container setups | Vertical | Minutes |
| Kubernetes (Helm) | Production, HA required | Horizontal replicas | Hours |
Hardware Requirements
Section titled “Hardware Requirements”SIE runs on CPU, but GPU acceleration is strongly recommended for production.
Minimum Specs
Section titled “Minimum Specs”| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 16 GB | 32+ GB |
| GPU | None (CPU mode) | NVIDIA T4 or better |
| VRAM | N/A | 16+ GB |
| Disk | 50 GB | 100+ GB (model cache) |
GPU Recommendations by Workload
Section titled “GPU Recommendations by Workload”| Workload | GPU | Notes |
|---|---|---|
| Development | None or T4 | CPU works for testing |
| Small production | T4 / L4 | 1-10 models, low traffic |
| Medium production | A10G / L40S | 10-50 models, moderate traffic |
| High throughput | A100 / H100 | Maximum performance |
Supported Hardware
Section titled “Supported Hardware”SIE supports multiple GPU vendors and CPU inference:
| Device | Flag | Notes |
|---|---|---|
| NVIDIA CUDA | --device cuda | Recommended for production |
| Apple Silicon | --device mps | M1/M2/M3 for local development |
| CPU | --device cpu | Fallback, significantly slower |
Docker Quick Start
Section titled “Docker Quick Start”Pull and run the official image:
# CPU onlydocker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# With GPU (recommended)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestThe server starts on port 8080. Models load on first request.
Common Options
Section titled “Common Options”# Custom portdocker run --gpus all -p 3000:8080 ghcr.io/superlinked/sie:latest
# Specific models only (faster startup)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latest \ sie-server serve -m BAAI/bge-m3,BAAI/bge-reranker-v2-m3
# Persistent model cache (skip re-downloads)docker run --gpus all -p 8080:8080 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ ghcr.io/superlinked/sie:latest
# Custom bundle for specific dependenciesdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latest \ sie-server serve -b gte-qwen2Health Checks
Section titled “Health Checks”Verify the server is running:
# Health endpointcurl http://localhost:8080/healthz
# List available modelscurl http://localhost:8080/v1/models
# Test encodingcurl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \ -H "Content-Type: application/json" \ -d '{"items": [{"text": "Hello world"}]}'When to Upgrade
Section titled “When to Upgrade”Stay on Docker When
Section titled “Stay on Docker When”- Single GPU is sufficient for your throughput
- You do not require high availability
- You are in development or early production
- Vertical scaling (bigger GPU) meets your needs
Move to Kubernetes When
Section titled “Move to Kubernetes When”- You need horizontal scaling (multiple replicas)
- High availability is required (pod failover)
- You have multiple GPU types to utilize
- You need automated scaling based on load
What’s Next
Section titled “What’s Next”- Docker Deployment - detailed Docker configuration
- Kubernetes in GCP - Helm charts and GKE deployment
- Hardware & Capacity - GPU selection and memory planning
- Monitoring & Observability - metrics and health checks