TypeScript SDK Reference
The TypeScript SDK provides an async client for interacting with the SIE server from Node.js and browser environments.
Installation
Section titled “Installation”pnpm add @sie/sdkOr with npm:
npm install @sie/sdkSIEClient
Section titled “SIEClient”Async client for the SIE server. All methods return Promises.
Constructor
Section titled “Constructor”import { SIEClient } from "@sie/sdk";
const client = new SIEClient( baseUrl: string, // Server URL (e.g., "http://localhost:8080") options?: { timeout?: number, // Request timeout in milliseconds (default: 30000) apiKey?: string, // API key for authentication gpu?: string, // Default GPU type for routing pool?: PoolSpec, // Resource pool configuration waitForCapacity?: boolean, // Auto-retry on 202 (default: false) provisionTimeout?: number, // Max wait for provisioning in ms (default: 300000) });Methods
Section titled “Methods”encode()
Section titled “encode()”Generate embeddings.
async encode( model: string, // Model name items: Item | Item[], // Items to encode options?: { outputTypes?: OutputType[], // ["dense", "sparse", "multivector"] instruction?: string, // Task instruction for instruction-tuned models outputDtype?: DType, // "float32", "float16", "int8", "binary" isQuery?: boolean, // Query vs document encoding gpu?: string, // GPU routing waitForCapacity?: boolean, // Wait for scale-up }): Promise<EncodeResult | EncodeResult[]>Returns: Single EncodeResult if single item passed, otherwise array.
Example:
// Single itemconst result = await client.encode("BAAI/bge-m3", { text: "Hello" });console.log(result.dense?.slice(0, 5)); // Float32Array
// Batchconst results = await client.encode("BAAI/bge-m3", [ { text: "First" }, { text: "Second" },]);score()
Section titled “score()”Rerank items against a query using a cross-encoder or late interaction model. Returns items sorted by relevance score (highest first).
async score( model: string, // Model name (e.g., "BAAI/bge-reranker-v2-m3") query: Item, // Query item with text or multivector items: Item[], // Items to score against query options?: { topK?: number, // Return only top K results gpu?: string, waitForCapacity?: boolean, }): Promise<ScoreResult>Example:
const result = await client.score( "BAAI/bge-reranker-v2-m3", { text: "What is Python?" }, [{ text: "Python is..." }, { text: "Java is..." }]);
// Scores are sorted by relevance (rank 0 = most relevant)for (const entry of result.scores) { console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);}Note: For ColBERT-style models, you can pass pre-computed multivectors to score client-side without a server round-trip. See the Scoring Utilities section.
extract()
Section titled “extract()”Extract entities or structured data from text. Supports Named Entity Recognition (NER) models like GLiNER.
async extract( model: string, // Model name (e.g., "urchade/gliner_multi-v2.1") items: Item | Item[], // Items to extract from options: { labels: string[], // Entity types to extract (e.g., ["person", "org"]) threshold?: number, // Minimum confidence (0-1) gpu?: string, waitForCapacity?: boolean, }): Promise<ExtractResult | ExtractResult[]>Returns: Single ExtractResult if single item passed, otherwise array.
Example:
const result = await client.extract( "urchade/gliner_multi-v2.1", { text: "Tim Cook leads Apple." }, { labels: ["person", "organization"] });
for (const entity of result.entities) { console.log(`${entity.label}: ${entity.text} (score: ${entity.score.toFixed(2)})`);}// Output:// person: Tim Cook (score: 0.95)// organization: Apple (score: 0.92)listModels()
Section titled “listModels()”Get available models.
async listModels(): Promise<ModelInfo[]>Example:
const models = await client.listModels();for (const model of models) { console.log(`${model.name}: ${model.outputs.join(", ")}`);}getCapacity()
Section titled “getCapacity()”Get cluster capacity information.
async getCapacity(gpu?: string): Promise<CapacityInfo>Example:
const capacity = await client.getCapacity();console.log(`Workers: ${capacity.workerCount}, GPUs: ${capacity.liveGpuTypes}`);
// Check if L4 GPUs are availableconst l4Capacity = await client.getCapacity("l4");if (l4Capacity.workerCount > 0) { console.log("L4 workers available");}waitForCapacity()
Section titled “waitForCapacity()”Wait for GPU capacity to become available. This is useful for pre-warming the cluster before running benchmarks.
async waitForCapacity( gpu: string, options?: { model?: string, // If provided, sends a warmup encode request timeout?: number, // Default: 300000ms pollInterval?: number, // Default: 5000ms }): Promise<CapacityInfo>Example:
// Wait for L4 capacity before running benchmarksconst capacity = await client.waitForCapacity("l4", { timeout: 300000 });console.log(`Ready with ${capacity.workerCount} L4 workers`);
// Wait and pre-load a modelconst capacityWithModel = await client.waitForCapacity("l4", { model: "BAAI/bge-m3" });close()
Section titled “close()”Close the client and cleanup resources.
async close(): Promise<void>Input item for encode, score, and extract operations.
interface Item { id?: string; // Client-provided ID (echoed in response) text?: string; // Text content images?: Uint8Array[]; // Image data as byte arrays (for multimodal models) multivector?: Float32Array[]; // Pre-computed vectors (for client-side MaxSim) metadata?: Record<string, unknown>; // Custom metadata}Common patterns:
// Simple text{ text: "Hello world" }
// With ID for tracking{ id: "doc-1", text: "Document text" }
// Multimodal (for CLIP, ColPali, etc.){ text: "Description", images: [imageBytes] }EncodeResult
Section titled “EncodeResult”interface EncodeResult { id?: string; // Echoed item ID dense?: Float32Array; // Dense embedding sparse?: SparseResult; // Sparse embedding multivector?: Float32Array[]; // Per-token embeddings timing?: TimingInfo; // Timing breakdown}SparseResult
Section titled “SparseResult”interface SparseResult { indices: Int32Array; // Token IDs values: Float32Array; // Token weights}ScoreResult
Section titled “ScoreResult”interface ScoreResult { model?: string; // Model used for scoring queryId?: string; // Query ID (if provided in request) scores: ScoreEntry[]; // Sorted by score descending}ScoreEntry
Section titled “ScoreEntry”interface ScoreEntry { itemId: string; // ID of the item score: number; // Relevance score rank: number; // Position (0 = most relevant)}ExtractResult
Section titled “ExtractResult”interface ExtractResult { id?: string; // Echoed item ID entities: Entity[]; // Extracted entities}Entity
Section titled “Entity”interface Entity { text: string; // Extracted span label: string; // Entity type score: number; // Confidence (0-1) start?: number; // Start character offset end?: number; // End character offset bbox?: number[]; // Bounding box [x, y, width, height] for vision models}ModelInfo
Section titled “ModelInfo”interface ModelInfo { name: string; // Model name/identifier loaded: boolean; // Whether model weights are in memory inputs: string[]; // Input types: ["text"], ["text", "image"], etc. outputs: string[]; // Output types: ["dense"], ["dense", "sparse"], etc. dims?: ModelDims; // Dimension info for each output type maxSequenceLength?: number; // Maximum input sequence length}CapacityInfo
Section titled “CapacityInfo”interface CapacityInfo { status: string; // "healthy", "degraded", "no_workers" workerCount: number; // Number of healthy workers gpuCount: number; // Number of GPUs available modelsLoaded: number; // Unique models loaded across workers configuredGpuTypes: string[]; // GPU types configured in cluster liveGpuTypes: string[]; // GPU types currently running workers: WorkerInfo[]; // Worker details}TimingInfo
Section titled “TimingInfo”interface TimingInfo { totalMs?: number; // Total request time queueMs?: number; // Time waiting in queue tokenizationMs?: number; // Tokenization time inferenceMs?: number; // Model inference time}OutputType
Section titled “OutputType”type OutputType = "dense" | "sparse" | "multivector";type DType = "float32" | "float16" | "bfloat16" | "int8" | "uint8" | "binary" | "ubinary";Utility Functions
Section titled “Utility Functions”// Convert typed arrays to regular number arrays (for JSON serialization)function toNumberArray(arr: Float32Array | Int32Array): number[];
// Convert number array to Float32Arrayfunction toFloat32Array(arr: number[]): Float32Array;Scoring Utilities
Section titled “Scoring Utilities”Client-side scoring for multi-vector embeddings.
maxsim()
Section titled “maxsim()”Compute MaxSim scores for ColBERT-style retrieval. MaxSim finds the maximum similarity between each query token and any document token, then sums these maximums.
function maxsim( query: Float32Array[], // [numQueryTokens][dim] document: Float32Array[] // [numDocTokens][dim]): numberExample:
import { SIEClient, maxsim } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
// Encode query with isQuery=true for ColBERT modelsconst queryResult = await client.encode( "jinaai/jina-colbert-v2", { text: "What is ColBERT?" }, { outputTypes: ["multivector"], isQuery: true });
// Encode documents (no isQuery needed for documents)const docResults = await client.encode( "jinaai/jina-colbert-v2", documents, { outputTypes: ["multivector"] });
// Compute MaxSim scores client-sideconst queryMv = queryResult.multivector!;const scores = docResults.map((r) => maxsim(queryMv, r.multivector!));
// Rank by score (higher is more relevant)const ranked = scores .map((score, idx) => ({ score, idx })) .sort((a, b) => b.score - a.score);maxsimDocuments()
Section titled “maxsimDocuments()”Score a query against multiple documents.
function maxsimDocuments( query: Float32Array[], documents: Float32Array[][]): number[]maxsimBatch()
Section titled “maxsimBatch()”Batch version for multiple queries against multiple documents.
function maxsimBatch( queries: Float32Array[][], documents: Float32Array[][]): Float32Array // Flattened [numQueries * numDocuments]Errors
Section titled “Errors”Exception hierarchy for SDK errors.
SIEError
Section titled “SIEError”Base class for all SDK errors.
class SIEError extends Error { name: "SIEError";}SIEConnectionError
Section titled “SIEConnectionError”Cannot connect to server.
class SIEConnectionError extends SIEError { name: "SIEConnectionError";}RequestError
Section titled “RequestError”Invalid request (4xx responses).
class RequestError extends SIEError { name: "RequestError"; code?: string; statusCode?: number;}ServerError
Section titled “ServerError”Server error (5xx responses).
class ServerError extends SIEError { name: "ServerError"; code?: string; statusCode?: number;}ProvisioningError
Section titled “ProvisioningError”No capacity available or timeout waiting for scale-up.
class ProvisioningError extends SIEError { name: "ProvisioningError"; gpu?: string; retryAfter?: number;}PoolError
Section titled “PoolError”Resource pool operation failed.
class PoolError extends SIEError { name: "PoolError"; poolName?: string; state?: string;}LoraLoadingError
Section titled “LoraLoadingError”LoRA adapter loading timeout.
class LoraLoadingError extends SIEError { name: "LoraLoadingError"; lora?: string; model?: string;}Handling Errors
Section titled “Handling Errors”import { SIEClient, RequestError, ProvisioningError } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
try { const result = await client.encode("unknown-model", { text: "test" });} catch (error) { if (error instanceof RequestError) { console.log(`Invalid request: ${error.code} (${error.statusCode})`); } else if (error instanceof ProvisioningError) { console.log(`No capacity for GPU ${error.gpu}, retry after ${error.retryAfter}ms`); }}GPU Routing
Section titled “GPU Routing”For cluster deployments with multiple GPU types, specify the target GPU:
// Per-request GPU selectionconst result = await client.encode( "BAAI/bge-m3", items, { gpu: "a100-80gb" });
// Default GPU for all requestsconst client = new SIEClient("http://router.example.com", { gpu: "l4"});Available GPU types depend on your cluster configuration.
Resource Pools
Section titled “Resource Pools”Create isolated worker sets for testing or tenant isolation:
import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://router.example.com");await client.createPool("my-test-pool", { l4: 2, "a100-40gb": 1 });
// Route requests to the poolconst result = await client.encode( "BAAI/bge-m3", items, { gpu: "my-test-pool/l4" });
// Check pool statusconst pool = await client.getPool("my-test-pool");console.log(`Pool state: ${pool?.status.state}`);console.log(`Workers: ${pool?.status.assignedWorkers.length}`);
// Clean upawait client.deletePool("my-test-pool");await client.close();Complete Example
Section titled “Complete Example”import { SIEClient, maxsim } from "@sie/sdk";
// Initialize clientconst client = new SIEClient("http://localhost:8080", { timeout: 60000 });
// Dense embeddingsconst documents = [ "Machine learning is a subset of artificial intelligence.", "Python is a popular programming language.", "Neural networks are inspired by the human brain.",];
const embeddings = await client.encode( "BAAI/bge-m3", documents.map((text, i) => ({ id: `doc-${i}`, text })));
// Store in vector databasefor (const result of embeddings) { if (result.dense) { // vectorDb.insert(result.id, result.dense); console.log(`Stored ${result.id}: ${result.dense.length} dimensions`); }}
// Query with rerankingconst query = { text: "What is machine learning?" };
// Stage 1: Vector searchconst queryEmb = await client.encode("BAAI/bge-m3", query, { isQuery: true });// const candidates = await vectorDb.search(queryEmb.dense, { topK: 100 });
// Stage 2: Rerank (using documents directly for this example)const rerankResult = await client.score( "BAAI/bge-reranker-v2-m3", query, documents.map((text, i) => ({ id: `doc-${i}`, text })));
// Top resultsconsole.log("\nTop results:");for (const entry of rerankResult.scores.slice(0, 3)) { console.log(` ${entry.rank + 1}. ${entry.itemId} (score: ${entry.score.toFixed(3)})`);}
// Entity extractionconst extractResult = await client.extract( "urchade/gliner_multi-v2.1", { text: "Elon Musk founded SpaceX and leads Tesla." }, { labels: ["person", "organization"] });
console.log("\nExtracted entities:");for (const entity of extractResult.entities) { console.log(` ${entity.label}: ${entity.text}`);}
// Clean upawait client.close();