Skip to content
SIE

TypeScript SDK Reference

The TypeScript SDK provides an async client for interacting with the SIE server from Node.js and browser environments.

Terminal window
pnpm add @sie/sdk

Or with npm:

Terminal window
npm install @sie/sdk

Async client for the SIE server. All methods return Promises.

import { SIEClient } from "@sie/sdk";
const client = new SIEClient(
baseUrl: string, // Server URL (e.g., "http://localhost:8080")
options?: {
timeout?: number, // Request timeout in milliseconds (default: 30000)
apiKey?: string, // API key for authentication
gpu?: string, // Default GPU type for routing
pool?: PoolSpec, // Resource pool configuration
waitForCapacity?: boolean, // Auto-retry on 202 (default: false)
provisionTimeout?: number, // Max wait for provisioning in ms (default: 300000)
}
);

Generate embeddings.

async encode(
model: string, // Model name
items: Item | Item[], // Items to encode
options?: {
outputTypes?: OutputType[], // ["dense", "sparse", "multivector"]
instruction?: string, // Task instruction for instruction-tuned models
outputDtype?: DType, // "float32", "float16", "int8", "binary"
isQuery?: boolean, // Query vs document encoding
gpu?: string, // GPU routing
waitForCapacity?: boolean, // Wait for scale-up
}
): Promise<EncodeResult | EncodeResult[]>

Returns: Single EncodeResult if single item passed, otherwise array.

Example:

// Single item
const result = await client.encode("BAAI/bge-m3", { text: "Hello" });
console.log(result.dense?.slice(0, 5)); // Float32Array
// Batch
const results = await client.encode("BAAI/bge-m3", [
{ text: "First" },
{ text: "Second" },
]);

Rerank items against a query using a cross-encoder or late interaction model. Returns items sorted by relevance score (highest first).

async score(
model: string, // Model name (e.g., "BAAI/bge-reranker-v2-m3")
query: Item, // Query item with text or multivector
items: Item[], // Items to score against query
options?: {
topK?: number, // Return only top K results
gpu?: string,
waitForCapacity?: boolean,
}
): Promise<ScoreResult>

Example:

const result = await client.score(
"BAAI/bge-reranker-v2-m3",
{ text: "What is Python?" },
[{ text: "Python is..." }, { text: "Java is..." }]
);
// Scores are sorted by relevance (rank 0 = most relevant)
for (const entry of result.scores) {
console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);
}

Note: For ColBERT-style models, you can pass pre-computed multivectors to score client-side without a server round-trip. See the Scoring Utilities section.

Extract entities or structured data from text. Supports Named Entity Recognition (NER) models like GLiNER.

async extract(
model: string, // Model name (e.g., "urchade/gliner_multi-v2.1")
items: Item | Item[], // Items to extract from
options: {
labels: string[], // Entity types to extract (e.g., ["person", "org"])
threshold?: number, // Minimum confidence (0-1)
gpu?: string,
waitForCapacity?: boolean,
}
): Promise<ExtractResult | ExtractResult[]>

Returns: Single ExtractResult if single item passed, otherwise array.

Example:

const result = await client.extract(
"urchade/gliner_multi-v2.1",
{ text: "Tim Cook leads Apple." },
{ labels: ["person", "organization"] }
);
for (const entity of result.entities) {
console.log(`${entity.label}: ${entity.text} (score: ${entity.score.toFixed(2)})`);
}
// Output:
// person: Tim Cook (score: 0.95)
// organization: Apple (score: 0.92)

Get available models.

async listModels(): Promise<ModelInfo[]>

Example:

const models = await client.listModels();
for (const model of models) {
console.log(`${model.name}: ${model.outputs.join(", ")}`);
}

Get cluster capacity information.

async getCapacity(gpu?: string): Promise<CapacityInfo>

Example:

const capacity = await client.getCapacity();
console.log(`Workers: ${capacity.workerCount}, GPUs: ${capacity.liveGpuTypes}`);
// Check if L4 GPUs are available
const l4Capacity = await client.getCapacity("l4");
if (l4Capacity.workerCount > 0) {
console.log("L4 workers available");
}

Wait for GPU capacity to become available. This is useful for pre-warming the cluster before running benchmarks.

async waitForCapacity(
gpu: string,
options?: {
model?: string, // If provided, sends a warmup encode request
timeout?: number, // Default: 300000ms
pollInterval?: number, // Default: 5000ms
}
): Promise<CapacityInfo>

Example:

// Wait for L4 capacity before running benchmarks
const capacity = await client.waitForCapacity("l4", { timeout: 300000 });
console.log(`Ready with ${capacity.workerCount} L4 workers`);
// Wait and pre-load a model
const capacityWithModel = await client.waitForCapacity("l4", { model: "BAAI/bge-m3" });

Close the client and cleanup resources.

async close(): Promise<void>

Input item for encode, score, and extract operations.

interface Item {
id?: string; // Client-provided ID (echoed in response)
text?: string; // Text content
images?: Uint8Array[]; // Image data as byte arrays (for multimodal models)
multivector?: Float32Array[]; // Pre-computed vectors (for client-side MaxSim)
metadata?: Record<string, unknown>; // Custom metadata
}

Common patterns:

// Simple text
{ text: "Hello world" }
// With ID for tracking
{ id: "doc-1", text: "Document text" }
// Multimodal (for CLIP, ColPali, etc.)
{ text: "Description", images: [imageBytes] }
interface EncodeResult {
id?: string; // Echoed item ID
dense?: Float32Array; // Dense embedding
sparse?: SparseResult; // Sparse embedding
multivector?: Float32Array[]; // Per-token embeddings
timing?: TimingInfo; // Timing breakdown
}
interface SparseResult {
indices: Int32Array; // Token IDs
values: Float32Array; // Token weights
}
interface ScoreResult {
model?: string; // Model used for scoring
queryId?: string; // Query ID (if provided in request)
scores: ScoreEntry[]; // Sorted by score descending
}
interface ScoreEntry {
itemId: string; // ID of the item
score: number; // Relevance score
rank: number; // Position (0 = most relevant)
}
interface ExtractResult {
id?: string; // Echoed item ID
entities: Entity[]; // Extracted entities
}
interface Entity {
text: string; // Extracted span
label: string; // Entity type
score: number; // Confidence (0-1)
start?: number; // Start character offset
end?: number; // End character offset
bbox?: number[]; // Bounding box [x, y, width, height] for vision models
}
interface ModelInfo {
name: string; // Model name/identifier
loaded: boolean; // Whether model weights are in memory
inputs: string[]; // Input types: ["text"], ["text", "image"], etc.
outputs: string[]; // Output types: ["dense"], ["dense", "sparse"], etc.
dims?: ModelDims; // Dimension info for each output type
maxSequenceLength?: number; // Maximum input sequence length
}
interface CapacityInfo {
status: string; // "healthy", "degraded", "no_workers"
workerCount: number; // Number of healthy workers
gpuCount: number; // Number of GPUs available
modelsLoaded: number; // Unique models loaded across workers
configuredGpuTypes: string[]; // GPU types configured in cluster
liveGpuTypes: string[]; // GPU types currently running
workers: WorkerInfo[]; // Worker details
}
interface TimingInfo {
totalMs?: number; // Total request time
queueMs?: number; // Time waiting in queue
tokenizationMs?: number; // Tokenization time
inferenceMs?: number; // Model inference time
}
type OutputType = "dense" | "sparse" | "multivector";
type DType = "float32" | "float16" | "bfloat16" | "int8" | "uint8" | "binary" | "ubinary";
// Convert typed arrays to regular number arrays (for JSON serialization)
function toNumberArray(arr: Float32Array | Int32Array): number[];
// Convert number array to Float32Array
function toFloat32Array(arr: number[]): Float32Array;

Client-side scoring for multi-vector embeddings.

Compute MaxSim scores for ColBERT-style retrieval. MaxSim finds the maximum similarity between each query token and any document token, then sums these maximums.

function maxsim(
query: Float32Array[], // [numQueryTokens][dim]
document: Float32Array[] // [numDocTokens][dim]
): number

Example:

import { SIEClient, maxsim } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
// Encode query with isQuery=true for ColBERT models
const queryResult = await client.encode(
"jinaai/jina-colbert-v2",
{ text: "What is ColBERT?" },
{ outputTypes: ["multivector"], isQuery: true }
);
// Encode documents (no isQuery needed for documents)
const docResults = await client.encode(
"jinaai/jina-colbert-v2",
documents,
{ outputTypes: ["multivector"] }
);
// Compute MaxSim scores client-side
const queryMv = queryResult.multivector!;
const scores = docResults.map((r) => maxsim(queryMv, r.multivector!));
// Rank by score (higher is more relevant)
const ranked = scores
.map((score, idx) => ({ score, idx }))
.sort((a, b) => b.score - a.score);

Score a query against multiple documents.

function maxsimDocuments(
query: Float32Array[],
documents: Float32Array[][]
): number[]

Batch version for multiple queries against multiple documents.

function maxsimBatch(
queries: Float32Array[][],
documents: Float32Array[][]
): Float32Array // Flattened [numQueries * numDocuments]

Exception hierarchy for SDK errors.

Base class for all SDK errors.

class SIEError extends Error {
name: "SIEError";
}

Cannot connect to server.

class SIEConnectionError extends SIEError {
name: "SIEConnectionError";
}

Invalid request (4xx responses).

class RequestError extends SIEError {
name: "RequestError";
code?: string;
statusCode?: number;
}

Server error (5xx responses).

class ServerError extends SIEError {
name: "ServerError";
code?: string;
statusCode?: number;
}

No capacity available or timeout waiting for scale-up.

class ProvisioningError extends SIEError {
name: "ProvisioningError";
gpu?: string;
retryAfter?: number;
}

Resource pool operation failed.

class PoolError extends SIEError {
name: "PoolError";
poolName?: string;
state?: string;
}

LoRA adapter loading timeout.

class LoraLoadingError extends SIEError {
name: "LoraLoadingError";
lora?: string;
model?: string;
}
import { SIEClient, RequestError, ProvisioningError } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
try {
const result = await client.encode("unknown-model", { text: "test" });
} catch (error) {
if (error instanceof RequestError) {
console.log(`Invalid request: ${error.code} (${error.statusCode})`);
} else if (error instanceof ProvisioningError) {
console.log(`No capacity for GPU ${error.gpu}, retry after ${error.retryAfter}ms`);
}
}

For cluster deployments with multiple GPU types, specify the target GPU:

// Per-request GPU selection
const result = await client.encode(
"BAAI/bge-m3",
items,
{ gpu: "a100-80gb" }
);
// Default GPU for all requests
const client = new SIEClient("http://router.example.com", {
gpu: "l4"
});

Available GPU types depend on your cluster configuration.


Create isolated worker sets for testing or tenant isolation:

import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://router.example.com");
await client.createPool("my-test-pool", { l4: 2, "a100-40gb": 1 });
// Route requests to the pool
const result = await client.encode(
"BAAI/bge-m3",
items,
{ gpu: "my-test-pool/l4" }
);
// Check pool status
const pool = await client.getPool("my-test-pool");
console.log(`Pool state: ${pool?.status.state}`);
console.log(`Workers: ${pool?.status.assignedWorkers.length}`);
// Clean up
await client.deletePool("my-test-pool");
await client.close();

import { SIEClient, maxsim } from "@sie/sdk";
// Initialize client
const client = new SIEClient("http://localhost:8080", { timeout: 60000 });
// Dense embeddings
const documents = [
"Machine learning is a subset of artificial intelligence.",
"Python is a popular programming language.",
"Neural networks are inspired by the human brain.",
];
const embeddings = await client.encode(
"BAAI/bge-m3",
documents.map((text, i) => ({ id: `doc-${i}`, text }))
);
// Store in vector database
for (const result of embeddings) {
if (result.dense) {
// vectorDb.insert(result.id, result.dense);
console.log(`Stored ${result.id}: ${result.dense.length} dimensions`);
}
}
// Query with reranking
const query = { text: "What is machine learning?" };
// Stage 1: Vector search
const queryEmb = await client.encode("BAAI/bge-m3", query, { isQuery: true });
// const candidates = await vectorDb.search(queryEmb.dense, { topK: 100 });
// Stage 2: Rerank (using documents directly for this example)
const rerankResult = await client.score(
"BAAI/bge-reranker-v2-m3",
query,
documents.map((text, i) => ({ id: `doc-${i}`, text }))
);
// Top results
console.log("\nTop results:");
for (const entry of rerankResult.scores.slice(0, 3)) {
console.log(` ${entry.rank + 1}. ${entry.itemId} (score: ${entry.score.toFixed(3)})`);
}
// Entity extraction
const extractResult = await client.extract(
"urchade/gliner_multi-v2.1",
{ text: "Elon Musk founded SpaceX and leads Tesla." },
{ labels: ["person", "organization"] }
);
console.log("\nExtracted entities:");
for (const entity of extractResult.entities) {
console.log(` ${entity.label}: ${entity.text}`);
}
// Clean up
await client.close();