From c357e0b5a5ecfe93cc7d5e838da99c118f982ad6 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 17:42:45 -0700 Subject: [PATCH 1/7] Document the 5.1 AI & Models feature set New reference section (models/): overview + configuration, embed/generate/ generateStream API, tool calling and the toolMode 'auto' agent loop, the four bundled backends (ollama, openai, anthropic, bedrock), and model-call analytics. Adds the @embed directive and 5.1 vector-indexing additions (int8 quantization, per-query ef, auto-scaled search ef, dotProduct distance) to the schema reference, corrects the HNSW search parameter name (efConstructionSearch, previously documented as efSearchConstruction), and starts the 5.1 release notes. Co-Authored-By: Claude Fable 5 --- reference/database/schema.md | 84 +++++++++++++++++-- reference/models/analytics.md | 50 ++++++++++++ reference/models/api.md | 109 +++++++++++++++++++++++++ reference/models/backends.md | 136 +++++++++++++++++++++++++++++++ reference/models/overview.md | 64 +++++++++++++++ reference/models/tool-calling.md | 104 +++++++++++++++++++++++ release-notes/v5-lincoln/5.1.md | 47 +++++++++++ sidebarsReference.ts | 33 ++++++++ 8 files changed, 618 insertions(+), 9 deletions(-) create mode 100644 reference/models/analytics.md create mode 100644 reference/models/api.md create mode 100644 reference/models/backends.md create mode 100644 reference/models/overview.md create mode 100644 reference/models/tool-calling.md create mode 100644 release-notes/v5-lincoln/5.1.md diff --git a/reference/database/schema.md b/reference/database/schema.md index cb1792d8..20c20568 100644 --- a/reference/database/schema.md +++ b/reference/database/schema.md @@ -227,6 +227,34 @@ If the field value is an array, each element in the array is individually indexe Null values are indexed by default (added in v4.3.0), enabling queries like `GET /Product/?category=null`. +### `@embed` + + + +Automatically computes an embedding vector for the attribute whenever the source field is written, using a configured [embedding model](../models/overview): + +```graphql +type Document @table { + id: Long @primaryKey + text: String + embedding: [Float] @embed(source: "text", model: "default") +} +``` + +- `source` — the name of the field to embed. Must be a declared field on the same type, passed as a string literal. +- `model` — the logical name of a configured embedding model, passed as a string literal. + +The attribute type must be `[Float]`. The attribute is automatically indexed with an [HNSW vector index](#vector-indexing), so it is immediately searchable by similarity; an explicit `@indexed` on the same attribute is allowed only if it is also HNSW. + +Write semantics: + +- Creating a record with the source field, or updating the source field, computes the vector before the write commits (with `inputType: 'document'`). A failure to compute the embedding fails the write. +- An update that does not touch the source field leaves the vector unchanged. +- Setting the source field to `null` sets the vector to `null`. +- Replicated writes and audit-log replays do not re-embed — the vector travels with the record, and only the node that accepted the original write calls the model. + +Multiple `@embed` attributes on one type are computed concurrently. + ### `@createdTime` Automatically assigns a creation timestamp (Unix epoch milliseconds) to the attribute when a record is created. @@ -393,6 +421,8 @@ type Document @table { } ``` +Embedding vectors can also be computed automatically at write time from a text field with the [`@embed` directive](#embed), which creates the HNSW index implicitly. + Query by nearest neighbors using the `sort` parameter: ```javascript @@ -443,26 +473,62 @@ let results = Document.search({ `$distance` is available in both `sort`-based ranking and `conditions`-based threshold queries. +### Per-Query Search Options + +The `sort` descriptor (and threshold condition) accepts options that tune an individual query: + +```javascript +let results = Document.search({ + sort: { attribute: 'textEmbeddings', target: searchVector, distance: 'dotProduct', ef: 200 }, + limit: 5, +}); +``` + +- `distance` — overrides the index's distance function for this query: `"cosine"`, `"euclidean"`, or `"dotProduct"` (`dotProduct` ). +- `ef` — overrides the search exploration budget for this query. Higher values improve recall at the cost of latency. + + — When a query passes no `ef` and the index does not explicitly configure `efConstructionSearch` (or `efConstruction`), the search budget auto-scales with the size of the index, so recall holds as the table grows instead of decaying with a fixed budget. + ### HNSW Parameters -| Parameter | Default | Description | -| ---------------------- | ----------------- | --------------------------------------------------------------------------------------------------- | -| `distance` | `"cosine"` | Distance function: `"euclidean"` or `"cosine"` (negative cosine similarity) | -| `efConstruction` | `100` | Max nodes explored during index construction. Higher = better recall, lower = better performance | -| `M` | `16` | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data | -| `optimizeRouting` | `0.5` | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) | -| `mL` | computed from `M` | Normalization factor for level generation | -| `efSearchConstruction` | `50` | Max nodes explored during search | +| Parameter | Default | Description | +| ---------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `distance` | `"cosine"` | Distance function: `"cosine"` (negative cosine similarity), `"euclidean"`, or `"dotProduct"` (added in v5.1.0) | +| `efConstruction` | `100` | Max nodes explored during index construction. Higher = better recall, lower = better performance | +| `M` | `16` | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data | +| `optimizeRouting` | `0.5` | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive) | +| `mL` | computed from `M` | Normalization factor for level generation | +| `efConstructionSearch` | auto-scaled | Max nodes explored during search. When unset, auto-scales with index size (see above); setting it (or `efConstruction`, which seeds it) fixes the budget | +| `quantization` | — | `"int8"` stores vectors quantized to int8 (added in v5.1.0, see below) | Example with custom parameters: ```graphql type Document @table { id: Long @primaryKey - textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100) + textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efConstructionSearch: 100) +} +``` + +Note: this parameter was previously documented as `efSearchConstruction`; the option name Harper reads is `efConstructionSearch`. + + — Changing `efConstructionSearch` on an existing index no longer triggers a rebuild; it only affects searches. Structural parameters (`distance`, `M`, `efConstruction`, `quantization`) still rebuild the index when changed. + +### Vector Quantization + + + +`quantization: "int8"` stores the index's vectors quantized to 8-bit integers, substantially reducing index size and memory traffic: + +```graphql +type Document @table { + id: Long @primaryKey + textEmbeddings: [Float] @indexed(type: "HNSW", quantization: "int8") } ``` +Graph navigation runs on the quantized (approximate) distances. For nearest-neighbor `sort` queries, Harper re-ranks the results against the full-precision vectors stored on the records, restoring exact ordering and exact `$distance` values. Distance-threshold (`lt`/`le`) queries currently filter on the approximate distance. + ## Field Types Harper supports the following field types: diff --git a/reference/models/analytics.md b/reference/models/analytics.md new file mode 100644 index 00000000..8c94f34a --- /dev/null +++ b/reference/models/analytics.md @@ -0,0 +1,50 @@ +--- +id: analytics +title: Analytics +--- + + + + + +Every model call is recorded for observability and usage accounting, at two levels of granularity: a per-call log table for forensics, and aggregate counters in Harper's [general analytics](../analytics/overview) for dashboards and trends. + +## Per-call log: `hdb_model_calls` + +Each `embed()`, `generate()`, and `generateStream()` call writes one row to the `hdb_model_calls` system table — on success and on failure. With `toolMode: 'auto'`, each backend round inside the loop records its own row (the outer loop itself does not add one). + +| Field | Description | +| ------------------- | ----------------------------------------------------------------------------------------- | +| `tenant` | Tenant identifier, when the call carried one | +| `app` | Resource path of the calling resource, when called from one | +| `model` | Logical model name the caller used | +| `backend` | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures | +| `method` | `embed`, `generate`, or `generateStream` | +| `prompt_tokens` | Prompt token count, when the backend reported usage | +| `completion_tokens` | Completion token count, when the backend reported usage | +| `embedding_tokens` | Embedding token count, when the backend reported usage | +| `latency_ms` | Wall-clock call duration | +| `success` | Whether the call completed | +| `error_code` | On failure: `backend_error`, `aborted`, `capability_unsupported`, or `backend_not_found` | + +Rows are buffered in memory and flushed every 10 seconds, or immediately once 1,000 rows accumulate; rows older than 90 days are purged. Buffered rows may be lost on abrupt shutdown — treat the table as operational telemetry, not an audit log. + +Query it like any table, for example through the operations API: + +```json +{ + "operation": "search_by_conditions", + "database": "system", + "table": "hdb_model_calls", + "conditions": [{ "search_attribute": "success", "search_type": "equals", "search_value": false }] +} +``` + +## Aggregate metrics + +Each call also increments Harper's aggregate analytics (visible in `hdb_raw_analytics` alongside the other [analytics metrics](../analytics/overview)): + +- `model-embed`, `model-generate`, `model-generateStream` — call counts +- `model-embed-tokens`, `model-generate-tokens`, `model-generateStream-tokens` — token totals + +Metrics are broken down by backend name, so usage can be charted per provider. diff --git a/reference/models/api.md b/reference/models/api.md new file mode 100644 index 00000000..d9355d6e --- /dev/null +++ b/reference/models/api.md @@ -0,0 +1,109 @@ +--- +id: api +title: API +--- + + + + + +The `models` object exposes three methods. All of them accept an optional `model` option naming the configured logical model to use; when omitted, the logical name `default` is used. Calling a logical name with no configured backend, or asking a backend for a capability it does not support (for example, embeddings from a generation-only backend), throws an error — capability checks run up front, before any request is made. + +## embed() + +```typescript +models.embed(input: string | string[], options?: EmbedOpts): Promise +``` + +Converts one or more strings into embedding vectors. The result is always an array of `Float32Array`, one per input string, in input order — including when a single string is passed. + +```javascript +import { models } from 'harper'; + +const [single] = await models.embed('What is Harper?', { inputType: 'query' }); +const batch = await models.embed(['first document', 'second document']); +``` + +| Option | Type | Default | Description | +| ----------- | ------------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------- | +| `model` | `string` | `'default'` | Logical name of a configured embedding model | +| `inputType` | `'document'` \| `'query'` | — | Hint for models that distinguish document embeddings from query embeddings (e.g. `nomic-embed-text`); ignored by models that do not | +| `signal` | `AbortSignal` | — | Cancels the call; composed with the backend's configured `requestTimeoutMs` | + +## generate() + +```typescript +models.generate(input: GenerateInput, options?: GenerateOpts): Promise +``` + +Generates a completion. The input may be: + +- a `string` — shorthand for a single user message, +- an array of messages: `{ role: 'system' | 'user' | 'assistant' | 'tool', content: string }`, +- an object `{ messages, tools?, system? }` — the form required to declare [tools](./tool-calling) or pass a system prompt alongside the messages. + +```javascript +const result = await models.generate( + [ + { role: 'system', content: 'You are a terse assistant.' }, + { role: 'user', content: 'What is an HNSW index?' }, + ], + { temperature: 0.2, maxTokens: 300 } +); +console.log(result.content); +``` + +| Option | Type | Default | Description | +| ---------------- | -------------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------ | +| `model` | `string` | `'default'` | Logical name of a configured generative model | +| `temperature` | `number` | backend | Sampling temperature, passed through to the backend | +| `maxTokens` | `number` | backend | Completion token limit, passed through to the backend | +| `responseFormat` | `'text'` \| `'json'` \| `{ schema: object }` | `'text'` | Structured output. `{ schema }` requests output conforming to a JSON Schema; support varies by backend | +| `toolMode` | `'return'` \| `'auto'` | `'return'` | How tool calls are handled — see [Tool Calling](./tool-calling) | +| `signal` | `AbortSignal` | — | Cancels the call; composed with the backend's configured `requestTimeoutMs` | + +Additional options apply only when `toolMode: 'auto'`; they are documented in [Tool Calling](./tool-calling). + +### GenerateResult + +| Field | Type | Description | +| -------------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | +| `content` | `string` | The generated text | +| `finishReason` | `'stop'` \| `'length'` \| `'tool_calls'` \| `'content_filter'` | Why generation stopped, normalized across backends | +| `toolCalls` | `ToolCall[]` | Tool calls the model requested, when `finishReason` is `'tool_calls'` (each `{ id, name, arguments }`, with `arguments` parsed to an object) | +| `usage` | `TokenUsage` | Token usage reported by the backend (`promptTokens`, `completionTokens`, …), when available | +| `trace` | `ToolTraceEntry[]` | Per-tool-invocation trace; only populated by the `toolMode: 'auto'` loop — see [Tool Calling](./tool-calling) | + +## generateStream() + +```typescript +models.generateStream(input: GenerateInput, options?: GenerateOpts): AsyncIterable +``` + +Identical to `generate()` but yields the completion incrementally: + +```javascript +let text = ''; +for await (const chunk of models.generateStream('Write a haiku about databases.')) { + if (chunk.deltaContent) text += chunk.deltaContent; +} +``` + +Each chunk may carry: + +| Field | Type | Description | +| ---------------- | ------------------------------- | ---------------------------------------------------------------------------------------------------- | +| `deltaContent` | `string` | Text appended since the previous chunk | +| `deltaToolCalls` | `Partial[]` | Tool-call deltas; a backend may deliver the same tool call across several chunks with partial fields | +| `finishReason` | same values as `GenerateResult` | Set on the final chunk only | + +Errors detected before the call starts (unknown model name, missing capability) throw synchronously; errors during generation propagate through the iterable. + +## Errors and timeouts + +- An unconfigured logical model name throws a not-found error. The error names the missing logical name only — it does not enumerate configured names. +- A capability mismatch (embedding call to a generation-only backend, tool declarations against a backend without tool support) throws before any request is made. +- Each backend supports a `requestTimeoutMs` configuration field; when set, it is composed with any caller-provided `signal` so whichever fires first cancels the request. +- Backend/network failures throw backend-specific errors with sanitized messages. + +Every call — successful or failed — is recorded in the [model-call analytics](./analytics). diff --git a/reference/models/backends.md b/reference/models/backends.md new file mode 100644 index 00000000..262f5393 --- /dev/null +++ b/reference/models/backends.md @@ -0,0 +1,136 @@ +--- +id: backends +title: Backends +--- + + + + + +Four model backends ship with Harper. Each model entry in the [`models` configuration](./overview#configuration) selects one with its `backend` field. + +| Backend | Embeddings | Generation | Streaming | Tools | +| ----------- | ---------- | ---------- | --------- | --------------- | +| `ollama` | ✓ | ✓ | ✓ | — | +| `openai` | ✓ | ✓ | ✓ | ✓ | +| `anthropic` | — | ✓ | ✓ | ✓ | +| `bedrock` | ✓ | ✓ | ✓ | varies by model | + +All backends support these common fields: + +| Field | Description | +| ------------------ | ---------------------------------------------------------------------------------------------------- | +| `backend` | Which backend to use (required) | +| `model` | Provider-side model identifier (e.g. `gpt-4o`) used when a call does not pass its own `model` option | +| `requestTimeoutMs` | Per-request timeout in milliseconds; composed with any caller-provided `AbortSignal` | + +## Ollama + +Calls a local or remote [Ollama](https://ollama.com) server. No credentials. + +```yaml +models: + embedding: + default: + backend: ollama + host: localhost:11434 + model: nomic-embed-text:latest + generative: + local: + backend: ollama + host: ollama.internal:11434 + model: mistral:7b +``` + +| Field | Default | Description | +| ------- | ----------------- | --------------------------------------------------------------------------------------------------------------- | +| `host` | `localhost:11434` | Ollama server origin. A scheme-less value is treated as `http://`; a full origin (`https://…`) is used as given | +| `model` | — | Ollama model name, e.g. `nomic-embed-text:latest`, `mistral:7b` | + +When embedding with `nomic-embed-text`, the `inputType` option (`'document'` / `'query'`) is applied using the model's task prefixes; other models ignore it. + +The Ollama backend does not advertise tool support — declaring tools against it fails up front. + +## OpenAI + +Calls the OpenAI API — or any service exposing an OpenAI-compatible API with bearer-token authentication, by pointing `baseUrl` at it. This includes vLLM's OpenAI-compatible server, Google's Gemini OpenAI-compatible endpoint, Azure OpenAI's `/openai/v1` endpoint, and hosted gateways such as OpenRouter or Together AI. + +```yaml +models: + embedding: + default: + backend: openai + apiKey: ${OPENAI_API_KEY} + model: text-embedding-3-large + generative: + default: + backend: openai + apiKey: ${OPENAI_API_KEY} + model: gpt-4o + vllm: + backend: openai + apiKey: ${VLLM_API_KEY} + baseUrl: http://vllm.internal:8000/v1 + model: meta-llama/Llama-3.1-8B-Instruct +``` + +| Field | Default | Description | +| -------------- | --------------------------- | ---------------------------------------------------------------------------------- | +| `apiKey` | — (required) | API key, sent as a bearer token. Use `${VAR}` indirection | +| `baseUrl` | `https://api.openai.com/v1` | API root; point at any OpenAI-compatible endpoint | +| `model` | — | Model name, e.g. `gpt-4o`, `text-embedding-3-large` | +| `organization` | — | Sent as the `OpenAI-Organization` header, for keys spanning multiple organizations | + +`responseFormat: 'json'` maps to OpenAI's JSON mode and `responseFormat: { schema }` to strict structured outputs (`json_schema`); OpenAI-compatible servers vary in their support for these. + +## Anthropic + +Calls the Anthropic Messages API. Generation only — Anthropic does not offer an embeddings API. + +```yaml +models: + generative: + claude: + backend: anthropic + apiKey: ${ANTHROPIC_API_KEY} + model: claude-sonnet-4-6 +``` + +| Field | Default | Description | +| --------- | --------------------------- | --------------------------------------- | +| `apiKey` | — (required) | API key, sent as the `x-api-key` header | +| `baseUrl` | `https://api.anthropic.com` | API root | +| `model` | — | Model name, e.g. `claude-sonnet-4-6` | + +The Anthropic API requires a completion token limit on every request; when a call does not pass `maxTokens`, Harper sends `4096`. + +## Amazon Bedrock + +Calls AWS Bedrock. Credentials come from the standard AWS SDK chain (environment variables, shared credentials file, IAM instance/task roles) — there is no `apiKey` field. + +The AWS SDK is not bundled with Harper. Install it in your project to use this backend: + +```bash +npm install @aws-sdk/client-bedrock-runtime +``` + +```yaml +models: + embedding: + titan: + backend: bedrock + region: us-east-1 + model: amazon.titan-embed-text-v2:0 + generative: + claude: + backend: bedrock + region: us-east-1 + model: anthropic.claude-sonnet-4-5-20250929-v1:0 +``` + +| Field | Default | Description | +| -------- | ------------ | ---------------------------------------------------------------------- | +| `region` | — (required) | AWS region hosting the Bedrock models | +| `model` | — | Bedrock model identifier; the vendor prefix selects the request format | + +The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`) determines the request/response format Harper uses. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially. diff --git a/reference/models/overview.md b/reference/models/overview.md new file mode 100644 index 00000000..b89c58d6 --- /dev/null +++ b/reference/models/overview.md @@ -0,0 +1,64 @@ +--- +id: overview +title: Models +--- + + + + + +Harper provides a unified API for calling AI models — text embeddings and text generation — from application code. Models are configured by an operator under logical names; application code requests a model by its logical name and Harper routes the call to the configured backend (Ollama, OpenAI, Anthropic, or Amazon Bedrock). Swapping providers is a configuration change, not a code change. + +The API is exposed as a single process-wide `models` object: + +```javascript +import { models } from 'harper'; + +const [vector] = await models.embed('What is Harper?'); +const reply = await models.generate('Describe the Harper resource API in one sentence.'); +``` + +The same object is available as `scope.models` in component scopes and as the `models` global. All three refer to the same instance. + +The API surface is three methods: + +| Method | Purpose | +| ---------------------------------------------------------------- | ------------------------------------------ | +| [`models.embed(input, options?)`](./api#embed) | Convert text to embedding vectors | +| [`models.generate(input, options?)`](./api#generate) | Generate a completion for a prompt or chat | +| [`models.generateStream(input, options?)`](./api#generatestream) | Stream a completion as it is produced | + +Generation supports [tool calling](./tool-calling), including a built-in agent loop (`toolMode: 'auto'`) that resolves tool calls in-process. Tables can compute embedding vectors automatically at write time with the [`@embed` schema directive](../database/schema#embed), and vectors can be searched with [HNSW vector indexes](../database/schema#vector-indexing). Every model call is recorded for [observability and usage accounting](./analytics). + +## Configuration + +Models are configured in the `models` section of `harper-config.yaml`, split by capability into `embedding` and `generative` maps. Each key is a logical model name; each entry names a `backend` plus backend-specific settings: + +```yaml +models: + embedding: + default: + backend: ollama + host: localhost:11434 + model: nomic-embed-text:latest + generative: + default: + backend: openai + apiKey: ${OPENAI_API_KEY} + model: gpt-4o + fast: + backend: ollama + model: mistral:7b +``` + +The logical name `default` is used when application code does not pass an explicit `model` option. Calling a logical name that is not configured throws an error. + +See [Backends](./backends) for the full set of configuration fields supported by each backend. + +### Credentials + +String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. + +### Startup behavior + +Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward. A misconfigured entry is logged and skipped — it does not prevent Harper from starting or block other model entries. diff --git a/reference/models/tool-calling.md b/reference/models/tool-calling.md new file mode 100644 index 00000000..0c0f1174 --- /dev/null +++ b/reference/models/tool-calling.md @@ -0,0 +1,104 @@ +--- +id: tool-calling +title: Tool Calling +--- + + + + + +Generative models can be given tools — functions the model may request calls to while producing a response. Tools are declared on the input object (they are model-facing content, like messages), and the `toolMode` option selects who resolves them: + +- **`toolMode: 'return'`** (default) — `generate()` returns as soon as the model requests tool calls; your code dispatches them and continues the conversation. +- **`toolMode: 'auto'`** — Harper runs an in-process loop: it dispatches each requested tool to a handler you supply, feeds results back to the model, and repeats until the model produces a final answer or a budget is exhausted. + +Tool calling requires a backend that supports tools (see the [backend capability table](./backends)). Declaring tools against a backend without tool support fails up front rather than silently dropping the tools. + +## Declaring tools + +Use the object form of the generation input. Each tool has a name, a description, and a JSON Schema for its arguments: + +```javascript +const input = { + system: 'You are a helpful assistant.', + messages: [{ role: 'user', content: 'What is the weather in Denver?' }], + tools: [ + { + name: 'get_weather', + description: 'Get the current weather for a city.', + parameters: { + type: 'object', + properties: { city: { type: 'string' } }, + required: ['city'], + }, + }, + ], +}; +``` + +## toolMode: 'return' + +The model's requested calls come back on the result, and `finishReason` is `'tool_calls'`. Your code runs the tools, appends the results as `tool`-role messages, and calls `generate()` again: + +```javascript +const result = await models.generate(input); +if (result.finishReason === 'tool_calls') { + const followUp = [...input.messages, { role: 'assistant', content: result.content, toolCalls: result.toolCalls }]; + for (const call of result.toolCalls) { + const output = await getWeather(call.arguments); // your dispatch + followUp.push({ role: 'tool', toolCallId: call.id, content: JSON.stringify(output) }); + } + const final = await models.generate({ ...input, messages: followUp }); +} +``` + +`ToolCall.arguments` is always a parsed object — backends that deliver stringified JSON normalize it before returning. + +## toolMode: 'auto' + +Supply handlers keyed by tool name and Harper resolves the calls in-process: + +```javascript +const result = await models.generate(input, { + toolMode: 'auto', + toolHandlers: { + get_weather: async ({ city }, ctx) => fetchWeather(city, { signal: ctx.signal }), + }, + maxToolIterations: 5, + includeToolTrace: true, +}); +console.log(result.content); // final answer, tool round-trips already resolved +``` + +A handler receives the parsed arguments and a context object `{ signal, accounting }`. The `signal` is the composed cancellation signal for the iteration — it fires if the caller aborts or a budget trips, so long-running handlers should honor it. The handler's return value is JSON-serialized and fed back to the model; a thrown error is routed by `toolErrorMode` (below). + +`generateStream()` supports `toolMode: 'auto'` as well: content deltas stream out as each round produces them, and `finishReason` is emitted exactly once, on the final chunk of the final round. + +### Options + +All options below apply only with `toolMode: 'auto'`. + +| Option | Type | Default | Description | +| -------------------- | ----------------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `toolHandlers` | `Record` | — | Dispatch table keyed by tool name. A tool declared in `tools` with no handler here is a configuration error (400) | +| `maxToolIterations` | `number` | `10` | Hard cap on model → tools → model rounds | +| `maxToolTokens` | `number` | — | Cumulative prompt+completion token cap across rounds. Best-effort: requires the backend to report usage; if it does not, Harper warns once and `maxToolIterations` remains the bound. Not supported on `generateStream()` (throws) | +| `toolParallelism` | `'parallel'` \| `'serial'` | `'parallel'` | When one round requests multiple tool calls, run handlers concurrently or in order | +| `toolResultMaxBytes` | `number` | `65536` | Per-result byte cap (JSON-stringified). Larger results are truncated with a marker; the model sees the truncated form, the trace records the original size | +| `toolErrorMode` | `'recover'` \| `'abort'` | `'recover'` | `'recover'` feeds a handler error back to the model as the tool result so it can react; `'abort'` stops the loop and throws with the trace attached | +| `includeToolTrace` | `boolean` | `false` | Populate `result.trace` with one entry per tool invocation (iteration, name, arguments, result size, duration, error) | +| `conversation` | `ConversationAppender` | — | Optional persistence hook — see below | + +### Budgets and errors + +When `maxToolIterations` or `maxToolTokens` is exhausted, the loop throws a budget-exceeded error (HTTP status 429) carrying a `partialTrace` of everything that ran — the trace is attached on error paths regardless of `includeToolTrace`. With `toolErrorMode: 'abort'`, a handler failure throws an error carrying the same trace. + +If the model requests a tool name that was never declared in `tools` (a hallucinated tool), the call is treated as a tool error and routed by `toolErrorMode` — with `'recover'`, the model is told the tool is unknown and can correct itself. + +### Conversation persistence + +The `conversation` option accepts any object with an `append(turn)` method returning a promise. The loop awaits `append` for each new turn it produces — assistant turns (with their tool calls) and tool-result turns — in order, giving the appender back-pressure over the loop. The caller's own input messages are not echoed back through the hook. Appenders should catch their own recoverable failures; a throw from `append` becomes the loop's terminal error. + +### Reserved options + +`toolArgValidation` (`'strict'` / `'lenient'` JSON Schema validation of tool arguments), `maxCostUsd`, and `conversationId` exist on the type surface but are not functional in 5.1 — the validation modes and streaming token budgets throw a `501` error, and the cost cap has no rate card behind it yet. Don't rely on them. diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md new file mode 100644 index 00000000..e93edfd0 --- /dev/null +++ b/release-notes/v5-lincoln/5.1.md @@ -0,0 +1,47 @@ +--- +title: '5.1' +--- + +# 5.1 Release Notes + + + +### Patch Releases + +All patch release notes for 5.1.x are available on the [releases page](https://github.com/HarperFast/harper/releases?q=v5.1&expanded=true). + +## AI & Models + +Harper 5.1 introduces a unified [model-access API](/reference/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change. + +### Models API + +The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends. + +### Backends + +Four [backends](/reference/models/backends) ship with Harper 5.1: + +- **Ollama** — local or self-hosted models, embeddings and generation +- **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl` +- **Anthropic** — Claude models for generation and tool calling +- **Amazon Bedrock** — AWS-hosted model families (Anthropic, Meta, Titan, Cohere) via the AWS credential chain + +### Tool calling and the agent loop + +Generation supports [tool calling](/reference/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns. + +### Automatic embeddings with `@embed` + +The new [`@embed` schema directive](/reference/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node. + +### Vector indexing improvements + +- [Int8 quantization](/reference/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering. +- [Per-query `ef`](/reference/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow. +- A `dotProduct` distance function joins `cosine` and `euclidean`. +- Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild. + +### Model-call analytics + +Every model call is recorded in the [`hdb_model_calls` system table](/reference/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend. diff --git a/sidebarsReference.ts b/sidebarsReference.ts index db02b8fb..9ef3ad81 100644 --- a/sidebarsReference.ts +++ b/sidebarsReference.ts @@ -89,6 +89,39 @@ const sidebars: SidebarsConfig = { }, ], }, + { + type: 'category', + label: 'AI & Models', + collapsible: false, + className: 'reference-category-header', + items: [ + { + type: 'doc', + id: 'models/overview', + label: 'Overview', + }, + { + type: 'doc', + id: 'models/api', + label: 'API', + }, + { + type: 'doc', + id: 'models/tool-calling', + label: 'Tool Calling', + }, + { + type: 'doc', + id: 'models/backends', + label: 'Backends', + }, + { + type: 'doc', + id: 'models/analytics', + label: 'Analytics', + }, + ], + }, { type: 'category', label: 'Components', From 802254c22d9c9692dc9f7bdd2ab5b1643adb778c Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 17:48:43 -0700 Subject: [PATCH 2/7] Docs-accuracy fixes from implementation audit error_code can also be pending_unsupported (Models.ts pending-status path is reachable); Bedrock family dispatch also handles mistral., and unknown prefixes are rejected. Co-Authored-By: Claude Fable 5 --- reference/models/analytics.md | 26 +++++++++++++------------- reference/models/backends.md | 2 +- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/reference/models/analytics.md b/reference/models/analytics.md index 8c94f34a..67a7ad89 100644 --- a/reference/models/analytics.md +++ b/reference/models/analytics.md @@ -13,19 +13,19 @@ Every model call is recorded for observability and usage accounting, at two leve Each `embed()`, `generate()`, and `generateStream()` call writes one row to the `hdb_model_calls` system table — on success and on failure. With `toolMode: 'auto'`, each backend round inside the loop records its own row (the outer loop itself does not add one). -| Field | Description | -| ------------------- | ----------------------------------------------------------------------------------------- | -| `tenant` | Tenant identifier, when the call carried one | -| `app` | Resource path of the calling resource, when called from one | -| `model` | Logical model name the caller used | -| `backend` | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures | -| `method` | `embed`, `generate`, or `generateStream` | -| `prompt_tokens` | Prompt token count, when the backend reported usage | -| `completion_tokens` | Completion token count, when the backend reported usage | -| `embedding_tokens` | Embedding token count, when the backend reported usage | -| `latency_ms` | Wall-clock call duration | -| `success` | Whether the call completed | -| `error_code` | On failure: `backend_error`, `aborted`, `capability_unsupported`, or `backend_not_found` | +| Field | Description | +| ------------------- | --------------------------------------------------------------------------------------------------------------- | +| `tenant` | Tenant identifier, when the call carried one | +| `app` | Resource path of the calling resource, when called from one | +| `model` | Logical model name the caller used | +| `backend` | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures | +| `method` | `embed`, `generate`, or `generateStream` | +| `prompt_tokens` | Prompt token count, when the backend reported usage | +| `completion_tokens` | Completion token count, when the backend reported usage | +| `embedding_tokens` | Embedding token count, when the backend reported usage | +| `latency_ms` | Wall-clock call duration | +| `success` | Whether the call completed | +| `error_code` | On failure: `backend_error`, `aborted`, `capability_unsupported`, `backend_not_found`, or `pending_unsupported` | Rows are buffered in memory and flushed every 10 seconds, or immediately once 1,000 rows accumulate; rows older than 90 days are purged. Buffered rows may be lost on abrupt shutdown — treat the table as operational telemetry, not an audit log. diff --git a/reference/models/backends.md b/reference/models/backends.md index 262f5393..4fe4b7d3 100644 --- a/reference/models/backends.md +++ b/reference/models/backends.md @@ -133,4 +133,4 @@ models: | `region` | — (required) | AWS region hosting the Bedrock models | | `model` | — | Bedrock model identifier; the vendor prefix selects the request format | -The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`) determines the request/response format Harper uses. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially. +The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`, `mistral.`) determines the request/response format Harper uses; an unrecognized prefix is rejected with an error. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially. From bc180b10a2aedf2c5159e9918df8166b69a403d1 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 17:55:27 -0700 Subject: [PATCH 3/7] Correct startup-validation semantics for model entries Config-file (Joi) validation of the models block is boot-blocking for structurally invalid entries; only registration-time errors are warn-and-skip. Also note ${VAR} indirection is string-fields-only. Surfaced by the models-subsystem deep review. Co-Authored-By: Claude Fable 5 --- reference/models/overview.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/reference/models/overview.md b/reference/models/overview.md index b89c58d6..cfa74068 100644 --- a/reference/models/overview.md +++ b/reference/models/overview.md @@ -57,8 +57,10 @@ See [Backends](./backends) for the full set of configuration fields supported by ### Credentials -String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. +String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values. ### Startup behavior -Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward. A misconfigured entry is logged and skipped — it does not prevent Harper from starting or block other model entries. +Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward. + +Model entries are validated with the rest of the configuration file at startup: a structurally invalid entry — a missing required field such as `apiKey`, an unrecognized field name, or a wrong value type — fails configuration validation and prevents Harper from starting, like any other configuration error. Errors at registration time (for example, an unrecognized `backend` name) are logged and skipped without blocking startup or other model entries. From caba90ab0ca449c69bf2ed049dff10027527fed7 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 18:05:33 -0700 Subject: [PATCH 4/7] Clarify unresolved env-var placeholder behavior per field type Only credential fields reject unresolved ${VAR} placeholders at startup; host/model/region carry them into requests literally. Surfaced by the models-subsystem deep review. Co-Authored-By: Claude Fable 5 --- reference/models/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/models/overview.md b/reference/models/overview.md index cfa74068..d47cdc89 100644 --- a/reference/models/overview.md +++ b/reference/models/overview.md @@ -57,7 +57,7 @@ See [Backends](./backends) for the full set of configuration fields supported by ### Credentials -String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values. +String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is; for credential fields the backend rejects the unresolved placeholder at startup, while other fields (such as `host` or `model`) carry the literal placeholder into requests — surfacing as per-request failures rather than a startup error. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values. ### Startup behavior From 803530d4a7d940643af61aefbf44b110bd9ffffe Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 20:31:27 -0700 Subject: [PATCH 5/7] Fix release-notes links to versioned reference path Current reference docs are served at /reference/v5/ (versions config maps current -> path 'v5'); the PR preview's broken-link check caught the unversioned /reference/ links. Co-Authored-By: Claude Fable 5 --- release-notes/v5-lincoln/5.1.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md index e93edfd0..c5d4edc3 100644 --- a/release-notes/v5-lincoln/5.1.md +++ b/release-notes/v5-lincoln/5.1.md @@ -12,15 +12,15 @@ All patch release notes for 5.1.x are available on the [releases page](https://g ## AI & Models -Harper 5.1 introduces a unified [model-access API](/reference/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change. +Harper 5.1 introduces a unified [model-access API](/reference/v5/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change. ### Models API -The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends. +The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/v5/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends. ### Backends -Four [backends](/reference/models/backends) ship with Harper 5.1: +Four [backends](/reference/v5/models/backends) ship with Harper 5.1: - **Ollama** — local or self-hosted models, embeddings and generation - **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl` @@ -29,19 +29,19 @@ Four [backends](/reference/models/backends) ship with Harper 5.1: ### Tool calling and the agent loop -Generation supports [tool calling](/reference/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns. +Generation supports [tool calling](/reference/v5/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns. ### Automatic embeddings with `@embed` -The new [`@embed` schema directive](/reference/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node. +The new [`@embed` schema directive](/reference/v5/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node. ### Vector indexing improvements -- [Int8 quantization](/reference/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering. -- [Per-query `ef`](/reference/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow. +- [Int8 quantization](/reference/v5/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering. +- [Per-query `ef`](/reference/v5/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow. - A `dotProduct` distance function joins `cosine` and `euclidean`. - Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild. ### Model-call analytics -Every model call is recorded in the [`hdb_model_calls` system table](/reference/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend. +Every model call is recorded in the [`hdb_model_calls` system table](/reference/v5/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend. From fe0cc20d9635c41f58b2228e0fc16a6a30dc7845 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Tue, 9 Jun 2026 20:50:08 -0700 Subject: [PATCH 6/7] AGENTS.md: add Versioning Content conventions VersionBadge tagging for minor-version availability (new vs changed), version derivation from the core release, release-notes-per-minor placement, versioned /reference/v5/ link paths from other content trees, and feature/docs PR cross-linking. Gives the engineering guidelines a single place to point at for docs mechanics. Co-Authored-By: Claude Fable 5 --- AGENTS.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index a50a847c..5b9c9d7f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -48,6 +48,14 @@ Prefer plain ASCII characters in Markdown unless a typographic character is genu - The `` component is globally registered — no import needed in `.md`/`.mdx` files. - See the complete repository organization in `CONTRIBUTING.md` +## Versioning Content + +- Tag minor-version availability inline: `` for new surface, `` for behavior changes to existing surface. +- Derive the version from the core release the change ships in, stripping prerelease suffixes (`5.1.0-beta.1` → `v5.1.0`). +- Each minor release gets a file under `release-notes//` (e.g. `release-notes/v5-lincoln/5.1.md`); the sidebar picks it up automatically. +- Absolute links from `release-notes/` (or `learn/`) into current reference docs use the versioned path `/reference/v5/...` — the reference plugin maps the current version to the `v5` URL path. +- When documenting a change from a core/pro PR, cross-link the feature PR and the docs PR in both descriptions. + ## Testing - There is no automated test suite. Verification is done by running the dev server or build. From 44bfe49f3fb1f8b323d0939d5929736b593ecb7a Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Thu, 11 Jun 2026 14:36:39 -0700 Subject: [PATCH 7/7] Drop the 5.1 release-notes draft from this PR Release notes will be authored separately (Kris owns them); this PR stays scoped to the reference documentation. Co-Authored-By: Claude Fable 5 --- release-notes/v5-lincoln/5.1.md | 47 --------------------------------- 1 file changed, 47 deletions(-) delete mode 100644 release-notes/v5-lincoln/5.1.md diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md deleted file mode 100644 index c5d4edc3..00000000 --- a/release-notes/v5-lincoln/5.1.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -title: '5.1' ---- - -# 5.1 Release Notes - - - -### Patch Releases - -All patch release notes for 5.1.x are available on the [releases page](https://github.com/HarperFast/harper/releases?q=v5.1&expanded=true). - -## AI & Models - -Harper 5.1 introduces a unified [model-access API](/reference/v5/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change. - -### Models API - -The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/v5/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends. - -### Backends - -Four [backends](/reference/v5/models/backends) ship with Harper 5.1: - -- **Ollama** — local or self-hosted models, embeddings and generation -- **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl` -- **Anthropic** — Claude models for generation and tool calling -- **Amazon Bedrock** — AWS-hosted model families (Anthropic, Meta, Titan, Cohere) via the AWS credential chain - -### Tool calling and the agent loop - -Generation supports [tool calling](/reference/v5/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns. - -### Automatic embeddings with `@embed` - -The new [`@embed` schema directive](/reference/v5/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node. - -### Vector indexing improvements - -- [Int8 quantization](/reference/v5/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering. -- [Per-query `ef`](/reference/v5/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow. -- A `dotProduct` distance function joins `cosine` and `euclidean`. -- Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild. - -### Model-call analytics - -Every model call is recorded in the [`hdb_model_calls` system table](/reference/v5/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend.