From c357e0b5a5ecfe93cc7d5e838da99c118f982ad6 Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 17:42:45 -0700
Subject: [PATCH 1/7] Document the 5.1 AI & Models feature set

New reference section (models/): overview + configuration, embed/generate/
generateStream API, tool calling and the toolMode 'auto' agent loop,
the four bundled backends (ollama, openai, anthropic, bedrock), and
model-call analytics. Adds the @embed directive and 5.1 vector-indexing
additions (int8 quantization, per-query ef, auto-scaled search ef,
dotProduct distance) to the schema reference, corrects the HNSW search
parameter name (efConstructionSearch, previously documented as
efSearchConstruction), and starts the 5.1 release notes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 reference/database/schema.md     |  84 +++++++++++++++++--
 reference/models/analytics.md    |  50 ++++++++++++
 reference/models/api.md          | 109 +++++++++++++++++++++++++
 reference/models/backends.md     | 136 +++++++++++++++++++++++++++++++
 reference/models/overview.md     |  64 +++++++++++++++
 reference/models/tool-calling.md | 104 +++++++++++++++++++++++
 release-notes/v5-lincoln/5.1.md  |  47 +++++++++++
 sidebarsReference.ts             |  33 ++++++++
 8 files changed, 618 insertions(+), 9 deletions(-)
 create mode 100644 reference/models/analytics.md
 create mode 100644 reference/models/api.md
 create mode 100644 reference/models/backends.md
 create mode 100644 reference/models/overview.md
 create mode 100644 reference/models/tool-calling.md
 create mode 100644 release-notes/v5-lincoln/5.1.md
diff --git a/reference/database/schema.md b/reference/database/schema.md
index cb1792d8..20c20568 100644
--- a/reference/database/schema.md
+++ b/reference/database/schema.md
@@ -227,6 +227,34 @@ If the field value is an array, each element in the array is individually indexe
 
 Null values are indexed by default (added in v4.3.0), enabling queries like `GET /Product/?category=null`.
 
+### `@embed`
+
+<VersionBadge version="v5.1.0" />
+
+Automatically computes an embedding vector for the attribute whenever the source field is written, using a configured [embedding model](../models/overview):
+
+```graphql
+type Document @table {
+	id: Long @primaryKey
+	text: String
+	embedding: [Float] @embed(source: "text", model: "default")
+}
+```
+
+- `source` — the name of the field to embed. Must be a declared field on the same type, passed as a string literal.
+- `model` — the logical name of a configured embedding model, passed as a string literal.
+
+The attribute type must be `[Float]`. The attribute is automatically indexed with an [HNSW vector index](#vector-indexing), so it is immediately searchable by similarity; an explicit `@indexed` on the same attribute is allowed only if it is also HNSW.
+
+Write semantics:
+
+- Creating a record with the source field, or updating the source field, computes the vector before the write commits (with `inputType: 'document'`). A failure to compute the embedding fails the write.
+- An update that does not touch the source field leaves the vector unchanged.
+- Setting the source field to `null` sets the vector to `null`.
+- Replicated writes and audit-log replays do not re-embed — the vector travels with the record, and only the node that accepted the original write calls the model.
+
+Multiple `@embed` attributes on one type are computed concurrently.
+
 ### `@createdTime`
 
 Automatically assigns a creation timestamp (Unix epoch milliseconds) to the attribute when a record is created.
@@ -393,6 +421,8 @@ type Document @table {
 }
 ```
 
+Embedding vectors can also be computed automatically at write time from a text field with the [`@embed` directive](#embed), which creates the HNSW index implicitly.
+
 Query by nearest neighbors using the `sort` parameter:
 
 ```javascript
@@ -443,26 +473,62 @@ let results = Document.search({
 
 `$distance` is available in both `sort`-based ranking and `conditions`-based threshold queries.
 
+### Per-Query Search Options
+
+The `sort` descriptor (and threshold condition) accepts options that tune an individual query:
+
+```javascript
+let results = Document.search({
+	sort: { attribute: 'textEmbeddings', target: searchVector, distance: 'dotProduct', ef: 200 },
+	limit: 5,
+});
+```
+
+- `distance` — overrides the index's distance function for this query: `"cosine"`, `"euclidean"`, or `"dotProduct"` (`dotProduct` <VersionBadge version="v5.1.0" />).
+- `ef` <VersionBadge version="v5.1.0" /> — overrides the search exploration budget for this query. Higher values improve recall at the cost of latency.
+
+<VersionBadge type="changed" version="v5.1.0" /> — When a query passes no `ef` and the index does not explicitly configure `efConstructionSearch` (or `efConstruction`), the search budget auto-scales with the size of the index, so recall holds as the table grows instead of decaying with a fixed budget.
+
 ### HNSW Parameters
 
-| Parameter              | Default           | Description                                                                                         |
-| ---------------------- | ----------------- | --------------------------------------------------------------------------------------------------- |
-| `distance`             | `"cosine"`        | Distance function: `"euclidean"` or `"cosine"` (negative cosine similarity)                         |
-| `efConstruction`       | `100`             | Max nodes explored during index construction. Higher = better recall, lower = better performance    |
-| `M`                    | `16`              | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data |
-| `optimizeRouting`      | `0.5`             | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive)          |
-| `mL`                   | computed from `M` | Normalization factor for level generation                                                           |
-| `efSearchConstruction` | `50`              | Max nodes explored during search                                                                    |
+| Parameter              | Default           | Description                                                                                                                                              |
+| ---------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `distance`             | `"cosine"`        | Distance function: `"cosine"` (negative cosine similarity), `"euclidean"`, or `"dotProduct"` (added in v5.1.0)                                           |
+| `efConstruction`       | `100`             | Max nodes explored during index construction. Higher = better recall, lower = better performance                                                         |
+| `M`                    | `16`              | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data                                                      |
+| `optimizeRouting`      | `0.5`             | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive)                                                               |
+| `mL`                   | computed from `M` | Normalization factor for level generation                                                                                                                |
+| `efConstructionSearch` | auto-scaled       | Max nodes explored during search. When unset, auto-scales with index size (see above); setting it (or `efConstruction`, which seeds it) fixes the budget |
+| `quantization`         | —                 | `"int8"` stores vectors quantized to int8 (added in v5.1.0, see below)                                                                                   |
 
 Example with custom parameters:
 
 ```graphql
 type Document @table {
 	id: Long @primaryKey
-	textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100)
+	textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efConstructionSearch: 100)
+}
+```
+
+Note: this parameter was previously documented as `efSearchConstruction`; the option name Harper reads is `efConstructionSearch`.
+
+<VersionBadge type="changed" version="v5.1.0" /> — Changing `efConstructionSearch` on an existing index no longer triggers a rebuild; it only affects searches. Structural parameters (`distance`, `M`, `efConstruction`, `quantization`) still rebuild the index when changed.
+
+### Vector Quantization
+
+<VersionBadge version="v5.1.0" />
+
+`quantization: "int8"` stores the index's vectors quantized to 8-bit integers, substantially reducing index size and memory traffic:
+
+```graphql
+type Document @table {
+	id: Long @primaryKey
+	textEmbeddings: [Float] @indexed(type: "HNSW", quantization: "int8")
 }
 ```
 
+Graph navigation runs on the quantized (approximate) distances. For nearest-neighbor `sort` queries, Harper re-ranks the results against the full-precision vectors stored on the records, restoring exact ordering and exact `$distance` values. Distance-threshold (`lt`/`le`) queries currently filter on the approximate distance.
+
 ## Field Types
 
 Harper supports the following field types:
diff --git a/reference/models/analytics.md b/reference/models/analytics.md
new file mode 100644
index 00000000..8c94f34a
--- /dev/null
+++ b/reference/models/analytics.md
@@ -0,0 +1,50 @@
+---
+id: analytics
+title: Analytics
+---
+
+<!-- Source: harper resources/models/analyticsTable.ts, resources/models/Models.ts (v5.1) -->
+
+<VersionBadge version="v5.1.0" />
+
+Every model call is recorded for observability and usage accounting, at two levels of granularity: a per-call log table for forensics, and aggregate counters in Harper's [general analytics](../analytics/overview) for dashboards and trends.
+
+## Per-call log: `hdb_model_calls`
+
+Each `embed()`, `generate()`, and `generateStream()` call writes one row to the `hdb_model_calls` system table — on success and on failure. With `toolMode: 'auto'`, each backend round inside the loop records its own row (the outer loop itself does not add one).
+
+| Field               | Description                                                                               |
+| ------------------- | ----------------------------------------------------------------------------------------- |
+| `tenant`            | Tenant identifier, when the call carried one                                              |
+| `app`               | Resource path of the calling resource, when called from one                               |
+| `model`             | Logical model name the caller used                                                        |
+| `backend`           | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures |
+| `method`            | `embed`, `generate`, or `generateStream`                                                  |
+| `prompt_tokens`     | Prompt token count, when the backend reported usage                                       |
+| `completion_tokens` | Completion token count, when the backend reported usage                                   |
+| `embedding_tokens`  | Embedding token count, when the backend reported usage                                    |
+| `latency_ms`        | Wall-clock call duration                                                                  |
+| `success`           | Whether the call completed                                                                |
+| `error_code`        | On failure: `backend_error`, `aborted`, `capability_unsupported`, or `backend_not_found`  |
+
+Rows are buffered in memory and flushed every 10 seconds, or immediately once 1,000 rows accumulate; rows older than 90 days are purged. Buffered rows may be lost on abrupt shutdown — treat the table as operational telemetry, not an audit log.
+
+Query it like any table, for example through the operations API:
+
+```json
+{
+	"operation": "search_by_conditions",
+	"database": "system",
+	"table": "hdb_model_calls",
+	"conditions": [{ "search_attribute": "success", "search_type": "equals", "search_value": false }]
+}
+```
+
+## Aggregate metrics
+
+Each call also increments Harper's aggregate analytics (visible in `hdb_raw_analytics` alongside the other [analytics metrics](../analytics/overview)):
+
+- `model-embed`, `model-generate`, `model-generateStream` — call counts
+- `model-embed-tokens`, `model-generate-tokens`, `model-generateStream-tokens` — token totals
+
+Metrics are broken down by backend name, so usage can be charted per provider.
diff --git a/reference/models/api.md b/reference/models/api.md
new file mode 100644
index 00000000..d9355d6e
--- /dev/null
+++ b/reference/models/api.md
@@ -0,0 +1,109 @@
+---
+id: api
+title: API
+---
+
+<!-- Source: harper resources/models/Models.ts, resources/models/types.ts (v5.1) -->
+
+<VersionBadge version="v5.1.0" />
+
+The `models` object exposes three methods. All of them accept an optional `model` option naming the configured logical model to use; when omitted, the logical name `default` is used. Calling a logical name with no configured backend, or asking a backend for a capability it does not support (for example, embeddings from a generation-only backend), throws an error — capability checks run up front, before any request is made.
+
+## embed()
+
+```typescript
+models.embed(input: string | string[], options?: EmbedOpts): Promise<Float32Array[]>
+```
+
+Converts one or more strings into embedding vectors. The result is always an array of `Float32Array`, one per input string, in input order — including when a single string is passed.
+
+```javascript
+import { models } from 'harper';
+
+const [single] = await models.embed('What is Harper?', { inputType: 'query' });
+const batch = await models.embed(['first document', 'second document']);
+```
+
+| Option      | Type                      | Default     | Description                                                                                                                         |
+| ----------- | ------------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------- |
+| `model`     | `string`                  | `'default'` | Logical name of a configured embedding model                                                                                        |
+| `inputType` | `'document'` \| `'query'` | —           | Hint for models that distinguish document embeddings from query embeddings (e.g. `nomic-embed-text`); ignored by models that do not |
+| `signal`    | `AbortSignal`             | —           | Cancels the call; composed with the backend's configured `requestTimeoutMs`                                                         |
+
+## generate()
+
+```typescript
+models.generate(input: GenerateInput, options?: GenerateOpts): Promise<GenerateResult>
+```
+
+Generates a completion. The input may be:
+
+- a `string` — shorthand for a single user message,
+- an array of messages: `{ role: 'system' | 'user' | 'assistant' | 'tool', content: string }`,
+- an object `{ messages, tools?, system? }` — the form required to declare [tools](./tool-calling) or pass a system prompt alongside the messages.
+
+```javascript
+const result = await models.generate(
+	[
+		{ role: 'system', content: 'You are a terse assistant.' },
+		{ role: 'user', content: 'What is an HNSW index?' },
+	],
+	{ temperature: 0.2, maxTokens: 300 }
+);
+console.log(result.content);
+```
+
+| Option           | Type                                         | Default     | Description                                                                                            |
+| ---------------- | -------------------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------ |
+| `model`          | `string`                                     | `'default'` | Logical name of a configured generative model                                                          |
+| `temperature`    | `number`                                     | backend     | Sampling temperature, passed through to the backend                                                    |
+| `maxTokens`      | `number`                                     | backend     | Completion token limit, passed through to the backend                                                  |
+| `responseFormat` | `'text'` \| `'json'` \| `{ schema: object }` | `'text'`    | Structured output. `{ schema }` requests output conforming to a JSON Schema; support varies by backend |
+| `toolMode`       | `'return'` \| `'auto'`                       | `'return'`  | How tool calls are handled — see [Tool Calling](./tool-calling)                                        |
+| `signal`         | `AbortSignal`                                | —           | Cancels the call; composed with the backend's configured `requestTimeoutMs`                            |
+
+Additional options apply only when `toolMode: 'auto'`; they are documented in [Tool Calling](./tool-calling).
+
+### GenerateResult
+
+| Field          | Type                                                           | Description                                                                                                                                  |
+| -------------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
+| `content`      | `string`                                                       | The generated text                                                                                                                           |
+| `finishReason` | `'stop'` \| `'length'` \| `'tool_calls'` \| `'content_filter'` | Why generation stopped, normalized across backends                                                                                           |
+| `toolCalls`    | `ToolCall[]`                                                   | Tool calls the model requested, when `finishReason` is `'tool_calls'` (each `{ id, name, arguments }`, with `arguments` parsed to an object) |
+| `usage`        | `TokenUsage`                                                   | Token usage reported by the backend (`promptTokens`, `completionTokens`, …), when available                                                  |
+| `trace`        | `ToolTraceEntry[]`                                             | Per-tool-invocation trace; only populated by the `toolMode: 'auto'` loop — see [Tool Calling](./tool-calling)                                |
+
+## generateStream()
+
+```typescript
+models.generateStream(input: GenerateInput, options?: GenerateOpts): AsyncIterable<GenerateChunk>
+```
+
+Identical to `generate()` but yields the completion incrementally:
+
+```javascript
+let text = '';
+for await (const chunk of models.generateStream('Write a haiku about databases.')) {
+	if (chunk.deltaContent) text += chunk.deltaContent;
+}
+```
+
+Each chunk may carry:
+
+| Field            | Type                            | Description                                                                                          |
+| ---------------- | ------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| `deltaContent`   | `string`                        | Text appended since the previous chunk                                                               |
+| `deltaToolCalls` | `Partial<ToolCall>[]`           | Tool-call deltas; a backend may deliver the same tool call across several chunks with partial fields |
+| `finishReason`   | same values as `GenerateResult` | Set on the final chunk only                                                                          |
+
+Errors detected before the call starts (unknown model name, missing capability) throw synchronously; errors during generation propagate through the iterable.
+
+## Errors and timeouts
+
+- An unconfigured logical model name throws a not-found error. The error names the missing logical name only — it does not enumerate configured names.
+- A capability mismatch (embedding call to a generation-only backend, tool declarations against a backend without tool support) throws before any request is made.
+- Each backend supports a `requestTimeoutMs` configuration field; when set, it is composed with any caller-provided `signal` so whichever fires first cancels the request.
+- Backend/network failures throw backend-specific errors with sanitized messages.
+
+Every call — successful or failed — is recorded in the [model-call analytics](./analytics).
diff --git a/reference/models/backends.md b/reference/models/backends.md
new file mode 100644
index 00000000..262f5393
--- /dev/null
+++ b/reference/models/backends.md
@@ -0,0 +1,136 @@
+---
+id: backends
+title: Backends
+---
+
+<!-- Source: harper components/ollama, components/openai, components/anthropic, components/bedrock (v5.1) -->
+
+<VersionBadge version="v5.1.0" />
+
+Four model backends ship with Harper. Each model entry in the [`models` configuration](./overview#configuration) selects one with its `backend` field.
+
+| Backend     | Embeddings | Generation | Streaming | Tools           |
+| ----------- | ---------- | ---------- | --------- | --------------- |
+| `ollama`    | ✓          | ✓          | ✓         | —               |
+| `openai`    | ✓          | ✓          | ✓         | ✓               |
+| `anthropic` | —          | ✓          | ✓         | ✓               |
+| `bedrock`   | ✓          | ✓          | ✓         | varies by model |
+
+All backends support these common fields:
+
+| Field              | Description                                                                                          |
+| ------------------ | ---------------------------------------------------------------------------------------------------- |
+| `backend`          | Which backend to use (required)                                                                      |
+| `model`            | Provider-side model identifier (e.g. `gpt-4o`) used when a call does not pass its own `model` option |
+| `requestTimeoutMs` | Per-request timeout in milliseconds; composed with any caller-provided `AbortSignal`                 |
+
+## Ollama
+
+Calls a local or remote [Ollama](https://ollama.com) server. No credentials.
+
+```yaml
+models:
+  embedding:
+    default:
+      backend: ollama
+      host: localhost:11434
+      model: nomic-embed-text:latest
+  generative:
+    local:
+      backend: ollama
+      host: ollama.internal:11434
+      model: mistral:7b
+```
+
+| Field   | Default           | Description                                                                                                     |
+| ------- | ----------------- | --------------------------------------------------------------------------------------------------------------- |
+| `host`  | `localhost:11434` | Ollama server origin. A scheme-less value is treated as `http://`; a full origin (`https://…`) is used as given |
+| `model` | —                 | Ollama model name, e.g. `nomic-embed-text:latest`, `mistral:7b`                                                 |
+
+When embedding with `nomic-embed-text`, the `inputType` option (`'document'` / `'query'`) is applied using the model's task prefixes; other models ignore it.
+
+The Ollama backend does not advertise tool support — declaring tools against it fails up front.
+
+## OpenAI
+
+Calls the OpenAI API — or any service exposing an OpenAI-compatible API with bearer-token authentication, by pointing `baseUrl` at it. This includes vLLM's OpenAI-compatible server, Google's Gemini OpenAI-compatible endpoint, Azure OpenAI's `/openai/v1` endpoint, and hosted gateways such as OpenRouter or Together AI.
+
+```yaml
+models:
+  embedding:
+    default:
+      backend: openai
+      apiKey: ${OPENAI_API_KEY}
+      model: text-embedding-3-large
+  generative:
+    default:
+      backend: openai
+      apiKey: ${OPENAI_API_KEY}
+      model: gpt-4o
+    vllm:
+      backend: openai
+      apiKey: ${VLLM_API_KEY}
+      baseUrl: http://vllm.internal:8000/v1
+      model: meta-llama/Llama-3.1-8B-Instruct
+```
+
+| Field          | Default                     | Description                                                                        |
+| -------------- | --------------------------- | ---------------------------------------------------------------------------------- |
+| `apiKey`       | — (required)                | API key, sent as a bearer token. Use `${VAR}` indirection                          |
+| `baseUrl`      | `https://api.openai.com/v1` | API root; point at any OpenAI-compatible endpoint                                  |
+| `model`        | —                           | Model name, e.g. `gpt-4o`, `text-embedding-3-large`                                |
+| `organization` | —                           | Sent as the `OpenAI-Organization` header, for keys spanning multiple organizations |
+
+`responseFormat: 'json'` maps to OpenAI's JSON mode and `responseFormat: { schema }` to strict structured outputs (`json_schema`); OpenAI-compatible servers vary in their support for these.
+
+## Anthropic
+
+Calls the Anthropic Messages API. Generation only — Anthropic does not offer an embeddings API.
+
+```yaml
+models:
+  generative:
+    claude:
+      backend: anthropic
+      apiKey: ${ANTHROPIC_API_KEY}
+      model: claude-sonnet-4-6
+```
+
+| Field     | Default                     | Description                             |
+| --------- | --------------------------- | --------------------------------------- |
+| `apiKey`  | — (required)                | API key, sent as the `x-api-key` header |
+| `baseUrl` | `https://api.anthropic.com` | API root                                |
+| `model`   | —                           | Model name, e.g. `claude-sonnet-4-6`    |
+
+The Anthropic API requires a completion token limit on every request; when a call does not pass `maxTokens`, Harper sends `4096`.
+
+## Amazon Bedrock
+
+Calls AWS Bedrock. Credentials come from the standard AWS SDK chain (environment variables, shared credentials file, IAM instance/task roles) — there is no `apiKey` field.
+
+The AWS SDK is not bundled with Harper. Install it in your project to use this backend:
+
+```bash
+npm install @aws-sdk/client-bedrock-runtime
+```
+
+```yaml
+models:
+  embedding:
+    titan:
+      backend: bedrock
+      region: us-east-1
+      model: amazon.titan-embed-text-v2:0
+  generative:
+    claude:
+      backend: bedrock
+      region: us-east-1
+      model: anthropic.claude-sonnet-4-5-20250929-v1:0
+```
+
+| Field    | Default      | Description                                                            |
+| -------- | ------------ | ---------------------------------------------------------------------- |
+| `region` | — (required) | AWS region hosting the Bedrock models                                  |
+| `model`  | —            | Bedrock model identifier; the vendor prefix selects the request format |
+
+The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`) determines the request/response format Harper uses. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially.
diff --git a/reference/models/overview.md b/reference/models/overview.md
new file mode 100644
index 00000000..b89c58d6
--- /dev/null
+++ b/reference/models/overview.md
@@ -0,0 +1,64 @@
+---
+id: overview
+title: Models
+---
+
+<!-- Source: harper resources/models/Models.ts, resources/models/types.ts, resources/models/bootstrap.ts (v5.1) -->
+
+<VersionBadge version="v5.1.0" />
+
+Harper provides a unified API for calling AI models — text embeddings and text generation — from application code. Models are configured by an operator under logical names; application code requests a model by its logical name and Harper routes the call to the configured backend (Ollama, OpenAI, Anthropic, or Amazon Bedrock). Swapping providers is a configuration change, not a code change.
+
+The API is exposed as a single process-wide `models` object:
+
+```javascript
+import { models } from 'harper';
+
+const [vector] = await models.embed('What is Harper?');
+const reply = await models.generate('Describe the Harper resource API in one sentence.');
+```
+
+The same object is available as `scope.models` in component scopes and as the `models` global. All three refer to the same instance.
+
+The API surface is three methods:
+
+| Method                                                           | Purpose                                    |
+| ---------------------------------------------------------------- | ------------------------------------------ |
+| [`models.embed(input, options?)`](./api#embed)                   | Convert text to embedding vectors          |
+| [`models.generate(input, options?)`](./api#generate)             | Generate a completion for a prompt or chat |
+| [`models.generateStream(input, options?)`](./api#generatestream) | Stream a completion as it is produced      |
+
+Generation supports [tool calling](./tool-calling), including a built-in agent loop (`toolMode: 'auto'`) that resolves tool calls in-process. Tables can compute embedding vectors automatically at write time with the [`@embed` schema directive](../database/schema#embed), and vectors can be searched with [HNSW vector indexes](../database/schema#vector-indexing). Every model call is recorded for [observability and usage accounting](./analytics).
+
+## Configuration
+
+Models are configured in the `models` section of `harper-config.yaml`, split by capability into `embedding` and `generative` maps. Each key is a logical model name; each entry names a `backend` plus backend-specific settings:
+
+```yaml
+models:
+  embedding:
+    default:
+      backend: ollama
+      host: localhost:11434
+      model: nomic-embed-text:latest
+  generative:
+    default:
+      backend: openai
+      apiKey: ${OPENAI_API_KEY}
+      model: gpt-4o
+    fast:
+      backend: ollama
+      model: mistral:7b
+```
+
+The logical name `default` is used when application code does not pass an explicit `model` option. Calling a logical name that is not configured throws an error.
+
+See [Backends](./backends) for the full set of configuration fields supported by each backend.
+
+### Credentials
+
+String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it.
+
+### Startup behavior
+
+Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward. A misconfigured entry is logged and skipped — it does not prevent Harper from starting or block other model entries.
diff --git a/reference/models/tool-calling.md b/reference/models/tool-calling.md
new file mode 100644
index 00000000..0c0f1174
--- /dev/null
+++ b/reference/models/tool-calling.md
@@ -0,0 +1,104 @@
+---
+id: tool-calling
+title: Tool Calling
+---
+
+<!-- Source: harper resources/models/agentLoop.ts, resources/models/types.ts (v5.1) -->
+
+<VersionBadge version="v5.1.0" />
+
+Generative models can be given tools — functions the model may request calls to while producing a response. Tools are declared on the input object (they are model-facing content, like messages), and the `toolMode` option selects who resolves them:
+
+- **`toolMode: 'return'`** (default) — `generate()` returns as soon as the model requests tool calls; your code dispatches them and continues the conversation.
+- **`toolMode: 'auto'`** — Harper runs an in-process loop: it dispatches each requested tool to a handler you supply, feeds results back to the model, and repeats until the model produces a final answer or a budget is exhausted.
+
+Tool calling requires a backend that supports tools (see the [backend capability table](./backends)). Declaring tools against a backend without tool support fails up front rather than silently dropping the tools.
+
+## Declaring tools
+
+Use the object form of the generation input. Each tool has a name, a description, and a JSON Schema for its arguments:
+
+```javascript
+const input = {
+	system: 'You are a helpful assistant.',
+	messages: [{ role: 'user', content: 'What is the weather in Denver?' }],
+	tools: [
+		{
+			name: 'get_weather',
+			description: 'Get the current weather for a city.',
+			parameters: {
+				type: 'object',
+				properties: { city: { type: 'string' } },
+				required: ['city'],
+			},
+		},
+	],
+};
+```
+
+## toolMode: 'return'
+
+The model's requested calls come back on the result, and `finishReason` is `'tool_calls'`. Your code runs the tools, appends the results as `tool`-role messages, and calls `generate()` again:
+
+```javascript
+const result = await models.generate(input);
+if (result.finishReason === 'tool_calls') {
+	const followUp = [...input.messages, { role: 'assistant', content: result.content, toolCalls: result.toolCalls }];
+	for (const call of result.toolCalls) {
+		const output = await getWeather(call.arguments); // your dispatch
+		followUp.push({ role: 'tool', toolCallId: call.id, content: JSON.stringify(output) });
+	}
+	const final = await models.generate({ ...input, messages: followUp });
+}
+```
+
+`ToolCall.arguments` is always a parsed object — backends that deliver stringified JSON normalize it before returning.
+
+## toolMode: 'auto'
+
+Supply handlers keyed by tool name and Harper resolves the calls in-process:
+
+```javascript
+const result = await models.generate(input, {
+	toolMode: 'auto',
+	toolHandlers: {
+		get_weather: async ({ city }, ctx) => fetchWeather(city, { signal: ctx.signal }),
+	},
+	maxToolIterations: 5,
+	includeToolTrace: true,
+});
+console.log(result.content); // final answer, tool round-trips already resolved
+```
+
+A handler receives the parsed arguments and a context object `{ signal, accounting }`. The `signal` is the composed cancellation signal for the iteration — it fires if the caller aborts or a budget trips, so long-running handlers should honor it. The handler's return value is JSON-serialized and fed back to the model; a thrown error is routed by `toolErrorMode` (below).
+
+`generateStream()` supports `toolMode: 'auto'` as well: content deltas stream out as each round produces them, and `finishReason` is emitted exactly once, on the final chunk of the final round.
+
+### Options
+
+All options below apply only with `toolMode: 'auto'`.
+
+| Option               | Type                          | Default      | Description                                                                                                                                                                                                                        |
+| -------------------- | ----------------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `toolHandlers`       | `Record<string, ToolHandler>` | —            | Dispatch table keyed by tool name. A tool declared in `tools` with no handler here is a configuration error (400)                                                                                                                  |
+| `maxToolIterations`  | `number`                      | `10`         | Hard cap on model → tools → model rounds                                                                                                                                                                                           |
+| `maxToolTokens`      | `number`                      | —            | Cumulative prompt+completion token cap across rounds. Best-effort: requires the backend to report usage; if it does not, Harper warns once and `maxToolIterations` remains the bound. Not supported on `generateStream()` (throws) |
+| `toolParallelism`    | `'parallel'` \| `'serial'`    | `'parallel'` | When one round requests multiple tool calls, run handlers concurrently or in order                                                                                                                                                 |
+| `toolResultMaxBytes` | `number`                      | `65536`      | Per-result byte cap (JSON-stringified). Larger results are truncated with a marker; the model sees the truncated form, the trace records the original size                                                                         |
+| `toolErrorMode`      | `'recover'` \| `'abort'`      | `'recover'`  | `'recover'` feeds a handler error back to the model as the tool result so it can react; `'abort'` stops the loop and throws with the trace attached                                                                                |
+| `includeToolTrace`   | `boolean`                     | `false`      | Populate `result.trace` with one entry per tool invocation (iteration, name, arguments, result size, duration, error)                                                                                                              |
+| `conversation`       | `ConversationAppender`        | —            | Optional persistence hook — see below                                                                                                                                                                                              |
+
+### Budgets and errors
+
+When `maxToolIterations` or `maxToolTokens` is exhausted, the loop throws a budget-exceeded error (HTTP status 429) carrying a `partialTrace` of everything that ran — the trace is attached on error paths regardless of `includeToolTrace`. With `toolErrorMode: 'abort'`, a handler failure throws an error carrying the same trace.
+
+If the model requests a tool name that was never declared in `tools` (a hallucinated tool), the call is treated as a tool error and routed by `toolErrorMode` — with `'recover'`, the model is told the tool is unknown and can correct itself.
+
+### Conversation persistence
+
+The `conversation` option accepts any object with an `append(turn)` method returning a promise. The loop awaits `append` for each new turn it produces — assistant turns (with their tool calls) and tool-result turns — in order, giving the appender back-pressure over the loop. The caller's own input messages are not echoed back through the hook. Appenders should catch their own recoverable failures; a throw from `append` becomes the loop's terminal error.
+
+### Reserved options
+
+`toolArgValidation` (`'strict'` / `'lenient'` JSON Schema validation of tool arguments), `maxCostUsd`, and `conversationId` exist on the type surface but are not functional in 5.1 — the validation modes and streaming token budgets throw a `501` error, and the cost cap has no rate card behind it yet. Don't rely on them.
diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md
new file mode 100644
index 00000000..e93edfd0
--- /dev/null
+++ b/release-notes/v5-lincoln/5.1.md
@@ -0,0 +1,47 @@
+---
+title: '5.1'
+---
+
+# 5.1 Release Notes
+
+<!-- TODO: this file currently covers the AI/models feature set; remaining 5.1 changes to be added before release. -->
+
+### Patch Releases
+
+All patch release notes for 5.1.x are available on the [releases page](https://github.com/HarperFast/harper/releases?q=v5.1&expanded=true).
+
+## AI & Models
+
+Harper 5.1 introduces a unified [model-access API](/reference/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change.
+
+### Models API
+
+The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends.
+
+### Backends
+
+Four [backends](/reference/models/backends) ship with Harper 5.1:
+
+- **Ollama** — local or self-hosted models, embeddings and generation
+- **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl`
+- **Anthropic** — Claude models for generation and tool calling
+- **Amazon Bedrock** — AWS-hosted model families (Anthropic, Meta, Titan, Cohere) via the AWS credential chain
+
+### Tool calling and the agent loop
+
+Generation supports [tool calling](/reference/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns.
+
+### Automatic embeddings with `@embed`
+
+The new [`@embed` schema directive](/reference/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node.
+
+### Vector indexing improvements
+
+- [Int8 quantization](/reference/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering.
+- [Per-query `ef`](/reference/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow.
+- A `dotProduct` distance function joins `cosine` and `euclidean`.
+- Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild.
+
+### Model-call analytics
+
+Every model call is recorded in the [`hdb_model_calls` system table](/reference/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend.
diff --git a/sidebarsReference.ts b/sidebarsReference.ts
index db02b8fb..9ef3ad81 100644
--- a/sidebarsReference.ts
+++ b/sidebarsReference.ts
@@ -89,6 +89,39 @@ const sidebars: SidebarsConfig = {
 				},
 			],
 		},
+		{
+			type: 'category',
+			label: 'AI & Models',
+			collapsible: false,
+			className: 'reference-category-header',
+			items: [
+				{
+					type: 'doc',
+					id: 'models/overview',
+					label: 'Overview',
+				},
+				{
+					type: 'doc',
+					id: 'models/api',
+					label: 'API',
+				},
+				{
+					type: 'doc',
+					id: 'models/tool-calling',
+					label: 'Tool Calling',
+				},
+				{
+					type: 'doc',
+					id: 'models/backends',
+					label: 'Backends',
+				},
+				{
+					type: 'doc',
+					id: 'models/analytics',
+					label: 'Analytics',
+				},
+			],
+		},
 		{
 			type: 'category',
 			label: 'Components',

From 802254c22d9c9692dc9f7bdd2ab5b1643adb778c Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 17:48:43 -0700
Subject: [PATCH 2/7] Docs-accuracy fixes from implementation audit

error_code can also be pending_unsupported (Models.ts pending-status
path is reachable); Bedrock family dispatch also handles mistral.,
and unknown prefixes are rejected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 reference/models/analytics.md | 26 +++++++++++++-------------
 reference/models/backends.md  |  2 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/reference/models/analytics.md b/reference/models/analytics.md
index 8c94f34a..67a7ad89 100644
--- a/reference/models/analytics.md
+++ b/reference/models/analytics.md
@@ -13,19 +13,19 @@ Every model call is recorded for observability and usage accounting, at two leve
 
 Each `embed()`, `generate()`, and `generateStream()` call writes one row to the `hdb_model_calls` system table — on success and on failure. With `toolMode: 'auto'`, each backend round inside the loop records its own row (the outer loop itself does not add one).
 
-| Field               | Description                                                                               |
-| ------------------- | ----------------------------------------------------------------------------------------- |
-| `tenant`            | Tenant identifier, when the call carried one                                              |
-| `app`               | Resource path of the calling resource, when called from one                               |
-| `model`             | Logical model name the caller used                                                        |
-| `backend`           | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures |
-| `method`            | `embed`, `generate`, or `generateStream`                                                  |
-| `prompt_tokens`     | Prompt token count, when the backend reported usage                                       |
-| `completion_tokens` | Completion token count, when the backend reported usage                                   |
-| `embedding_tokens`  | Embedding token count, when the backend reported usage                                    |
-| `latency_ms`        | Wall-clock call duration                                                                  |
-| `success`           | Whether the call completed                                                                |
-| `error_code`        | On failure: `backend_error`, `aborted`, `capability_unsupported`, or `backend_not_found`  |
+| Field               | Description                                                                                                     |
+| ------------------- | --------------------------------------------------------------------------------------------------------------- |
+| `tenant`            | Tenant identifier, when the call carried one                                                                    |
+| `app`               | Resource path of the calling resource, when called from one                                                     |
+| `model`             | Logical model name the caller used                                                                              |
+| `backend`           | Backend that served the call (`ollama`, `openai`, …); `unknown` for pre-dispatch failures                       |
+| `method`            | `embed`, `generate`, or `generateStream`                                                                        |
+| `prompt_tokens`     | Prompt token count, when the backend reported usage                                                             |
+| `completion_tokens` | Completion token count, when the backend reported usage                                                         |
+| `embedding_tokens`  | Embedding token count, when the backend reported usage                                                          |
+| `latency_ms`        | Wall-clock call duration                                                                                        |
+| `success`           | Whether the call completed                                                                                      |
+| `error_code`        | On failure: `backend_error`, `aborted`, `capability_unsupported`, `backend_not_found`, or `pending_unsupported` |
 
 Rows are buffered in memory and flushed every 10 seconds, or immediately once 1,000 rows accumulate; rows older than 90 days are purged. Buffered rows may be lost on abrupt shutdown — treat the table as operational telemetry, not an audit log.
 
diff --git a/reference/models/backends.md b/reference/models/backends.md
index 262f5393..4fe4b7d3 100644
--- a/reference/models/backends.md
+++ b/reference/models/backends.md
@@ -133,4 +133,4 @@ models:
 | `region` | — (required) | AWS region hosting the Bedrock models                                  |
 | `model`  | —            | Bedrock model identifier; the vendor prefix selects the request format |
 
-The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`) determines the request/response format Harper uses. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially.
+The model identifier's vendor prefix (`anthropic.`, `meta.`, `amazon.titan-`, `cohere.`, `mistral.`) determines the request/response format Harper uses; an unrecognized prefix is rejected with an error. Tool support depends on the underlying model family. Bedrock embedding APIs accept one text per request, so batch `embed()` calls are issued sequentially.

From bc180b10a2aedf2c5159e9918df8166b69a403d1 Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 17:55:27 -0700
Subject: [PATCH 3/7] Correct startup-validation semantics for model entries

Config-file (Joi) validation of the models block is boot-blocking for
structurally invalid entries; only registration-time errors are
warn-and-skip. Also note ${VAR} indirection is string-fields-only.
Surfaced by the models-subsystem deep review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 reference/models/overview.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/reference/models/overview.md b/reference/models/overview.md
index b89c58d6..cfa74068 100644
--- a/reference/models/overview.md
+++ b/reference/models/overview.md
@@ -57,8 +57,10 @@ See [Backends](./backends) for the full set of configuration fields supported by
 
 ### Credentials
 
-String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it.
+String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values.
 
 ### Startup behavior
 
-Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward. A misconfigured entry is logged and skipped — it does not prevent Harper from starting or block other model entries.
+Model entries are registered when Harper boots, before components load, so `models` is usable from component initialization onward.
+
+Model entries are validated with the rest of the configuration file at startup: a structurally invalid entry — a missing required field such as `apiKey`, an unrecognized field name, or a wrong value type — fails configuration validation and prevents Harper from starting, like any other configuration error. Errors at registration time (for example, an unrecognized `backend` name) are logged and skipped without blocking startup or other model entries.

From caba90ab0ca449c69bf2ed049dff10027527fed7 Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 18:05:33 -0700
Subject: [PATCH 4/7] Clarify unresolved env-var placeholder behavior per field
 type

Only credential fields reject unresolved ${VAR} placeholders at
startup; host/model/region carry them into requests literally.
Surfaced by the models-subsystem deep review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 reference/models/overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reference/models/overview.md b/reference/models/overview.md
index cfa74068..d47cdc89 100644
--- a/reference/models/overview.md
+++ b/reference/models/overview.md
@@ -57,7 +57,7 @@ See [Backends](./backends) for the full set of configuration fields supported by
 
 ### Credentials
 
-String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is and the backend's required-field validation reports it. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values.
+String values in model entries support environment-variable indirection with `${VAR_NAME}` syntax, resolved at startup. Use this for API keys rather than placing the literal key in the configuration file — Harper logs a warning at startup when a credential field contains a literal value. If the referenced environment variable is unset, the placeholder is left as-is; for credential fields the backend rejects the unresolved placeholder at startup, while other fields (such as `host` or `model`) carry the literal placeholder into requests — surfacing as per-request failures rather than a startup error. Indirection applies to string-typed fields only; numeric fields such as `requestTimeoutMs` must be literal values.
 
 ### Startup behavior
 

From 803530d4a7d940643af61aefbf44b110bd9ffffe Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 20:31:27 -0700
Subject: [PATCH 5/7] Fix release-notes links to versioned reference path

Current reference docs are served at /reference/v5/ (versions config
maps current -> path 'v5'); the PR preview's broken-link check caught
the unversioned /reference/ links.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 release-notes/v5-lincoln/5.1.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md
index e93edfd0..c5d4edc3 100644
--- a/release-notes/v5-lincoln/5.1.md
+++ b/release-notes/v5-lincoln/5.1.md
@@ -12,15 +12,15 @@ All patch release notes for 5.1.x are available on the [releases page](https://g
 
 ## AI & Models
 
-Harper 5.1 introduces a unified [model-access API](/reference/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change.
+Harper 5.1 introduces a unified [model-access API](/reference/v5/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change.
 
 ### Models API
 
-The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends.
+The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/v5/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends.
 
 ### Backends
 
-Four [backends](/reference/models/backends) ship with Harper 5.1:
+Four [backends](/reference/v5/models/backends) ship with Harper 5.1:
 
 - **Ollama** — local or self-hosted models, embeddings and generation
 - **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl`
@@ -29,19 +29,19 @@ Four [backends](/reference/models/backends) ship with Harper 5.1:
 
 ### Tool calling and the agent loop
 
-Generation supports [tool calling](/reference/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns.
+Generation supports [tool calling](/reference/v5/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns.
 
 ### Automatic embeddings with `@embed`
 
-The new [`@embed` schema directive](/reference/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node.
+The new [`@embed` schema directive](/reference/v5/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node.
 
 ### Vector indexing improvements
 
-- [Int8 quantization](/reference/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering.
-- [Per-query `ef`](/reference/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow.
+- [Int8 quantization](/reference/v5/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering.
+- [Per-query `ef`](/reference/v5/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow.
 - A `dotProduct` distance function joins `cosine` and `euclidean`.
 - Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild.
 
 ### Model-call analytics
 
-Every model call is recorded in the [`hdb_model_calls` system table](/reference/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend.
+Every model call is recorded in the [`hdb_model_calls` system table](/reference/v5/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend.

From fe0cc20d9635c41f58b2228e0fc16a6a30dc7845 Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Tue, 9 Jun 2026 20:50:08 -0700
Subject: [PATCH 6/7] AGENTS.md: add Versioning Content conventions

VersionBadge tagging for minor-version availability (new vs changed),
version derivation from the core release, release-notes-per-minor
placement, versioned /reference/v5/ link paths from other content
trees, and feature/docs PR cross-linking. Gives the engineering
guidelines a single place to point at for docs mechanics.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 AGENTS.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index a50a847c..5b9c9d7f 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -48,6 +48,14 @@ Prefer plain ASCII characters in Markdown unless a typographic character is genu
 - The `<VersionBadge>` component is globally registered — no import needed in `.md`/`.mdx` files.
 - See the complete repository organization in `CONTRIBUTING.md`
 
+## Versioning Content
+
+- Tag minor-version availability inline: `<VersionBadge version="vX.Y.0" />` for new surface, `<VersionBadge type="changed" version="vX.Y.0" />` for behavior changes to existing surface.
+- Derive the version from the core release the change ships in, stripping prerelease suffixes (`5.1.0-beta.1` → `v5.1.0`).
+- Each minor release gets a file under `release-notes/<major-codename>/` (e.g. `release-notes/v5-lincoln/5.1.md`); the sidebar picks it up automatically.
+- Absolute links from `release-notes/` (or `learn/`) into current reference docs use the versioned path `/reference/v5/...` — the reference plugin maps the current version to the `v5` URL path.
+- When documenting a change from a core/pro PR, cross-link the feature PR and the docs PR in both descriptions.
+
 ## Testing
 
 - There is no automated test suite. Verification is done by running the dev server or build.

From 44bfe49f3fb1f8b323d0939d5929736b593ecb7a Mon Sep 17 00:00:00 2001
From: Nathan Heskew <nathan@harperdb.io>
Date: Thu, 11 Jun 2026 14:36:39 -0700
Subject: [PATCH 7/7] Drop the 5.1 release-notes draft from this PR

Release notes will be authored separately (Kris owns them); this PR
stays scoped to the reference documentation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 release-notes/v5-lincoln/5.1.md | 47 ---------------------------------
 1 file changed, 47 deletions(-)
 delete mode 100644 release-notes/v5-lincoln/5.1.md

diff --git a/release-notes/v5-lincoln/5.1.md b/release-notes/v5-lincoln/5.1.md
deleted file mode 100644
index c5d4edc3..00000000
--- a/release-notes/v5-lincoln/5.1.md
+++ /dev/null
@@ -1,47 +0,0 @@
----
-title: '5.1'
----
-
-# 5.1 Release Notes
-
-<!-- TODO: this file currently covers the AI/models feature set; remaining 5.1 changes to be added before release. -->
-
-### Patch Releases
-
-All patch release notes for 5.1.x are available on the [releases page](https://github.com/HarperFast/harper/releases?q=v5.1&expanded=true).
-
-## AI & Models
-
-Harper 5.1 introduces a unified [model-access API](/reference/v5/models/overview) for calling AI models from application code. Models are configured under logical names and served by pluggable backends, so switching providers is a configuration change rather than a code change.
-
-### Models API
-
-The new `models` object (importable from `harper`, also available as `scope.models`) exposes [`embed()`, `generate()`, and `generateStream()`](/reference/v5/models/api) for text embeddings, completions, and streamed completions. Calls accept per-call options (temperature, token limits, structured output via JSON Schema, abort signals) and normalize results — finish reasons, token usage, tool calls — across backends.
-
-### Backends
-
-Four [backends](/reference/v5/models/backends) ship with Harper 5.1:
-
-- **Ollama** — local or self-hosted models, embeddings and generation
-- **OpenAI** — including any OpenAI-compatible endpoint (vLLM, Azure OpenAI, OpenRouter, Together AI) via `baseUrl`
-- **Anthropic** — Claude models for generation and tool calling
-- **Amazon Bedrock** — AWS-hosted model families (Anthropic, Meta, Titan, Cohere) via the AWS credential chain
-
-### Tool calling and the agent loop
-
-Generation supports [tool calling](/reference/v5/models/tool-calling). With `toolMode: 'return'` the caller dispatches tool calls; with `toolMode: 'auto'` Harper runs the model → tools → model loop in-process with caller-supplied handlers, budget caps (iterations, tokens), configurable parallelism and error recovery, result-size truncation, an optional per-invocation trace, and a hook for persisting conversation turns.
-
-### Automatic embeddings with `@embed`
-
-The new [`@embed` schema directive](/reference/v5/database/schema#embed) computes a record's embedding vector at write time from a source field, using a configured embedding model, and implicitly maintains an HNSW vector index on the attribute. Replicated writes carry the vector rather than re-embedding on every node.
-
-### Vector indexing improvements
-
-- [Int8 quantization](/reference/v5/database/schema#vector-quantization) (`quantization: "int8"`) shrinks HNSW indexes, with nearest-neighbor results re-ranked against full-precision vectors to preserve exact ordering.
-- [Per-query `ef`](/reference/v5/database/schema#per-query-search-options) overrides the search exploration budget on individual queries, and the default search budget now auto-scales with index size so recall holds as tables grow.
-- A `dotProduct` distance function joins `cosine` and `euclidean`.
-- Search-only parameter changes (`efConstructionSearch`) no longer trigger an index rebuild.
-
-### Model-call analytics
-
-Every model call is recorded in the [`hdb_model_calls` system table](/reference/v5/models/analytics) (per-call latency, tokens, success/error), with aggregate call and token counters emitted to Harper's analytics, broken down by backend.