diff --git a/README.md b/README.md
index 386d77e..40a5866 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,16 @@
# api-test
+[](https://github.com/plexara/api-test/actions/workflows/ci.yml)
+[](https://github.com/plexara/api-test/actions/workflows/codeql.yml)
+[](https://pkg.go.dev/github.com/plexara/api-test)
+[](LICENSE)
+
A controllable HTTP REST fixture used to exercise API gateways (Plexara's
in particular). Sister project to [mcp-test](../mcp-test), which plays the
same role for the MCP gateway.
+📖 **Documentation: **
+
## Why
Plexara MCP exposes two gateway capabilities:
@@ -17,9 +24,29 @@ Plexara MCP exposes two gateway capabilities:
`api-test` is the upstream HTTP fixture the API gateway calls. Endpoints
are deliberately simple and deterministic; their job is not to compute
anything useful, it's to make the gateway's behavior observable. Every
-request will be recorded in a Postgres-backed audit log so you can
-compare what a client sent through Plexara, what reached this server, and
-what came back.
+request is recorded in a Postgres-backed audit log so you can compare
+what a client sent through Plexara, what reached this server, and what
+came back.
+
+### Why not httpbin / mockoon / Prism?
+
+Those are great mocks. api-test is a different shape:
+
+- **Audit log of every request** — sanitized headers, query params,
+ request and response bodies, identity, latency, status — queryable in
+ Postgres and browsable from the embedded portal. Mocks tell you they
+ served a request; api-test tells you *what the gateway sent*.
+- **Real inbound auth** — file API keys, bcrypt-hashed Postgres-backed
+ keys, static bearer tokens, and OIDC JWT validation. Mocks let
+ anything through; api-test rejects bad credentials the way a real
+ upstream does.
+- **Gateway-specific endpoint groups** — one endpoint per pagination
+ cursor style the gateway recognizes; one endpoint per security probe
+ the gateway should reject; failure modes with seeded determinism so
+ retry/timeout tests are reproducible.
+- **In-tree OpenAPI** — every route is published at `/openapi.json`,
+ generated from the same metadata the portal uses, so the gateway's
+ `api_list_endpoints` tool sees an exact contract.
## Endpoint groups
@@ -33,12 +60,39 @@ what came back.
for retry/timeout policy testing.
- **echo** — `ANY /v1/echo`. Generic catch-all that returns the request
verbatim (with auth headers redacted).
-
-Coming in later milestones: streaming (chunked, SSE, NDJSON), pagination
-(Link, OData, cursor variants), method matrix, security probes, export
-(large/long-running targets for `api_export`), the OpenAPI document,
-inbound auth (bearer/api_key/OAuth2), audit log, web portal, mkdocs
-site, and CI/release tooling.
+- **streaming** — chunked, SSE, and NDJSON variants for stream-proxy
+ testing.
+- **pagination** — RFC 5988 Link, OData v4, and opaque-cursor styles;
+ same synthetic dataset under all three so cross-style assertions are
+ bit-equal.
+- **methods** — HTTP method matrix (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS)
+ for verifying method pass-through.
+- **security** — SSRF / path-traversal / admin-prefix probes the
+ gateway should reject upstream.
+- **export** — long-running / large-payload targets for the gateway's
+ `api_export` tool.
+
+## Web portal
+
+A React SPA embedded into the binary (`/portal/`) gives you a browsable
+audit log with filters, charts, request/response payload drawers, an
+endpoint catalog, and a Try-It form that proxies through the same auth
+chain as `/v1/*`. Sign in with OIDC or paste an API key.
+
+
+
+## Prerequisites
+
+- **Go 1.26+** for `go run` / `make build`.
+- **Docker** for the full stack (Postgres + Keycloak via
+ `docker-compose.dev.yml`) and for the integration test suite
+ (testcontainers).
+- **pnpm + Node 20+** only if you change the SPA (`make ui`). The
+ pre-built bundle ships embedded; you can run the binary without
+ Node installed.
+
+Postgres is **optional**: `make dev-anon` runs the binary with no DB,
+no audit, no portal — fastest loop for endpoint work.
## Quickstart
@@ -51,29 +105,41 @@ curl -s http://localhost:8080/v1/status/418
curl -s -X POST http://localhost:8080/v1/echo -H 'Content-Type: application/json' -d '{"hi":1}'
```
-`make dev-anon` does the same. `make build` produces `./bin/api-test`.
+- `make dev-anon` — same thing, anonymous mode, Postgres-only (no Keycloak).
+- `make dev` — full stack: Postgres + Keycloak + portal. First run writes
+ `.env.dev` with random API-key + cookie secrets (gitignored, reused).
+- `make build` — produces `./bin/api-test`.
## Tests
```bash
-go test ./... # unit + in-memory tests; no Docker required
-make test # alias: go test -race -count=1 ./...
-make verify # CI-equivalent: fmt, vet, test, lint, security, coverage gate
+make test # unit + in-memory; go test -race -count=1 ./...
+make integration # //go:build integration; testcontainers Postgres (needs Docker)
+make verify # CI-equivalent gate: fmt + vet + lint + test + security + coverage + codeql
```
-Integration tests requiring testcontainers Postgres land in.
+`make verify` is the single source of truth for "is this tree
+shippable" — same commands CI runs. The pre-commit hook reads
+`.claude/.last-verify-passed` (written only after a full pass).
## Layout
```
cmd/api-test # binary entry
-internal/server # composition root (config + endpoints + httpsrv)
-pkg/build # version metadata stamped at link time
+internal/server # composition root (config + endpoints + httpsrv + portal)
+internal/ui # //go:embed all:dist — SPA bundle
+pkg/apikeys # Postgres-backed bcrypt API keys
+pkg/audit # Event/Payload model, AsyncLogger, in-memory + Postgres stores
+pkg/auth/inbound # APIKey / Bearer / OIDC authenticators + Chain
pkg/config # YAML loader + ${VAR:-default} env interpolation
-pkg/endpoints # Endpoints interface + registry
-pkg/endpoints/{...} # one package per group (identity, data, failure, echo)
-pkg/httpsrv # HTTP mux composition + health/readiness + CORS
-configs/ # *.dev.yaml, *.live.yaml, *.example.yaml
+pkg/database # pgxpool wrapper + golang-migrate runner
+pkg/endpoints/{...} # identity, data, failure, echo, streaming, pagination,
+ # methods, security, export
+pkg/httpmw # RequestID, AccessLog, Identity, Audit middleware
+pkg/httpsrv # mux composition + portal API + SPA serving + health
+pkg/oapi # in-tree OpenAPI 3.x generator (reflection-based)
+configs/ # *.dev.yaml (anon), *.live.yaml (full), *.example.yaml
+ui/ # React + Vite + Tailwind portal source
```
## License
diff --git a/docs/configuration/auth.md b/docs/configuration/auth.md
index 89b368c..687d406 100644
--- a/docs/configuration/auth.md
+++ b/docs/configuration/auth.md
@@ -24,6 +24,12 @@ chain returns 401 immediately. This prevents accidental cross-mode
matches (a typo'd JWT shouldn't accidentally pass the static-bearer
list).
+To verify the chain end-to-end, hit
+[`GET /v1/whoami`](../endpoints/identity.md#whoami) — it echoes the
+resolved `auth_type` and `subject`, so you can confirm the credential
+the gateway is actually sending. The auth pipeline diagram and the
+data-flow notes live in [Architecture › Auth chain](../reference/architecture.md#auth-chain).
+
## File API keys
Simplest, no DB required.
@@ -153,6 +159,15 @@ safe to run with anonymous + a few static keys: clients that send a
valid key get their identity, clients that send nothing get anonymous,
clients that send a bad key get 401.
+!!! warning "Don't expect bad-credential demotion"
+ `allow_anonymous: true` is **not** "let anything in." A typo'd API
+ key, an expired bearer token, or a JWT signed by the wrong key all
+ still return 401. The anonymous fallback only fires when there is
+ no credential header at all. If you want to allow truly
+ unauthenticated callers from a script while still allowing keyed
+ callers, make sure the script sends no `X-API-Key` or
+ `Authorization` header — not a placeholder.
+
## Portal browser login
The portal uses a standard OIDC PKCE flow: hit `/portal/`, redirect to
diff --git a/docs/configuration/reference.md b/docs/configuration/reference.md
index 19e089c..eab8e8e 100644
--- a/docs/configuration/reference.md
+++ b/docs/configuration/reference.md
@@ -32,9 +32,24 @@ This page is the human-friendly tour.
| Key | Default | Description |
| --- | --- | --- |
-| `allow_anonymous` | `false` | Falls back to anonymous identity when no inbound credential matches. |
-| `require_for_api` | `false` | **Reserved** for per-surface gating; the inbound chain is currently gated by `allow_anonymous` alone. The shipped `live.yaml` opts in to `true`. |
-| `require_for_portal` | `false` | **Reserved**, same shape. The shipped `live.yaml` opts in to `true`. |
+| `allow_anonymous` | `false` | Falls back to anonymous identity when no inbound credential matches. The single switch that today gates **all** unauthenticated access — see note below. |
+| `require_for_api` | `false` | **Loaded but not yet wired.** Intended for per-surface gating once the API and portal can require auth independently. The shipped `live.yaml` opts in to `true` so existing configs keep working when the gate lands. |
+| `require_for_portal` | `false` | **Loaded but not yet wired**, same shape as above. |
+
+!!! note "What actually gates each surface today"
+ - **`/v1/*`** — the [inbound auth chain](auth.md). When
+ `allow_anonymous: false`, every endpoint requires a credential
+ (API key, bearer, or OIDC); a missing credential returns 401.
+ When `allow_anonymous: true`, missing credentials get an
+ anonymous identity, but a *bad* credential still 401s.
+ - **`/portal/`** — `portal.enabled` mounts it; the SPA itself is
+ reachable without a session, but the portal API
+ (`/api/v1/portal/*`) requires a session cookie or an API key.
+ Sign in via OIDC (`oidc.enabled`) or paste an API key from the
+ file/DB store on the portal sign-in screen.
+ - **Health and well-known** (`/healthz`, `/readyz`,
+ `/.well-known/*`) — never gated; live outside both the auth
+ chain and the audit middleware.
## `api_keys`
diff --git a/docs/endpoints/methods.md b/docs/endpoints/methods.md
index fc26d5b..1562c7b 100644
--- a/docs/endpoints/methods.md
+++ b/docs/endpoints/methods.md
@@ -12,35 +12,127 @@ HTTP method survives the proxy hop unchanged.
| --- | --- | --- |
| `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `HEAD`, `OPTIONS` | `/v1/method/echo` | `{ "method": "POST", "path": "/v1/method/echo", "query": {...} }` |
-`HEAD` returns headers only (per RFC 7231). `OPTIONS` returns the body
-plus an `Allow` header listing every supported verb.
+The same handler serves every verb. The supported matrix is fixed at
+seven verbs — the gateway-testing patterns these probe are mostly
+about the seven common-case verbs surviving proxy traversal.
-`CONNECT` and `TRACE` are not registered; Go's `http.ServeMux` answers
-them with `405 Method Not Allowed` because other verbs are registered
-for the same path.
+## Response shape
+
+```json
+{
+ "method": "PATCH",
+ "path": "/v1/method/echo",
+ "query": { "foo": ["1", "2"] }
+}
+```
+
+| Field | Notes |
+| --- | --- |
+| `method` | The verb the server observed. Should equal the verb the client sent — that's the assertion. |
+| `path` | Always `/v1/method/echo`. Confirms the gateway didn't rewrite the path along with the method. |
+| `query` | The parsed query string. `omitempty` — absent on requests with no query. Use this to assert the gateway preserved query params under unusual verbs (e.g., a `DELETE` with a `?reason=...` parameter). |
+
+Two verbs have special-case behavior:
+
+- **`HEAD`** — returns headers only; the body is suppressed at the HTTP
+ layer (Go's `http.ResponseWriter` automatically discards the body on
+ `HEAD`). The response is byte-equivalent to the `GET` headers
+ otherwise. Use `curl -I` or `curl -is | head -1` to inspect.
+- **`OPTIONS`** — returns the body plus an `Allow` header listing every
+ supported verb:
+
+ ```http
+ HTTP/1.1 200 OK
+ Allow: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS
+ Content-Type: application/json
+ ```
+
+ Useful to confirm the gateway forwards `OPTIONS` rather than
+ intercepting it for CORS handling.
+
+## Unregistered verbs
+
+`CONNECT`, `TRACE`, `LINK`, `UNLINK`, `PROPFIND`, and other less-common
+verbs are not registered. Go's `http.ServeMux` answers them with
+`405 Method Not Allowed` because *other* verbs are registered for the
+same path. The 405 itself is informative — it tells you the gateway
+forwarded the request, just to a path that doesn't accept that verb.
+
+```bash
+curl -is -X CONNECT http://localhost:8080/v1/method/echo | head -1
+# HTTP/1.1 405 Method Not Allowed
+```
+
+If the gateway *blocks* `CONNECT`/`TRACE` upstream (most should), you
+won't see a 405 — you'll see whatever the gateway returns for a
+blocked verb. That's also a useful signal.
## Examples
```bash
+# Verb preservation
curl -s -X PATCH http://localhost:8080/v1/method/echo
# {"method":"PATCH","path":"/v1/method/echo"}
+# HEAD: headers only, no body
curl -is -X HEAD http://localhost:8080/v1/method/echo | head -1
# HTTP/1.1 200 OK
+# OPTIONS: body + Allow header
curl -is -X OPTIONS http://localhost:8080/v1/method/echo
# HTTP/1.1 200 OK
# Allow: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS
+# Content-Type: application/json
+# ...
+# {"method":"OPTIONS","path":"/v1/method/echo"}
+
+# Query preservation under non-GET
+curl -s -X DELETE 'http://localhost:8080/v1/method/echo?reason=cleanup&id=42'
+# {"method":"DELETE","path":"/v1/method/echo","query":{"id":["42"],"reason":["cleanup"]}}
-curl -s -X CONNECT http://localhost:8080/v1/method/echo
-# (405 Method Not Allowed)
+# Unregistered verb
+curl -is -X CONNECT http://localhost:8080/v1/method/echo | head -1
+# HTTP/1.1 405 Method Not Allowed
```
+## What to assert
+
+For a gateway proxying api-test:
+
+| Assertion | Means |
+| --- | --- |
+| Response `method` equals the client's verb | Gateway preserved the verb verbatim. |
+| Response `query` matches the client's query string | Gateway didn't strip or reorder query params under this verb. |
+| `OPTIONS` returns 200 with `Allow` header | Gateway didn't swallow the response inside a CORS pre-flight handler. |
+| `HEAD` returns 200 with no body | Gateway didn't substitute a `GET` body on a `HEAD` response. |
+
+## Audit-log perspective
+
+Each verb registers as its own `EndpointMeta` (`method_get`,
+`method_post`, …). The shared handler means the same Go code services
+all seven, but the audit row's `route_name` carries the verb-specific
+name, so you can `GROUP BY route_name` to count calls per verb:
+
+```sql
+SELECT route_name, count(*)
+FROM audit_events
+WHERE endpoint_group = 'methods'
+ AND ts > now() - interval '1 hour'
+GROUP BY route_name
+ORDER BY 2 DESC;
+```
+
+If you expected the client to send 50 `PATCH`es through the gateway and
+the count comes back showing 50 `POST`es, the gateway is rewriting the
+verb — that's exactly the kind of finding this group is built to make
+visible.
+
## Why this exists
Gateway proxies sometimes break verbs in subtle ways: rewriting `PATCH`
to `POST` to fit a stricter client library, swallowing `OPTIONS`
-pre-flight responses inside a CORS layer, or refusing `HEAD` because
-the upstream handler doesn't register it explicitly. This endpoint
-exposes every verb at one path so a tester can spot any of those
-rewrites with a single curl loop.
+pre-flight responses inside a CORS layer, refusing `HEAD` because the
+upstream handler doesn't register it explicitly, or stripping query
+strings on verbs that "shouldn't have a body so probably shouldn't have
+query either." This endpoint exposes every verb at one path so a
+tester can spot any of those rewrites with a single curl loop.
diff --git a/docs/endpoints/overview.md b/docs/endpoints/overview.md
index ffafeb7..07a340c 100644
--- a/docs/endpoints/overview.md
+++ b/docs/endpoints/overview.md
@@ -33,11 +33,11 @@ call, the body it got back is bit-for-bit predictable.
| [Data](data.md) | Deterministic bodies for caching / dedup / size handling. |
| [Failure](failure.md) | Controlled error codes, latency, seeded flake. |
| [Echo](echo.md) | Generic catch-all that returns the request verbatim. |
-| Streaming | Chunked, SSE, NDJSON responses. |
-| Pagination | One endpoint per cursor style the gateway recognizes. |
-| Methods | Method matrix on `/v1/method/echo`. |
-| Security | Probe targets the gateway should refuse to forward. |
-| Export | Large/long-running targets exercising `api_export`. |
+| [Streaming](streaming.md) | Chunked, SSE, NDJSON responses. |
+| [Pagination](pagination.md) | One endpoint per cursor style the gateway recognizes. |
+| [Methods](methods.md) | Method matrix on `/v1/method/echo`. |
+| [Security](security.md) | Probe targets the gateway should refuse to forward. |
+| [Export](export.md) | Large/long-running targets exercising `api_export`. |
## Toggling groups
diff --git a/docs/endpoints/security.md b/docs/endpoints/security.md
index 8c49c56..2a19768 100644
Binary files a/docs/endpoints/security.md and b/docs/endpoints/security.md differ
diff --git a/docs/getting-started/overview.md b/docs/getting-started/overview.md
index cf8d2ef..ad8e9f2 100644
--- a/docs/getting-started/overview.md
+++ b/docs/getting-started/overview.md
@@ -81,8 +81,8 @@ use mcp-test to validate MCP-gateway behavior.
- [Installation](installation.md) — download the binary or pull the
container image.
-- [Quickstart](quickstart.md) — `make dev` runs the binary in
- anonymous mode today; the full Postgres + Keycloak + portal stack
- lands with.
+- [Quickstart](quickstart.md) — `make dev` brings up the full
+ Postgres + Keycloak + portal stack; `make dev-anon` skips both for
+ fastest iteration.
- [Register with Plexara](register-with-plexara.md) — wire api-test in
as a connection.
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
index 7519f3a..1141ee1 100644
--- a/docs/getting-started/quickstart.md
+++ b/docs/getting-started/quickstart.md
@@ -11,24 +11,21 @@ cd api-test
make dev
```
-Today, `make dev` is aliased to `make dev-anon` and runs the binary
-against `configs/api-test.dev.yaml` (anonymous, no Postgres, no
-Keycloak). That's the happy path; everything below the
-`/healthz` row in the table works.
+`make dev` spins up the full local stack: starts Postgres + Keycloak
+via `docker-compose.dev.yml`, waits for both to be ready, builds the
+SPA into `internal/ui/dist/` if missing, and runs the binary against
+`configs/api-test.live.yaml`. On the first run it writes `.env.dev`
+with random cookie / API-key / bearer secrets (gitignored, reused on
+subsequent runs).
+
+For the fastest iteration loop without standing up Postgres or
+Keycloak, use `make dev-anon` — anonymous mode, no audit, no portal:
```text
-make dev → go run ./cmd/api-test --config configs/api-test.dev.yaml
+make dev-anon → go run ./cmd/api-test --config configs/api-test.dev.yaml
```
-The full Postgres + Keycloak + portal stack lands with. Once shipped,
-`make dev` will spin up the compose stack
-(`docker-compose.dev.yml`), poll containers, build the SPA into
-`internal/ui/dist`, and run the binary against
-`configs/api-test.live.yaml`. The
-[milestone status](../reference/releases.md) tracks where each piece
-stands.
-
-When it's up (today, anonymous mode):
+When the stack is up:
@@ -80,10 +77,10 @@ EOF
go run ./cmd/api-test --config /tmp/api-test-auth.yaml
```
-`make dev-secrets` (already in the Makefile) writes a gitignored
-`.env.dev` with random `APITEST_DEV_KEY` / `APITEST_DEV_BEARER` /
-`APITEST_COOKIE_SECRET` values; M3's full `make dev` will source it
-automatically.
+`make dev-secrets` (idempotent — only writes if missing) creates a
+gitignored `.env.dev` with random `APITEST_DEV_KEY` /
+`APITEST_DEV_BEARER` / `APITEST_COOKIE_SECRET` values; `make dev`
+sources it automatically.
## Verify it works
@@ -123,8 +120,15 @@ query string), or `-H "Authorization: Bearer dev-bearer-1"`.
## Stop the stack
-In the foreground binary's terminal: `Ctrl-C`. Once M3 lands, `make
-dev-down` will also tear down the compose stack.
+In the foreground binary's terminal: `Ctrl-C`. To tear down the
+Postgres + Keycloak containers as well:
+
+```bash
+make dev-down # stops containers, keeps volumes (Postgres data persists)
+```
+
+Add `-v` to the underlying compose command if you want to wipe the
+audit history along with the containers.
## Next
diff --git a/docs/operations/audit.md b/docs/operations/audit.md
index 8586fb9..fe4ed58 100644
--- a/docs/operations/audit.md
+++ b/docs/operations/audit.md
@@ -15,7 +15,10 @@ The pipeline is async: the request handler enqueues into a buffered
channel; a background goroutine drains into Postgres. A stalled DB
can never inflate request latency. On a full buffer the event is
*dropped* and counted (logged every 1000th drop). For lossless audit,
-size the buffer for your peak rate.
+size the buffer for your peak rate. See
+[Architecture › Audit pipeline](../reference/architecture.md#audit-pipeline)
+for the data-flow diagram, and [Deployment › Logging](deployment.md#logging)
+for how to correlate audit rows with the access log via `request_id`.
## Schema
diff --git a/docs/operations/deployment.md b/docs/operations/deployment.md
index 85e077e..8fcc40d 100644
--- a/docs/operations/deployment.md
+++ b/docs/operations/deployment.md
@@ -109,19 +109,79 @@ limits at 1 vCPU / 512 MiB to absorb burst.
## Logging
-Structured JSON via slog, written to stderr. Override the level via
-`LOG_LEVEL=debug|info|warn|error`. Every line carries:
+Structured JSON via [slog](https://pkg.go.dev/log/slog), written to
+stderr. Override the level via
+`LOG_LEVEL=debug|info|warn|error`. Two line shapes you'll see most:
+
+**Access log** (one per inbound request, emitted by `AccessLog`
+middleware):
+
+```json
+{
+ "time": "2026-05-11T22:18:03.421Z",
+ "level": "INFO",
+ "msg": "request",
+ "method": "GET",
+ "path": "/v1/whoami",
+ "status": 200,
+ "bytes": 142,
+ "duration_ms": 3,
+ "request_id": "01HXYZ7Q8N5F0VTA9KM3B2P0WJ",
+ "auth_type": "apikey",
+ "subject": "demo-key"
+}
+```
+
+Field reference:
+
+| Field | When present | Source |
+| --- | --- | --- |
+| `time`, `level`, `msg` | Always | `slog` core. |
+| `method`, `path`, `status`, `bytes`, `duration_ms` | Always | `pkg/httpmw.AccessLog`. |
+| `request_id` | Always | Preserved from `X-Request-Id` if the caller set one, otherwise a fresh UUID; echoed back on the response. |
+| `auth_type`, `subject` | Only on routes that ran the per-route auth chain (`/v1/*` and the portal API) | Resolved identity holder seeded by `RequestID` and written by `Identity`. Health probes, well-known, and the SPA path are intentionally skipped. |
+
+**Audit-pipeline lines** are emitted by the `AsyncLogger` worker, not
+the request path:
+
+- `audit write failed` (WARN) — a DB write returned an error. Includes
+ `method`, `path`, `err`.
+- `audit buffer full; dropping events` (WARN) — emitted at the 1st,
+ 1001st, 2001st, … drop with the cumulative `dropped_total`. If you
+ see this regularly, raise the buffer size or scale Postgres.
+
+### Correlating one request across systems
-- `time` (RFC 3339 nano).
-- `level`, `msg`.
-- `method`, `path`, `status`, `bytes`, `duration_ms` for request lines.
-- `request_id` for traceability (generated or preserved from `X-Request-Id`).
-- `auth_type`, `subject` when the identity middleware ran.
+`request_id` is the join key:
+
+1. Caller sends `X-Request-Id: ` (or doesn't — api-test will mint
+ one and put it on the response).
+2. api-test echoes `X-Request-Id` on the response.
+3. The access-log line carries the same `request_id`.
+4. The `audit_events` row stores it in column `request_id`.
+
+In Plexara's own audit log, look up the same `request_id` to see what
+the gateway forwarded vs. what the upstream received.
+
+```sql
+-- Look up one request end-to-end
+SELECT ts, method, path, status, duration_ms, auth_type, subject
+FROM audit_events
+WHERE request_id = '01HXYZ7Q8N5F0VTA9KM3B2P0WJ';
+```
## Metrics
-Prometheus metrics endpoint lands in. Until then, derive metrics
-from the structured access log or query the audit table:
+api-test does not expose a `/metrics` endpoint today, and there are no
+current plans to add one — the audit table is the canonical
+observability surface, and the structured access log covers what
+Prometheus would. Derive request-rate / latency / error-rate metrics
+from either source:
+
+- **Access log** — pipe the JSON lines into your log pipeline and
+ aggregate on `path`, `status`, `duration_ms`, and `auth_type`.
+- **Audit table** — richer (full headers, payload sizes, identity),
+ cheap to query for ad-hoc analysis:
```sql
-- p50/p95 latency, last hour, by endpoint group
diff --git a/docs/operations/gateway-testing.md b/docs/operations/gateway-testing.md
index 7cde449..d91930a 100644
--- a/docs/operations/gateway-testing.md
+++ b/docs/operations/gateway-testing.md
@@ -152,25 +152,30 @@ the redaction policy isn't covering that key.
**Question**: did the gateway recognize the upstream's pagination
cursor?
-api-test exposes one endpoint per cursor style:
-
-- `/v1/page/link` — RFC 5988 `Link: <…>; rel="next"`.
-- `/v1/page/odata` — body field `@odata.nextLink`.
-- `/v1/page/cursor` — body field `next_cursor`.
-- `/v1/page/cursor-camel` — `nextCursor`.
-- `/v1/page/google` — `next_page_token`.
-- `/v1/page/google-camel` — `nextPageToken`.
-- `/v1/page/generic` — `next`.
-- `/v1/page/none` — single page, no cursor (negative test).
-- `/v1/page/mixed` — both Link header AND body cursor (precedence
- test; Link should win).
+api-test exposes one endpoint per cursor style the gateway's pagination
+detector recognizes:
+
+- `/v1/pagination/link` — RFC 5988 `Link: <…>; rel="next"` (also
+ `first`, `prev`, `last`). Paged with `?page=&per_page=`.
+- `/v1/pagination/odata` — OData v4 body field `@odata.nextLink` plus
+ `@odata.count`. Paged with `?$top=&$skip=`.
+- `/v1/pagination/cursor` — opaque base64 cursor in body field
+ `next_cursor`. Paged with `?cursor=&limit=`.
+
+All three slice the same deterministic synthetic dataset
+(`hex(sha256(id)[:8])`), so the items returned for page 2 of the Link
+endpoint should bit-match the items returned by walking the OData or
+cursor endpoint to the same offset. See the
+[Pagination endpoint reference](../endpoints/pagination.md) for the
+full parameter table.
**Assertion**: gateway response envelope's `pagination` field:
-- For each style, the cursor value should match what api-test put on
- the wire.
-- For `/v1/page/none`, `pagination` should be absent or null.
-- For `/v1/page/mixed`, the gateway should prefer the Link header.
+- For each style, the cursor / next-page value should match what
+ api-test put on the wire (no host rewrite, no re-encoding).
+- Item bodies for the same `id` are byte-equal across all three styles.
+- Requesting past the last page returns 400 from api-test; the gateway
+ should surface that, not collapse it into a tool-level error.
## Snapshot fixtures
diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md
new file mode 100644
index 0000000..6b69392
--- /dev/null
+++ b/docs/operations/troubleshooting.md
@@ -0,0 +1,169 @@
+---
+title: Troubleshooting
+description: Common api-test pitfalls and how to diagnose them — 401s, missing audit rows, empty portals, OIDC failures, integration-test flakes.
+---
+
+# Troubleshooting
+
+A list of failure modes that come up often enough to be worth writing
+down. Every entry: symptom, what it actually means, fix.
+
+## 401 Unauthorized everywhere
+
+**Symptom**: every `/v1/*` call returns 401, even with a key that
+worked yesterday.
+
+**Likely causes**:
+
+- `auth.allow_anonymous: false` and the key isn't in any store. Confirm
+ with `make dev-anon` (anonymous mode) — if that works, the issue is
+ the credential, not the wiring.
+- Bad credential, not missing. **A bad key does not fall back to
+ anonymous** even when `allow_anonymous: true`. Send no auth header
+ at all to take the anonymous path. See
+ [Authentication › Anonymous mode](../configuration/auth.md#anonymous-mode).
+- File-store key value is `${APITEST_DEV_KEY}` and the env var is
+ empty. The `${VAR:-default}` interpolation lets you set a fallback;
+ without a `:-`, the literal `${VAR}` survives only if `VAR` is set.
+
+**Diagnose**:
+
+```bash
+curl -i http://localhost:8080/v1/whoami # see WWW-Authenticate header
+curl -i -H "X-API-Key: $APITEST_DEV_KEY" http://localhost:8080/v1/whoami
+```
+
+The `WWW-Authenticate` response header tells you whether api-test saw
+"no credential" (`Bearer realm="api-test"`) or "bad credential"
+(`Bearer realm="api-test", error="invalid_token"`).
+
+## 401 on the portal API only
+
+**Symptom**: the SPA loads, but every `/api/v1/portal/*` request 401s.
+
+**Likely causes**:
+
+- No portal session cookie *and* no API key on the request. The portal
+ requires one or the other. Sign in via OIDC, or paste an API key on
+ the sign-in screen.
+- `portal.cookie_secure: true` over plain HTTP. The browser refuses to
+ send a `Secure` cookie back to a non-TLS endpoint. Either run behind
+ TLS or flip the flag off for local dev.
+- `portal.cookie_secret` is empty. The session store fails to start
+ cleanly with no secret; check the boot log for `session store:`.
+
+## /portal/ returns `{"status":"banner"...}` instead of the SPA
+
+**Symptom**: visiting `/portal/` in a browser shows raw JSON, not the
+React UI.
+
+**Cause**: `internal/ui/dist/` only contains `.gitkeep` — the
+`//go:embed` is empty, so the mux falls back to a stub JSON banner.
+
+**Fix**:
+
+```bash
+make ui # builds ui/dist/ → internal/ui/dist/
+make build # rebuild the binary so the embed picks up the bundle
+```
+
+`make build` (and `make verify`) refuse to build when the embed is
+empty. Bare `go build ./...` does not — that's the path that produces
+this surprise.
+
+## Audit log is empty in the portal
+
+**Symptom**: requests succeed, but `/portal/audit` is empty or
+out-of-date.
+
+**Likely causes**:
+
+- `audit.enabled: false` in config. The shipped `*.dev.yaml` profile
+ has it off; only `*.live.yaml` enables it.
+- `database.url` is empty. With `audit.enabled: true` and no database,
+ the binary fails to start; if it started, you're on the dev config.
+- Health, readiness, well-known, and the portal's own auth flow are
+ **intentionally** skipped — they don't generate audit rows. Only
+ `/v1/*` requests do.
+- The async buffer dropped the events. Check the binary's stderr for
+ `audit buffer full; dropping events`. Default depth is 4096; raise
+ it if you hit sustained drop warnings.
+
+## OIDC login redirects loop
+
+**Symptom**: the IdP redirects back to api-test, which redirects back
+to the IdP, repeatedly.
+
+**Likely causes**:
+
+- `oidc.issuer` mismatches the IdP's actual issuer claim. Visit
+ `${issuer}/.well-known/openid-configuration` and confirm the
+ `issuer` field in the response matches the config exactly (including
+ trailing slash).
+- `oidc.audience` doesn't match the IdP's token `aud` claim. Decode a
+ token at [jwt.io](https://jwt.io) and compare.
+- Clock skew. `oidc.clock_skew_seconds` defaults to 30; if the binary
+ and the IdP disagree by more, validation fails with `exp` or `nbf`
+ errors. Check the binary log for `oidc:` warnings.
+
+## `make integration` hangs or times out
+
+**Symptom**: integration suite stalls at "starting postgres container."
+
+**Likely causes**:
+
+- Docker isn't running. The `make integration` target gates on
+ `docker info`; if you see a hang, you started Docker after the
+ target gate ran.
+- Resource limits on the Docker VM. `testcontainers` pulls
+ `postgres:16-alpine` (~250 MiB) and needs ~512 MiB free.
+- Ryuk (the testcontainers reaper) is being blocked by a corporate
+ proxy. Set `TESTCONTAINERS_RYUK_DISABLED=true` if you trust your
+ own cleanup, or whitelist `quay.io/testcontainers/ryuk`.
+
+## `make verify` passes locally but CI fails
+
+**Symptom**: green `make verify`, red CI.
+
+**First check**: pinned tool versions in `Makefile`
+(`GOLANGCI_LINT_VERSION`, `GOSEC_VERSION`, `SEMGREP_VERSION`) must
+match the versions in `.github/workflows/ci.yml`. CI installs from
+those refs, the Makefile installs to `bin/tools/`. Drift = different
+outcomes.
+
+**Second check**: `semgrep` is the most likely culprit. The Makefile
+warns on version drift but doesn't fail — if CI uses a newer rule
+set, it can flag code the local pinned version accepts. Run
+`pipx install --force semgrep==` to align.
+
+**Third check**: integration tests sometimes flake on `docker compose
+up` race conditions in CI. Re-run; if it persists, it's a real bug.
+
+## Plexara can't reach api-test
+
+**Symptom**: Plexara connection registration succeeds, but invoking
+the connection returns "upstream unreachable."
+
+**Likely causes**:
+
+- `server.base_url` doesn't match the actual reachable URL. Plexara
+ uses this for redirect and OpenAPI server URLs; if it points at
+ `localhost` while Plexara is in a different network namespace,
+ every redirect breaks.
+- TLS: api-test is plain HTTP behind a TLS-terminating LB and the
+ Plexara connection is configured `https://...`. Check that the LB
+ is actually forwarding to api-test.
+- Health probe disabled. Plexara may pre-flight `/healthz`; if you
+ blocked that path in front of api-test, the connection looks dead
+ even when `/v1/*` would work.
+
+## When in doubt
+
+- The [Architecture](../reference/architecture.md) diagrams document
+ the exact request flow.
+- The audit log (when enabled) is the source of truth for what
+ api-test actually saw — query it before assuming the gateway is at
+ fault.
+- File an issue with the binary's startup log (config, "listening",
+ any WARN/ERROR lines) at
+ .
diff --git a/docs/reference/architecture.md b/docs/reference/architecture.md
index ec91d23..ca079b2 100644
--- a/docs/reference/architecture.md
+++ b/docs/reference/architecture.md
@@ -119,6 +119,11 @@ matches.
## Audit pipeline
+For the schema, retention model, redaction rules, and query patterns,
+see [Audit log](../operations/audit.md). This section is just the
+data-flow.
+
+
```mermaid
flowchart LR
Handler["Request handler"] -->|Event{}| Async["AsyncLogger
(buffered channel)"]
diff --git a/docs/reference/releases.md b/docs/reference/releases.md
index bc65275..fd73c3b 100644
--- a/docs/reference/releases.md
+++ b/docs/reference/releases.md
@@ -39,9 +39,9 @@ The current milestone status:
| --- | --- | --- |
| M1 | HTTP fixture skeleton; identity / data / failure / echo groups | Released |
| M2 | DB + audit + non-OAuth inbound auth (file API key, bearer, DB key) | Released |
-| M3 | Portal + React SPA + Keycloak + OIDC JWT validator + browser OIDC PKCE | In progress |
-| M4 | In-tree OpenAPI generator + remaining groups (pagination, methods, security, export, streaming) | Planned |
-| M5 | Docs + CI + goreleaser + k8s examples + plexara-connection.yaml | Planned |
+| M3 | Portal + React SPA + Keycloak + OIDC JWT validator + browser OIDC PKCE | Released |
+| M4 | In-tree OpenAPI generator + remaining groups (pagination, methods, security, export, streaming) | Released |
+| M5 | Docs site + CI + goreleaser + Kubernetes examples + plexara-connection.yaml | In progress |
## Where to find releases
@@ -74,3 +74,28 @@ For a `vX` → `v(X+1)` major upgrade:
2. Test against staging first.
3. Plan a maintenance window if the audit schema migration needs to
rebuild indexes.
+
+## Breaking changes
+
+Each major and minor release records its breaking changes in the
+GitHub release notes — that's the canonical registry. For the pre-1.0
+era, individual milestone PRs (M1–M5) are the source of truth; the
+[git log on `main`](https://github.com/plexara/api-test/commits/main)
+records every config-key rename and schema migration. Migrations are
+forward-only, so a fresh deploy off `main` always boots; the friction
+is in-place upgrades against an existing audit history.
+
+Specifically watch for:
+
+- **Config-key renames** — `pkg/config` rejects unknown keys silently
+ (YAML deserialization is lenient). After an upgrade, grep your
+ config against [`configs/api-test.example.yaml`](https://github.com/plexara/api-test/blob/main/configs/api-test.example.yaml)
+ to confirm every key you set still exists.
+- **Audit schema** — backed by [golang-migrate](https://github.com/golang-migrate/migrate).
+ Migrations are forward-only and run automatically on boot; rolling
+ back to a prior binary is **not** safe without first restoring the
+ schema manually.
+- **Endpoint paths** — moving an endpoint between groups would
+ invalidate any audit-row analytics that filter on
+ `endpoint_group`. Endpoint paths are stable post-1.0; pre-1.0 path
+ changes are called out in the relevant PR.
diff --git a/mkdocs.yml b/mkdocs.yml
index 4921f1a..fab89ad 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -86,6 +86,7 @@ nav:
- Portal: operations/portal.md
- Deployment: operations/deployment.md
- Testing a Gateway: operations/gateway-testing.md
+ - Troubleshooting: operations/troubleshooting.md
- Reference:
- HTTP API: reference/http-api.md
- Architecture: reference/architecture.md