From d09c032a6212d21f36df4e4663a86a2c234bd0ce Mon Sep 17 00:00:00 2001 From: Jeff Masud Date: Tue, 12 May 2026 00:30:00 -0700 Subject: [PATCH] docs: refresh README + docs, add troubleshooting page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - README: badges, prereqs, portal screenshot; drop "coming in later milestones" roadmap (streaming/pagination/methods/security/export are all shipped) - docs/operations/troubleshooting.md: new page covering 401s, portal JSON-stub fallback, empty audit log, OIDC redirect loops, testcontainers hangs - fix stale /v1/page/* paths in gateway-testing.md (real routes are /v1/pagination/{link,odata,cursor}) - releases.md: mark M3/M4 released, M5 in progress; add Breaking Changes pointer - deployment.md: complete dangling "Prometheus metrics endpoint lands in." sentence (no /metrics planned); add sample access-log line, field-source table, request_id correlation guidance - configuration/reference.md: clarify auth.require_for_api / auth.require_for_portal as loaded-but-unused; add note on what actually gates /v1/* / portal / health surfaces - expand methods.md and security.md with response schemas, "what to assert" tables, and audit-log perspective - cross-link architecture↔audit, configuration/authβ†’identity, operations/audit↔deployment/architecture - quickstart.md, overview.md: drop stale "M3 hasn't landed" framing - mkdocs.yml: wire troubleshooting into Operations nav --- README.md | 106 ++++++++++++++---- docs/configuration/auth.md | 15 +++ docs/configuration/reference.md | 21 +++- docs/endpoints/methods.md | 114 +++++++++++++++++-- docs/endpoints/overview.md | 10 +- docs/endpoints/security.md | Bin 2968 -> 7778 bytes docs/getting-started/overview.md | 6 +- docs/getting-started/quickstart.md | 44 ++++---- docs/operations/audit.md | 5 +- docs/operations/deployment.md | 78 +++++++++++-- docs/operations/gateway-testing.md | 37 ++++--- docs/operations/troubleshooting.md | 169 +++++++++++++++++++++++++++++ docs/reference/architecture.md | 5 + docs/reference/releases.md | 31 +++++- mkdocs.yml | 1 + 15 files changed, 551 insertions(+), 91 deletions(-) create mode 100644 docs/operations/troubleshooting.md diff --git a/README.md b/README.md index 386d77e..40a5866 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,16 @@ # api-test +[![CI](https://github.com/plexara/api-test/actions/workflows/ci.yml/badge.svg)](https://github.com/plexara/api-test/actions/workflows/ci.yml) +[![CodeQL](https://github.com/plexara/api-test/actions/workflows/codeql.yml/badge.svg)](https://github.com/plexara/api-test/actions/workflows/codeql.yml) +[![Go Reference](https://pkg.go.dev/badge/github.com/plexara/api-test.svg)](https://pkg.go.dev/github.com/plexara/api-test) +[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) + A controllable HTTP REST fixture used to exercise API gateways (Plexara's in particular). Sister project to [mcp-test](../mcp-test), which plays the same role for the MCP gateway. +πŸ“– **Documentation: ** + ## Why Plexara MCP exposes two gateway capabilities: @@ -17,9 +24,29 @@ Plexara MCP exposes two gateway capabilities: `api-test` is the upstream HTTP fixture the API gateway calls. Endpoints are deliberately simple and deterministic; their job is not to compute anything useful, it's to make the gateway's behavior observable. Every -request will be recorded in a Postgres-backed audit log so you can -compare what a client sent through Plexara, what reached this server, and -what came back. +request is recorded in a Postgres-backed audit log so you can compare +what a client sent through Plexara, what reached this server, and what +came back. + +### Why not httpbin / mockoon / Prism? + +Those are great mocks. api-test is a different shape: + +- **Audit log of every request** β€” sanitized headers, query params, + request and response bodies, identity, latency, status β€” queryable in + Postgres and browsable from the embedded portal. Mocks tell you they + served a request; api-test tells you *what the gateway sent*. +- **Real inbound auth** β€” file API keys, bcrypt-hashed Postgres-backed + keys, static bearer tokens, and OIDC JWT validation. Mocks let + anything through; api-test rejects bad credentials the way a real + upstream does. +- **Gateway-specific endpoint groups** β€” one endpoint per pagination + cursor style the gateway recognizes; one endpoint per security probe + the gateway should reject; failure modes with seeded determinism so + retry/timeout tests are reproducible. +- **In-tree OpenAPI** β€” every route is published at `/openapi.json`, + generated from the same metadata the portal uses, so the gateway's + `api_list_endpoints` tool sees an exact contract. ## Endpoint groups @@ -33,12 +60,39 @@ what came back. for retry/timeout policy testing. - **echo** β€” `ANY /v1/echo`. Generic catch-all that returns the request verbatim (with auth headers redacted). - -Coming in later milestones: streaming (chunked, SSE, NDJSON), pagination -(Link, OData, cursor variants), method matrix, security probes, export -(large/long-running targets for `api_export`), the OpenAPI document, -inbound auth (bearer/api_key/OAuth2), audit log, web portal, mkdocs -site, and CI/release tooling. +- **streaming** β€” chunked, SSE, and NDJSON variants for stream-proxy + testing. +- **pagination** β€” RFC 5988 Link, OData v4, and opaque-cursor styles; + same synthetic dataset under all three so cross-style assertions are + bit-equal. +- **methods** β€” HTTP method matrix (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS) + for verifying method pass-through. +- **security** β€” SSRF / path-traversal / admin-prefix probes the + gateway should reject upstream. +- **export** β€” long-running / large-payload targets for the gateway's + `api_export` tool. + +## Web portal + +A React SPA embedded into the binary (`/portal/`) gives you a browsable +audit log with filters, charts, request/response payload drawers, an +endpoint catalog, and a Try-It form that proxies through the same auth +chain as `/v1/*`. Sign in with OIDC or paste an API key. + +![Portal audit list](docs/images/portal/audit-light.png) + +## Prerequisites + +- **Go 1.26+** for `go run` / `make build`. +- **Docker** for the full stack (Postgres + Keycloak via + `docker-compose.dev.yml`) and for the integration test suite + (testcontainers). +- **pnpm + Node 20+** only if you change the SPA (`make ui`). The + pre-built bundle ships embedded; you can run the binary without + Node installed. + +Postgres is **optional**: `make dev-anon` runs the binary with no DB, +no audit, no portal β€” fastest loop for endpoint work. ## Quickstart @@ -51,29 +105,41 @@ curl -s http://localhost:8080/v1/status/418 curl -s -X POST http://localhost:8080/v1/echo -H 'Content-Type: application/json' -d '{"hi":1}' ``` -`make dev-anon` does the same. `make build` produces `./bin/api-test`. +- `make dev-anon` β€” same thing, anonymous mode, Postgres-only (no Keycloak). +- `make dev` β€” full stack: Postgres + Keycloak + portal. First run writes + `.env.dev` with random API-key + cookie secrets (gitignored, reused). +- `make build` β€” produces `./bin/api-test`. ## Tests ```bash -go test ./... # unit + in-memory tests; no Docker required -make test # alias: go test -race -count=1 ./... -make verify # CI-equivalent: fmt, vet, test, lint, security, coverage gate +make test # unit + in-memory; go test -race -count=1 ./... +make integration # //go:build integration; testcontainers Postgres (needs Docker) +make verify # CI-equivalent gate: fmt + vet + lint + test + security + coverage + codeql ``` -Integration tests requiring testcontainers Postgres land in. +`make verify` is the single source of truth for "is this tree +shippable" β€” same commands CI runs. The pre-commit hook reads +`.claude/.last-verify-passed` (written only after a full pass). ## Layout ``` cmd/api-test # binary entry -internal/server # composition root (config + endpoints + httpsrv) -pkg/build # version metadata stamped at link time +internal/server # composition root (config + endpoints + httpsrv + portal) +internal/ui # //go:embed all:dist β€” SPA bundle +pkg/apikeys # Postgres-backed bcrypt API keys +pkg/audit # Event/Payload model, AsyncLogger, in-memory + Postgres stores +pkg/auth/inbound # APIKey / Bearer / OIDC authenticators + Chain pkg/config # YAML loader + ${VAR:-default} env interpolation -pkg/endpoints # Endpoints interface + registry -pkg/endpoints/{...} # one package per group (identity, data, failure, echo) -pkg/httpsrv # HTTP mux composition + health/readiness + CORS -configs/ # *.dev.yaml, *.live.yaml, *.example.yaml +pkg/database # pgxpool wrapper + golang-migrate runner +pkg/endpoints/{...} # identity, data, failure, echo, streaming, pagination, + # methods, security, export +pkg/httpmw # RequestID, AccessLog, Identity, Audit middleware +pkg/httpsrv # mux composition + portal API + SPA serving + health +pkg/oapi # in-tree OpenAPI 3.x generator (reflection-based) +configs/ # *.dev.yaml (anon), *.live.yaml (full), *.example.yaml +ui/ # React + Vite + Tailwind portal source ``` ## License diff --git a/docs/configuration/auth.md b/docs/configuration/auth.md index 89b368c..687d406 100644 --- a/docs/configuration/auth.md +++ b/docs/configuration/auth.md @@ -24,6 +24,12 @@ chain returns 401 immediately. This prevents accidental cross-mode matches (a typo'd JWT shouldn't accidentally pass the static-bearer list). +To verify the chain end-to-end, hit +[`GET /v1/whoami`](../endpoints/identity.md#whoami) β€” it echoes the +resolved `auth_type` and `subject`, so you can confirm the credential +the gateway is actually sending. The auth pipeline diagram and the +data-flow notes live in [Architecture β€Ί Auth chain](../reference/architecture.md#auth-chain). + ## File API keys Simplest, no DB required. @@ -153,6 +159,15 @@ safe to run with anonymous + a few static keys: clients that send a valid key get their identity, clients that send nothing get anonymous, clients that send a bad key get 401. +!!! warning "Don't expect bad-credential demotion" + `allow_anonymous: true` is **not** "let anything in." A typo'd API + key, an expired bearer token, or a JWT signed by the wrong key all + still return 401. The anonymous fallback only fires when there is + no credential header at all. If you want to allow truly + unauthenticated callers from a script while still allowing keyed + callers, make sure the script sends no `X-API-Key` or + `Authorization` header β€” not a placeholder. + ## Portal browser login The portal uses a standard OIDC PKCE flow: hit `/portal/`, redirect to diff --git a/docs/configuration/reference.md b/docs/configuration/reference.md index 19e089c..eab8e8e 100644 --- a/docs/configuration/reference.md +++ b/docs/configuration/reference.md @@ -32,9 +32,24 @@ This page is the human-friendly tour. | Key | Default | Description | | --- | --- | --- | -| `allow_anonymous` | `false` | Falls back to anonymous identity when no inbound credential matches. | -| `require_for_api` | `false` | **Reserved** for per-surface gating; the inbound chain is currently gated by `allow_anonymous` alone. The shipped `live.yaml` opts in to `true`. | -| `require_for_portal` | `false` | **Reserved**, same shape. The shipped `live.yaml` opts in to `true`. | +| `allow_anonymous` | `false` | Falls back to anonymous identity when no inbound credential matches. The single switch that today gates **all** unauthenticated access β€” see note below. | +| `require_for_api` | `false` | **Loaded but not yet wired.** Intended for per-surface gating once the API and portal can require auth independently. The shipped `live.yaml` opts in to `true` so existing configs keep working when the gate lands. | +| `require_for_portal` | `false` | **Loaded but not yet wired**, same shape as above. | + +!!! note "What actually gates each surface today" + - **`/v1/*`** β€” the [inbound auth chain](auth.md). When + `allow_anonymous: false`, every endpoint requires a credential + (API key, bearer, or OIDC); a missing credential returns 401. + When `allow_anonymous: true`, missing credentials get an + anonymous identity, but a *bad* credential still 401s. + - **`/portal/`** β€” `portal.enabled` mounts it; the SPA itself is + reachable without a session, but the portal API + (`/api/v1/portal/*`) requires a session cookie or an API key. + Sign in via OIDC (`oidc.enabled`) or paste an API key from the + file/DB store on the portal sign-in screen. + - **Health and well-known** (`/healthz`, `/readyz`, + `/.well-known/*`) β€” never gated; live outside both the auth + chain and the audit middleware. ## `api_keys` diff --git a/docs/endpoints/methods.md b/docs/endpoints/methods.md index fc26d5b..1562c7b 100644 --- a/docs/endpoints/methods.md +++ b/docs/endpoints/methods.md @@ -12,35 +12,127 @@ HTTP method survives the proxy hop unchanged. | --- | --- | --- | | `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `HEAD`, `OPTIONS` | `/v1/method/echo` | `{ "method": "POST", "path": "/v1/method/echo", "query": {...} }` | -`HEAD` returns headers only (per RFC 7231). `OPTIONS` returns the body -plus an `Allow` header listing every supported verb. +The same handler serves every verb. The supported matrix is fixed at +seven verbs β€” the gateway-testing patterns these probe are mostly +about the seven common-case verbs surviving proxy traversal. -`CONNECT` and `TRACE` are not registered; Go's `http.ServeMux` answers -them with `405 Method Not Allowed` because other verbs are registered -for the same path. +## Response shape + +```json +{ + "method": "PATCH", + "path": "/v1/method/echo", + "query": { "foo": ["1", "2"] } +} +``` + +| Field | Notes | +| --- | --- | +| `method` | The verb the server observed. Should equal the verb the client sent β€” that's the assertion. | +| `path` | Always `/v1/method/echo`. Confirms the gateway didn't rewrite the path along with the method. | +| `query` | The parsed query string. `omitempty` β€” absent on requests with no query. Use this to assert the gateway preserved query params under unusual verbs (e.g., a `DELETE` with a `?reason=...` parameter). | + +Two verbs have special-case behavior: + +- **`HEAD`** β€” returns headers only; the body is suppressed at the HTTP + layer (Go's `http.ResponseWriter` automatically discards the body on + `HEAD`). The response is byte-equivalent to the `GET` headers + otherwise. Use `curl -I` or `curl -is | head -1` to inspect. +- **`OPTIONS`** β€” returns the body plus an `Allow` header listing every + supported verb: + + ```http + HTTP/1.1 200 OK + Allow: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS + Content-Type: application/json + ``` + + Useful to confirm the gateway forwards `OPTIONS` rather than + intercepting it for CORS handling. + +## Unregistered verbs + +`CONNECT`, `TRACE`, `LINK`, `UNLINK`, `PROPFIND`, and other less-common +verbs are not registered. Go's `http.ServeMux` answers them with +`405 Method Not Allowed` because *other* verbs are registered for the +same path. The 405 itself is informative β€” it tells you the gateway +forwarded the request, just to a path that doesn't accept that verb. + +```bash +curl -is -X CONNECT http://localhost:8080/v1/method/echo | head -1 +# HTTP/1.1 405 Method Not Allowed +``` + +If the gateway *blocks* `CONNECT`/`TRACE` upstream (most should), you +won't see a 405 β€” you'll see whatever the gateway returns for a +blocked verb. That's also a useful signal. ## Examples ```bash +# Verb preservation curl -s -X PATCH http://localhost:8080/v1/method/echo # {"method":"PATCH","path":"/v1/method/echo"} +# HEAD: headers only, no body curl -is -X HEAD http://localhost:8080/v1/method/echo | head -1 # HTTP/1.1 200 OK +# OPTIONS: body + Allow header curl -is -X OPTIONS http://localhost:8080/v1/method/echo # HTTP/1.1 200 OK # Allow: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS +# Content-Type: application/json +# ... +# {"method":"OPTIONS","path":"/v1/method/echo"} + +# Query preservation under non-GET +curl -s -X DELETE 'http://localhost:8080/v1/method/echo?reason=cleanup&id=42' +# {"method":"DELETE","path":"/v1/method/echo","query":{"id":["42"],"reason":["cleanup"]}} -curl -s -X CONNECT http://localhost:8080/v1/method/echo -# (405 Method Not Allowed) +# Unregistered verb +curl -is -X CONNECT http://localhost:8080/v1/method/echo | head -1 +# HTTP/1.1 405 Method Not Allowed ``` +## What to assert + +For a gateway proxying api-test: + +| Assertion | Means | +| --- | --- | +| Response `method` equals the client's verb | Gateway preserved the verb verbatim. | +| Response `query` matches the client's query string | Gateway didn't strip or reorder query params under this verb. | +| `OPTIONS` returns 200 with `Allow` header | Gateway didn't swallow the response inside a CORS pre-flight handler. | +| `HEAD` returns 200 with no body | Gateway didn't substitute a `GET` body on a `HEAD` response. | + +## Audit-log perspective + +Each verb registers as its own `EndpointMeta` (`method_get`, +`method_post`, …). The shared handler means the same Go code services +all seven, but the audit row's `route_name` carries the verb-specific +name, so you can `GROUP BY route_name` to count calls per verb: + +```sql +SELECT route_name, count(*) +FROM audit_events +WHERE endpoint_group = 'methods' + AND ts > now() - interval '1 hour' +GROUP BY route_name +ORDER BY 2 DESC; +``` + +If you expected the client to send 50 `PATCH`es through the gateway and +the count comes back showing 50 `POST`es, the gateway is rewriting the +verb β€” that's exactly the kind of finding this group is built to make +visible. + ## Why this exists Gateway proxies sometimes break verbs in subtle ways: rewriting `PATCH` to `POST` to fit a stricter client library, swallowing `OPTIONS` -pre-flight responses inside a CORS layer, or refusing `HEAD` because -the upstream handler doesn't register it explicitly. This endpoint -exposes every verb at one path so a tester can spot any of those -rewrites with a single curl loop. +pre-flight responses inside a CORS layer, refusing `HEAD` because the +upstream handler doesn't register it explicitly, or stripping query +strings on verbs that "shouldn't have a body so probably shouldn't have +query either." This endpoint exposes every verb at one path so a +tester can spot any of those rewrites with a single curl loop. diff --git a/docs/endpoints/overview.md b/docs/endpoints/overview.md index ffafeb7..07a340c 100644 --- a/docs/endpoints/overview.md +++ b/docs/endpoints/overview.md @@ -33,11 +33,11 @@ call, the body it got back is bit-for-bit predictable. | [Data](data.md) | Deterministic bodies for caching / dedup / size handling. | | [Failure](failure.md) | Controlled error codes, latency, seeded flake. | | [Echo](echo.md) | Generic catch-all that returns the request verbatim. | -| Streaming | Chunked, SSE, NDJSON responses. | -| Pagination | One endpoint per cursor style the gateway recognizes. | -| Methods | Method matrix on `/v1/method/echo`. | -| Security | Probe targets the gateway should refuse to forward. | -| Export | Large/long-running targets exercising `api_export`. | +| [Streaming](streaming.md) | Chunked, SSE, NDJSON responses. | +| [Pagination](pagination.md) | One endpoint per cursor style the gateway recognizes. | +| [Methods](methods.md) | Method matrix on `/v1/method/echo`. | +| [Security](security.md) | Probe targets the gateway should refuse to forward. | +| [Export](export.md) | Large/long-running targets exercising `api_export`. | ## Toggling groups diff --git a/docs/endpoints/security.md b/docs/endpoints/security.md index 8c49c56c9ce008927afe5592c8f104dd720b43b7..2a197681e97423b94df410c13eb88b646e9126f1 100644 GIT binary patch literal 7778 zcmb_hTXGz?70owRfucV|cGxp~$a0w~m&;Zp$B}F*6jMsR)R=C}49sp|)If7Mww%f$ zvWM&=OUX)d?gh{@BT-7F%C^fS`vF|s$2sTTF3VB`6Iy*CU+H@2P1wj)?`GPIM*F%q zt6-eHkT2<))V5hUV}qA|saCp?!O0hwm(OKuZnSKaUFhBoUM^J7YqgO<^$Sf;I~kUG zyj&|QS1JVE+p<%kUeeu8D%;3j&j+t*-MQ=6s&6Kugo)y_56lzK%w5fIO zCfXF1#xA^6R!$@nXklOu3j9cOK_WN;Q;v2+?rM|v>$JECP?26i$x2w%H+ge`<1g^`-y-JrkA&> z9keuTg)I1~B5u0Ev)W<}w{s-yUCvDlcODD4b(#N=u7^M4+T~KK2KKfFCgA6XA=r9< zBE&bcc73nw&~D1w*|}K^JzSgPn$8|0#AYI9y?#4rAN*r7Rj8FI1OAj=_v9#N+YNq) zKXJE{W==oH1##BS_}#5sFL1@fO*m;kw0eRdTxnHPK;^i4a?)Ix78XiM0o6_rr|)?K zp@s%7Z`}gs;Ybaog5OK@<6T1@co{v3^TTSj&};>cZUWr$PX3_7(lzp(d53UOi z5&rrv<$X-D$z&qm;T1GiKe9e2?bk!!egzYGgLm@PtE*=*Atl1t0|ivCI!(DUZn0i! zD=WNL$piz6e2a8nIz)Pk?6Nm74;S!(;ID;gj~v%p&EdXKU!Ze?LEy&&{{e z&Bq*;F^4@H^b?XZCiohlg<^xDSI?fvFV4k0q_WIwo(dl&1m% z2rrh>&8YU&tliM2;wch3fje9phTeLVR7K~yd|shs*l4OsV^>2U{8{U^Edh`n3gD2c zI;>sXzig0?$cD%jX+2pzJUOXisUsZ~)ihUd%qWJnBLX=rjsF_mni+~LfP8ie+CBhi z46Bv%5zLF9hGwA|0I?5a456^t$B&8phC?*(5uS0iPAi;p_sInJx`{BA5Dj%-xV0^| zxpqi+2#?_1o)^el_k=;PY}`WyfiyqLc#Q~vDL|SyL_b%x25x-z*{3R8h>sNaWS5i1 zm?=Kz+MxIe%i4LVK$Kei>DA?ra^{-Ng%DL${p6h$zX&N0I_*7>?BGHkux2qV-#M$J|Pf|eQ(WL?jN780BklYK8A;8&XGmXngtAH!}&YK*FU&&Y@isJPW zXaZZJ1B>E<-M$6zv?JAsLKzg+AzW?^#75%NeoTH7LVoPjIHKK+8~XQ1CQ%9#We{ctLOqnkG{l>)j9VRbQ`o<90=a`ySdN%}b@{o%tD_^rxD1vNc5rc789X1$@9 zy)8s%So_*ey6)R0XGszP1>rBod*$j`V^s1{<+7qa4|stM#nRto)K zEpZj+1A_%ik6uF!phT!LQ6q>u1%E6l))xYhj#GAC>ILu#~#SaM$nL#PPGYUu8*@w*JQ%v&ri=vV(X;isbf$x2S(jWny6$1l_nrBK(a!{B@+n3!5v+v zda%3SEq~NI%lJ378*D)BK6-dOUSolPYv<=DXAkB7fYt2B1gaWr0-ozN%POj*>=AKz zpR&rTp4$n3v?Ii(EJmnyBq?Z*BQzolw?qV_QX(jz69D4V!*y#2WI;RGH1?)->E)7E zLf!)rYTZ!}v^H&Luf8+(9z2ca&OaWHm_}6Pq=6XEcto zb|gT8LmPPyCX3KHK~zxE-l* z|EdRD!1&Pty0|2vD(6EBeMD`wgfcO;_8=}w^;fYg0uC1iT8N^E6nK7j$0HJ?-HgN; zKt&C~p@|@Xre_d^*mxmS=ww%DC4mMr91*KtBiTv!U|EuQ#l3x2-m7|nPLmE^@s=rx zvFvsrLL`oq!bHYyqn|;7E~12i91w$mf}LR$7U7;gMsvGS$zu%Y|N4feL=h{f%z!5# zX1gGrM@q(TI*6}+gN5iUfpj{U8s$wMog3Q_07hCMUnk>{q}&h^zJ*#J3C&==I(2Ky7Ukn*~~J6dI?e+(0aq$0Z(8;GNJ9W8+Zm+^QH-aWjq- zh-=w2lyvo$Z_YH(KvpMrXHZo%S&&aovazzKIWlVsbYB=Hf!pxp7jr!Mho>)&Ch|v( znUdA=5SLF>1@&F4zBe?ji;_CtY3NQ+v-sDS^cQ)>4ybs3^#mgu5)QPBnQu5zKkPz@ z27!d^d@?^yXEO^T8ECFZB)Oop39L0(i}4#tb;N|gW;6tac*!%Go!Tep=ryE%!o!RS zkpJk|jO$4X64&Oz51=@YqsM1ieG!OojN;Jf@krUkmq-mpH)t57104w29XqjPD;Y@K zg@Z$A(b2%$vZE0n&E{(~%_wTJ{=4BS7faxG+o3vs& zX6d0D0tr|W?--Dm#**R$V#8*SNq$79*US7t2yc>_A>|Ue9>c!r!;^C%qRe3+#TS$w zT`{$=a(@)ZLk0A+4Z0i3`f*M+k*~E_D=vy%BFt$|^^f@!y)#ZUp;D>grfa$7f8d2% z84LRfxoN7I+A$uHLBqt4SqM~Yu)toJpM3uSE{tf*(;|@?qA1bRbSuuRIz2gHfr4|B z)EpboI1wj^U_OsHMnjD>K(ipiXm6hIgLxp@2yieNj3keT0W-nO48pR(7Z{Qy{2`7% zKt&CPQiL;D2dB5HykrSzD;dbbrV`B39y~)n_$V_oo?OF+dGe7yf29vu#@@Oc!V4=& zw&oTs$+Rl7AS7{>Kb)mEc_hClLYsNb63n5}{&95j9}7+ox~8hbfw; zzzIpO0mTXL@PQ92U`7xK;*l3|@t!^>(Kx-u)IG18F2xCZ#p5wrGo zp>PR8cvO&QMz{;4DewUiV}85|NGgP(W+37l!os5uR2qJz@3 z3w*s@h#hqXNTD{^Du4t50sI4Dhg#Hkgt3GM5On0?%x@`Q`CDb|b4t$B+$M1674!e?cGQb7fUl;?)ugDeKpTN`0>`Szg%qw z*UsN4oti0)6U-uQ?U^zYD5Wpp8Mx-JYsnSY^sj%6&6eu%v`)$di ztw*<0YrL}>lG-DIGVr8ASMxD7zi$nfySs2w)ZhyQhnaGidMKSu!lSR8A$aHT5ax=T zN=Pg6cDiaI#Bo~@pd`GzSZu1K*65;DV~rV%AJ|lSWz85Aaj{k&oz!5<0*zSr9RSYI zfHhpftR8Z;U<8ce1VPTM${?}67y)_dE1soe+cQ?k%G)TnTHEE>{mS@w@F1@B~v!dMfnWnDQ!10c)yaH|YSO;||HFU}Cl0FAA1DrJGBuAib4 zsYYp9+X8@ul5NX5Z$MR9IwWGtVjv|*T#c|vlH2-eDxjY}o;xlH<73!>uNt+;< z`Vph8gaKC>6B5@>*#(t~Km|4-bTbsKJ7Lvk_8-kdFO4zB*`?N+O=)V-D zIoiI}{JFjNYM&dvV0H7y_S?(eJj``kaDYx*A2>myoI{G|f$$?WiMr1hy#%p4iAX&~ x{(wxC-$3FwY{DAte^k1Ak @@ -80,10 +77,10 @@ EOF go run ./cmd/api-test --config /tmp/api-test-auth.yaml ``` -`make dev-secrets` (already in the Makefile) writes a gitignored -`.env.dev` with random `APITEST_DEV_KEY` / `APITEST_DEV_BEARER` / -`APITEST_COOKIE_SECRET` values; M3's full `make dev` will source it -automatically. +`make dev-secrets` (idempotent β€” only writes if missing) creates a +gitignored `.env.dev` with random `APITEST_DEV_KEY` / +`APITEST_DEV_BEARER` / `APITEST_COOKIE_SECRET` values; `make dev` +sources it automatically. ## Verify it works @@ -123,8 +120,15 @@ query string), or `-H "Authorization: Bearer dev-bearer-1"`. ## Stop the stack -In the foreground binary's terminal: `Ctrl-C`. Once M3 lands, `make -dev-down` will also tear down the compose stack. +In the foreground binary's terminal: `Ctrl-C`. To tear down the +Postgres + Keycloak containers as well: + +```bash +make dev-down # stops containers, keeps volumes (Postgres data persists) +``` + +Add `-v` to the underlying compose command if you want to wipe the +audit history along with the containers. ## Next diff --git a/docs/operations/audit.md b/docs/operations/audit.md index 8586fb9..fe4ed58 100644 --- a/docs/operations/audit.md +++ b/docs/operations/audit.md @@ -15,7 +15,10 @@ The pipeline is async: the request handler enqueues into a buffered channel; a background goroutine drains into Postgres. A stalled DB can never inflate request latency. On a full buffer the event is *dropped* and counted (logged every 1000th drop). For lossless audit, -size the buffer for your peak rate. +size the buffer for your peak rate. See +[Architecture β€Ί Audit pipeline](../reference/architecture.md#audit-pipeline) +for the data-flow diagram, and [Deployment β€Ί Logging](deployment.md#logging) +for how to correlate audit rows with the access log via `request_id`. ## Schema diff --git a/docs/operations/deployment.md b/docs/operations/deployment.md index 85e077e..8fcc40d 100644 --- a/docs/operations/deployment.md +++ b/docs/operations/deployment.md @@ -109,19 +109,79 @@ limits at 1 vCPU / 512 MiB to absorb burst. ## Logging -Structured JSON via slog, written to stderr. Override the level via -`LOG_LEVEL=debug|info|warn|error`. Every line carries: +Structured JSON via [slog](https://pkg.go.dev/log/slog), written to +stderr. Override the level via +`LOG_LEVEL=debug|info|warn|error`. Two line shapes you'll see most: + +**Access log** (one per inbound request, emitted by `AccessLog` +middleware): + +```json +{ + "time": "2026-05-11T22:18:03.421Z", + "level": "INFO", + "msg": "request", + "method": "GET", + "path": "/v1/whoami", + "status": 200, + "bytes": 142, + "duration_ms": 3, + "request_id": "01HXYZ7Q8N5F0VTA9KM3B2P0WJ", + "auth_type": "apikey", + "subject": "demo-key" +} +``` + +Field reference: + +| Field | When present | Source | +| --- | --- | --- | +| `time`, `level`, `msg` | Always | `slog` core. | +| `method`, `path`, `status`, `bytes`, `duration_ms` | Always | `pkg/httpmw.AccessLog`. | +| `request_id` | Always | Preserved from `X-Request-Id` if the caller set one, otherwise a fresh UUID; echoed back on the response. | +| `auth_type`, `subject` | Only on routes that ran the per-route auth chain (`/v1/*` and the portal API) | Resolved identity holder seeded by `RequestID` and written by `Identity`. Health probes, well-known, and the SPA path are intentionally skipped. | + +**Audit-pipeline lines** are emitted by the `AsyncLogger` worker, not +the request path: + +- `audit write failed` (WARN) β€” a DB write returned an error. Includes + `method`, `path`, `err`. +- `audit buffer full; dropping events` (WARN) β€” emitted at the 1st, + 1001st, 2001st, … drop with the cumulative `dropped_total`. If you + see this regularly, raise the buffer size or scale Postgres. + +### Correlating one request across systems -- `time` (RFC 3339 nano). -- `level`, `msg`. -- `method`, `path`, `status`, `bytes`, `duration_ms` for request lines. -- `request_id` for traceability (generated or preserved from `X-Request-Id`). -- `auth_type`, `subject` when the identity middleware ran. +`request_id` is the join key: + +1. Caller sends `X-Request-Id: ` (or doesn't β€” api-test will mint + one and put it on the response). +2. api-test echoes `X-Request-Id` on the response. +3. The access-log line carries the same `request_id`. +4. The `audit_events` row stores it in column `request_id`. + +In Plexara's own audit log, look up the same `request_id` to see what +the gateway forwarded vs. what the upstream received. + +```sql +-- Look up one request end-to-end +SELECT ts, method, path, status, duration_ms, auth_type, subject +FROM audit_events +WHERE request_id = '01HXYZ7Q8N5F0VTA9KM3B2P0WJ'; +``` ## Metrics -Prometheus metrics endpoint lands in. Until then, derive metrics -from the structured access log or query the audit table: +api-test does not expose a `/metrics` endpoint today, and there are no +current plans to add one β€” the audit table is the canonical +observability surface, and the structured access log covers what +Prometheus would. Derive request-rate / latency / error-rate metrics +from either source: + +- **Access log** β€” pipe the JSON lines into your log pipeline and + aggregate on `path`, `status`, `duration_ms`, and `auth_type`. +- **Audit table** β€” richer (full headers, payload sizes, identity), + cheap to query for ad-hoc analysis: ```sql -- p50/p95 latency, last hour, by endpoint group diff --git a/docs/operations/gateway-testing.md b/docs/operations/gateway-testing.md index 7cde449..d91930a 100644 --- a/docs/operations/gateway-testing.md +++ b/docs/operations/gateway-testing.md @@ -152,25 +152,30 @@ the redaction policy isn't covering that key. **Question**: did the gateway recognize the upstream's pagination cursor? -api-test exposes one endpoint per cursor style: - -- `/v1/page/link` β€” RFC 5988 `Link: <…>; rel="next"`. -- `/v1/page/odata` β€” body field `@odata.nextLink`. -- `/v1/page/cursor` β€” body field `next_cursor`. -- `/v1/page/cursor-camel` β€” `nextCursor`. -- `/v1/page/google` β€” `next_page_token`. -- `/v1/page/google-camel` β€” `nextPageToken`. -- `/v1/page/generic` β€” `next`. -- `/v1/page/none` β€” single page, no cursor (negative test). -- `/v1/page/mixed` β€” both Link header AND body cursor (precedence - test; Link should win). +api-test exposes one endpoint per cursor style the gateway's pagination +detector recognizes: + +- `/v1/pagination/link` β€” RFC 5988 `Link: <…>; rel="next"` (also + `first`, `prev`, `last`). Paged with `?page=&per_page=`. +- `/v1/pagination/odata` β€” OData v4 body field `@odata.nextLink` plus + `@odata.count`. Paged with `?$top=&$skip=`. +- `/v1/pagination/cursor` β€” opaque base64 cursor in body field + `next_cursor`. Paged with `?cursor=&limit=`. + +All three slice the same deterministic synthetic dataset +(`hex(sha256(id)[:8])`), so the items returned for page 2 of the Link +endpoint should bit-match the items returned by walking the OData or +cursor endpoint to the same offset. See the +[Pagination endpoint reference](../endpoints/pagination.md) for the +full parameter table. **Assertion**: gateway response envelope's `pagination` field: -- For each style, the cursor value should match what api-test put on - the wire. -- For `/v1/page/none`, `pagination` should be absent or null. -- For `/v1/page/mixed`, the gateway should prefer the Link header. +- For each style, the cursor / next-page value should match what + api-test put on the wire (no host rewrite, no re-encoding). +- Item bodies for the same `id` are byte-equal across all three styles. +- Requesting past the last page returns 400 from api-test; the gateway + should surface that, not collapse it into a tool-level error. ## Snapshot fixtures diff --git a/docs/operations/troubleshooting.md b/docs/operations/troubleshooting.md new file mode 100644 index 0000000..6b69392 --- /dev/null +++ b/docs/operations/troubleshooting.md @@ -0,0 +1,169 @@ +--- +title: Troubleshooting +description: Common api-test pitfalls and how to diagnose them β€” 401s, missing audit rows, empty portals, OIDC failures, integration-test flakes. +--- + +# Troubleshooting + +A list of failure modes that come up often enough to be worth writing +down. Every entry: symptom, what it actually means, fix. + +## 401 Unauthorized everywhere + +**Symptom**: every `/v1/*` call returns 401, even with a key that +worked yesterday. + +**Likely causes**: + +- `auth.allow_anonymous: false` and the key isn't in any store. Confirm + with `make dev-anon` (anonymous mode) β€” if that works, the issue is + the credential, not the wiring. +- Bad credential, not missing. **A bad key does not fall back to + anonymous** even when `allow_anonymous: true`. Send no auth header + at all to take the anonymous path. See + [Authentication β€Ί Anonymous mode](../configuration/auth.md#anonymous-mode). +- File-store key value is `${APITEST_DEV_KEY}` and the env var is + empty. The `${VAR:-default}` interpolation lets you set a fallback; + without a `:-`, the literal `${VAR}` survives only if `VAR` is set. + +**Diagnose**: + +```bash +curl -i http://localhost:8080/v1/whoami # see WWW-Authenticate header +curl -i -H "X-API-Key: $APITEST_DEV_KEY" http://localhost:8080/v1/whoami +``` + +The `WWW-Authenticate` response header tells you whether api-test saw +"no credential" (`Bearer realm="api-test"`) or "bad credential" +(`Bearer realm="api-test", error="invalid_token"`). + +## 401 on the portal API only + +**Symptom**: the SPA loads, but every `/api/v1/portal/*` request 401s. + +**Likely causes**: + +- No portal session cookie *and* no API key on the request. The portal + requires one or the other. Sign in via OIDC, or paste an API key on + the sign-in screen. +- `portal.cookie_secure: true` over plain HTTP. The browser refuses to + send a `Secure` cookie back to a non-TLS endpoint. Either run behind + TLS or flip the flag off for local dev. +- `portal.cookie_secret` is empty. The session store fails to start + cleanly with no secret; check the boot log for `session store:`. + +## /portal/ returns `{"status":"banner"...}` instead of the SPA + +**Symptom**: visiting `/portal/` in a browser shows raw JSON, not the +React UI. + +**Cause**: `internal/ui/dist/` only contains `.gitkeep` β€” the +`//go:embed` is empty, so the mux falls back to a stub JSON banner. + +**Fix**: + +```bash +make ui # builds ui/dist/ β†’ internal/ui/dist/ +make build # rebuild the binary so the embed picks up the bundle +``` + +`make build` (and `make verify`) refuse to build when the embed is +empty. Bare `go build ./...` does not β€” that's the path that produces +this surprise. + +## Audit log is empty in the portal + +**Symptom**: requests succeed, but `/portal/audit` is empty or +out-of-date. + +**Likely causes**: + +- `audit.enabled: false` in config. The shipped `*.dev.yaml` profile + has it off; only `*.live.yaml` enables it. +- `database.url` is empty. With `audit.enabled: true` and no database, + the binary fails to start; if it started, you're on the dev config. +- Health, readiness, well-known, and the portal's own auth flow are + **intentionally** skipped β€” they don't generate audit rows. Only + `/v1/*` requests do. +- The async buffer dropped the events. Check the binary's stderr for + `audit buffer full; dropping events`. Default depth is 4096; raise + it if you hit sustained drop warnings. + +## OIDC login redirects loop + +**Symptom**: the IdP redirects back to api-test, which redirects back +to the IdP, repeatedly. + +**Likely causes**: + +- `oidc.issuer` mismatches the IdP's actual issuer claim. Visit + `${issuer}/.well-known/openid-configuration` and confirm the + `issuer` field in the response matches the config exactly (including + trailing slash). +- `oidc.audience` doesn't match the IdP's token `aud` claim. Decode a + token at [jwt.io](https://jwt.io) and compare. +- Clock skew. `oidc.clock_skew_seconds` defaults to 30; if the binary + and the IdP disagree by more, validation fails with `exp` or `nbf` + errors. Check the binary log for `oidc:` warnings. + +## `make integration` hangs or times out + +**Symptom**: integration suite stalls at "starting postgres container." + +**Likely causes**: + +- Docker isn't running. The `make integration` target gates on + `docker info`; if you see a hang, you started Docker after the + target gate ran. +- Resource limits on the Docker VM. `testcontainers` pulls + `postgres:16-alpine` (~250 MiB) and needs ~512 MiB free. +- Ryuk (the testcontainers reaper) is being blocked by a corporate + proxy. Set `TESTCONTAINERS_RYUK_DISABLED=true` if you trust your + own cleanup, or whitelist `quay.io/testcontainers/ryuk`. + +## `make verify` passes locally but CI fails + +**Symptom**: green `make verify`, red CI. + +**First check**: pinned tool versions in `Makefile` +(`GOLANGCI_LINT_VERSION`, `GOSEC_VERSION`, `SEMGREP_VERSION`) must +match the versions in `.github/workflows/ci.yml`. CI installs from +those refs, the Makefile installs to `bin/tools/`. Drift = different +outcomes. + +**Second check**: `semgrep` is the most likely culprit. The Makefile +warns on version drift but doesn't fail β€” if CI uses a newer rule +set, it can flag code the local pinned version accepts. Run +`pipx install --force semgrep==` to align. + +**Third check**: integration tests sometimes flake on `docker compose +up` race conditions in CI. Re-run; if it persists, it's a real bug. + +## Plexara can't reach api-test + +**Symptom**: Plexara connection registration succeeds, but invoking +the connection returns "upstream unreachable." + +**Likely causes**: + +- `server.base_url` doesn't match the actual reachable URL. Plexara + uses this for redirect and OpenAPI server URLs; if it points at + `localhost` while Plexara is in a different network namespace, + every redirect breaks. +- TLS: api-test is plain HTTP behind a TLS-terminating LB and the + Plexara connection is configured `https://...`. Check that the LB + is actually forwarding to api-test. +- Health probe disabled. Plexara may pre-flight `/healthz`; if you + blocked that path in front of api-test, the connection looks dead + even when `/v1/*` would work. + +## When in doubt + +- The [Architecture](../reference/architecture.md) diagrams document + the exact request flow. +- The audit log (when enabled) is the source of truth for what + api-test actually saw β€” query it before assuming the gateway is at + fault. +- File an issue with the binary's startup log (config, "listening", + any WARN/ERROR lines) at + . diff --git a/docs/reference/architecture.md b/docs/reference/architecture.md index ec91d23..ca079b2 100644 --- a/docs/reference/architecture.md +++ b/docs/reference/architecture.md @@ -119,6 +119,11 @@ matches. ## Audit pipeline +For the schema, retention model, redaction rules, and query patterns, +see [Audit log](../operations/audit.md). This section is just the +data-flow. + + ```mermaid flowchart LR Handler["Request handler"] -->|Event{}| Async["AsyncLogger
(buffered channel)"] diff --git a/docs/reference/releases.md b/docs/reference/releases.md index bc65275..fd73c3b 100644 --- a/docs/reference/releases.md +++ b/docs/reference/releases.md @@ -39,9 +39,9 @@ The current milestone status: | --- | --- | --- | | M1 | HTTP fixture skeleton; identity / data / failure / echo groups | Released | | M2 | DB + audit + non-OAuth inbound auth (file API key, bearer, DB key) | Released | -| M3 | Portal + React SPA + Keycloak + OIDC JWT validator + browser OIDC PKCE | In progress | -| M4 | In-tree OpenAPI generator + remaining groups (pagination, methods, security, export, streaming) | Planned | -| M5 | Docs + CI + goreleaser + k8s examples + plexara-connection.yaml | Planned | +| M3 | Portal + React SPA + Keycloak + OIDC JWT validator + browser OIDC PKCE | Released | +| M4 | In-tree OpenAPI generator + remaining groups (pagination, methods, security, export, streaming) | Released | +| M5 | Docs site + CI + goreleaser + Kubernetes examples + plexara-connection.yaml | In progress | ## Where to find releases @@ -74,3 +74,28 @@ For a `vX` β†’ `v(X+1)` major upgrade: 2. Test against staging first. 3. Plan a maintenance window if the audit schema migration needs to rebuild indexes. + +## Breaking changes + +Each major and minor release records its breaking changes in the +GitHub release notes β€” that's the canonical registry. For the pre-1.0 +era, individual milestone PRs (M1–M5) are the source of truth; the +[git log on `main`](https://github.com/plexara/api-test/commits/main) +records every config-key rename and schema migration. Migrations are +forward-only, so a fresh deploy off `main` always boots; the friction +is in-place upgrades against an existing audit history. + +Specifically watch for: + +- **Config-key renames** β€” `pkg/config` rejects unknown keys silently + (YAML deserialization is lenient). After an upgrade, grep your + config against [`configs/api-test.example.yaml`](https://github.com/plexara/api-test/blob/main/configs/api-test.example.yaml) + to confirm every key you set still exists. +- **Audit schema** β€” backed by [golang-migrate](https://github.com/golang-migrate/migrate). + Migrations are forward-only and run automatically on boot; rolling + back to a prior binary is **not** safe without first restoring the + schema manually. +- **Endpoint paths** β€” moving an endpoint between groups would + invalidate any audit-row analytics that filter on + `endpoint_group`. Endpoint paths are stable post-1.0; pre-1.0 path + changes are called out in the relevant PR. diff --git a/mkdocs.yml b/mkdocs.yml index 4921f1a..fab89ad 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -86,6 +86,7 @@ nav: - Portal: operations/portal.md - Deployment: operations/deployment.md - Testing a Gateway: operations/gateway-testing.md + - Troubleshooting: operations/troubleshooting.md - Reference: - HTTP API: reference/http-api.md - Architecture: reference/architecture.md