The fedify bench command with its scenario and report schemas#791
The fedify bench command with its scenario and report schemas#791dahlia wants to merge 47 commits into
fedify bench command with its scenario and report schemas#791Conversation
Add the skeleton for a new `fedify bench` subcommand in @fedify/cli that
will run ActivityPub-specific load benchmarks against a cooperative
Fedify target running in benchmark mode.
This first step wires the command into the CLI without the engine:
- Define the Optique `benchCommand` with the suite-file argument and the
--target, --format, --output, --dry-run, and --allow-unsafe-target
options, plus a stub `runBench` that is fleshed out in later steps.
- Register the command in the runner and dispatcher, and add a `bench`
section to the configuration schema.
- Add the `@cfworker/json-schema` (draft 2020-12 validator) and `yaml`
dependencies used by the scenario format, to both deno.json and
package.json.
- Cover argument parsing with tests.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add a lightweight HdrHistogram-style log-linear histogram used by the benchmark engine to record latency samples and compute percentiles with bounded relative error. Values are bucketed by octave and split into linear sub-buckets, so the relative error stays roughly constant across the whole range. The structure is sparse, mergeable, and serializable, which lets percentiles from several runs be re-aggregated without coordinated-omission error and lets the report carry an optional serialized histogram. Sub-bucket indices are derived from the mantissa ratio to avoid denormal underflow, and non-positive samples (including -0) are normalized to the zero bucket. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Add the small, pure building blocks the scenario format is built on:
- `asList()`: scalar-or-list coercion, so fields such as `recipient`,
`seed`, `collection`, and `type` can accept either a single value or a
list while the common single-value case stays terse.
- `parseSize()` / `resolveGenerate()`: typed payload-generation
directives (e.g. `content: { generate: lorem, size: 2KB }`) that
produce deterministic output of an exact byte size, with the size
parser bounded to the safe-integer range.
- A logic-less GitHub-Actions-style `${{ ... }}` template engine
(dotted-path resolution plus whitelisted helper calls). Lookups go
through own properties only, with a denylist for prototype members,
and unclosed delimiters, trailing text, and unbalanced quotes are
rejected rather than silently mishandled, so the format cannot turn
into a programming language.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the `fedify bench` scenario suite format and its published
JSON Schema (draft 2020-12). The format is a suite of `version`,
`target`, `defaults`, `actors`, and `scenarios`, with an `expect` block
per scenario, and it can express every scenario type discussed for the
tool (inbox, webfinger, actor, object, fanout, collection, failure,
mixed) even though only inbox and webfinger will have runners.
Rather than a schema-first single source, the published JSON Schema and
the TypeScript types are maintained as two artifacts kept identical by a
drift guard. Runtime validation uses `@cfworker/json-schema`, and a
validated value is narrowed with an `as unknown as` cast. Three
cross-field rules live in the schema where an editor can flag them:
- exactly one HTTP request signature scheme per actor group
(`contains` + `minContains`/`maxContains`);
- `rate` XOR `concurrency` in a load block (`oneOf`);
- the allowed `expect` metrics per scenario type (`if`/`then` +
`propertyNames`).
The embedded schema object is the editing source; *schema/bench/*
holds the hosted copy, regenerated by *scripts/generate-bench-schema.ts*.
Four guards run as tests: structural/meta validation, example-fixture
validation (valid and invalid fixtures covering every scenario type),
drift between the embedded object and the published file, and git-based
immutability of already-published version files.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the normalization step that turns a schema-validated suite into the
resolved form the engine runs:
- `parseDuration()` and `parseRate()` parse the human-friendly duration
(`30s`) and rate (`200/s`) units into milliseconds and requests per
second, rejecting non-positive and overflowing magnitudes.
- `normalizeSuite()` applies suite defaults, coerces the top-level
scalar-or-list fields to arrays, resolves the target (with a
`--target` override), and determines the open- or closed-loop load
model, inheriting compatible fields such as `arrival` and
`maxInFlight` from the defaults while a scenario's `rate`/
`concurrency` selects the model.
It also enforces the one cross-field rule the JSON Schema cannot express:
the buffered signing modes (`pipeline`, `presign`) pre-sign requests, so
they require the target's signature time window to be off; a
time-windowed target must use `signing: jit`.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the canonical benchmark report: the single result model from which the terminal, JSON, and Markdown renderers all derive, so the outputs can never drift apart. JSON is the canonical machine form, pinned by a published draft-2020-12 schema (schema/bench/report-v1.json). The model splits `client` and `server` numbers by nesting so it is clear which the load generator measured and which came from the target's stats endpoint, bakes the unit into numeric keys (latencyMs, drainMs), turns each expect assertion into an evaluated record, and carries first-class environment/target/configHash reproducibility metadata plus an optional serialized histogram. The report schema is registered alongside the scenario schema, so the existing structural, fixture, drift, and immutability guards now cover it too; a valid report fixture is added. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Turn each scenario's `expect` block into evaluated records that gate a
run. `parseAssertion()` parses a human assertion (">= 99%", "< 100ms",
"< 2s", ">= 500/s", "== 0") into an operator and a machine-clean
threshold: percentages become ratios, durations milliseconds, rates per
second. `evaluateExpect()` looks each metric up by name (successRate,
throughputPerSec, errors.4xx/5xx/total, latency.*, signatureVerification.*,
queueDrain.*), checks the assertion's unit is compatible with the
metric's natural unit, and compares. Equality is tolerant for float
metrics but exact for counts. A `fail`-severity assertion gates the
build while `warn` only annotates, and a missing or unmeasured metric
fails cleanly.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Assemble the canonical report from measured scenario data and render it
in three forms from that single model:
- `buildScenarioResult()`/`buildReport()` turn resolved scenarios and
their measurements into the report, evaluating each `expect` block,
summarizing the load model, and computing the overall gate.
- `detectEnvironment()` and `configHash()` capture the reproducibility
metadata (runtime, OS, CPU count, and a stable sha256 over the
canonicalized configuration, honoring `toJSON()` so URLs hash by
value).
- The JSON renderer is the canonical machine form (pinned by the
report schema); the terminal-text and Markdown renderers derive from
the same model. A shared metric-unit registry keeps the evaluator
and the renderers in agreement, so measured values display in the
metric's own unit while an explicit assertion unit stays visible.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the client-side safety guard and the discovery that finds where to
deliver:
- `classifyTarget()` sorts a target into loopback/private/public from
its host (IP-literal aware, IPv4-mapped IPv6 decoded), conservatively
treating anything it cannot confirm as public.
- `assertTargetAllowed()` lets loopback/private targets and any target
advertising benchmark mode run without friction, and refuses only a
public target that does not advertise benchmark mode unless
--allow-unsafe-target is given (mandatory, with no interactive
prompt); --dry-run bypasses the gate since it only inspects.
- `probeBenchmarkMode()` reads the cooperative `stats` endpoint to
detect benchmark mode and the target's Fedify version, never throwing.
- `discoverInbox()` resolves a handle or actor URI to its personal and
shared inbox the way a remote peer would, building
private-address-allowing loaders for loopback targets, and
`selectInbox()` picks the inbox for the scenario's mode.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Stand up the benchmark's own synthetic remote peer. An author picks signature standards and the key set is derived: HTTP request signatures and LD Signatures share one RSA pair, FEP-8b32 uses an Ed25519 pair. `buildFleet()` expands the actor groups into members with generated keys, and `spawnSyntheticServer()` serves each member as a normal ActivityPub actor document with an embedded `publicKey` and `assertionMethod` over plain loopback HTTP. The target dereferences a signature's keyId during verification, so serving exactly the document a real actor exposes lets verification resolve the key the same way; a fixed actor set keeps this on a cold path a warm-up window excludes. A test confirms the served document parses back into a verifiable actor whose keys resolve. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Sign inbox deliveries reusing the @fedify/fedify signers so the client pays realistic crypto cost. `signInboxDelivery()` applies the FEP-8b32 object proof and the LD Signature to the document, then the HTTP request signature (cavage or rfc9421) to the final body. `createActivityIdMinter()` mints a unique activity id per request, satisfying Fedify's always-on inbox idempotency automatically. `createSigningPipeline()` keeps RSA signing off the send critical path with three lookahead modes: `jit`, `pipeline` (default; background signers keep a bounded buffer filled and buffer starvation surfaces the client as the bottleneck), and `presign`. The pipeline cannot hang on a stuck factory, drops transient sign failures, and fails fast on deterministic ones. Tests verify the produced cavage and rfc9421 requests pass Fedify's own verifyRequest against synthetic-server keys. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Drive load and turn the raw samples into client-side metrics. `runLoad()` supports open-loop (a fixed arrival schedule, with latency measured from each request's scheduled time — the coordinated-omission correction — so a stalled target or maxInFlight backpressure shows up as latency rather than being omitted) and closed-loop (N virtual users). A fair slot-transferring semaphore enforces `maxInFlight` in both models and reports backpressure as the saturation signal; arrivals are a lazy generator (constant or seeded Poisson) and only in-flight dispatches are retained, so memory stays flat on long runs. `aggregateSamples()` excludes warm-up samples and produces request counts, success rate, throughput over the measured window, latency percentiles from the log-linear histogram, and errors grouped by kind, status, and reason. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Wire the engine into runnable scenarios. The stats client reads the cooperative `stats` endpoint and projects the signature-verification histogram and queue depth into the report's server section, robust to malformed snapshots. The inbox runner discovers the recipient inbox, builds a signing factory over the synthetic fleet, drives the signing pipeline and load generator, aggregates the client metrics, and attaches the server metrics; the webfinger runner drives handle-resolution lookups. A registry dispatches by type and reports a clear error for the scenario types that the format expresses but this version does not run. `presign` signing now requires an open-loop load (a closed-loop run has no fixed request count to pre-sign). An end-to-end test stands up a real `benchmarkMode` Fedify federation and confirms signed inbox deliveries verify, the inbox listener runs, and server-side signature-verification metrics are read back. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Implement `runBench`: load, validate, and normalize the suite (any configuration error logs a friendly message and exits 2), preflight the scenario runners so an unsupported type fails fast, classify and probe the target, and apply the safety gate. A `--dry-run` prints the plan and sends nothing. For a real run it builds the synthetic actor server once when a signed scenario needs it, runs each scenario, assembles the report, renders it to the chosen format (stdout or a file), and sets the exit code to 0 when the gate passes and 1 otherwise. The default exit sets `process.exitCode` so cleanup and output flushing finish first. Signed scenarios are refused against a public target, since the synthetic actor server is only reachable on the client's loopback. Dependencies are injectable, and tests cover the passing and failing gates, dry run, the unsafe-target and public-signed refusals, and an invalid suite. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Wire the logic-less `${{ ... }}` template engine into the load pipeline:
`renderSuiteTemplates()` expands templates in a parsed suite with a
context exposing the target (host, hostname, port, origin, href,
protocol) plus the default helpers, and `runBench` runs it between
loading and validation. This is what makes `recipient:
"http://${{ target.host }}/users/alice"` resolve to a concrete URL.
The target comes from `--target` or the suite's own `target`, neither of
which is templated. Tests cover rendering and the end-to-end inbox run
now uses templating.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Extend the benchmarking manual with the client side: a getting-started
scenario suite, the actors and signature-standards model, `${{ }}`
templating, open- and closed-loop load with the signing modes, the
output formats and CI usage, the safety gate, and the http/loopback
caveats. Add the @fedify/cli changelog entry for the new command.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Address four behavioral gaps where the bench engine silently accepted
options it did not actually apply:
- Reject `runs` greater than 1 during normalization. Repeated runs are
not implemented yet, so accepting the field gave a single run while
implying several.
- Fail a scenario that measured zero requests instead of letting every
`expect` assertion pass vacuously, and reject a `warmup` that is not
shorter than the `duration` (which would leave no measured window).
- Reject inbox `activity` options the runner cannot honor. The runner
always delivers a `Create` carrying an embedded `Note`, so a
non-`Create` activity type, a non-`Note` `object.type`, or
`embedObject: false` is now refused up front through a new optional
`validate()` on the runner, called during preflight. Scalar-or-list
type fields are checked in full, not just their first element.
- Implement multi-recipient delivery in the inbox runner: every
recipient's inbox is discovered, and deliveries (with the synthetic
actors that sign them) are rotated across the recipients, modeling a
server receiving from many peers into many local inboxes.
The scenario format and JSON Schema still express these options; only the
inbox/webfinger runners constrain what they execute in this version.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
A malformed `expect` assertion was only parsed while evaluating results, which happens after the entire benchmark load has been sent. Worse, the run loop has no catch around result building, so the resulting AssertionParseError escaped uncaught and crashed the command instead of failing as a configuration error. Add validateExpectBlock(), which parses every assertion in a scenario's `expect` block, and run it in the preflight step (alongside runner validation) before any probe or load. A typo in a CI gate now exits 2 without sending traffic, with a message naming the offending metric. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The cooperative `stats` endpoint is cumulative and has no reset, but the
inbox and webfinger runners read it once at the end, so the reported
server numbers (and any signatureVerification.* expectations) folded in
warm-up traffic and every earlier scenario in the suite. Client samples
were already windowed; the server side was not, so the two disagreed.
Take a server snapshot at the measured-window boundary and diff it
against the end snapshot:
- stats-client.ts gains a raw `ServerSnapshot` (signature histogram and
queue-depth gauge), `parseServerSnapshot`, `diffSnapshots` (subtracts
bucket counts; the gauge is not cumulative, so the end value is kept),
and `snapshotToMetrics`. `fetchServerSnapshot` returns `null` only on
transport or parse failure; an available-but-empty snapshot is
non-null, so an unavailable baseline is never mistaken for an empty
one. Histogram subtraction requires identical bucket boundaries, and
refuses (yields no signature metric) otherwise.
- runner.ts gains `withMeasuredWindowStart`, which gates every measured
send on a one-shot boundary callback so the baseline is captured
before any measured request reaches the target.
- The inbox and webfinger runners snapshot the baseline at the boundary
and report server metrics only when both ends of the window were
captured, instead of falling back to the cumulative snapshot.
A few warm-up requests still in flight at the boundary may be attributed
to the window; a hard drain would distort the coordinated-omission client
latency, so that bounded residue is accepted and documented.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The scenario schema's `load` object required exactly one of `rate` or
`concurrency`, so a block that set only `arrival` or `maxInFlight` and
inherited its load model was rejected before normalization, even though
`resolveLoad()` already supports such partial overrides (inheriting the
model, or falling back to the default open-loop rate).
Relax the constraint to forbid only `rate` and `concurrency` together,
allowing either or neither. This lets a suite write, for example,
`defaults: { load: { maxInFlight: 100 } }` or override just `arrival` on
one scenario. The embedded schema literal and the published
schema/bench/scenario-v1.json are regenerated together (the v1 file is
new on this branch, so it is not yet immutable).
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The synthetic actor/key server bound loopback and advertised `127.0.0.1` actor and key IDs, which the target dereferences to verify HTTP signatures. A same-machine (loopback) target reaches it, but a non-loopback target dereferences its own `127.0.0.1`, fails key lookup, and rejects every signed delivery. The command nonetheless allowed signed scenarios against private targets, so they failed silently. Add a `--advertise-host` option. When set, the synthetic server binds every interface (`0.0.0.0`, or `::` for an IPv6 host) and advertises the given host in its actor, key, and base URLs, so a non-loopback target can dereference them. `resolveAdvertiseHost()` validates the value as a bare host name, IPv4 address, or IPv6 literal (bracketing IPv6 for the URL authority and binding the matching family), rejecting a scheme, port, path, or other URL syntax with a clear configuration error. Signed scenarios are now refused (exit 2) when the target is non-loopback and no `--advertise-host` is given, instead of running and failing on the target. The documentation is updated accordingly. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The `--user-agent` value was passed only to the document loader, so the benchmark's main requests — the runners' inbox POSTs and WebFinger GETs, the benchmark-mode probe, and the server stats reads — went out with the runtime's default User-Agent. A target that inspects, logs, or rate-limits by User-Agent saw the wrong value, so the option was silently ineffective for the traffic that matters. Wrap the fetch implementation once with withUserAgent(), so every benchmark request carries the configured User-Agent. A prebuilt request (the signed inbox delivery, a WebFinger GET) has the header set in place rather than recloned, leaving the already-signed body and digest untouched; the User-Agent is not part of the signed header set, so this does not affect verification. A User-Agent the caller already set is left as-is. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The text and Markdown renderers only surfaced server queue metrics when
a drain-latency histogram was present, with the depth shown merely as a
suffix to that line. The current stats reader supplies
`queue.depthMax` without `drainMs`, so queue depth never appeared in the
human-readable output even though it was in the JSON model; the Markdown
form rendered no queue metrics at all.
Render queue depth on its own:
- text: keep the combined drain line (now only when it has at least one
percentile), otherwise print a standalone `Server queue depth max`
line whenever a depth is reported.
- Markdown: add a queue drain p95 row when present and a queue depth max
row whenever a depth is reported.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
`new URL("localhost:3000")` parses as the `localhost:` scheme with an
empty host, a common typo for a missing `http://`. Normalization
accepted it, so `--dry-run` succeeded while a real run would misclassify
the target or build an unsupported fetch URL. Targets carrying
credentials (`http://user:pass@host`) were likewise accepted even though
`fetch` rejects them.
Reject, during normalization, any target whose protocol is not `http:`
or `https:`, whose host is empty, or that carries embedded credentials,
with a message pointing at the likely fix. The probe and runners only
make bare HTTP(S) requests, so these never produce a working run.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The safety gate classified only the suite `target`, but an `inbox` scenario's actual signed-load destination is the discovered inbox (or an explicit `inbox:` URL), which can differ from the target. A loopback `target` with a public `recipient`, or `inbox: https://prod.example/inbox`, would send benchmark POST load to a public inbox with no gate at all, bypassing the guard against accidentally benchmarking production. The synthetic-reachability rule was likewise only checked against the target tier, not the destination that actually verifies signatures. Gate each resolved inbox destination before any load reaches it: - assertInboxDestinationAllowed() refuses a public destination unless it shares the gated target's origin while the target advertises benchmark mode (inheriting its gate), or --allow-unsafe-target is given; and refuses a non-loopback destination unless a reachable synthetic host was advertised (--advertise-host). Origins are compared (scheme, host, effective port), so an http inbox does not inherit an https target. - The inbox runner calls an injected destination gate for each resolved inbox before sending; the orchestrator maps a refusal to exit 2. Discovery (a read) still runs, but no benchmark load is sent to an ungated destination. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The default fetch follows redirects, which let two safety checks be
bypassed. A public target whose `stats` endpoint redirected to a host
serving benchmark-mode JSON was marked as advertising benchmark mode, so
the gate allowed load against it. And a gated loopback, private, or
benchmark target that answered a WebFinger GET or a signed inbox POST
with a 307/308 could carry that load to an ungated public service,
slipping past the destination gate.
Make every benchmark request non-following:
- The benchmark-mode probe and the server stats read use
`redirect: "manual"`, so a redirect is treated as "not advertised"
and "unavailable" respectively rather than trusted.
- `sendRequest` re-wraps any non-manual request as `redirect: "manual"`
and records a redirect (opaque or 3xx) as a failed send, so no signed
load reaches the redirect target; the WebFinger and inbox requests are
built with `redirect: "manual"` so the common path needs no re-clone.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive benchmarking toolchain for Fedify centered around the new fedify bench command. It allows driving ActivityPub-specific load, such as signed inbox deliveries and WebFinger lookups, against a cooperative target running in benchmark mode. The implementation includes scenario loading, validation against a published JSON Schema, safety gating to prevent unsafe benchmarking of public targets, a load generator supporting open-loop and closed-loop models, and report generation in text, JSON, and Markdown formats. Feedback on the changes highlights two key improvements: wrapping onMeasuredWindowStart in .then() to safely catch synchronous errors, and passing the Uint8Array body directly instead of body.buffer to avoid issues with pooled or sliced buffers.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
withMeasuredWindowStart wrapped the callback as Promise.resolve(onMeasuredWindowStart()), which runs it synchronously before Promise.resolve, so a synchronous throw in the callback would escape the promise chain instead of becoming a rejection. Invoke it through Promise.resolve().then(...), matching the signing pipeline's pattern, so a sync throw rejects the gated send. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
signInboxDelivery passed body.buffer to signRequest. body comes from TextEncoder().encode() (an exact-fit view), so this was correct, but it would include trailing bytes were body ever a view into a larger buffer, breaking the digest. Slice the exact view bytes instead. signRequest's body option is an ArrayBuffer, so passing the Uint8Array directly would not type-check. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the fedify bench command to @fedify/cli for benchmarking federation workloads, along with runners for inbox and webfinger scenarios, a synthetic actor/key server, and JSON schemas for scenario suites and reports. The review feedback highlights two important issues: a bug in the template argument parser where escaped quotes inside string arguments are not handled correctly, and a memory efficiency concern where draining response bodies using response.arrayBuffer() could lead to out-of-memory errors under load, which can be resolved by canceling the body stream directly.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 180abef664
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
splitTopLevel did not track the backslash escape, so an escaped quote inside a helper string argument was treated as a closing quote and split the arguments wrongly; parseArg's regex also forbade any embedded quote. Track the escape state when splitting and accept (then unescape) escaped quotes when parsing the argument. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
resolveAdvertiseHost bound an advertised hostname to the IPv4 wildcard (0.0.0.0). If the hostname resolves to an AAAA record (or the target prefers IPv6), the target dereferences the actor URLs over IPv6 with nothing listening, so signed deliveries fail key lookup. A hostname can resolve to either family, so bind dual-stack (::); an IPv4 literal still binds 0.0.0.0 and an IPv6 literal still binds ::. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
The --dry-run help promised to "resolve discovery", but the command returns right after printing the normalized plan: it never contacts the target, performs recipient discovery, or gates the resolved inbox, so a bad recipient or off-target inbox can look valid in a dry run and only fail in the real run. Match the help (and the gate's comment) to what dry-run actually does, consistent with the manual: print the plan without contacting the target or sending load. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the fedify bench command to @fedify/cli for benchmarking federation workloads against a cooperative target. It adds comprehensive support for parsing, validating, and executing benchmarking scenarios—specifically inbox and webfinger runners—and generating detailed reports. Key features include open-loop and closed-loop load generation, a synthetic actor server for signature verification, and log-linear histogram aggregation. The reviewer feedback highlights two robustness improvements in stats-client.ts: defensively verifying matching bucket boundaries before merging histogram data points to avoid misaligned buckets, and filtering out null or undefined metrics in flattenMetrics to prevent a TypeError from silently failing the entire snapshot parsing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf94455946
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two defensive gaps in parsing the target's stats JSON: a null or undefined entry in a scope's metrics array made the whole parse throw (caught, but silently dropping every server metric), and merging histogram data points summed counts whenever the lengths matched without checking that the bucket boundaries were identical, which would misalign buckets and skew percentiles. Filter out null metric entries so the rest still parse, and only sum histogram points that share the exact same boundaries. fedify-dev#791 (comment) fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
--allow-unsafe-target was config-backed, so a system, user, or project .fedify.toml with bench.allowUnsafeTarget = true would make every run behave as if the flag were passed, silently disabling the safety gate and letting load reach a public, non-benchmark target. The override is meant to be an explicit per-run acknowledgment, not a persisted default. Make the flag a plain CLI flag (no config binding) and drop allowUnsafeTarget from the bench config schema, so it can only be given on the command line. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the fedify bench command for benchmarking Fedify federation workloads, including scenario validation, load generation, metrics aggregation, and a synthetic actor server. Feedback on the implementation highlights two potential runtime errors: a possible TypeError when attempting to mutate immutable headers on a Request object in withUserAgent, and another TypeError in the WebFinger runner when falling back to a schemeless host if recipients is empty.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f6d30f268
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
withUserAgent set the User-Agent on a prebuilt Request in place. If such a request ever has immutable headers, set() throws a TypeError and the send crashes. Try the in-place set (the fast path for the requests this tool builds, which have mutable headers) and fall back to a cloned Request with merged headers if it throws. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
When a webfinger scenario has no recipients, the runner fell back to the target's schemeless host (for example localhost:3000), which convertUrlIfHandle cannot parse as a URL and would throw. Fall back to the target's full href, which is always a valid URL. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
import.meta.dirname is only available on Node >= 20.11, but the package supports Node >= 20.0, so on Node 20.0 to 20.10 it is undefined and feeds undefined into join(), aborting the schema and render tests before they run. Derive the directory from dirname(fileURLToPath(import.meta.url)) instead, which works across all supported Node versions. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the fedify bench command to @fedify/cli for benchmarking federation workloads. It adds a comprehensive benchmarking suite, including scenario runners for inbox and webfinger, load generation, metrics aggregation, safety gating, and a synthetic actor/key server. It also publishes JSON Schemas for scenarios and reports. Feedback on the changes includes a minor formatting correction in schema/README.md to remove an unnecessary backslash in a file path to ensure standard Markdown rendering.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Resolves #783, the second of the five benchmarking steps tracked in #744. It adds the client half of
fedify benchto @fedify/cli: a load generator that exercises a Fedify server the way the fediverse does.Generic HTTP load tools (autocannon, wrk, k6) cannot sign an inbox delivery, build a realistic ActivityStreams payload, or read a target's queue depth, so against a federation server they measure the wrong thing. The server half landed earlier in #782, which added
benchmarkModetoFederationOptionstogether with the cooperative/.well-known/fedify/bench/statsand…/triggerendpoints. This PR is what drives that target.The command acts as a synthetic remote actor. It generates keys and serves its own actor and key documents over loopback, then discovers the recipient's inbox the way a real peer would. Every delivery is signed with the same
@fedify/fedifysigner a real sender uses, so the crypto cost lands in the measured latency. It drives the load, reads the target's server-side metrics from the stats endpoint, and renders one report model as text, JSON, or Markdown.What it includes:
target, theactorsto sign as, shareddefaults, and a list of scenarios, each with anexpectblock of pass/fail thresholds that doubles as a CI gate.inbox(the signed end-to-end delivery benchmark) andwebfinger. The format and the schema can express the other types from Performance benchmarking tools for Fedify federation workloads #744 (actor,object,fanout,collection,failure,mixed), but a scenario whose type has no runner yet is rejected with a clear message rather than silently skipped.rate) and closed-loop (concurrency) load, with coordinated-omission correction so a stalled target shows up as latency instead of disappearing, plus constant or Poisson arrivals and an optionalmaxInFlightcap.pipeline(background signers fill a bounded buffer),jit, andpresign.Scenario format and JSON schema
The schema is dual-maintained. A frozen TypeScript literal embedded in the CLI is what the runtime validates against, using
@cfworker/json-schema(pure JavaScript, so it survivesdeno compile); the committed schema/bench/scenario-v1.json and schema/bench/report-v1.json are the published copies. A test guard keeps the embedded and published forms byte-identical and refuses any edit to an already-published version, so a-v1URL never changes meaning. The# yaml-language-server:line in a suite gives editors autocomplete and validation against the published URL.Safety
A run proceeds without friction against a loopback or private target, or any target that advertises benchmark mode. A public target that does not advertise it is refused unless you pass
--allow-unsafe-target, which is mandatory and never prompted in CI. The gate classifies the actual load destination, not only the declaredtarget, so a loopback target paired with a public recipient (or an explicit publicinbox:) cannot route load to production behind the gate's back. For the same reason, benchmark traffic does not follow redirects. Signed scenarios additionally need the synthetic actor server to be reachable from the target: a loopback target reaches it automatically, and a non-loopback target requires--advertise-host.Schema hosting
The schemas live at
https://json-schema.fedify.dev/. This PR adds the static assets under schema/: the two JSON files, an index.html landing page, a contributor README.md, and _headers with netlify.toml that set CORS, long-lived immutable caching, and theapplication/schema+jsoncontent type. The hosting itself is configured on Netlify out of band; CI does not upload anything.Testing and documentation
The benchmark test suite runs under both Node and Deno (about 240 tests), including an end-to-end inbox benchmark against a real
benchmarkModeserver that verifies the signatures, so the signed delivery path is run rather than mocked. docs/manual/benchmarking.md gains a client section covering the suite format, the actor and signing model, the output formats, and safety; CHANGES.md has an entry under version 2.3.0.