Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion plugins/codex/commands/adversarial-review.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
description: Run a Codex review that challenges the implementation approach and design choices
argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch] [focus ...]'
argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch] [--resume|--fresh] [--within-hours <n>] [focus ...]'
disable-model-invocation: true
allowed-tools: Read, Glob, Grep, Bash(node:*), Bash(git:*), AskUserQuestion
---
Expand Down Expand Up @@ -44,6 +44,13 @@ Argument handling:
- It does not support `--scope staged` or `--scope unstaged`.
- Unlike `/codex:review`, it can still take extra focus text after the flags.

Context reuse (lower token burn):
- `--resume` reuses-or-creates: it reuses this git worktree's recent review thread if one exists, and otherwise starts a new resumable thread. It is safe to use on every review; you never need to seed it with `--fresh` first, and it never fails just because no prior thread exists.
- Reusing the prior thread keeps the exploration Codex already did instead of re-reading the codebase each pass, lowering token/quota burn on iterative review loops.
- Reuse is scoped to the current git worktree and only happens when the prior review thread was used within the last few hours (default 3, override with `--within-hours <n>`); otherwise a fresh thread is started automatically.
- `--fresh` forces a new review thread for this worktree even when a recent one exists — use it only to deliberately discard the previous review context.
- Pass `--resume`/`--fresh` through verbatim; do not treat them as focus text.

Foreground flow:
- Run:
```bash
Expand Down
10 changes: 9 additions & 1 deletion plugins/codex/commands/review.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
description: Run a Codex code review against local git state
argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch]'
argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch] [--resume|--fresh] [--within-hours <n>]'
disable-model-invocation: true
allowed-tools: Read, Glob, Grep, Bash(node:*), Bash(git:*), AskUserQuestion
---
Expand Down Expand Up @@ -39,6 +39,14 @@ Argument handling:
- `/codex:review` is native-review only. It does not support staged-only review, unstaged-only review, or extra focus text.
- If the user needs custom review instructions or more adversarial framing, they should use `/codex:adversarial-review`.

Context reuse (lower token burn):
- `--resume` reuses-or-creates: it reuses this git worktree's recent review thread if one exists, and otherwise starts a new resumable thread. It is safe to use on every review; you never need to seed it with `--fresh` first, and it never fails just because no prior thread exists.
- Reusing the prior thread lets Codex keep the exploration it already did instead of re-reading the codebase from scratch, which sharply cuts token/quota burn during a review → fix → re-review loop.
- Reuse is scoped to the current git worktree and only happens when the prior review thread was used within the last few hours (default 3, override with `--within-hours <n>`); otherwise a fresh thread is started automatically.
- `--fresh` forces a brand-new review thread for this worktree even when a recent one exists — use it only to deliberately discard the previous review context (a new task, or a stale thread).
- `--resume`/`--fresh` run a turn-based reviewer (so the thread can be resumed) instead of Codex's one-shot native review mode, which cannot carry context between runs. Plain `/codex:review` with no flag is still the best choice for a single cold pass.
- Pass `--resume`/`--fresh` through verbatim; the companion script handles them.

Foreground flow:
- Run:
```bash
Expand Down
75 changes: 75 additions & 0 deletions plugins/codex/prompts/review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<!-- Adapted from Codex's native reviewer system prompt
(openai/codex: codex-rs/core/review_prompt.md) so that the resumable
turn-based `/codex:review` reviews like native `/codex:review`. The rubric
is kept faithful; only the output section is remapped to this plugin's
review-output schema (see schemas/review-output.schema.json). -->

# Review guidelines

You are acting as a reviewer for a proposed code change made by another engineer.

Below are default guidelines for determining whether the original author would appreciate an issue being flagged. They are not the final word: more specific guidelines you encounter (in the repository context below, a developer message, or a user message) override these.

Flag something as a bug only when:
1. It meaningfully impacts the accuracy, performance, security, or maintainability of the code.
2. The bug is discrete and actionable (not a general issue with the codebase, and not a combination of multiple issues).
3. Fixing it does not demand a level of rigor absent from the rest of the codebase (e.g. one-off scripts do not need detailed comments and input validation).
4. The bug was introduced by the change under review — do not flag pre-existing bugs.
5. The original author would likely fix it if they were made aware of it.
6. It does not rely on unstated assumptions about the codebase or the author's intent.
7. You can identify the specific other code that is provably affected — it is not enough to speculate that a change may disrupt another part of the codebase.
8. It is clearly not an intentional change by the original author.

When you flag a bug, the accompanying explanation should:
1. Be clear about why the issue is a bug.
2. Communicate severity accurately — do not claim an issue is more severe than it is.
3. Be brief: at most one paragraph, with no line breaks in the prose unless needed for a code fragment.
4. Avoid code chunks longer than 3 lines; wrap any code in inline code tags or a short code block.
5. Explicitly state the scenarios, environments, or inputs required for the bug to arise, and make clear that severity depends on those factors.
6. Be matter-of-fact, not accusatory or flattering (avoid "Great job…", "Thanks for…").
7. Let the author grasp the issue without close reading.

HOW MANY FINDINGS TO RETURN:

Output every finding the original author would fix if they knew about it. If there is no finding a person would clearly want fixed, prefer returning none. Do not stop at the first qualifying finding; continue until you have listed every qualifying one.

GUIDELINES:

- Ignore trivial style unless it obscures meaning or violates a documented standard.
- Use one finding per distinct issue.
- Keep the line range as short as possible — avoid ranges longer than 5–10 lines; pick the subrange that pinpoints the problem. The location should overlap the change under review.
- Do not generate a full patch; describe the fix rather than writing the replacement code.

PRIORITY:

Assess each finding's priority and map it to `severity`:
- P0 → `critical`: drop everything; a universal issue that does not depend on assumptions about the inputs.
- P1 → `high`: urgent; should be addressed in the next cycle.
- P2 → `medium`: normal; to be fixed eventually.
- P3 → `low`: nice to have.

OVERALL CORRECTNESS:

Decide whether the change is correct — existing code and tests will not break and it is free of blocking bugs. Ignore non-blocking issues (style, formatting, typos, documentation, nits) for this verdict. Map a correct patch to `approve` and an incorrect one to `needs-attention`.

## What to review

Target: {{TARGET_LABEL}}

{{REVIEW_COLLECTION_GUIDANCE}}

## Output format

Return only valid JSON matching the provided schema — no markdown fences and no extra prose. For each finding provide:
- `title`: ≤ 80 chars, imperative.
- `body`: one paragraph of Markdown explaining why it is a bug, citing the affected file/lines/function.
- `severity`: mapped from the priority above.
- `confidence`: a float from 0 to 1.
- `file`, `line_start`, `line_end`: the affected location, kept as short as possible and overlapping the change.
- `recommendation`: describe the fix concisely; do not provide a code patch.

Set `verdict` from the overall-correctness decision, write `summary` as a terse 1–3 sentence justification of that verdict, and use `next_steps` for any follow-ups.

## Repository context

{{REVIEW_INPUT}}
132 changes: 117 additions & 15 deletions plugins/codex/scripts/codex-companion.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import { fileURLToPath } from "node:url";

import { parseArgs, splitRawArgumentString } from "./lib/args.mjs";
import {
buildPersistentReviewThreadName,
buildPersistentTaskThreadName,
DEFAULT_CONTINUE_PROMPT,
findLatestTaskThread,
Expand Down Expand Up @@ -69,14 +70,22 @@ const DEFAULT_STATUS_POLL_INTERVAL_MS = 2000;
const VALID_REASONING_EFFORTS = new Set(["none", "minimal", "low", "medium", "high", "xhigh"]);
const MODEL_ALIASES = new Map([["spark", "gpt-5.3-codex-spark"]]);
const STOP_REVIEW_TASK_MARKER = "Run a stop-gate review of the previous Claude turn.";
const DEFAULT_REVIEW_REUSE_WINDOW_HOURS = 3;
const REVIEW_RESUME_CONTINUATION_NOTE = [
"<continuation>",
"You already reviewed this worktree earlier in this same thread.",
"Reuse what you already learned about the codebase; do not re-read files you have already seen unless they changed.",
"The working tree may have moved on since then. Concentrate on what changed, and re-check whether your earlier findings are now resolved or still open.",
"</continuation>"
].join("\n");

function printUsage() {
console.log(
[
"Usage:",
" node scripts/codex-companion.mjs setup [--enable-review-gate|--disable-review-gate] [--json]",
" node scripts/codex-companion.mjs review [--wait|--background] [--base <ref>] [--scope <auto|working-tree|branch>]",
" node scripts/codex-companion.mjs adversarial-review [--wait|--background] [--base <ref>] [--scope <auto|working-tree|branch>] [focus text]",
" node scripts/codex-companion.mjs review [--wait|--background] [--base <ref>] [--scope <auto|working-tree|branch>] [--resume|--fresh] [--within-hours <n>]",
" node scripts/codex-companion.mjs adversarial-review [--wait|--background] [--base <ref>] [--scope <auto|working-tree|branch>] [--resume|--fresh] [--within-hours <n>] [focus text]",
" node scripts/codex-companion.mjs task [--background] [--write] [--resume-last|--resume|--fresh] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>] [prompt]",
" node scripts/codex-companion.mjs status [job-id] [--all] [--json]",
" node scripts/codex-companion.mjs result [job-id] [--json]",
Expand Down Expand Up @@ -235,15 +244,19 @@ async function handleSetup(argv) {
outputResult(options.json ? finalReport : renderSetupReport(finalReport), options.json);
}

function buildAdversarialReviewPrompt(context, focusText) {
const template = loadPromptTemplate(ROOT_DIR, "adversarial-review");
return interpolateTemplate(template, {
REVIEW_KIND: "Adversarial Review",
function buildReviewTurnPrompt(context, { reviewName, focusText = "", resuming = false } = {}) {
const templateName = reviewName === "Adversarial Review" ? "adversarial-review" : "review";
const template = loadPromptTemplate(ROOT_DIR, templateName);
const body = interpolateTemplate(template, {
REVIEW_KIND: reviewName,
TARGET_LABEL: context.target.label,
USER_FOCUS: focusText || "No extra focus provided.",
REVIEW_COLLECTION_GUIDANCE: context.collectionGuidance,
REVIEW_INPUT: context.content
});
// Inject the reuse note only when actually resuming, so the no-flag path is
// byte-identical to the original one-shot prompt.
return resuming ? `${REVIEW_RESUME_CONTINUATION_NOTE}\n\n${body}` : body;
}

function ensureCodexAvailable(cwd) {
Expand Down Expand Up @@ -352,6 +365,54 @@ async function resolveLatestTrackedTaskThread(cwd, options = {}) {
return findLatestTaskThread(workspaceRoot);
}

function resolveReviewReuseWindowMs(hours) {
const parsed = Number(hours);
if (!Number.isFinite(parsed) || parsed <= 0) {
return DEFAULT_REVIEW_REUSE_WINDOW_HOURS * 60 * 60 * 1000;
}
return parsed * 60 * 60 * 1000;
}

// Reuse is keyed by the git worktree (the per-worktree state dir already scopes
// jobs) and bounded by recency, so stale exploration is not resumed.
// Only `resumable` jobs are eligible: plain native `/codex:review` runs persist
// a `threadId` too, but those threads are ephemeral and cannot be resumed.
function resolveLatestReviewThread(workspaceRoot, { withinMs = null, excludeJobId = null, kind = null } = {}) {
// Take the newest resumable review run and reuse its thread only if that run
// completed. The newest run is considered even before it records a thread id
// (a concurrent run that has not yet emitted "Thread ready"); otherwise we
// could fall back past it to an older completed run on the same thread and
// let two runs resume the same thread and interleave turns. A cancelled,
// failed, or still-running newest run therefore means: start fresh.
const candidate = sortJobsNewestFirst(listJobs(workspaceRoot)).find(
(job) =>
job.id !== excludeJobId &&
job.jobClass === "review" &&
// Scope to the same review kind. `/codex:review` and
// `/codex:adversarial-review` share jobClass "review" but differ by
// `kind`; reusing across kinds would inject the other reviewer's prompt
// (e.g. the adversarial framing) into a plain review thread.
(kind === null || job.kind === kind) &&
job.resumable === true
Comment thread
rrva marked this conversation as resolved.
);
if (!candidate || candidate.status !== "completed" || !candidate.threadId) {
return null;
}

if (Number.isFinite(withinMs)) {
const lastUsed = Date.parse(candidate.completedAt ?? candidate.createdAt ?? "");
if (!Number.isFinite(lastUsed) || Date.now() - lastUsed > withinMs) {
return null;
}
}

return {
id: candidate.threadId,
jobId: candidate.id,
lastUsedAt: candidate.completedAt ?? candidate.createdAt ?? null
};
}

async function executeReviewRun(request) {
ensureCodexAvailable(request.cwd);
ensureGitRepository(request.cwd);
Expand All @@ -362,7 +423,8 @@ async function executeReviewRun(request) {
});
const focusText = request.focusText?.trim() ?? "";
const reviewName = request.reviewName ?? "Review";
if (reviewName === "Review") {
const sessionMode = Boolean(request.resume || request.fresh);
if (reviewName === "Review" && !sessionMode) {
const reviewTarget = validateNativeReviewRequest(target, focusText);
const result = await runAppServerReview(request.cwd, {
target: reviewTarget,
Expand Down Expand Up @@ -404,14 +466,39 @@ async function executeReviewRun(request) {
}

const context = collectReviewContext(request.cwd, target);
const prompt = buildAdversarialReviewPrompt(context, focusText);

let resumeThreadId = null;
if (request.resume) {
const prior = resolveLatestReviewThread(resolveWorkspaceRoot(context.repoRoot), {
withinMs: request.reuseWindowMs ?? resolveReviewReuseWindowMs(),
excludeJobId: request.jobId ?? null,
kind: reviewName === "Adversarial Review" ? "adversarial-review" : "review"
});
if (prior) {
resumeThreadId = prior.id;
}
}

const persistThread = sessionMode;
const result = await runAppServerTurn(context.repoRoot, {
prompt,
// Build the prompt from the ACTUAL outcome: if a pruned/expired thread
// forces a fresh-thread fallback, the continuation note must not claim
// prior context that the fresh thread does not have.
buildPrompt: ({ resumed }) => buildReviewTurnPrompt(context, { reviewName, focusText, resuming: resumed }),
model: request.model,
sandbox: "read-only",
outputSchema: readOutputSchema(REVIEW_SCHEMA),
onProgress: request.onProgress
onProgress: request.onProgress,
resumeThreadId,
// If a persisted review thread was pruned/expired by Codex, don't fail the
// review — fall back to a fresh persisted thread.
resumeFallback: persistThread,
persistThread,
threadName: persistThread ? buildPersistentReviewThreadName(`${reviewName} ${target.label}`) : null
});
// The resume may have fallen back to a fresh thread (pruned/expired); only
// report reuse when the returned thread is actually the one we resumed.
const actuallyResumed = Boolean(resumeThreadId) && result.threadId === resumeThreadId;
const parsed = parseStructuredOutput(result.finalMessage, {
status: result.status,
failureMessage: result.error?.message ?? result.stderr
Expand All @@ -420,6 +507,9 @@ async function executeReviewRun(request) {
review: reviewName,
target,
threadId: result.threadId,
sessionMode,
reused: actuallyResumed,
resumedThreadId: actuallyResumed ? resumeThreadId : null,
context: {
repoRoot: context.repoRoot,
branch: context.branch,
Expand Down Expand Up @@ -561,7 +651,7 @@ function getJobKindLabel(kind, jobClass) {
return jobClass === "review" ? "review" : "rescue";
}

function createCompanionJob({ prefix, kind, title, workspaceRoot, jobClass, summary, write = false }) {
function createCompanionJob({ prefix, kind, title, workspaceRoot, jobClass, summary, write = false, resumable = false }) {
return createJobRecord({
id: generateJobId(prefix),
kind,
Expand All @@ -570,7 +660,8 @@ function createCompanionJob({ prefix, kind, title, workspaceRoot, jobClass, summ
workspaceRoot,
jobClass,
summary,
write
write,
resumable
});
}

Expand Down Expand Up @@ -681,13 +772,19 @@ function enqueueBackgroundTask(cwd, job, request) {

async function handleReviewCommand(argv, config) {
const { options, positionals } = parseCommandInput(argv, {
valueOptions: ["base", "scope", "model", "cwd"],
booleanOptions: ["json", "background", "wait"],
valueOptions: ["base", "scope", "model", "cwd", "within-hours"],
booleanOptions: ["json", "background", "wait", "resume", "fresh"],
aliasMap: {
m: "model"
}
});

const resume = Boolean(options.resume);
const fresh = Boolean(options.fresh);
if (resume && fresh) {
throw new Error("Choose either --resume or --fresh.");
}

const cwd = resolveCommandCwd(options);
const workspaceRoot = resolveCommandWorkspace(options);
const focusText = positionals.join(" ").trim();
Expand All @@ -704,7 +801,8 @@ async function handleReviewCommand(argv, config) {
title: metadata.title,
workspaceRoot,
jobClass: "review",
summary: metadata.summary
summary: metadata.summary,
resumable: resume || fresh
});
await runForegroundCommand(
job,
Expand All @@ -716,6 +814,10 @@ async function handleReviewCommand(argv, config) {
model: options.model,
focusText,
reviewName: config.reviewName,
resume,
fresh,
reuseWindowMs: resolveReviewReuseWindowMs(options["within-hours"]),
jobId: job.id,
onProgress: progress
}),
{ json: options.json }
Expand Down
Loading