fix: reconcile orphaned jobs whose launcher process has died by mroldrobot · Pull Request #379 · openai/codex-plugin-cc

mroldrobot · 2026-06-17T13:58:54Z

Problem

A tracked job's completion is written by the launcher process that owns it
(tracked-jobs.mjs::runTrackedJob): it marks the job running, awaits the
turn, then writes completed/failed. If that launcher dies before the turn
finishes — a cancelled background job, an ended Claude Code session, a crash, or
the machine sleeping — the completion write never runs and the job is frozen at
running/queued with a now-dead pid.

Nothing reconciles this: buildStatusSnapshot / buildSingleJobSnapshot filter
purely on the stored status string, and the only ESRCH handling in the
codebase is in the cancel path. So /codex:status reports the job running
forever and /codex:result refuses with "still running". In practice these
orphans accumulate (I had several stuck for days).

Fix

listJobs() — which every status/result/cancel path flows through — now probes
the recorded launcher pid with process.kill(pid, 0):

ESRCH (process gone) → reconcile the job to failed, clear the pid, set
completedAt and an explanatory errorMessage. The correction is persisted to
both the state index and the per-job file.
EPERM / unknown → treat as alive (fail safe).
Jobs without a pid, or already in a terminal state, are left untouched.

A dead background turn now surfaces as failed (and is retryable) instead of
hanging running indefinitely.

Tests

Adds coverage in tests/state.test.mjs: dead-pid running/queued reconciliation,
persistence across reads, and live-pid / pidless / finished jobs left untouched.
node --test tests/state.test.mjs → 7/7 green.

Note: 3 unrelated tests (#63/#65/#67) fail under the aggregate npm test due to
pre-existing cross-file state leakage; they pass in isolation and fail
identically on main without this change.

A tracked job's completion is written by the launcher process that owns it (tracked-jobs.mjs::runTrackedJob). If that launcher dies before the turn finishes (cancelled background job, ended session, crash, sleep), the job is frozen at "running"/"queued" with a stale pid forever: /status reports it running indefinitely and /result refuses with "still running". listJobs() now probes the recorded pid via process.kill(pid, 0). Jobs whose process is gone (ESRCH) are reconciled to "failed" and the correction is persisted to both the state index and the per-job file. EPERM/unknown errors are treated as alive; jobs without a pid or already in a terminal state are left untouched. Adds tests for dead-pid running/queued reconciliation, persistence across reads, and live-pid / pidless / finished jobs left untouched.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc61122b1b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T14:01:41Z

+    status: job.status,
+    phase: job.phase,


Preserve completed job files when reconciling

If a launcher dies after runTrackedJob has written the per-job completion file but before its following index upsert, the index still says running with a dead pid while the job file already says completed. This block unconditionally spreads stored and then overwrites its status/phase to failed, so the next /codex:status or /codex:result corrupts that completed job into an orphan failure. Check the stored job's terminal status, or avoid overwriting it, before persisting reconciliation.

Useful? React with 👍 / 👎.

mroldrobot requested a review from a team June 17, 2026 13:58

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reconcile orphaned jobs whose launcher process has died#379

fix: reconcile orphaned jobs whose launcher process has died#379
mroldrobot wants to merge 1 commit into
openai:mainfrom
mroldrobot:fix/reconcile-orphaned-jobs

mroldrobot commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mroldrobot commented Jun 17, 2026

Problem

Fix

Tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant