Skip to content

Add optional whole-run wall-clock deadline to graph execution#29

Merged
senamakel merged 1 commit into
mainfrom
feat/graph-run-deadline
Jul 5, 2026
Merged

Add optional whole-run wall-clock deadline to graph execution#29
senamakel merged 1 commit into
mainfrom
feat/graph-run-deadline

Conversation

@senamakel

Copy link
Copy Markdown
Member

Whole-run wall-clock deadline for graph execution

Adds an optional per-run deadline to CompiledGraph, checked at every super-step boundary.

Motivation

Today the only way to bound a whole graph run by wall clock is to wrap run()/run_with_thread() in an external tokio::time::timeout. That aborts the run mid-super-step, which cannot leave a clean checkpoint — so a durable graph that times out loses the in-flight super-step's boundary and the caller can't cleanly resume or inspect it. (This bit a downstream: a cron-driven reflection loop wrapped each run in an outer tokio::time::timeout; a slow run left no resumable state.)

What this does

CompiledGraph::with_run_deadline(d) stops the run between super-steps: when the elapsed run time first reaches d, the executor fails with the existing TinyAgentsError::Timeout before scheduling the next super-step, so the last committed boundary checkpoint stays intact and resumable.

  • Reuses TinyAgentsError::Timeout (already documented as "the run exceeded its wall-clock deadline") — no new error variant, no breaking change.
  • The check sits next to the existing RecursionLimit boundary check in the superstep loop; None (default) is a no-op.
  • Bounds scheduling, not a single in-flight node — pair with with_node_timeout to bound individual handlers. (A future composition with the cancellation token could add best-effort in-flight cancellation; deliberately out of scope here to keep the primitive small and the checkpoint guarantee clean.)

Tests (graph/compiled/test.rs)

  • run_deadline_stops_between_supersteps — a deadline shorter than the run trips with Timeout.
  • run_deadline_allows_a_run_that_finishes_in_time — no false trip when the run completes in time.
  • run_deadline_leaves_last_checkpoint_resumable — on a checkpointed thread, a deadline trip leaves the last boundary checkpoint intact, and resuming (no deadline) runs the remaining super-steps to completion.

cargo fmt --check, cargo clippy --all-targets --features sqlite -- -D warnings, and the full suite are green. Public API doc updated in docs/modules/graph/builder.md.

@coderabbitai

coderabbitai Bot commented Jul 5, 2026

Copy link
Copy Markdown

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 42 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f4f5023-8110-429c-affa-beccf118f3b0

📥 Commits

Reviewing files that changed from the base of the PR and between 3e81e49 and 1982d9e.

📒 Files selected for processing (5)
  • docs/modules/graph/builder.md
  • src/graph/compiled/executor.rs
  • src/graph/compiled/mod.rs
  • src/graph/compiled/test.rs
  • src/graph/compiled/types.rs

Comment @coderabbitai help to get the list of available commands.

CompiledGraph::with_run_deadline(d) bounds the whole run by a wall clock,
checked at every super-step boundary. When the elapsed run time first reaches
the deadline, the run stops *between* super-steps with TinyAgentsError::Timeout,
leaving the last committed boundary checkpoint intact and resumable.

This is the durable alternative to wrapping run() in an external
tokio::time::timeout, which aborts mid-super-step and cannot leave a clean
checkpoint. The deadline bounds scheduling, not a single in-flight node — pair
it with with_node_timeout to also bound individual handlers.

Tests: deadline trips between super-steps (Timeout); a run that finishes in time
is unaffected; a deadline trip on a checkpointed thread leaves the last boundary
checkpoint intact and the run resumes to completion. fmt + clippy -D warnings
clean; full suite green. Docs: builder.md.
@senamakel senamakel force-pushed the feat/graph-run-deadline branch from 1945739 to 1982d9e Compare July 5, 2026 08:01
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

.map_err(|err| raise(ctx, err))?;

P2 Badge Complete ReplCall events on failed calls

When model.invoke returns an error, times out, or is cancelled after the Started event has been emitted above, this ? exits through raise before record emits the matching ReplCallPhase::Completed event. Live subscribers can then keep showing the call as in-flight forever; the same pattern appears in the tool and agent paths below, so failures from providers/tools/agents need to emit a completed/error record before returning.


self.fail_run(&run_id, &thread_id, started_at, steps, &err, None)

P2 Badge Preserve checkpoint id on deadline failure

When a checkpointed run trips the new deadline after at least one super-step, last_checkpoint already points at the resumable boundary, but the failed status is saved with checkpoint_id: None. Hosts reading GraphStatusStore for the failed run cannot locate the checkpoint to inspect or resume from even though the feature promises the last committed boundary stays usable; pass the current last_checkpoint here when failing due to the run deadline.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@senamakel senamakel merged commit 6bf67ac into main Jul 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant