Skip to content

Improve terminal-bench eval execution defaults#1031

Draft
wgqqqqq wants to merge 3 commits into
GCWing:evals-on-releasefrom
wgqqqqq:evals-on-release
Draft

Improve terminal-bench eval execution defaults#1031
wgqqqqq wants to merge 3 commits into
GCWing:evals-on-releasefrom
wgqqqqq:evals-on-release

Conversation

@wgqqqqq
Copy link
Copy Markdown
Collaborator

@wgqqqqq wgqqqqq commented Jun 2, 2026

Summary

  • cherry-pick terminal-bench eval execution default improvements
  • add eval deadline guidance/metadata propagation and tighter Bash budget handling
  • keep eval exec behavior focused on concrete artifacts, verification, and non-interactive reliability

Verification

  • cargo check -p bitfun-cli

@wgqqqqq wgqqqqq marked this pull request as draft June 2, 2026 04:41
@wgqqqqq wgqqqqq force-pushed the evals-on-release branch from eee4660 to 3fcec1c Compare June 2, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants