Skip to content

fix(openant-cli): bound Python subprocess in Invoke with an automatic timeout#98

Open
gadievron wants to merge 1 commit into
masterfrom
fix/openant-cli-bound-python-subprocess-in-invoke-with
Open

fix(openant-cli): bound Python subprocess in Invoke with an automatic timeout#98
gadievron wants to merge 1 commit into
masterfrom
fix/openant-cli-bound-python-subprocess-in-invoke-with

Conversation

@gadievron
Copy link
Copy Markdown
Collaborator

Invoke built the Python subprocess with exec.Command (no context, no
deadline), so a hung parser blocked the CLI forever on io.Copy/cmd.Wait.
The only recovery was a user-delivered SIGINT, leaving headless callers
(CI, scheduled scans, checkpoint.go quiet=true) with no recovery path.

Switch the single subprocess-build site to exec.CommandContext driven by
context.WithTimeout(defaultInvokeTimeout) — mirroring cmd/docker.go. The
Invoke signature is unchanged, so all 10 callers are unaffected and the
SIGINT fast-path is kept as a secondary mechanism.

Killing the process is not sufficient on its own: a descendant can hold the
stdout/stderr pipe write-ends open, leaving io.Copy blocked after the parent
dies. cmd.WaitDelay force-closes the inherited FDs, and a watchdog goroutine
closes the pipe read-ends on ctx.Done() so the in-flight reads return.

defaultInvokeTimeout defaults to 30m and is a package var so tests can shrink
it. New invoke_test.go::TestInvoke_HangingSubprocessIsBoundedByTimeout runs a
fake parser that sleeps past the deadline and asserts Invoke returns within a
bounded window (RED: blocked the full budget; GREEN: returns at the deadline).
go test ./internal/python/ ok (21 passed); go vet + gofmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

… timeout

Invoke built the Python subprocess with exec.Command (no context, no
deadline), so a hung parser blocked the CLI forever on io.Copy/cmd.Wait.
The only recovery was a user-delivered SIGINT, leaving headless callers
(CI, scheduled scans, checkpoint.go quiet=true) with no recovery path.

Switch the single subprocess-build site to exec.CommandContext driven by
context.WithTimeout(defaultInvokeTimeout) — mirroring cmd/docker.go. The
Invoke signature is unchanged, so all 10 callers are unaffected and the
SIGINT fast-path is kept as a secondary mechanism.

Killing the process is not sufficient on its own: a descendant can hold the
stdout/stderr pipe write-ends open, leaving io.Copy blocked after the parent
dies. cmd.WaitDelay force-closes the inherited FDs, and a watchdog goroutine
closes the pipe read-ends on ctx.Done() so the in-flight reads return.

defaultInvokeTimeout defaults to 30m and is a package var so tests can shrink
it. New invoke_test.go::TestInvoke_HangingSubprocessIsBoundedByTimeout runs a
fake parser that sleeps past the deadline and asserts Invoke returns within a
bounded window (RED: blocked the full budget; GREEN: returns at the deadline).
go test ./internal/python/ ok (21 passed); go vet + gofmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant