fix(producer): retry probe navigation timeouts#1713
Conversation
|
Closing this for now because it is retry classification / mitigation, not the root-cause fix for the Chromium renderer crash. We will reopen or replace with a root fix once the crash trigger is proven. |
|
Reopening based on the deeper prod-path repro. We now have exact-image A/B evidence: the deployed prod sidecar image (0.7.2, Chrome 150) fails the exact failed streamed-preview source under /v1/render-stream with the observed browser_probe signatures (6/8 failures: Navigating frame was detached / Target closed), while the 0.7.6 candidate image passes the identical replay 8/8. The candidate classifier still returns false for |
Summary
Treat Chromium navigation timeouts during browser probe/init as transient browser failures so
runProbeStageretries once with a fresh browser session.This preserves the preferred Linux BeginFrame path. It does not force screenshot mode.
Why
Render-streaming probe/init failures can surface as browser lifecycle errors such as:
Navigating frame was detachedProtocol error (Runtime.evaluate): Target closedNavigation timeout of 60000 ms exceededMain already retries the first two classes through
isTransientBrowserError, but navigation timeout was explicitly excluded. This PR closes that gap so probe-stage navigation timeouts use the same fresh-browser retry path before the producer emits a terminal render error.Validation
Navigation timeout of 60000 ms exceeded.Verification
bun test packages/engine/src/services/frameCapture-transientErrors.test.ts packages/producer/src/services/render/stages/probeStage.test.ts— 30 pass, 0 failbunx oxfmt --check packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture-transientErrors.test.ts packages/producer/src/services/render/stages/probeStage.test.tsbunx oxlint packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture-transientErrors.test.ts packages/producer/src/services/render/stages/probeStage.test.tsgit diff --check HEAD~1..HEADNote: I also tried the broader render-stage test bundle including
captureStreamingStage.test.ts; that test file has unrelated mock drift in isolation (getCapturePerfSummaryexport/sessionoptionsmocks), so I did not treat it as verification for this PR.