feat(screencapture): ReplayKit broadcast extension as high-fps frame source#3
Merged
Conversation
…source Add a broadcast upload extension (WebDriverAgentBroadcast.appex, embedded into the generated Runner.app) that receives the system's screen frames via ReplayKit, encodes them per capture session (one hardware H264/H265 encoder each, VTPixelTransfer letterbox scaling, 420v end to end) and ships the elementary stream to WDA over loopback TCP :9300. Capture sessions switch to the broadcast source at the extension's first IDR and revert to the XCTest screenshot loop (forced keyframe, no client reconnect) when the broadcast stops, raising achievable stream rates from ~10-20fps to 30-60fps on real devices. - New endpoints: POST /mobilerun/screencapture/broadcast/start (drives RPSystemBroadcastPickerView + system sheet via UI automation), GET .../broadcast (status), POST .../broadcast/stop. Plain /screencapture/start is unchanged; sessions attach automatically and report their active source as "replaykit" or "screenshot". - Shared wire protocol (FBBroadcastProtocol) compiled into both the lib and the appex; the appex cannot link WebDriverAgentLib (XCTest private API is forbidden in extensions) and reuses FBVideoEncoder + GCDAsyncSocket as shared sources instead. - Embedding happens via a scheme build post-action (Scripts/embed-broadcast-extension.sh) because nothing built into the .xctest reaches the auto-generated Runner.app; the script also rewrites the appex bundle id to <host>.broadcast and re-signs inner-first. - Simulator/tvOS: broadcast endpoints return unsupportedOperation; the screenshot pipeline is untouched. Audio capture is a follow-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POST /mobilerun/screencapture/broadcast/start used to sit on a black screen for ~5s before the system sheet appeared: it foregrounded the runner via XCUIApplication.activate, whose self-quiescence wait can only ever time out because the waiting thread is the very main thread whose idleness is being awaited. Foreground via LSApplicationWorkspace (FBUnattachedAppLauncher) instead and poll the in-process application state, keeping the XCTest activation only as a fallback. Measured on an iPhone XS (iOS 18.7): runner foreground after 485ms, picker triggered after 531ms, broadcast connected after 8.4s total. Also skip the active-app lookup when the runner is already frontmost, re-fire the picker press every 2s until the confirmation sheet shows up (the system drops presses that arrive before the scene is fully active), and log per-stage timings for the whole dance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nd repeatedly Starting a second broadcast while one is live or still launching makes iOS kill both. Three gaps allowed exactly that: - The start dance spins the main run loop, so a second /broadcast/start request could be dispatched re-entrantly mid-dance and drive the picker again. A startInProgress gate now serializes starts; followers await the leader's outcome and return its result. - A live broadcast whose extension is momentarily disconnected (crash, TCP reconnect window) passed the isExtensionConnected idempotency check. The dance now checks UIScreen.isCaptured first and waits for the extension instead of stacking a second broadcast. - A missed XCUIElement tap on the confirmation button (sheet dismissed in between) was recorded by XCTest as a test failure, tearing down the whole WDA session. The tap now goes through WDA's own event synthesis at the button's coordinates, so a miss surfaces as a plain connect timeout. Side effect: no springboard-idle wait, which cuts the confirmation tap from ~4.8s to ~1.9s and time-to-connected from ~8.4s to ~5.5s on an iPhone XS. Verified on device: two concurrent starts produce one broadcast and two "connected" responses; a repeat start while connected returns in ~30ms. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e fps gate Each extension heartbeat now carries per-session counters (samplesIn, accepted, encoded, droppedFpsGate, droppedReplaced, droppedPool, encode submit-to-callback latency last/avg) plus derived per-second rates, the ReplayKit delivery rate and the loopback socket's outstanding bytes. They surface verbatim under 'heartbeat' in GET /mobilerun/screencapture/broadcast, so a low consumer-side fps can be attributed to delivery, a specific pipeline stage or backpressure without reproducing locally. The first measurement immediately located the loss: ReplayKit delivered 42-44 fps while only ~22 fps passed the fps gate - the gate paced by minimum gap from the last accepted frame, which beats against jittery ~30-60Hz delivery (a frame arriving 1ms early is dropped and the next accepted gap doubles). Replace it with a due-time accumulator that admits exactly one frame per interval on average regardless of arrival jitter, with a re-anchor clamp so stalls do not admit bursts. Measured on an iPhone XS: accepted rate went from a flat ~22/s to 30/s whenever delivery sustains it; encode latency ~10ms, all other drop counters zero. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three on-device crash reports share one signature: SIGSEGV (pointer authentication failure) in objc_msgSend inside -[RoutingConnection setHeadersForResponse:isError:] on an HTTPConnection thread. The vendored CocoaHTTPServer/RoutingHTTPServer never retain the server from the connection side (HTTPConfig.server and RoutingConnection's http ivar are both __unsafe_unretained), so GET /wda/shutdown - which tears the server down via stopServing while other keep-alive connections are still replying on their own GCD queues - leaves those replies dereferencing a freed RoutingHTTPServer. Continuous endpoint polling (e.g. the droidrun devicekit) makes hitting the race likely. Make HTTPConfig.server strong (connections keep the server alive until their replies finish; the server->connections->config cycle breaks when connections die) and RoutingConnection's server reference weak, and snapshot the route response's mutable headers dictionary instead of aliasing it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ReplayKit only delivers frames while the screen changes and VideoToolbox has no Android-style KEY_REPEAT_PREVIOUS_FRAME_AFTER mode, so streams stalled on static content. Each pipeline now retains its most recent (pool-owned) scaled frame and a timer on the pipeline queue re-encodes it whenever no live frame arrived within the frame interval, keeping the output cadence at the session's requested fps regardless of screen activity. In the direct-encode path the repeat copy goes through the pixel transfer session, since retaining ReplayKit's own buffers would stall its capture pool. Repeated frames of unchanged content encode to near-empty delta frames, and the constant flow also restores the periodic 2s IDR cadence for late-joining clients. The heartbeat gains a 'repeated' counter/rate. Measured on an iPhone XS with a near-static screen: accepted ~24/s + repeated ~7/s = encoded 30.7-33.3/s, encode latency ~12ms. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pt rewrites the bundle id When a device build overrides PRODUCT_BUNDLE_IDENTIFIER, the embed script rewrites the appex CFBundleIdentifier to <host id>.broadcast but re-signed with --preserve-metadata=entitlements, keeping an application-identifier minted for the pre-rewrite id. installd rejects an extension whose signed identity does not match its bundle id, so exactly the downstream-override case the rewrite exists for produced an uninstallable appex. Extract the entitlements, point application-identifier at <team>.<new id> and re-sign with them; warn that the embedded provisioning profile must cover the new id. The no-rewrite path (and simulator ad-hoc signing without an application-identifier) keeps the previous behavior. Verified with a synthetic rewrite harness (host id changed, ad-hoc identity): the re-signed appex carries the corrected application-identifier and a valid signature; a real device build (no rewrite) still produces a valid deep signature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a ReplayKit broadcast upload extension (
WebDriverAgentBroadcast.appex, embedded into the generatedWebDriverAgentRunner-Runner.app) as a high-fps frame source for/mobilerun/screencapture. The extension receives the system's screen frames as pixel buffers, runs one hardware H264/H265 encoder per capture session (VTPixelTransfer letterbox scaling, 420v end to end, bounded buffer pool) and ships the elementary stream to WDA over loopback TCP:9300. WDA's existing per-session TCP fan-out serves the frames unchanged.Device-verified on an iPhone XS (iOS 18.7): broadcast connected in ~5.5s end to end, h264/h265 streaming at a stable 30fps (held on static screens too), ~12ms encode latency, transparent screenshot fallback on broadcast loss. The previous screenshot-loop source tops out at ~10-20fps.
API
/mobilerun/screencapture/broadcast/startRPSystemBroadcastPickerView+ the system confirmation sheet via UI automation, waits for the extension to connect. Idempotent while connected; concurrent calls are serialized; body optstimeout,confirmButtonLabels(localization),restoreForegroundApp./mobilerun/screencapture/broadcast/mobilerun/screencapture/broadcast/stop/mobilerun/screencapture/startis unchanged: sessions attach to a connected broadcast automatically (switch at the first extension IDR) and report"source": "replaykit" | "screenshot". Broadcasts started from Control Center attach identically. Fps/bitrate/codec/framing remain per-session.Frame pipeline correctness (measured via the new metrics)
Robustness
broadcast/startis safe to call concurrently and repeatedly: astartInProgressgate serializes re-entrant requests (the dance spins the main run loop),UIScreen.isCaptureddetects a live-but-disconnected broadcast instead of stacking a second one (iOS kills both), and the confirmation tap uses WDA's own event synthesis — a missedXCUIElement.taprecorded an XCTest failure that tore down the whole session.fix(webserver): use-after-free of the HTTP server during teardown —HTTPConfig.server/RoutingConnection.httpwere unretained, so/wda/shutdownunder concurrent endpoint polling crashed in-flight replies on other connections (3 identical on-device crash reports: SIGSEGV insetHeadersForResponse:). Now strong/weak respectively, with the route headers snapshotted.Build & signing
Scripts/embed-broadcast-extension.sh) copies it intoRunner.app/PlugIns, rewrites itsCFBundleIdentifierto<Runner.app id>.broadcast(the host id gets.xctrunnerappended; downstream tooling may override the prefix) and re-signs inner-first. Works fornpm run bundle:ios, plainxcodebuild build-for-testing -scheme WebDriverAgentRunner, CI and Fastlane.FBVideoEncoder.m,FBBroadcastProtocol.mand vendoredGCDAsyncSocket.mas shared sources withAPPLICATION_EXTENSION_API_ONLY = YES.docs/broadcast-extension.md.unsupported operation; the screenshot pipeline is untouched.Follow-ups (out of scope)
RPSampleBufferTypeAudioApp/Micare currently ignored).VTPixelRotationSession(iOS 16+); v1 encodes native orientation and exposes it as metadata./mobilerun/a11y(apps with pathological AX trees, e.g. TikTok, can hang WDA's main thread until XCTest kills the session — pre-existing, unrelated to this branch).🤖 Generated with Claude Code