Skip to content

feat(screencapture): ReplayKit broadcast extension as high-fps frame source#3

Merged
Timo972 merged 11 commits into
masterfrom
timo/wda-replaykit-broadcast
Jun 11, 2026
Merged

feat(screencapture): ReplayKit broadcast extension as high-fps frame source#3
Timo972 merged 11 commits into
masterfrom
timo/wda-replaykit-broadcast

Conversation

@Timo972

@Timo972 Timo972 commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Adds a ReplayKit broadcast upload extension (WebDriverAgentBroadcast.appex, embedded into the generated WebDriverAgentRunner-Runner.app) as a high-fps frame source for /mobilerun/screencapture. The extension receives the system's screen frames as pixel buffers, runs one hardware H264/H265 encoder per capture session (VTPixelTransfer letterbox scaling, 420v end to end, bounded buffer pool) and ships the elementary stream to WDA over loopback TCP :9300. WDA's existing per-session TCP fan-out serves the frames unchanged.

Device-verified on an iPhone XS (iOS 18.7): broadcast connected in ~5.5s end to end, h264/h265 streaming at a stable 30fps (held on static screens too), ~12ms encode latency, transparent screenshot fallback on broadcast loss. The previous screenshot-loop source tops out at ~10-20fps.

API

Endpoint Method Description
/mobilerun/screencapture/broadcast/start POST Starts a system broadcast: foregrounds the runner (LaunchServices), drives RPSystemBroadcastPickerView + the system confirmation sheet via UI automation, waits for the extension to connect. Idempotent while connected; concurrent calls are serialized; body opts timeout, confirmButtonLabels (localization), restoreForegroundApp.
/mobilerun/screencapture/broadcast GET State, extension HELLO info, and the live heartbeat with per-session pipeline metrics: ReplayKit delivery rate, fps-gate accepts/drops, latch replacements, pool drops, repeated frames, encode latency last/avg, socket backpressure.
/mobilerun/screencapture/broadcast/stop POST Graceful finish; sessions revert to the screenshot source with a forced keyframe (no client reconnect).

/mobilerun/screencapture/start is unchanged: sessions attach to a connected broadcast automatically (switch at the first extension IDR) and report "source": "replaykit" | "screenshot". Broadcasts started from Control Center attach identically. Fps/bitrate/codec/framing remain per-session.

Frame pipeline correctness (measured via the new metrics)

  • fps gate: due-time accumulator instead of gap-based pacing — gap pacing beats against jittery delivery and shed half the frames (~22fps out of 42fps in); the accumulator reaches the configured fps whenever delivery sustains it.
  • Frame repeater: ReplayKit sends nothing on static screens and VideoToolbox has no Android-style repeat-frame mode, so each pipeline re-encodes its last frame to fill delivery gaps — output cadence holds the requested fps regardless of content (repeats of unchanged content encode to near-empty delta frames, and the constant flow keeps the periodic 2s IDR cadence for late joiners).
  • Newest-pending-sample latch instead of hard drops when the scaler is busy; direct-encode fast path when source dims match the session.

Robustness

  • broadcast/start is safe to call concurrently and repeatedly: a startInProgress gate serializes re-entrant requests (the dance spins the main run loop), UIScreen.isCaptured detects a live-but-disconnected broadcast instead of stacking a second one (iOS kills both), and the confirmation tap uses WDA's own event synthesis — a missed XCUIElement.tap recorded an XCTest failure that tore down the whole session.
  • fix(webserver): use-after-free of the HTTP server during teardownHTTPConfig.server / RoutingConnection.http were unretained, so /wda/shutdown under concurrent endpoint polling crashed in-flight replies on other connections (3 identical on-device crash reports: SIGSEGV in setHeadersForResponse:). Now strong/weak respectively, with the route headers snapshotted.
  • Broadcast loss (pill, lock, extension crash) is detected via TCP disconnect or 6s heartbeat staleness; live sessions fall back to the screenshot source with a forced keyframe.

Build & signing

  • The appex is a target dependency of WebDriverAgentRunner; a scheme build post-action (Scripts/embed-broadcast-extension.sh) copies it into Runner.app/PlugIns, rewrites its CFBundleIdentifier to <Runner.app id>.broadcast (the host id gets .xctrunner appended; downstream tooling may override the prefix) and re-signs inner-first. Works for npm run bundle:ios, plain xcodebuild build-for-testing -scheme WebDriverAgentRunner, CI and Fastlane.
  • The appex cannot link WebDriverAgentLib.framework (XCTest private API is forbidden in extensions); it compiles FBVideoEncoder.m, FBBroadcastProtocol.m and vendored GCDAsyncSocket.m as shared sources with APPLICATION_EXTENSION_API_ONLY = YES.
  • No app groups / entitlements — IPC is loopback TCP only. Re-signing guidance for device farms: docs/broadcast-extension.md.
  • Simulator/tvOS: broadcast endpoints return unsupported operation; the screenshot pipeline is untouched.

Follow-ups (out of scope)

  • Opus audio capture (RPSampleBufferTypeAudioApp/Mic are currently ignored).
  • Optional upright rotation via VTPixelRotationSession (iOS 16+); v1 encodes native orientation and exposes it as metadata.
  • Bounded a11y snapshots for /mobilerun/a11y (apps with pathological AX trees, e.g. TikTok, can hang WDA's main thread until XCTest kills the session — pre-existing, unrelated to this branch).

🤖 Generated with Claude Code

Timo972 and others added 11 commits June 10, 2026 19:42
…source

Add a broadcast upload extension (WebDriverAgentBroadcast.appex, embedded
into the generated Runner.app) that receives the system's screen frames
via ReplayKit, encodes them per capture session (one hardware H264/H265
encoder each, VTPixelTransfer letterbox scaling, 420v end to end) and
ships the elementary stream to WDA over loopback TCP :9300. Capture
sessions switch to the broadcast source at the extension's first IDR and
revert to the XCTest screenshot loop (forced keyframe, no client
reconnect) when the broadcast stops, raising achievable stream rates
from ~10-20fps to 30-60fps on real devices.

- New endpoints: POST /mobilerun/screencapture/broadcast/start (drives
  RPSystemBroadcastPickerView + system sheet via UI automation),
  GET .../broadcast (status), POST .../broadcast/stop. Plain
  /screencapture/start is unchanged; sessions attach automatically and
  report their active source as "replaykit" or "screenshot".
- Shared wire protocol (FBBroadcastProtocol) compiled into both the lib
  and the appex; the appex cannot link WebDriverAgentLib (XCTest private
  API is forbidden in extensions) and reuses FBVideoEncoder +
  GCDAsyncSocket as shared sources instead.
- Embedding happens via a scheme build post-action
  (Scripts/embed-broadcast-extension.sh) because nothing built into the
  .xctest reaches the auto-generated Runner.app; the script also rewrites
  the appex bundle id to <host>.broadcast and re-signs inner-first.
- Simulator/tvOS: broadcast endpoints return unsupportedOperation; the
  screenshot pipeline is untouched. Audio capture is a follow-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POST /mobilerun/screencapture/broadcast/start used to sit on a black
screen for ~5s before the system sheet appeared: it foregrounded the
runner via XCUIApplication.activate, whose self-quiescence wait can only
ever time out because the waiting thread is the very main thread whose
idleness is being awaited. Foreground via LSApplicationWorkspace
(FBUnattachedAppLauncher) instead and poll the in-process application
state, keeping the XCTest activation only as a fallback. Measured on an
iPhone XS (iOS 18.7): runner foreground after 485ms, picker triggered
after 531ms, broadcast connected after 8.4s total.

Also skip the active-app lookup when the runner is already frontmost,
re-fire the picker press every 2s until the confirmation sheet shows up
(the system drops presses that arrive before the scene is fully active),
and log per-stage timings for the whole dance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nd repeatedly

Starting a second broadcast while one is live or still launching makes
iOS kill both. Three gaps allowed exactly that:

- The start dance spins the main run loop, so a second /broadcast/start
  request could be dispatched re-entrantly mid-dance and drive the
  picker again. A startInProgress gate now serializes starts; followers
  await the leader's outcome and return its result.
- A live broadcast whose extension is momentarily disconnected (crash,
  TCP reconnect window) passed the isExtensionConnected idempotency
  check. The dance now checks UIScreen.isCaptured first and waits for
  the extension instead of stacking a second broadcast.
- A missed XCUIElement tap on the confirmation button (sheet dismissed
  in between) was recorded by XCTest as a test failure, tearing down the
  whole WDA session. The tap now goes through WDA's own event synthesis
  at the button's coordinates, so a miss surfaces as a plain connect
  timeout. Side effect: no springboard-idle wait, which cuts the
  confirmation tap from ~4.8s to ~1.9s and time-to-connected from ~8.4s
  to ~5.5s on an iPhone XS.

Verified on device: two concurrent starts produce one broadcast and two
"connected" responses; a repeat start while connected returns in ~30ms.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e fps gate

Each extension heartbeat now carries per-session counters (samplesIn,
accepted, encoded, droppedFpsGate, droppedReplaced, droppedPool, encode
submit-to-callback latency last/avg) plus derived per-second rates, the
ReplayKit delivery rate and the loopback socket's outstanding bytes.
They surface verbatim under 'heartbeat' in GET
/mobilerun/screencapture/broadcast, so a low consumer-side fps can be
attributed to delivery, a specific pipeline stage or backpressure
without reproducing locally.

The first measurement immediately located the loss: ReplayKit delivered
42-44 fps while only ~22 fps passed the fps gate - the gate paced by
minimum gap from the last accepted frame, which beats against jittery
~30-60Hz delivery (a frame arriving 1ms early is dropped and the next
accepted gap doubles). Replace it with a due-time accumulator that
admits exactly one frame per interval on average regardless of arrival
jitter, with a re-anchor clamp so stalls do not admit bursts. Measured
on an iPhone XS: accepted rate went from a flat ~22/s to 30/s whenever
delivery sustains it; encode latency ~10ms, all other drop counters
zero.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three on-device crash reports share one signature: SIGSEGV (pointer
authentication failure) in objc_msgSend inside -[RoutingConnection
setHeadersForResponse:isError:] on an HTTPConnection thread. The
vendored CocoaHTTPServer/RoutingHTTPServer never retain the server from
the connection side (HTTPConfig.server and RoutingConnection's http
ivar are both __unsafe_unretained), so GET /wda/shutdown - which tears
the server down via stopServing while other keep-alive connections are
still replying on their own GCD queues - leaves those replies
dereferencing a freed RoutingHTTPServer. Continuous endpoint polling
(e.g. the droidrun devicekit) makes hitting the race likely.

Make HTTPConfig.server strong (connections keep the server alive until
their replies finish; the server->connections->config cycle breaks when
connections die) and RoutingConnection's server reference weak, and
snapshot the route response's mutable headers dictionary instead of
aliasing it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ReplayKit only delivers frames while the screen changes and VideoToolbox
has no Android-style KEY_REPEAT_PREVIOUS_FRAME_AFTER mode, so streams
stalled on static content. Each pipeline now retains its most recent
(pool-owned) scaled frame and a timer on the pipeline queue re-encodes
it whenever no live frame arrived within the frame interval, keeping
the output cadence at the session's requested fps regardless of screen
activity. In the direct-encode path the repeat copy goes through the
pixel transfer session, since retaining ReplayKit's own buffers would
stall its capture pool.

Repeated frames of unchanged content encode to near-empty delta frames,
and the constant flow also restores the periodic 2s IDR cadence for
late-joining clients. The heartbeat gains a 'repeated' counter/rate.
Measured on an iPhone XS with a near-static screen: accepted ~24/s +
repeated ~7/s = encoded 30.7-33.3/s, encode latency ~12ms.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pt rewrites the bundle id

When a device build overrides PRODUCT_BUNDLE_IDENTIFIER, the embed
script rewrites the appex CFBundleIdentifier to <host id>.broadcast but
re-signed with --preserve-metadata=entitlements, keeping an
application-identifier minted for the pre-rewrite id. installd rejects
an extension whose signed identity does not match its bundle id, so
exactly the downstream-override case the rewrite exists for produced an
uninstallable appex. Extract the entitlements, point
application-identifier at <team>.<new id> and re-sign with them; warn
that the embedded provisioning profile must cover the new id. The
no-rewrite path (and simulator ad-hoc signing without an
application-identifier) keeps the previous behavior.

Verified with a synthetic rewrite harness (host id changed, ad-hoc
identity): the re-signed appex carries the corrected
application-identifier and a valid signature; a real device build
(no rewrite) still produces a valid deep signature.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Timo972 Timo972 merged commit d0878b5 into master Jun 11, 2026
@Timo972 Timo972 deleted the timo/wda-replaykit-broadcast branch June 11, 2026 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant