Skip to content

fix/preserve-stt-eos-timestamp-for-metrics#5755

Open
dhruvladia-sarvam wants to merge 3 commits into
livekit:mainfrom
dhruvladia-sarvam:fix/preserve-stt-eos-timestamp-for-metrics
Open

fix/preserve-stt-eos-timestamp-for-metrics#5755
dhruvladia-sarvam wants to merge 3 commits into
livekit:mainfrom
dhruvladia-sarvam:fix/preserve-stt-eos-timestamp-for-metrics

Conversation

@dhruvladia-sarvam
Copy link
Copy Markdown
Contributor

@dhruvladia-sarvam dhruvladia-sarvam commented May 18, 2026

Problem

When no external VAD is configured, some streaming STT plugins still emit speech-boundary events such as SpeechEventType.START_OF_SPEECH and SpeechEventType.END_OF_SPEECH. Those STT-provided boundaries should be used for EOU latency metrics.

Previously, AudioRecognition._on_stt_event could overwrite or fail to preserve the STT-provided speech-end timestamp. This caused EOUMetrics.transcription_delay and sometimes end_of_utterance_delay to be reported as 0.0 or based on transcript-arrival time instead of the actual end-of-speech time.

This was especially visible when:

  • turn_handling.turn_detection = "stt" is used with no external VAD
  • no external VAD is configured and STT is the only provider of speech-end timing
  • providers emit FINAL_TRANSCRIPT before END_OF_SPEECH

Root Cause

AudioRecognition used _last_speaking_time for multiple purposes but did not track whether that timestamp came from an actual speech-end signal.

Before this change, when FINAL_TRANSCRIPT or PREFLIGHT_TRANSCRIPT arrived and self._vad is None, the handler could fall back to:

self._last_speaking_time = time.time()

That fallback is only safe when no speech-end timestamp is available. If the STT plugin later emits END_OF_SPEECH, that event is the authoritative speech-end signal and should replace any transcript-arrival fallback.

A review also identified an important ordering issue: many STT providers emit FINAL_TRANSCRIPT before END_OF_SPEECH. In that ordering, the old fallback could set _last_speaking_time to transcript-arrival time first, then the later END_OF_SPEECH timestamp could be ignored, leaving metrics based on the wrong timestamp.

Fix

Add an explicit _stt_end_of_speech_received flag and use it to preserve STT-provided EOS timing.

The updated behavior is:

  • When no external VAD is configured and STT emits START_OF_SPEECH, clear _stt_end_of_speech_received.
  • When no external VAD is configured and STT emits END_OF_SPEECH:
    • mark _stt_end_of_speech_received = True
    • unconditionally set _last_speaking_time = time.time(), because END_OF_SPEECH is the authoritative speech-end timestamp
    • if a final transcript already scheduled EOU in VAD-base turn detection, reschedule EOU using the corrected speech-end timestamp
  • When FINAL_TRANSCRIPT or PREFLIGHT_TRANSCRIPT arrives:
    • only fall back to transcript-arrival time if no VAD/STT speech-end timestamp is available
    • otherwise preserve the STT EOS timestamp

Behavior Matrix

External VAD STT emits EOS Event ordering Before After
Yes any any VAD timestamp used Same
No Yes END_OF_SPEECH before FINAL_TRANSCRIPT Transcript handler could overwrite EOS timing in some modes Preserves STT EOS timestamp
No Yes FINAL_TRANSCRIPT before END_OF_SPEECH Locks in transcript-arrival time Replaces fallback with authoritative STT EOS timestamp and reschedules EOU
No No final transcript only Falls back to transcript-arrival time Same fallback behavior
No Yes turn_detection="stt" Metrics could be 0.0 Metrics use STT EOS timing
No Yes model/omitted/manual turn detection Metrics could be 0.0 or fallback-based Metrics preserve STT EOS timing without changing turn-commit authority

Tests Added

Added focused tests in tests/test_speech_start_time_persistence.py:

  • test_stt_eos_timestamp_is_preserved_for_final_transcript
    • verifies EOS-before-final preserves STT EOS timestamp
  • test_stt_eos_replaces_fallback_final_transcript_time
    • verifies final-before-EOS replaces transcript-arrival fallback with authoritative EOS timestamp and reschedules EOU
  • test_final_transcript_falls_back_without_stt_eos
    • verifies providers without EOS still safely fall back to transcript-arrival time
  • test_external_vad_timestamp_is_not_overwritten
    • verifies external VAD behavior is unchanged

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Co-authored-by: Cursor <cursoragent@cursor.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant