fix/preserve-stt-eos-timestamp-for-metrics by dhruvladia-sarvam · Pull Request #5755 · livekit/agents

dhruvladia-sarvam · 2026-05-18T02:07:44Z

Problem

When no external VAD is configured, some streaming STT plugins still emit speech-boundary events such as SpeechEventType.START_OF_SPEECH and SpeechEventType.END_OF_SPEECH. Those STT-provided boundaries should be used for EOU latency metrics.

Previously, AudioRecognition._on_stt_event could overwrite or fail to preserve the STT-provided speech-end timestamp. This caused EOUMetrics.transcription_delay and sometimes end_of_utterance_delay to be reported as 0.0 or based on transcript-arrival time instead of the actual end-of-speech time.

This was especially visible when:

turn_handling.turn_detection = "stt" is used with no external VAD
no external VAD is configured and STT is the only provider of speech-end timing
providers emit FINAL_TRANSCRIPT before END_OF_SPEECH

Root Cause

AudioRecognition used _last_speaking_time for multiple purposes but did not track whether that timestamp came from an actual speech-end signal.

Before this change, when FINAL_TRANSCRIPT or PREFLIGHT_TRANSCRIPT arrived and self._vad is None, the handler could fall back to:

self._last_speaking_time = time.time()

That fallback is only safe when no speech-end timestamp is available. If the STT plugin later emits END_OF_SPEECH, that event is the authoritative speech-end signal and should replace any transcript-arrival fallback.

A review also identified an important ordering issue: many STT providers emit FINAL_TRANSCRIPT before END_OF_SPEECH. In that ordering, the old fallback could set _last_speaking_time to transcript-arrival time first, then the later END_OF_SPEECH timestamp could be ignored, leaving metrics based on the wrong timestamp.

Fix

Add an explicit _stt_end_of_speech_received flag and use it to preserve STT-provided EOS timing.

The updated behavior is:

When no external VAD is configured and STT emits START_OF_SPEECH, clear _stt_end_of_speech_received.
When no external VAD is configured and STT emits END_OF_SPEECH:
- mark _stt_end_of_speech_received = True
- unconditionally set _last_speaking_time = time.time(), because END_OF_SPEECH is the authoritative speech-end timestamp
- if a final transcript already scheduled EOU in VAD-base turn detection, reschedule EOU using the corrected speech-end timestamp
When FINAL_TRANSCRIPT or PREFLIGHT_TRANSCRIPT arrives:
- only fall back to transcript-arrival time if no VAD/STT speech-end timestamp is available
- otherwise preserve the STT EOS timestamp

Behavior Matrix

External VAD	STT emits EOS	Event ordering	Before	After
Yes	any	any	VAD timestamp used	Same
No	Yes	`END_OF_SPEECH` before `FINAL_TRANSCRIPT`	Transcript handler could overwrite EOS timing in some modes	Preserves STT EOS timestamp
No	Yes	`FINAL_TRANSCRIPT` before `END_OF_SPEECH`	Locks in transcript-arrival time	Replaces fallback with authoritative STT EOS timestamp and reschedules EOU
No	No	final transcript only	Falls back to transcript-arrival time	Same fallback behavior
No	Yes	`turn_detection="stt"`	Metrics could be `0.0`	Metrics use STT EOS timing
No	Yes	model/omitted/manual turn detection	Metrics could be `0.0` or fallback-based	Metrics preserve STT EOS timing without changing turn-commit authority

Tests Added

Added focused tests in tests/test_speech_start_time_persistence.py:

test_stt_eos_timestamp_is_preserved_for_final_transcript
- verifies EOS-before-final preserves STT EOS timestamp
test_stt_eos_replaces_fallback_final_transcript_time
- verifies final-before-EOS replaces transcript-arrival fallback with authoritative EOS timestamp and reschedules EOU
test_final_transcript_falls_back_without_stt_eos
- verifies providers without EOS still safely fall back to transcript-arrival time
test_external_vad_timestamp_is_not_overwritten
- verifies external VAD behavior is unchanged

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Co-authored-by: Cursor <cursoragent@cursor.com>

initial

77c064e

devin-ai-integration Bot reviewed May 18, 2026

View reviewed changes

fix(voice): preserve STT EOS timing for metrics

d57b872

Co-authored-by: Cursor <cursoragent@cursor.com>

This comment was marked as resolved.

Sign in to view

fix(voice): handle final transcript before STT EOS

fd0b3ad

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/preserve-stt-eos-timestamp-for-metrics#5755

fix/preserve-stt-eos-timestamp-for-metrics#5755
dhruvladia-sarvam wants to merge 3 commits into
livekit:mainfrom
dhruvladia-sarvam:fix/preserve-stt-eos-timestamp-for-metrics

dhruvladia-sarvam commented May 18, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhruvladia-sarvam commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Behavior Matrix

Tests Added

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhruvladia-sarvam commented May 18, 2026 •

edited

Loading