Skip to content

fix(rate_limiter): re-check the backoff deadline after sleeping (close TOCTOU)#97

Open
gadievron wants to merge 1 commit into
masterfrom
fix/rate-limiter-re-check-the-backoff-deadline-after
Open

fix(rate_limiter): re-check the backoff deadline after sleeping (close TOCTOU)#97
gadievron wants to merge 1 commit into
masterfrom
fix/rate-limiter-re-check-the-backoff-deadline-after

Conversation

@gadievron
Copy link
Copy Markdown
Collaborator

wait_if_needed() read _backoff_until under the lock, released it, then slept for the entry-time
duration. If another worker extended _backoff_until (a fresh 429 via report_rate_limit) during that
sleep, this worker woke before the new deadline and issued a request into the still-active backoff
window -- re-triggering the 429 storm (thundering herd).

Wrap the read+sleep in a loop that re-checks _backoff_until after each sleep and keeps waiting until
the deadline has actually passed. Preserves the existing contract: returns 0.0 immediately when not
in backoff, and increments _total_waits once per waiting call. The loop waits as long as the backoff
is genuinely active (a persistently-overloaded API) -- the intended coordinated-backoff behavior, and
strictly safer than the old early-wake.

Tests: tests/test_rate_limiter_deadline_recheck.py (2 cases: a deadline extended mid-sleep is honored
via re-check; not-in-backoff returns 0.0 without sleeping). RED 1 failed -> GREEN 2 passed; full suite
178 passed / 63 skipped (py3.11).

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

…e TOCTOU)

wait_if_needed() read _backoff_until under the lock, released it, then slept for the entry-time
duration. If another worker extended _backoff_until (a fresh 429 via report_rate_limit) during that
sleep, this worker woke before the new deadline and issued a request into the still-active backoff
window -- re-triggering the 429 storm (thundering herd).

Wrap the read+sleep in a loop that re-checks _backoff_until after each sleep and keeps waiting until
the deadline has actually passed. Preserves the existing contract: returns 0.0 immediately when not
in backoff, and increments _total_waits once per waiting call. The loop waits as long as the backoff
is genuinely active (a persistently-overloaded API) -- the intended coordinated-backoff behavior, and
strictly safer than the old early-wake.

Tests: tests/test_rate_limiter_deadline_recheck.py (2 cases: a deadline extended mid-sleep is honored
via re-check; not-in-backoff returns 0.0 without sleeping). RED 1 failed -> GREEN 2 passed; full suite
178 passed / 63 skipped (py3.11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant