fix(bot): self-exit on sustained getUpdates network outage by Time4Mind · Pull Request #135 · Time4Mind/ccbot

Time4Mind · 2026-06-17T05:16:08Z

Problem

The bot's _error_handler exits the process on a sustained Conflict storm (a second poller owns the token — unrecoverable by retry) so the supervisor converges on one clean instance. But the network-error branch had no such escape hatch: on NetworkError/TimedOut it logged a single debounced one-liner and returned, trusting PTB's polling to self-recover.

In production that trust broke. A night-time proxy/VPN outage left the long-poll getUpdates wedged on a half-open socket. The bot stayed running but deaf — it never recovered even after the upstream came back, and sat silent for ~6h until a manual restart. The container's restart: unless-stopped couldn't help because the process never exited.

Fix

Add the symmetric self-heal for sustained network outages: if NetworkError/TimedOut persist contiguously longer than NETWORK_MAX_SECONDS (180s), log CRITICAL and exit non-zero via the existing _terminate_for_sustained_conflict() seam. The supervisor then respawns a clean instance that opens a fresh getUpdates connection.

Contiguity is judged by NETWORK_GAP_SECONDS (45s): a quiet gap longer than that proves a poll succeeded in between (recovery), so the outage clock resets. This means sporadic blips on a healthy idle bot never accumulate toward the threshold — only a genuinely stuck poll trips it.

Composes with both supervisors:

Docker (restart: unless-stopped) → respawns the container.
ccbot-supervisor.sh → os._exit(1) is EXIT_CRASH, and its wait-for-net loop gates the restart, so a still-down network can't cause a restart storm.

Tests

3 new cases in test_conflict_exit.py:

a single network blip is tolerated (no exit),
a contiguous outage past the budget exits exactly once,
a quiet gap resets the outage clock so spread-out blips never trip it.

Full suite green (732 passed locally + in an ephemeral docker build of this branch). app.py untouched since #111, so this applies cleanly on current main.

🤖 Generated with Claude Code

A sustained network outage (long-poll getUpdates failing continuously) is the silent twin of the Conflict storm: the bot stays alive-but-deaf and may NOT recover even after the upstream returns (a wedged half-open poll socket). Observed in production — a night-time proxy/VPN outage left the bot running-but-deaf for ~6h, since the network-error branch only logged a debounced one-liner and never exited. Mirror the existing sustained-Conflict exit: once NetworkError/TimedOut persist CONTIGUOUSLY longer than NETWORK_MAX_SECONDS (180s), log CRITICAL and exit non-zero so the supervisor (Docker `restart: unless-stopped` / ccbot-supervisor.sh's wait-for-net loop) respawns a clean instance with a fresh getUpdates connection. Contiguity is judged by NETWORK_GAP_SECONDS (45s): a longer quiet gap proves a poll recovered, so the outage clock resets — sporadic blips on a healthy idle bot never accumulate toward the threshold. Reuses the _terminate_for_sustained_conflict() seam (stop_running + os._exit). Covered by 3 new tests in test_conflict_exit.py (single blip tolerated, sustained outage exits, gap resets the clock). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Time4Mind force-pushed the fix/network-self-exit branch from 422483a to a94cb3a Compare June 17, 2026 05:19

Time4Mind merged commit 57f92c0 into main Jun 17, 2026
4 checks passed

Time4Mind deleted the fix/network-self-exit branch June 17, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bot): self-exit on sustained getUpdates network outage#135

fix(bot): self-exit on sustained getUpdates network outage#135
Time4Mind merged 1 commit into
mainfrom
fix/network-self-exit

Time4Mind commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Time4Mind commented Jun 17, 2026

Problem

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant