Skip to content

Handle network / connectivity errors around gh calls with clearer messaging #32

@oscarvalenzuelab

Description

@oscarvalenzuelab

Handle network / connectivity errors around gh calls with clearer messaging

Summary

When the machine is offline, gh api user --jq .login exits non-zero and the poller logs:

Failed to get GitHub username: Command '['/opt/homebrew/bin/gh', 'api', 'user', '--jq', '.login']' returned non-zero exit status 1.

This is misleading — nothing is wrong with the user's GitHub auth, the network is just down. The message should distinguish between "you're offline", "gh is not authenticated", and "the API returned an error", so operators don't waste time debugging the wrong thing.

Same category of issue likely exists anywhere else dev-sync shells out to gh / git / claude during a poll cycle — under flaky connectivity the poller spams the log with opaque subprocess failures.

Evidence

  • src/dev_sync/cli.py:745-758 — the originating call site. Catches CalledProcessError and prints the raw exception string.
  • src/dev_sync/core/github.py and src/dev_sync/pipelines/secops.py also invoke gh; worth auditing for the same pattern.

Motivation

  • An offline laptop should degrade gracefully, not emit scary-looking errors that imply broken auth.
  • Enterprise operators running dev-sync as a launchd / systemd service (see docs/operations.md) will grep these logs when something's wrong — false leads waste time.
  • This pairs with Add structured logging to core modules #27 (add structured logging): the correct behavior is an INFO/WARNING on transient network issues and an ERROR only on persistent failures.

Acceptance criteria

  • Introduce a small helper (e.g. core/network.py or extend core/github.py) that classifies gh / git subprocess failures into:
    • NetworkUnavailable — no route to host, DNS failure, TLS handshake failure
    • GitHubAuthErrorgh exits with an auth-related message
    • RateLimitedgh returns 429 / "rate limit exceeded"
    • GitHubAPIError — any other non-zero exit from gh api
  • Parse stderr from gh (not just the exit code) to drive classification. gh writes clear messages like error connecting to api.github.com on network failures.
  • Call sites in src/dev_sync/cli.py:757, src/dev_sync/core/github.py, and src/dev_sync/pipelines/secops.py use this classifier and emit messages like:
    • "Network unavailable — skipping this poll cycle, will retry in <interval>s" (not an error)
    • "GitHub auth failed — run gh auth login and restart the poller" (clear action)
  • Poller does not exit on NetworkUnavailable; it logs at WARNING and continues to the next tick. Today it calls raise typer.Exit(1) (line 758, 762), which on a daemon is particularly bad.
  • Unit tests cover each classification branch, mocking subprocess.run with representative stderr strings.

Out of scope

  • Retry / backoff logic (can be a follow-up once error classification exists).
  • Offline-mode queuing of issues to process when the network returns.
  • Changes to claude subprocess error handling (different error domain).

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions