Skip to content

tun/mac: make server bypass route idempotent to avoid orphaned route#436

Open
luancamara wants to merge 1 commit into
OpenVPN:masterfrom
luancamara:fix/mac-bypass-route-idempotent
Open

tun/mac: make server bypass route idempotent to avoid orphaned route#436
luancamara wants to merge 1 commit into
OpenVPN:masterfrom
luancamara:fix/mac-bypass-route-idempotent

Conversation

@luancamara

Copy link
Copy Markdown

tun/mac: make server bypass route idempotent to avoid orphaned route

Summary

On macOS, the server bypass host route (the remote /32 route via the physical
default gateway that is installed when redirect-gateway is active) can be left
orphaned in the routing table after an unclean teardown. On the next
connection this orphan makes every datagram to the server fail with
EADDRNOTAVAIL, so the client can only time out — even though the server and all
other clients are fine.

Observed in OpenVPN Connect 3.8.2 (macOS), whose core emits the exact log
signature below; reproduced against OpenVPN/openvpn3 master.

Symptom

Connecting to [vpn.example]:1197 (X.X.X.X) via UDPv4
UDP send exception: send: Can't assign requested address
UDP send exception: send: Can't assign requested address
... (repeats every second) ...
Server poll timeout, trying next remote entry...
EVENT: RECONNECTING

While disconnected, a stale host route to the server remains in the table:

$ netstat -rn -f inet | grep X.X.X.X
X.X.X.X/32    192.168.1.1    UGSc    en0      # left over, app is DISCONNECTED

Root cause

The bypass route is installed in openvpn/tun/mac/client/tunsetup.hpp with a plain
/sbin/route add. The create/destroy actions are built as a pair (shared mark),
and teardown runs the destroy (delete) action.

When a previous session ends uncleanly (laptop sleep, Wi‑Fi/network change, or a
crash), the teardown ActionList never runs, so the bypass route survives. On the
next connect:

  1. Command::execute() (openvpn/common/process.hpp) sees route add print
    File exists and throws "Route already exists, new route will be ignored".
  2. ActionList::execute() (openvpn/common/action.hpp) catches it and records the
    route's mark in failed_actions.
  3. tunsetup.hpp then calls remove_cmds->remove_marked(failed_actions, ...)
    "since we should not undo failed actions, remove them" — which drops the
    paired delete from the teardown list.

Net effect: the route is never cleaned up. Worse, the surviving entry still points at
the previous network's gateway, so once that gateway is no longer on-link, source
address selection for the server fails and send() returns EADDRNOTAVAIL
permanently. The client loops RECONNECTING/RESOLVE until timeout.

This is local, per-machine state: other users (and the server) are unaffected, which
makes it easy to misdiagnose as a server/network problem.

Fix

Make the bypass-route installation idempotent: issue a best-effort
route delete immediately before the route add.

  • If no stale route exists, the delete is a harmless no-op (not in table,
    non-fatal — Command::execute() only throws on File exists).
  • If a stale/orphaned route exists, it is removed first, so the add always
    installs a fresh route bound to the current gateway and the paired delete
    stays in the teardown list.

This mirrors the NLM_F_REPLACE semantics the Linux sitnl backend already uses
when it hits EEXIST (openvpn/tun/linux/client/sitnl.hpp), bringing macOS in line.

Action::Ptr c, d;
add_del_route(pull.remote_address.address, 32, gw4.gateway_addr_str(), gw4.iface(), 0, c, d);
create.add(d); // pre-delete any stale/orphaned copy, then add
create.add(c);
destroy.add(d);

The same Action::Ptr d (a stateless, ref-counted Command) is reused for the
pre-delete and the teardown delete; running it at setup and teardown is safe. If
maintainers prefer a distinct object, the pre-delete can be built as a copy instead —
happy to adjust.

Scope / alternatives

This patch is intentionally scoped to the server bypass route, which is the entry that
strands a reachable physical gateway and triggers the failure. The same idempotent
pattern could be applied to the shared add_del_route(..., ActionList&, ActionList&)
choke point so all macOS routes self-heal on EEXIST; that is a larger behavioral
change (and adds a benign route delete per route at connect), so I left it out of
this minimal fix. I can extend it if preferred.

Testing

  • Manual repro: connect, force an unclean teardown (sleep / kill the helper) to leave
    the orphan, switch networks, reconnect → without the patch the client loops on
    EADDRNOTAVAIL; with the patch the stale route is replaced and the connection comes
    up normally.
  • A clean connect/disconnect cycle is unchanged apart from one extra best-effort
    route delete (logged as not in table) before the bypass route add.

Based on OpenVPN/openvpn3 master @ 2c889af.

On macOS the server bypass host route (remote /32 via the physical default
gateway, installed when redirect-gateway is active) is added with a plain
"route add". If a previous session terminated uncleanly (sleep, network
change or process crash) the teardown ActionList never runs and a stale copy
of this route survives in the routing table.

On the next connection the "route add" fails because the route already
exists. Command::execute() detects "File exists" and throws; ActionList::
execute() records the route's mark as failed and remove_marked() then drops
the *paired delete* from the teardown list (tunsetup.hpp: "since we should
not undo failed actions, remove them"). The route is therefore never cleaned
up, and because the stale entry still points at the previous network's
gateway, once that gateway is no longer on-link every datagram to the server
fails with EADDRNOTAVAIL:

    UDP send exception: send: Can't assign requested address

The client then loops RECONNECTING/RESOLVE until it times out, while the
server and all other users are unaffected.

Make the bypass route installation idempotent by issuing a best-effort
"route delete" immediately before the "route add". Deleting a non-existent
route is harmless ("not in table", non-fatal), while a stale orphan is
removed so the add always installs a fresh route bound to the current
gateway and stays managed for teardown. This mirrors the REPLACE semantics
already used by the Linux sitnl backend on EEXIST.

Signed-off-by: Luan Camara <luancamara@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant