This repository was archived by the owner on Jul 13, 2025. It is now read-only.
Fork Sync: Update from parent repository#36
Open
github-actions[bot] wants to merge 1599 commits into
Open
Conversation
Consolidate go.mod.sri and go.toolchain.rev.sri into a single flakehashes.json file at the repo root, owned by a new Go program at tool/updateflakes. The JSON is consumed by flake.nix via builtins.fromJSON and by any future Go code via the FlakeHashes struct that defines its schema. Each block records its input fingerprint alongside the SRI it produced: the goModSum (a sha256 over go.mod and go.sum) for the vendor block, and the literal rev string from go.toolchain.rev for the toolchain block. updateflakes regenerates a block only when its recorded fingerprint disagrees with the current input. Doing the gating by content rather than file mtimes avoids the usual mtime hazards across git checkouts, clones, and merges. It also means re-runs with no input changes are essentially free, and a re-run that touches only one input pays only for that one block. The two blocks have no shared state -- vendor invokes go mod vendor into one tempdir, toolchain fetches and extracts a tarball into another -- so they run concurrently via errgroup. Cold time is bounded by the slower of the two rather than their sum. Also takes the opportunity to fold the toolchain fetch into a single curl|tar pipeline (no intermediate .tar.gz on disk). Split cmd/nardump into a thin package main and a new package nardump library at cmd/nardump/nardump that holds the NAR encoder and SRI helper. tool/updateflakes imports the library directly rather than building and exec'ing the nardump binary at runtime. The library uses fs.ReadLink (Go 1.25+) instead of os.Readlink, so it no longer requires the caller to chdir into the FS root for symlink targets to resolve. WriteNAR now wraps its writer in a bufio.Writer internally (unless the caller already passed one) and flushes on return, so callers don't pay for tiny writes against slow underlying writers. The cache-busting line in flake.nix and shell.nix is known to live at end of file, so updateCacheBust walks the lines in reverse. make tidy timings on this machine, before: ~14s every run. After: warm (no input changes): 0.05s vendor block stale only: 1.4s toolchain block stale only: 5.0s cold (no flakehashes.json): 5.0s Updates #6845 Change-Id: I0340608798f1614abf147a491bf7c68a198a0db4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…peerAPI
The Online bit in PeerStatus comes from control's last-known state and
can lag reality, so gating "tailscale file cp" on it is both unreliable
and pushes correctness onto the server. Just try the push directly.
In runCp, when the target's PeerStatus says it's offline, no longer
fail upfront; getTargetStableID returns the StableID anyway. Replace
the static "is offline" warning with a 3-second timer armed for the
first file: if the timer fires before peerAPI bytes have flowed, we
print a warning to stderr. The wording depends on whether control
reported the peer offline ("is reportedly offline; trying anyway") or
online ("is not replying; trying anyway"). The warning is printed with
a leading vt100 clear-line and a trailing newline so it doesn't get
painted over by the progress redraw and so the next progress redraw
lands on a fresh line below it.
Both the timer disarm and the progress display now read from
tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of
the local-body counter. That's the difference between bytes-acked-by-
local-tailscaled (what countingReader.n was measuring; useless for
detecting an unreachable peer because for small files net/http buffers
the entire body into the unix-socket conn before the peerAPI dial has
even started) and bytes-pulled-toward-peerAPI (what tailscaled is
actually doing, reflected in OutgoingFile.Sent). The previous code
reported 100% within milliseconds for a 3 KiB file even when the peer
was unreachable.
Add --update-interval (default 250ms) to control the progress repaint
cadence; zero or negative disables the progress display entirely. The
printer now also stops repainting once it observes Sent at full size
with a near-zero rate for >2s, so a stuck transfer doesn't keep
clobbering whatever the rest of runCp is trying to print.
Updates #18740
Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If both ExitNode and AdvertiseRoutes flags are empty, then the request is invalid and should fail. Previously it would wipe out any existing values configured for these prefs because of the assumption in the handler that exactly one of them is set. Updates tailscale/corp#40851 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale webserver, each on its own NAT'd vnet network with a distinct WAN IP. The test exercises Tailscale's IsWireGuardOnly peer code path: the way the control plane wires Mullvad exit nodes into a client's netmap, including the per-client SelfNodeV4MasqAddrForThisPeer source-IP rewrite that lets a Tailscale CGNAT IP egress through a plain-WireGuard tunnel that has no idea what Tailscale is. The mullvad VM doesn't run wireguard-tools or kernel WireGuard; instead, a new TTA endpoint /wg-server-up creates a real Linux TUN named wg0, drives it with wireguard-go (already vendored), and configures the kernel side (ip addr/up, ip_forward, iptables NAT MASQUERADE) so decrypted traffic from the peer egresses with the mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no difference on the wire — what's being tested is Tailscale's plain-WireGuard exit-node code path, not the kernel module — and this lets the test avoid downloading and installing .deb packages inside the VM. Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the generated WG public key as a key.NodePublic), Env.SetExitNodeIP (EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't discoverable via TTA), Env.ControlServer (exposes the underlying testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses to inject custom peers), and Env.Status (fetches a node's tailscale status, used to read the client's pubkey so we can pin it as the WG server's only allowed peer). The test verifies that the webserver's echoed source IP is the client's WAN with no exit node selected, the mullvad VM's WAN with the WG-only peer selected as exit, and the client's WAN again after clearing. Updates #13038 Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a vmtest that brings up two Ubuntu nodes, each behind its own EasyNAT, joined to the tailnet. The sender pushes a small file via "tailscale file cp" and the receiver fetches it via "tailscale file get --wait", asserting that the filename and contents round-trip unchanged. To make Taildrop work in vmtest, three small pieces were needed: The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as well as --state=mem:, so the daemon has a VarRoot to host Taildrop's incoming-files directory. State itself remains in-memory (so nothing persists across reboots); only the var-root scratch space is on disk. vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser helper. When the option is passed, Start sets AllNodesSameUser=true on the embedded testcontrol.Server. Cross-node Taildrop requires the sender and receiver to share a Tailnet user (or have an explicit PeerCapabilityFileSharingTarget granted between them, which we don't plumb here), so TestTaildrop opts in. Existing tests don't. cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap "tailscale file cp" and "tailscale file get --wait", plus Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that drive them. Updates #13038 Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add macOS VM support to the vmtest framework using Tart's pre-built macOS images (ghcr.io/cirruslabs/macos-tahoe-base) instead of building from IPSW. The Tart image has SIP disabled and SSH enabled. At test time, the Tart base image's disk, NVRAM, and hardware identity are APFS-cloned into a tailmac-compatible directory layout, and the VM is booted headlessly via tailmac's Host.app (Virtualization.framework) with its NIC connected to vnet's dgram socket. New features: - tailmac.go: ensureTartImage (auto-pull), cloneTartToTailmac (format conversion), startTailMacVM (launch + cleanup) - NoAgent() node option for VMs without TTA installed - LANPing() for ICMP reachability testing via TTA's /ping endpoint - IsMacOS field on OSImage, with GOOS/GOARCH support - Dgram socket listener in Start() for macOS VMs - Fix ReadFromUnix error spam on dgram socket close in vnet TestMacOSAndLinuxCanPing verifies a macOS Tart VM and a gokrazy Linux VM can ping each other on the same vnet LAN. Updates #13038 Change-Id: I5e73a27878abf009f780fdf11a346fc857711cff Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fixes #19566 Signed-off-by: Noel O'Brien <noel@tailscale.com>
Seamless key renewal has been the default in all clients since 1.90. We retained the ability to disable it from the control plane as a precaution, but we haven't seen any issues that require us to disable it. We're now removing all the code for non-seamless key renewal, because we don't expect to turn it on again, and indeed it's been untested in the field for three releases so might contain latent bugs! Updates tailscale/corp#33042 Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b Signed-off-by: Alex Chan <alexc@tailscale.com>
…laky (again) This test is still flaking on macOS, so mark it as such so we can track and investigate further. Updates #7707 Change-Id: I640da3c1068a90a9815caab2df9431bceb01f846 Signed-off-by: Alex Chan <alexc@tailscale.com>
…19491) With netmap caching, the home DERP of the self node was neither saved to the cache or loaded from it, making nodes not stick to a DERP when starting without a connection to control. Instead, make sure that when a cache is available, load that cache, before looking for DERP servers. This is implemented by allowing a skip of ReSTUN in setting the DERP map (we must have a DERP map before setting the home DERP), so the DERP from cache will set itself and be sticky until a connection to control is established. Making DERP only change when connected to control is handled by existing code from f072d01. Updates #19490 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
When --vmtest-web is set, Host.app is launched with --screenshot-port 0
to start a localhost HTTP server that captures the VZVirtualMachineView
display. The Go test harness parses the SCREENSHOT_PORT=<port> line from
stdout, then polls every 2 seconds for JPEG thumbnails and pushes them
over WebSocket to the web dashboard.
Clicking a screenshot thumbnail opens a full-resolution image proxied
through the web UI's /screenshot/{node} endpoint.
Screenshot events are excluded from the EventBus history (they're large
and only the latest matters, stored in NodeStatus.Screenshot).
Updates #13038
Change-Id: I9bc67ddd1cc72948b33c555d4be3d8db06a41f6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies the `DNSConfig` resource to allow customisation of the `spec.nodeSelector` field in the nameserver pods. Closes: #19419 Signed-off-by: David Bond <davidsbond93@gmail.com>
…19565) Upon deciding to update the LastSeen timestamp, we weren't checking that the field we are replacing into was non-nil. Rather than add an additional check, just allocate a fresh pointer for the updated time. Updates #19564 Change-Id: I589ebe65175fc7677c04a31dd6c4670e2531ee62 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Cache a pre-booted macOS VM snapshot on disk so subsequent test runs restore from the snapshot instead of cold-booting. The snapshot is keyed by the Tart base image digest and a code version constant (macOSSnapshotCodeVersion); bumping either invalidates the cache. Snapshot preparation (one-time): - Boot the Tart base image with a NAT NIC (--nat-nic flag) - Wait for SSH, compile and install cmd/tta as a LaunchDaemon - TTA polls the host via AF_VSOCK for an IP assignment; during prep the host replies "wait" - Disconnect NIC, save VM state via SIGINT Test fast path (cached, ~7s to agent connected): - APFS clone the snapshot, write test-specific config.json - Launch Host.app with --disconnected-nic --attach-network --assign-ip - VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM) - TTA's vsock poll gets the IP config, sets static IP via ifconfig (bypasses DHCP entirely), switches driver addr to the IP directly (bypasses DNS), and resets the dial context so the reverse-dial reconnects immediately - TTA agent connects to test driver within ~2s of IP assignment Key optimizations: - 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB), halves restore time (5.5s vs 11s) - AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved) - Direct IP dial: bypasses DNS resolution for test-driver.tailscale - Dial context reset: cancels stale in-flight dials from snapshot - Kill instead of SIGINT for test VM cleanup (no state save needed) - Parallel VM launches Also: - Add TestDriverIPv4/TestDriverPort constants to vnet - Add --nat-nic and --assign-ip flags to Host.app - Fix SIGINT handler: retain DispatchSource globally, use dispatchMain() - Add vsock listener (port 51011) to Host.app for IP config protocol - Add disconnectNetwork() to VMController for clean snapshot state - Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed Updates #13038 Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Two cloud-platform nodes (e.g. sr-a and sr-b in TestSiteToSite) boot in
parallel via errgroup and both call ensureCompiled and the inline image
preparation block, racing to Begin() the same shared *Step (which is
deduped by name in Env.Step). The second goroutine panics:
panic: Step "Compile linux_amd64 binaries": Begin called in state running
panic: Step "Prepare ubuntu-24.04 image": Begin called in state done
ensureCompiled had a TOCTOU dedup attempt (released compileMu before
doing the work, only added to the compiled set at the end), and image
preparation had no dedup at all.
Replace the compiled set with a per-key map[string]*sync.Once for each
of compile and image preparation, so concurrent callers serialize on
the Once and only the first executes Begin/work/End.
Fixes commit 02ffe5b.
Updates #13038
Change-Id: If710bcc9e0aafebf0ad5b61553bae11458d976d7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…ch (#19575) The mismatch behaviour of falling back to a previous key could end up breaking connections when the netmap update took longer than the 2 seconds allowed in controlClient.auto for netmap updates, or if the controlClient context was canceled. This could end up breaking legitimate updates to the netmap for disco keys coming from control. Instead, log the event, and let the connection be reset to that of the key as that is safer. Issue found by @bradfitz. Updates #19574 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Expvars track count of rate limiters exceeding their threshold. Covers (1) global rate limiter and (2) total of local rate limiters. Also publish optional rate-limit metrics during ExpVar() call if -rate-config is specified. Fixes current rate-limit metrics being published outside of "derp" in /debug/vars. Updates tailscale/corp#38509 Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
…log change When we switched to monogok in 371d636, we lost our gokrazy fork's change to let the syslog be configured from the Linux cmdline. That's sent upstream in gokrazy/gokrazy#275 but still in review. Meanwhile, revert to a fork, while still keeping monogok. Monogok was updated to support an alternate init package, which is now hosted temporarily at https://github.com/tailscale/ts-gokrazy This means we can rip out the log polling loop out of pending PR #19568 and go ack to using syslog. Updates #13038 Change-Id: I36931ee8eecc40d6165ad036c6181dfb07b86ba2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates: tailscale/corp#40648 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
…utReSTUN Commit 78627c1 changed the signature of magicsock.Conn.SetDERPMap to take an additional bool doReStun parameter. Avoid both the boolean parameter and the API signature change by restoring SetDERPMap to its original single-argument form and adding a new SetDERPMapWithoutReSTUN method for the cache-loading caller that wants to skip the post-set ReSTUN. Updates #19490 Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).
The five pings:
1. A → B (initial; brings up the tunnel; 30s budget)
2. B → A after rotate (LocalAPI rotate-disco-key debug action)
3. A → B after rotate (LocalAPI)
4. B → A after restart (SIGKILL; gokrazy supervisor respawns)
5. A → B after restart (SIGKILL)
Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:
- The rotate-then-a→b phase takes ~10s on main because of LazyWG.
After B's WantRunning bounce, B's wgengine resets its
sentActivityAt/recvActivityAt maps and trims A out of the
wireguard-go config as an "idle peer"; B only re-adds A on
inbound activity, by which point A's first few TSMP packets
have been silently dropped at B's tundev. The
bradfitz/rm_lazy_wg branch removes that trimming entirely
(verified locally: this phase drops to <100ms there).
- The restart phases take ~5s for wireguard-go's RekeyTimeout
handshake retry. After SIGKILL+respawn the first WG handshake
init from the restarted node sometimes goes into the void
(likely the brief peer-removed window in the receiver's
two-step maybeReconfigWireguardLocked reconfig during which
the peer is absent from wireguard-go), and wg-go's 5s+jitter
retransmit timer is the next opportunity to retry. That retry
succeeds and the staged TSMP packet flushes. Intrinsic to the
protocol's retransmit policy.
Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.
Supporting changes:
ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
only resets local disco state; without also dropping wireguard-go
session keys, peers keep encrypting with their stale per-peer
session against us until their rekey timer fires (WireGuard has no
data-plane signaling to invalidate sessions). Bouncing WantRunning
runs the engine through Reconfig(empty) → authReconfig, which
drops every peer's WG session so the next packet either way
triggers a fresh handshake.
ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
a map[NodePublic]DiscoPublic from the current netmap. Tests reach
it via [local.Client.DebugResultJSON]. We do not surface disco
keys via [ipnstate.PeerStatus] because adding a non-comparable
[key.DiscoPublic] field there breaks reflect-based test helpers
(e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
general LocalAPI clients have no need for disco keys. Since the
debug LocalAPI is gated behind the ts_omit_debug build tag, this
endpoint is automatically stripped from small binaries.
cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
to drive the SIGKILL phase. On gokrazy the supervisor respawns
tailscaled within a second.
tstest/integration/testcontrol: add Server.AllOnline. When set,
every peer entry in MapResponses is marked Online=true. Several
disco-key handling fast paths in controlclient and wgengine
(removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
for online peers; without this flag, tests exercising disco-key
rotation only hit the offline-peer code paths, which mask issues
and are several seconds slower in this scenario. Finer-grained
per-node online tracking can be added later.
tstest/natlab/vmtest: add Env.RotateDiscoKey,
Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
[AllOnline] EnvOption that plumbs through to
testcontrol.Server.AllOnline, and an exported
Env.Ping(from, to, type, timeout). Ping replaces the unexported
helper so callers can specify both a ping type (PingDisco for
warming peer state, PingTSMP for asserting end-to-end
connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
error so callers inside tstest.WaitFor can retry transient
failures rather than fataling the test.
Updates #12639
Updates #13038
Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit enables the operator to set a global rate limit without any per-client. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>
As of 0e9f9e2 it is possible to have an infinity per-client limit, with finite global. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>
78627c1 introduced starting up and preserving the DERP server from cache, but also changed it so the initial ReSTUN would not fire when setting the DERPMap. Change this so when not working from a cache, the ReSTUN will always fire during startup. Updates #19585 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
…stControl The test was flaky under stress with "AddRawMapResponse N: node not connected" failures. The root cause was in testcontrol's addDebugMessage: it conflated "no streaming poll registered" with "wake-up channel buffer momentarily full". The single-slot updatesCh is just a lossy wake-up signal, but the streaming serveMap loop has fast paths (takeRawMapMessage and the hasPendingRawMapMessage continue) that don't drain it. A stale notification could remain buffered, causing the next sendUpdate to fail even though msgToSend had been queued and the streaming poll would still pick it up. Detect the real failure case (no streaming poll) by checking s.updates[nodeID] directly, and treat sendUpdate's buffer-full result as benign — the message is in msgToSend, which is the source of truth. Also plumb an optional *health.Tracker through tsp.ClientOpts to the underlying ts2021.Client and supply one in the tests, eliminating the "## WARNING: (non-fatal) nil health.Tracker (being strict in CI)" stack dumps emitted by controlhttp.(*Dialer).forceNoise443 under CI. Fixes #19583 Change-Id: Ib2334376585e8d6562f000a0b71dea0117acb0ff Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…mand peers Replace the UAPI text protocol-based wireguard configuration with wireguard-go's new direct callback API (SetPeerLookupFunc, SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey). Instead of computing a trimmed wireguard config ahead of time upon control plane updates and pushing it via UAPI, install callbacks so wireguard-go creates peers on demand when packets arrive. This removes all the LazyWG trimming machinery: idle peer tracking, activity maps, noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the ts_omit_lazywg build tag. For incoming packets, PeerLookupFunc answers wireguard-go's questions about unknown public keys by looking up the peer in the full config. For outgoing packets, PeerByIPPacketFunc (installed from LocalBackend.lookupPeerByIP) maps destination IPs to node public keys using the existing nodeByAddr index. Updates tailscale/corp#12345 Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If a user passes --advertise-tags=foo,bar (with no colons in any
segment), automatically prepend "tag:" client-side so it goes on the
wire as "tag:foo,tag:bar". Segments that already contain a colon are
left untouched and must be fully-qualified ("tag:foo"), which keeps
the door open for future colon-bearing syntax.
This was originally added in cd07437 (2020-10-28) and then reverted
in 1be01dd (2020-11-10) over forward-compatibility concerns. But
then it was realized in 2026-04-29 that this was always safe for
future extensiblity anyway (tags can't contain colons-- tag:foo:bar is
invalid anyway, per the 2020 CheckTag restrictions). So if we wanted
to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo",
we'd still have syntax to do, as it can't conflict with tag:group:foo.
Avery signed off on this on Slack: "Ok, I withdraw my objection to
auto-qualifying tag names in advertise-tags and I hope I won't regret
it :)"
Updates #861
Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies the usage of the `egressservices.Configs` type within containerboot and the k8s operator. Originally it was being thrown around as a pointer which is not required as maps are already pointers under the hood. Signed-off-by: David Bond <davidsbond93@gmail.com>
Move the template, request handler, and HTTP/HTTPS server wiring out of package main and into a new cmd/hello/helloserver package so the server can be embedded in other binaries. The main package now only constructs a helloserver.Server with the production addresses and calls Run. While here, drop the -http, -https, and -test-ip flags along with the dev-mode template and fake-data fallbacks they enabled; the binary is only run in production. Updates tailscale/corp#32398 Change-Id: Id1d38b981733334cafc596021130f36e1c1eed67 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…MapWithPeers
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:
- NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
a possibly-stale Peers slice). For callers that only read non-Peers
fields like SelfNode, DNS, PacketFilter, capabilities.
- NetMapWithPeers: documented as returning an up-to-date Peers slice.
For callers that genuinely need to iterate Peers or call
PeerByXxx.
Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.
Migrate in-tree callers to the appropriate accessor based on what
fields they read:
- NetMapNoPeers (most common): localapi handlers, peerapi accept,
GetCertPEMWithValidity, web client noise request, doctor DNS
resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
SSH-policy/cap reads, several LocalBackend internals
(isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
nil-check, serve config).
- NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
disk for fast restart), PeerByTailscaleIP lookup.
Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.
Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:
- [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
that decodes directly into a target type T, avoiding the
marshal/unmarshal roundtrip callers otherwise need.
- localapi "current-netmap" debug action: returns the current
netmap (with peers) as JSON. Documented as debug-only — the
netmap.NetworkMap shape is internal and may change without notice.
This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.
Updates #12542
Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…#20029) This is a refinement of #19916. Previously, we would only emit a latency log when going from a cached netmap to an uncached one (i.e., from the control plane). We would like to know the latency in both conditions, though, so instead use the validity of the previous self state. Updates #12639 Updates tailscale/projects#27 Change-Id: I6bbeb5d3162f1f98cdb3dcd244f67ef31c170957 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
tsnet depends on logpolicy, which in turn depended on util/syspolicy because of a single LogTarget policy setting it uses. In this commit, we replace that dependency with a feature.Hook, which only tailscaled or its platform-specific alternatives should set. Updates #20031 Signed-off-by: Nick Khyl <nickk@tailscale.com>
Fixes #20035 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
…hs (#20020) Assign the Kubernetes operator, kube libraries, container build commands, and related paths to @tailscale/k8s-devs. Updates #cleanup Change-Id: I9d8c7ebfd9a2b6401dd8cb0ff335151afe58357c Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
To avoid breaking downstream code, add deprecated aliases for all the old names. Updates tailscale/corp#37904 Change-Id: I86d0b0d7da371946440b181c665448f91c3ef8d2 Signed-off-by: Alex Chan <alexc@tailscale.com>
Track lastSeen on each cached flow and add a sweeper goroutine that periodically removes flows idle past the idle timeout. Introduce tunables for idle timeout, maximum flows removed per sweep (to limit mutex hold time), and the sweeper interval. Also cap the previously-unlimited tables: 10k client flows, 100k connector flows. Updates tailscale/corp#38630 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
…20051) Add a vmtest that guards the fix in #20025: after an in-process control client swap (profile switch / interactive re-login), magicsock's NetInfo dedup cache (netInfoLast) must be cleared so the structurally-identical post-switch NetInfo (same PreferredDERP, same NAT shape) is re-reported to the new control session rather than suppressed as unchanged. The test brings a node up, pins its home DERP so the reported NetInfo is identical across the switch, records the home DERP the test control learned, switches to a fresh login profile on the same control/network/NAT/DERP, and asserts the control re-learns the same non-zero home DERP for the node's new identity. Without ResetNetInfoLast the assertion times out at HomeDERP=0. To support this, vnet now serves the test control on port 443 (TLS) in addition to port 80: an immediate re-login makes a fresh noise dial, and because the prior dial was recent the control client forces an HTTPS (443) dial (controlhttp.Dialer.forceNoise443), which the harness previously did not answer. The control endpoint gets its own self-signed cert (the existing selfSignedDERPCert helper, renamed to the generic selfSignedCert); the cert is not validated since control noise dials authenticate via the Noise handshake, so it only needs a TLS peer to complete the forced 443 dial. Add Env.ForcePreferredDERP and Env.Relogin helpers for the above. Updates #20024 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Did you know that Gilbert Baker used the Pantone color scale when designing the rainbow flag? I suppose that's not too surprising. There are also other color scales like munsell and werner. I guess the rainbow itself is a color scale, with its seven "roygbiv" colors. (It's also a fish, with both a tail and scales.) We have so many ways to measure color on so many different scales. And it turns out "pride" itself is a scale. Updates #words Signed-off-by: Will Norris <will@tailscale.com>
macOS 26.4 emits RTM_MISS on the routing socket for every failed route lookup. skipRouteMessage never inspected the message type, so each miss woke the monitor as a link change and triggered a netcheck. On networks without an IPv6 default route the netcheck's IPv6 DERP probes fail and emit more RTM_MISS messages, sustaining the loop indefinitely: netchecks run at roughly 40x the intended rate, with sustained probe traffic and corresponding CPU and battery cost. RTM_MISS scales with traffic volume, not network state, and is never the leading signal for a topology change: route withdrawals emit RTM_DELETE synchronously before any subsequent lookup can miss, so ignoring it loses no signal. Other routing daemons (bird, dhcpcd, frr) ignore it as well. Same fix as coder/tailscale@e956a950741f. Fixes #19324 Signed-off-by: Doug Bryant <dougbryant@anthropic.com>
Fixes #17188 Signed-off-by: Anthony SCHWARTZ <antho.schwartz@gmail.com> Signed-off-by: Anthony SCHWARTZ <anthony.schwartz@ext.ec.europa.eu>
This removes deprecated magic-dns formats for 4via6 subnet routers. These are superseded by the current format: Q-R-S-T-via-X. Fixes #20053 Change-Id: I0eed1f057f856f248c4dc8ce3b751f6c7edcfbfd Signed-off-by: Becky Pauley <becky@tailscale.com>
New-style IPN bus subscribers consume stateful delta streams. Reject NotifyRateLimit when it is combined with those subscription bits so tailscaled cannot merge or delay messages that clients need to apply in order. Also stop silently dropping notifications when a watcher falls behind. Remove the watcher, replace its stale queue with one terminal ErrMessage notification, and close the watch. Updates #20062 Change-Id: Id9d402ea76f4011cd23f122adf62f30dd4b6f90b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates #20035 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Add NotifyInProcessNoDisconnect for in-process IPN bus subscribers that must apply every bus update. When such a subscriber falls behind, block Notify production instead of sending the terminal fell-behind message and closing the watch. This is intentionally not available over LocalAPI, where a slow or stuck out-of-process client should still be disconnected rather than allowed to stall tailscaled. In-process callers that use the bit must keep their callbacks fast and must not call back into LocalBackend from the callback. Updates #20062 Change-Id: I730ad61a07475243bb226fba2262c1a3ded211ae Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/corp#43105 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
For real, we're supposed to use min, not max. Updates tailscale/corp#43105 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This commit modifies the reconciler for the `Tailnet` custom resource to allow referenced secrets to specify an `audience` field. If a referenced secret contains both an `audience` and `client_id` we assume the user's intention is to use workload identity. In that case, we configure the tailscale API client to authenticate using the Kubernetes token request API against the operator's service account. This requires the operator to be aware of its own service account name. A small change has also been made to the messages added to the `Tailnet` CRD's status field in the even that it is missing scopes to make it clearer that certain scopes may not be applied. Closes: #19090 Updates: #19471 Signed-off-by: David Bond <davidsbond93@gmail.com>
…ndpoints (#20088) 9be2108 changed sending disco pings so a callMeMaybe would be not be gated by endpoints existing if the node was running off of a cached netmap. This commit partly reverts that change, but keeps in a few bug fixes in that commit and the tests that was introduced and now skipped. The behaviour prior to 9be2108 is retained. Updates #20085 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
…es (#20084) The 1 minute timeout was hitting timers inside wireguard-go, leading stale connections hanging forever. Increasing the timeout to 2 minutes makes a small subset of cached connections establish direct connections slightly slower. Updates to wireguard-go will allow a better hook for when to send these messages in the future. This change only makes fixes the error mode but if we have better triggers coming in wireguard-go, we should be using those. Updates #20081 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
We aren't supposed to be using CODEOWNERS as blocking reviews, blocking global cleanups. (This is why we want to move to go/policybot) Updates tailscale/corp#13972 Change-Id: I380258e2d4ffd0720d57d891adab06c8ca388617 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/corp#43243 Updaets #20067 Change-Id: I27e19f34e2216f3ac1a4e2a6b38c0ac473b8c7ad Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I previously (in #20096) had only considered the tailscaled deps and forgot about the CLI deps. This does the CLI ones too. containerboot and k8s-operator aren't applicable because they build from oss already. Updates tailscale/corp#43243 Updates #20067 Change-Id: I66790f822b5d040e7fcf90feabca24669f69cf61 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates #cleanup Change-Id: Ib6ff2e678670ecc001207a0b8be02b035958cb88 Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates #cleanup Change-Id: I088aa91218354f6208190c8f6673f9c5a98e65fc Signed-off-by: Alex Chan <alexc@tailscale.com>
Prevent tailscale ssh from automatically adding a username when connecting to a server, only forward one if provided. The previous behaviour prevented username overrides in the ssh configuration, since the provided username takes precedence to the configured one. This also keeps the tailscale ssh a thin wrapper around ssh by not adding any extra arguments unless required. Fixes #19357 Signed-off-by: Örjan Fors <o@42mm.org>
* cmd/k8s-operator: rework [unexpected] log lines This commit modifies several places in the operator logs where we prepend `[unexpected]` to instead use an appropriate logging level. The `[unexpected]` prefix is intended to be used when the program violates some internal invariant (or for example, a database has become corrupted). Many of these cases were simply log lines that then fell back to a default value/behaviour. These have been releveled to warnings. Some of these log lines also seemed extraeneous as for the example of service reconcilers logging when there is no proxy group annotation. As far as I can tell we've never had any predicates for limiting the services reconciled to ones with that annotation, so they can just be removed to reduce log spam. Fixes: #cleanup Signed-off-by: David Bond <davidsbond93@gmail.com> * Update cmd/k8s-operator/egress-services-readiness.go Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com> Signed-off-by: David Bond <davidsbond@users.noreply.github.com> * Update cmd/k8s-operator/operator.go Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com> Signed-off-by: David Bond <davidsbond@users.noreply.github.com> --------- Signed-off-by: David Bond <davidsbond93@gmail.com> Signed-off-by: David Bond <davidsbond@users.noreply.github.com> Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
… extensions When a client's node key expires and the user clicks "Login" (or runs `tailscale up`), the Login() method was cancelling the map poll context. This caused key extension notifications from the server to be lost, leaving clients stuck in NeedsLogin state even after an admin extended their key. The fix has three parts: 1. Login(): Don't cancel mapCtx if we have valid credentials (loggedIn=true) or a valid node key. This allows the map poll to continue receiving server notifications while the auth flow proceeds in parallel. 2. mapRoutine(): Poll when we have a node key, even if !loggedIn. This handles the tsnet restart scenario where control returns an AuthURL (so loggedIn=false) but we still have a valid node key that can receive map updates. 3. sendStatus()/UpdateFullNetmap(): Forward netmaps when we have a node key, not just when loggedIn. This ensures the backend sees key expiry changes even when the auth flow hasn't completed. "First successful flow wins": if a key extension arrives via map poll, the client recovers automatically. If the auth flow completes first, that works too. Either way, the client is no longer stuck. This aligns with the SeamlessKeyRenewal philosophy: maintain connectivity paths while authentication proceeds, allowing server-initiated recovery. Fixes #19326 Change-Id: I26dbbc1fa7c1159ba075362e44d02814355d6b44 Signed-off-by: Avery Pennarun <apenwarr@tailscale.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
authRoutine snapshots c.loginGoal, runs TryLogin without the lock, then writes back loggedIn/loginGoal under the lock. If a concurrent Login() or Logout() changes the goal during the in-flight request, the write-back overwrites the new intent: the more recent login goal is silently dropped, or a logout is reverted to logged-in. Gate both the URL-followup and success commits on c.loginGoal still matching the goal we were processing. Stale results are ignored and the next iteration runs with the current goal. Updates #19326 Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This adds testcontrol support for expiring individual node keys, in order to enable test scenarios involving to key-expiry and extension. Updates #19326 Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Updates #19326 Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.