Tailscale + Proxmox: When Your Cluster Routes Against Itself

by Erik ParawellJunE 4th, 2025

The Problem

I've been running Tailscale in a VM on one of my Proxmox hosts to provide subnet routing for my homelab. This worked fine until I needed to reboot that host - suddenly I'd lose all remote access to the network and have to fall back on UniFi Teleport or the UDM Pro's OpenVPN. Not ideal.

The obvious solution: run Tailscale on all the Proxmox hosts for redundancy. Just install it on each node, advertise the same subnet, and let Tailscale handle failover.

So I ran tailscale up --advertise-routes=10.0.4.0/24 --accept-routes=true on the first host. That's when things got weird. The host became unreachable at its LAN IP (10.0.4.30) from other machines on the network. When I tried deploying Tailscale to more hosts, the problem spread - each host I added would disappear from the local network.

Even worse, the Proxmox cluster started having issues. Corosync couldn't maintain quorum properly. The web UI would work from some nodes but not others. ZFS replication between nodes slowed to a crawl.

Turns out when you tell a Linux box to both advertise AND accept the same subnet it lives on, it starts routing its own local traffic through the VPN tunnel.

I backed off and just created a dedicated VM for Tailscale routing. The VM itself couldn't be reached locally either, but that was fine since its only job was forwarding tailnet traffic. Still, this felt like a hack.

Initial Attempts

First thing I did was check if Tailscale had a built-in solution. Maybe a flag like --ignore-local-routes or something.

Nope. GitHub issue #1227 confirms this is a known limitation. Several people have requested this exact feature but it hasn't been implemented.
The community suggested a few workarounds:

 1.  Keep Tailscale in a dedicated VM/LXC
 2. Disable --accept-routes on hosts that advertise
 3. Use policy routing to override the behavior

Option 2 was a non-starter since I need these hosts to reach other Tailscale subnets. Option 1 defeats the purpose of getting rid of the VM. That left policy routing.

The Root Cause

When Tailscale accepts a route, it adds a policy rule that takes precedence over your normal routing table:

$ ip rule show
# 52: from all to 10.0.4.0/24 lookup 52

Table 52 is Tailscale's routing table pointing everything to tailscale0. So when host 10.0.4.30 tries to reach 10.0.4.31 on the same bridge, Linux checks the policy rules first and sends it through WireGuard instead of vmbr0.
You can verify this is happening:

$ ip route get 10.0.4.31
# If broken: 10.0.4.31 dev tailscale0 table 52 src 100.x.x.x
# If fixed: 10.0.4.31 dev vmbr0 src 10.0.4.30

This explains why each host disappeared from the LAN - all their traffic was being tunneled through whichever node Tailscale picked as the active router, creating a bizarre loop where nodes tried to reach each other through themselves.

The Fix

Add a higher-priority rule that catches local traffic before Tailscale's rule:

auto vmbr0
iface vmbr0 inet static
 address 10.0.4.30/24
 gateway 10.0.4.1
 bridge-ports eno1
 bridge-stp off
 bridge-fd 0
 
 # Force 10.0.4.0/24 to use main table (local bridge) first
 post-up ip rule add to 10.0.4.0/24 table main priority 1000
 pre-down ip rule del to 10.0.4.0/24 table main priority 1000
 
 # Enable IP forwarding for subnet routing
 post-up sysctl -w net.ipv4.ip_forward=1
 post-up sysctl -w net.ipv6.conf.all.forwarding=1
 pre-down sysctl -w net.ipv4.ip_forward=0
 pre-down sysctl -w net.ipv6.conf.all.forwarding=0

The `priority 1000` rule evaluates before Tailscale's rules, keeping local traffic local. The lower the number, the higher the priority - and since Tailscale's rules start at 5210, our 1000 wins.

Deployment

Rolled this out to all nodes with Ansible playbooks to keep the configs consistent. The playbook handles:

 - Installing Tailscale
 - Configuring /etc/network/interfaces with the policy rules
 - Bringing up Tailscale with the right flags
 - Enabling subnet routes in the admin console

Key part of the playbook:

- name: Configure network interfaces
 blockinfile:
  path: /etc/network/interfaces
  block: |
   # Force local subnet to use main table
   post-up  ip rule add to {{ local_subnet }} table main priority 1000
   pre-down ip rule del to {{ local_subnet }} table main priority 1000
   
   # Enable forwarding
   post-up  sysctl -w net.ipv4.ip_forward=1
   post-up  sysctl -w net.ipv6.conf.all.forwarding=1
   pre-down sysctl -w net.ipv4.ip_forward=0
   pre-down sysctl -w net.ipv6.conf.all.forwarding=0
   
   # UDP offloads for performance
   post-up  ethtool -K {{ ansible_default_ipv4.interface }} rx-udp-gro-forwarding on rx-gro-list off
   pre-down ethtool -K {{ ansible_default_ipv4.interface }} rx-udp-gro-forwarding off rx-gro-list on
  marker: "# {mark} ANSIBLE MANAGED TAILSCALE BLOCK"
  insertafter: "iface vmbr0 inet static"

- name: Bring up Tailscale
 command: |
  tailscale up --reset \
   --accept-routes=true \
   --advertise-routes={{ local_subnet }}

Results

Now I have:

 - True HA subnet routing - can reboot any host without losing remote access
 - Local cluster traffic stays on the physical bridge
 - Tailscale handles failover automatically in about 15 seconds

Testing failover is simple:

# From a remote machine, ping something on the subnet
$ ping 10.0.4.100

# On the Proxmox host currently routing, stop Tailscale
$ systemctl stop tailscaled

# Within 15-20 seconds, pings resume through another host

And most importantly, the Proxmox cluster is happy. Corosync maintains quorum, the web UI is accessible from any node, and ZFS replication runs at full speed.

The whole setup is simpler than maintaining a dedicated router VM, and I can bounce any host for maintenance without thinking twice about remote access. Just took some creative policy routing to make Linux do what I wanted.