Floating Service IP (Poor Manโ€™s Failover)

I wanted a file server that survives a host failure without touching clients, DNS, or rebuilding the network. This is the simplest pattern I know that works.


Context

In a small home lab, high availability doesnโ€™t need Kubernetes, load balancers, or enterprise clustering.

Sometimes you just need:

  • a stable IP for a service
  • the ability to move it between hosts
  • and minimal moving parts

This pattern uses a floating service IP that can be manually (or later automatically) moved between nodes.


๐Ÿง  Concept

Separate host identity from service identity:

TypeExample IPPurpose
Host IP172.17.20.253Physical node (araman)
Host IP172.17.20.254Physical node (eldamar)
Service IP172.17.20.97File server (movable)

Clients connect to:

172.17.20.97 โ†’ file-server

They never need to know which machine is serving it.


โš™๏ธ Current State

On active node (e.g. araman):

ip -br addr

Shows:

enp2s0 โ†’ 172.17.20.253 (host)
          172.17.20.97 (file-server)

๐Ÿ” Manual Failover Procedure

Step 1 โ€” Remove service IP from current host

ip addr del 172.17.20.97/24 dev enp2s0

Step 2 โ€” Add service IP to target host

ip addr add 172.17.20.97/24 dev enp2s0

Step 3 โ€” Announce ownership (IMPORTANT)

arping -c 3 -A -I enp2s0 172.17.20.97

This forces:

  • switches
  • clients
  • dv-gate

โ€ฆto update their ARP tables immediately.


โš ๏ธ Gotchas

โŒ Never have both hosts active with .97

If both nodes hold the IP:

  • ARP flapping
  • intermittent connections
  • extremely confusing behaviour

โš ๏ธ Without arping

Failover will appear broken for a while due to stale ARP caches.


๐Ÿงฉ Why This Works

This leverages simple Layer 2 behaviour:

  • IP โ†’ resolved to MAC via ARP
  • whichever host responds becomes the owner
  • updating ARP = redirecting traffic

No routing changes required
No DNS changes required
No client changes required


๐Ÿš€ Evolution Path (Optional)

If this pattern proves useful, it can evolve into:

1. Scripted takeover

takeover-file-ip.sh

2. Health-based failover

  • ping checks
  • service checks

3. Lightweight HA

  • keepalived (VRRP)
  • floating virtual IP

๐ŸŽฏ Design Philosophy

Keep the system understandable, observable, and recoverable.

This approach is:

  • โœ… transparent
  • โœ… low dependency
  • โœ… easy to debug
  • โœ… easy to reverse

๐Ÿงญ Relationship to dv-gate

  • dv-gate handles routing + NAT
  • service IP floats within LAN only
  • no impact to WAN or firewall rules

๐Ÿงช Final Thought

This isnโ€™t enterprise HA.

Itโ€™s something better for a home lab:

A system you can reason about at 2am when it breaks.