Virtual Routing and Forwarding (VRF) ==================================== The VRF device combined with ip rules provides the ability to create virtual routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the Linux network stack. One use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. Processes can be "VRF aware" by binding a socket to the VRF device. Packets through the socket then use the routing table associated with the VRF device. An important feature of the VRF device implementation is that it impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected (ie., they do not need to be run in each VRF). The design also allows the use of higher priority ip rules (Policy Based Routing, PBR) to take precedence over the VRF device rules directing specific traffic as desired. In addition, VRF devices allow VRFs to be nested within namespaces. For example network namespaces provide separation of network interfaces at L1 (Layer 1 separation), VLANs on the interfaces within a namespace provide L2 separation and then VRF devices provide L3 separation. Design ------ A VRF device is created with an associated route table. Network interfaces are then enslaved to a VRF device: +-----------------------------+ | vrf-blue | ===> route table 10 +-----------------------------+ | | | +------+ +------+ +-------------+ | eth1 | | eth2 | ... | bond1 | +------+ +------+ +-------------+ | | +------+ +------+ | eth8 | | eth9 | +------+ +------+ Packets received on an enslaved device and are switched to the VRF device using an rx_handler which gives the impression that packets flow through the VRF device. Similarly on egress routing rules are used to send packets to the VRF device driver before getting sent out the actual interface. This allows tcpdump on a VRF device to capture all packets into and out of the VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied using the VRF device to specify rules that apply to the VRF domain as a whole. [1] Packets in the forwarded state do not flow through the device, so those packets are not seen by tcpdump. Will revisit this limitation in a future release. [2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev set to real ingress device and egress is limited to NF_INET_POST_ROUTING. Will revisit this limitation in a future release. Setup ----- 1. VRF device is created with an association to a FIB table. e.g, ip link add vrf-blue type vrf table 10 ip link set dev vrf-blue up 2. Rules are added that send lookups to the associated FIB table when the iif or oif is the VRF device. e.g., ip ru add oif vrf-blue table 10 ip ru add iif vrf-blue table 10 Set the default route for the table (and hence default route for the VRF). e.g, ip route add table 10 prohibit default 3. Enslave L3 interfaces to a VRF device. e.g, ip link set dev eth1 master vrf-blue Local and connected routes for enslaved devices are automatically moved to the table associated with VRF device. Any additional routes depending on the enslaved device will need to be reinserted following the enslavement. 4. Additional VRF routes are added to associated table. e.g., ip route add table 10 ... Applications ------------ Applications that are to work within a VRF need to bind their socket to the VRF device: setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); or to specify the output device using cmsg and IP_PKTINFO. Limitations ----------- VRF device currently only works for IPv4. Support for IPv6 is under development. Index of original ingress interface is not available via cmsg. Will address soon.