Introduction to BGP EVPN with VXLAN
Why BGP EVPN over VXLAN is required.
In VXLAN there is traditional method of Flood and Learn mechanism where multidestination traffic is flooded over VXLAN between VTEPs to learn about host MAC address located behind VTEPs so that data traffic can be unicasts. This feature still have subsequent BUM traffic in DC VXLAN environment.
As VXLAN has 16 million VNI, and mapping to each multicast group is impractical for deployment.
Ingress Replication is sometime used in VXLAN, so that every VTEP must be aware of every VTEP that have membership for given VNI. Due to which source VTEP generates n copies for every multidestination frame weather some VTEP requires that frame or not.
So keeping all above or more issues in mind BGP EVPN VXLAN was introduced to address issues F&L problems.
Some benefits of BGP EVPN over VXLAN are discussed here:
- Host placement anywhere, and mobility
- Optimal east-west traffic
- Segmentation of tenant L2 and L3 tenant traffic
- Minimum flooding traffic
BGP EVPN feature address family sends the host MAC, IP, network, VRF, and VTEP information over MP-BGP. As long as a VTEP learns a host behind it BGP EVPN provides this information to all other BGP EVPN–speaking VTEPs. As long as the source VTEP continues to detect a host behind it, an EVPN update message is not sent out due to which other VTEPs need not “age out” any remote host reachability Information.
Before going much in deep we should learn about the BGP EVPN route type that is used to share the MAC, IP and other information for successful host reachability.
- Type 1 – Ethernet Auto-Discovery (A-D) route
- Type 2 – MAC advertisement route for L2 VNI MAC/MAC-IP
- Type 3 – Inclusive Multicast Route for EVPN IR, Peer Discovery
- Type 4 – Ethernet Segment Route
- Type 5 – IP Prefix Route for L3 VNI Route
Here in BGP EVPN, mostly Route type -2 and Route type 5 is used.
Route Type 2: Route type 2 or MAC Advertisement route is for MAC and ARP resolution advertisement, MAC or MAC-IP
Route type 2 has has mandatory MAC Address and MAC Address Length fields and also define the Layer 2 VNI for the VXLAN data plan. This NLRI also allows for the optional fields, IP Address, and IP Address Length. When Route type 2 is used for bridging information then additional attributes such as Enacp type ( Encap 8: VXLAN) RT , or MAC mobility sequence is also sent.
And if routing is to be done then additional attributes or communities such as Router MAC of next hop, Layer3 VNI, RT is also sent.
Route type 3: it is also said as inclusive multicast Ethernet tag route” and is typically used to create the distribution list for ingress replication. Route type 3 is immediately generated and sent to all ingress replication–participating VTEPs as soon as a VNI is configured at the VTEP and is operational. In this way, every VTEP is aware of all the other remote VTEPs that need to be sent a copy of a BUM packet in a given VNI.
Route Type 5: Route type 5 or IP Prefix route will be used for the advertisement of prefixes, IP only. Route type 5 only contains the Layer 3 VNI needed for routing and also has extended Communities of Route type 5 carry the Route Target, the encapsulation type, and the router MAC of the next-hop VTEP in the overlay.
Host Detection & Subnet Route Distribution:
BGP EVPN underlay provides the reachability information from VTEP to VTEP, the overlay control protocol distributes end host information, like MAC addresses, IP addresses, or subnet prefixes with the associated location in the overlay. The VTEP is advertised as the next hop in all BGP EVPN prefix advertisements.
On detection of an end host connected to VTEP, the MAC information is learned on the local switch. Meanwhile MAC of layer 2 VLAN is also mapped to the Layer 2 VNI. At this stage, a BGP update, containing an EVPN Route type 2 NLRI, is created with the MAC address length (6 bytes), the MAC address itself, the Layer 2 VNI (label 1), and the respective Route Distinguisher and Route Target, derived through the configuration on the edge device itself and encapsulation type 8 is also included to ensure that all neighboring devices understand that the data plane encapsulation being used is VXLAN. A MAC-only Route type 2 is represented with a /216 prefix, while a MAC/IP Route type 2 is represented as a /272 prefix.
Once the edge device receives an ARP request from a directly attached end host, the IPMAC binding of the end host is learned at that edge device. The Layer 3 interface on which the ARP request is received provides the associated context information and implicitly maps to the tenant VRF associated with the Layer 3 interface.
At this stage, all the related Layer 3 end host information can be populated into the Route type 2 NLRI in the BGP EVPN message. This includes the IP address length, the IP address, the Layer 3 VNI (label 2), and the Route Distinguisher and Route Target derived via the configuration. In addition, the Router MAC (RMAC) associated with the sourcing VTEP or edge device is also added as an extended community. This provides the neighboring edge devices the mapping information about the sourcing VTEP’s IP address (Layer 3) and the associated MAC address (Layer 2).
The information known is sent via BGP update to the route reflector (iBGP), which in turn forwards this update message to all its BGP peers. On receiving this update message, all the edge devices add the information of the newly learned remote end hosts to their respective local database bridging/routing tables. At this time, MAC addresses matching the Layer 2 VNI and the corresponding import Route Targets are populated in the hardware tables for MAC learning purposes
A similar procedure is used for populating the host IP prefixes into the hardware routing table, called as Forwarding Information Base (FIB). Specifically, the Layer 3 VNI, along with the associated import Route Targets, identifies the tenant VRF or VPN in which the /32 or /128 host IP prefix must be installed. Depending on the hardware capability of the edge device, the FIB may be divided into a host route table (HRT), which stores exact /32 (IPv4) or /128 (IPv6) host prefixes, and a Longest Prefix Match (LPM) table, which typically stores variable-length IP prefix information.
In addition to host bridging and routing capability, BGP EVPN also provides the classic semantics of IP prefix-based routing. IP prefixes are learned and redistributed by the sourcing edge devices into the BGP EVPN control protocol. To perform this operation, a specific EVPN NLRI format has been introduced (Route type 5). With Route type 5 messages, an edge device populates the IP prefix length with the IP prefix field corresponding to the route being advertised and the Layer 3 VNI associated with the VRF context to which the route belongs.
Once a Route type 5 message is received by an edge device or a VTEP, the IP prefix route carried in the update is imported into the routing tables only if the Layer 3 VNI and Route Target configured locally at the receiving VTEP matches the one carried in the message. If this match exists, the IP route is installed into the FIB table, typically as an LPM entry.
There are two major use cases in which Route type 5 is utilized for carrying the IP prefix routing information in a VXLAN-based BGP EVPN network.
The first involves advertising the IP subnet prefix information from every edge device performing first-hop routing service. also referred to as Integrated Routing and Bridging (IRB), the edge device performs the default gateway service with the distributed IP anycast gateway.
As long as the gateway is configured, the IP subnet prefix is advertised via Route type 5. This helps in the discovery of undiscovered, or “silent,” end hosts because the subnet prefix attracts routed traffic to the edge device below which the undiscovered host resides. This edge device generates an ARP request to discover the silent host, which in turn sends out an ARP response. After discovery, the IP-MAC binding of the end host is learned at the directly attached edge device (via ARP snooping) and advertised into BGP EVPN using Route type 2.
The second and probably more common use case for injection of IP prefix routes into the BGP EVPN fabric involves external routing. Routes are learned via peering established using eBGP (or other unicast routing protocols) and redistributed into the BGP EVPN address family using appropriate route map policies
As part of this redistribution process, this information is advertised in the same routing context (VRF) in which it was learned.