EMAIL SUPPORT

dclessons@dclessons.com

LOCATION

NZ

VXLAN Packet Forwarding

VXLAN Packet Forwarding

Posted on Jan 09, 2020 (0)

VXLAN Packet Forwarding

In this topic we will learn how VXLAN packet forwarding is done on same VLAN – VXLAN ( VXLAN Bridging ) or over Different VLAN (Inter VLAN) via different VXLAN ( VXLAN Routing )

Here we will classify this packet forwarding in three section.

  • ARP request
  • ARP Reply
  • Actual Data traffic.

But for Layer 2 Broadcast, Unknown Unicast, and Multicast Traffic, VXLAN on Cisco Nexus 9000 Series Switches do the following:

  • Transport broadcast, unknown unicast, and multicast traffic
  • Discover remote VTEPs IP address 
  • Learn the remote host MAC addresses and also capture the MAC-to-VTEP mappings for each VXLAN segment

For BUM traffic types, IP multicast method is used to reduce the flooding of BUM traffic for set of hosts that are participating in the VXLAN segment.In each VXLAN segment , VNID is mapped to a particular IP multicast group in IP transport network. Once each VTEP device is configured independently they join this multicast group as an IP host by IGMP protocol. As soon as VTEP joins the multicast group , it trigger the PIM joins which further signals via Transport network for particular Multicast group and after this multicast distribution tree for the group is built through transport network.

VLAN – VXLAN Bridging packet flow:

For VLAN – VXLAN bridging packet flow we will use the following topology throughout the section to understand the VXLAN and Configuration.

Here Server 1 wants to talk to Server 2 which is same vlan 140 and is mapped to same VXLAN 50140 and multicast group 239.0.0.140.

Broadcast: ARP Request

Following are the steps for the ARP request:

  • Server -1 wants to start a communication with Server -2. Because Server -2 is in the same subnet as Server -1, it sends out an ARP request for Server -2 with DMAC set to the broadcast address (ff:ff:ff:ff:ff:ff) and source 00:00:00:00:00:0a of Server -1 .
  • The Leaf -1 associates the frame from Server -1 with a VNI of 50140. The Leaf -1 gets this packet and performs a layer lookup based on (VNI=50140, DMAC=ff: ff:ff:ff:ff:ff). Because this is a lookup miss in the Layer 2 table, the packet is handed off to the VTEP Leaf -1 . The VTEP Leaf -1 encapsulates the packet with an appropriate VXLAN header with SIP set to 192.168.0.8, DIP set to 239.0.0.140, and the VNI in the VXLAN header set to 50140. The UDP source port field is generated and filled based on the hash of the original packet received from Server -1. The UDP destination port is set to the well-known VXLAN port. This encapsulated multicast IP packet is now forwarded toward the upstream switch. And Leaf -1 will learn the Mac address of Server-1 on port Eth1/3 in vlan 140 and put this entry in MAC address table.
  • The upstream switch Spines forwards the packet based on the outer IP header. In other words, based on the DIP being a multicast address (239.0.0.140 ), the packet is Layer 3 multicast forwarded.
  • Regular multicast forwarding results in a packet being forwarded to the VTEPs Leaf 2, leaf 3, leaf 4 respectively. Recall that Leaf -3 VTEP is interested receiver for the multicast group 239.0.0.140 because they sent an IGMP join when Server -2 powered on.
  • The VTEP Leaf 2 , leaf 3 , leaf 4 receive the VXLAN packet and appropriately decapsulate the packet. The well-known UDP destination port serves as the identification of the VXLAN packet. Post-decapsulation, the VTEPs are aware that this is a packet in VNI 50140 (from the VXLAN header) and first perform Layer 2 MAC learning, that is, (50140, 00:00:00:00:00:0a) -> 192.168.0.8 (the SIP in the outer VXLAN header). Subsequently, the Leafs performs a regular Layer 2 lookup on the inner packet, namely based on the key (50140 , ff:ff:ff:ff:ff:ff), and the packet is forwarded to Server -2 which is regular Ethernet broadcast frame. Here Leaf-1 also act as VXLAN Gateway as this perform VLAN – VXLAN encapsulation and decpasulation. This is because a packet with a broadcast MAC is typically sent to all end hosts within that segment where they are discarded as there is no one interested receiver is there .
  • In this way, Server -2 receive the broadcast ARP request from Server -1. Because the ARP request is for 172.21.140.11 , which is the IP address of , Server -2, only , Server -2 responds to the same.
  • In this way, VTEPs learn about remote 00:00:00:00:00:0a aided by IP multicast forwarding. The upstream switches that form the IP core network are completely unaware of the end host MAC addresses and forward only the packet based on the overlay header. In the next section, the packet flow for the unicast ARP response packet is described.

Unicast: ARP Reply

Following are the steps for the ARP response:

  • The ARP response from Server -2 is a unicast packet with SMAC=00:00:00:00:00:0b and DMAC=00:00:00:00:00:0a. Leaf -3 updates the MAC address table with 00:00:00:00:00:0b port 1/3 over vlan 140.
  • The Leaf-3 gets this packet and performs a Layer 2 lookup based on (VNI=50140, DMAC=00:00:00:00:00:0a). Because this entry was learned on the Leaf -3  based on the earlier incoming ARP request, the lookup results in a hit. Consequently, the VTEP leaf -3 takes care of encapsulating the packet with an appropriate VXLAN header. The important fields set by the VTEP are as follows: DIP is set to 192.168.0.8 , the SIP is set to 192.168.0.10, and the VNI to 50140. Because the 00:00:00:00:00:0a to VTEP binding is known, the encapsulated packet is a unicast IP packet. Here Leaf-2 also act as VXLAN Gateway as this perform VLAN – VXLAN encapsulation and decpasulation.
  • This encapsulated unicast IP packet is now forwarded toward the upstream switch. Again, the upstream switch looks at only the outer IP header and forwards the packet toward the VTEP on Leaf -1 .
  • The VXLAN packet received with the DIP corresponding to the VTEP address on Leaf -1 will be appropriately decapsulated. Post-decapsulation, the VTEP performs MAC learning as usual based on the inner packet SMAC and outer overlay header SIP, namely (50140 , 00:00:00:00:00:0b) -> 192.168.0.10. Subsequently, the Leaf -1 performs a regular Layer 2 lookup on the inner packet, namely based on the key (50140 , 00:00:00:00:00:0a), and the packet is forwarded to Server -1.
  • Server -1 and Server -2 now have the IP-MAC binding for each other and can subsequently start sending data traffic to each other. The corresponding VTEPs  are also aware of the MAC-to-VTEP binding and therefore packets between Server -1and Server -2 can be forwarded via regular unicast IP forwarding.

Unicast: Data

Please refer the following diagram

Following are the steps for a unicast data packet flow from Server -1 to Server -2:

  • Server -1 now transmits an IP packet with SIP=172.21.140.10 and DIP=172.21.140.11 in a standard Ethernet frame with SMAC=00:00:00:00:00:0a and DMAC=00:00:00:00:00:0b.
  • VTEP Leaf -1 gets this packet and performs a Layer 2 lookup based on (VNI=50140, DMAC=00:00:00:00:00:0b). Because this entry was learned on the Leaf -1 based on the earlier incoming ARP response, the lookup results in a hit with the result pointing toward VTEP 192.168.0.10. Consequently, the VTEP leaf -1 takes care of encapsulating the packet with an appropriate VXLAN header. The important fields set by the VTEP are as follows: DIP is set to 192.168.0.10, the SIP is set to 192.168.0.8, and the VNI is set to 50140. Because the 00:00:00:00:00:0b to- VTEP binding is known, the encapsulated packet is a unicast IP packet.
  • The VXLAN encapsulated frame is sent to the upstream switch toward Leaf -3. There are multiple paths to reach from 192.168.0.8 to 192.168.0.10, the standard ECMP-based logic would kick in to select one of these paths. The Leaf may choose to select a path based on the inner frame fields such as SMAC. This applies to all unicast forwarding cases including the ARP reply case listed in the previous section.
  • The VXLAN packet received with the DIP=192.168.0.10 will be appropriately decapsulated. Post-decapsulation, the VTEP reinforces the Layer 2 table entry corresponding to (50140, 00:00:00:00:00:0a) -> 192.168.0.8 and subsequently based on the inner payload lookup (50140, 00:00:00:00:00:0b) sends the packet toward Server -2.
  • Reverse traffic from Server -2 to Server -1 will be similarly unicast forwarded.

Inter-VLAN Routing over VXLAN packet flow:

We will be using the same Topology, where all SVI are configured on nexus Core Switch. There are Two SVI vlan 140 (172.21.140.0/24) and Vlan 141 (172.21.141.0/24). Nexus core switch is connected to Leaf 2 and leaf 2 via trunk port. Between Leaf 1, Leaf -2, Spine-1, Spine -2, Leaf =3 and Leaf 4, OSPF is running and Infrastructure is Multicast supported. Server -1 is in vlan 140 with ip address 172.21.140.10 and Server-2 is in Vlan 141 with IP address 172.21.141.11.

Vlan 104 is mapped to VNI 50140 with Multicast address 239.0.0.140 and Vlan 141 is mapped to VNI 50141 with multicast address 239.0.0.141.

Broadcast: ARP Request

Following are the steps for the ARP request:

  • Server -1 wants to start a communication with Server -2. As Server -2 is in the different subnet as Server -1, it sends out an ARP request for its Gateway Vlan 140 172.21.140.1/24 with DMAC set to the broadcast address (ff: ff: ff: ff:ff:ff) and source 00:00:00:00:00:0a of Server -1.
  • The Leaf -1 associates the frame from Server -1 with a VNI of 50140. The Leaf -1 gets this packet and performs a layer lookup based on (VNI=50140, DMAC=ff: ff:ff:ff:ff:ff). Because this is a lookup miss in the Layer 2 table, the packet is handed off to the VTEP Leaf -1 . The VTEP Leaf -1 encapsulates the packet with an appropriate VXLAN header with SIP set to 192.168.0.8, DIP set to 239.0.0.140, and the VNI in the VXLAN header set to 50140. The UDP source port field is generated and filled based on the hash of the original packet received from Server -1. The UDP destination port is set to the well-known VXLAN port. This encapsulated multicast IP packet is now forwarded toward the upstream switch. And Leaf -1 will learn the Mac address of Server-1 on port Eth1/3 in vlan 140 and put this entry in MAC address table.
  • The upstream switch Spines forwards the packet based on the outer IP header. In other words, based on the DIP being a multicast address (239.0.0.140), the packet is Layer 3 multicast Forwarded.
  • Regular multicast forwarding results in a packet being forwarded to the VTEPs Leaf 2, leaf 3, leaf 4 Spine -1, Spine 2 respectively. Upon Receipt, Leaf 2 and Leaf 4 has Switch connected and is having trunk port, Leaf 2 and Leaf 4 will decapslates this VXLAN Packet to regular Ethernet Broadcast frame and floods to its Trunk port. While doing same, Leaf 2- Leaf 4 will learn the following (50140, 00:00:00:00:00:0a) -> 192.168.0.8. Likewise Leaf 3 will also the following 50140, 00:00:00:00:00:0a) -> 192.168.0.8 and discard the packet as there is no host 172.21.140.1/24 connected to Leaf -3.
  • Once the Core Switch receives the ARP packet, It will learn the Mac address of Server -1 00.00.00.00.00.0a on its Trunk port and it will process to reply for it known as ARP Reply.

Unicast: ARP Reply

  • Following are the steps for the ARP response:
  • The ARP response from Gateway 172.21.140.1/24 is a unicast packet with SMAC=00:00:00:00:00:0C and DMAC=00:00:00:00:00:0a.
  • As soon as Leaf -2 will receive this ARP reply , the Leaf-2 performs a Layer 2 lookup based on (VNI=50140, DMAC=00:00:00:00:00:0a). Because this entry was learned on the Leaf -3 based on the earlier incoming ARP request, the lookup results in a hit. Consequently, the VTEP leaf -3 takes care of encapsulating the packet with an appropriate VXLAN header. The important fields set by the VTEP are as follows: DIP is set to 192.168.0.8, the SIP is set to 192.168.0.9 and the VNI to 50140. Because the 00:00:00:00:00:0a to VTEP binding is known, the encapsulated packet is a unicast IP packet. Here Leaf-2 also act as VXLAN Gateway as this perform VLAN – VXLAN encapsulation and decpasulation.
  • This encapsulated unicast IP packet is now forwarded toward the upstream switch. Again, the upstream switch looks at only the outer IP header and forwards the packet toward the VTEP on Leaf -1.
  • The VXLAN packet received with the DIP corresponding to the VTEP address on Leaf -1 will be appropriately decapsulated. Post-decapsulation, the VTEP performs MAC learning as usual based on the inner packet SMAC and outer overlay header SIP, namely (50140 , 00:00:00:00:00:0C) -> 192.168.0.9. Subsequently, the Leaf -1 performs a regular Layer 2 lookup on the inner packet, namely based on the key (50140 , 00:00:00:00:00:0a), and the packet is forwarded to Server -1.
  • Now Server -1 now have the IP-MAC binding for Its Gateway can subsequently start sending data traffic.

Unicast: Data

Following are the steps for a unicast data packet flow from Server -1 to Server -2:

  • Server -1 now transmits an IP packet with SIP=172.21.140.10 and DIP=172.21.141.11 in a standard Ethernet frame with SMAC=00:00:00:00:00:0a and DMAC=00:00:00:00:00:0C which is MAC address of Gateway.
  • VTEP Leaf -1 gets this packet and performs a Layer 2 lookup based on (VNI=50140, DMAC=00:00:00:00:00:0C). Because this entry was learned on the Leaf -1 based on the Earlier incoming ARP response, the lookup results in a hit with the result pointing toward VTEP 192.168.0.9. Consequently, the VTEP leaf -1 takes care of encapsulating the packet with an appropriate VXLAN header. The important fields set by the VTEP are as follows: DIP is set to 192.168.0.9 , the SIP is set to 192.168.0.8, and the VNI is set to 50140. Because the 00:00:00:00:00:0C to- VTEP binding is known, the encapsulated packet is a unicast IP packet.
  • The VXLAN encapsulated frame is sent to the upstream switch toward Leaf -2. There are multiple paths to reach from 192.168.0.8 to 192.168.0.9, the standard ECMP-based logic would kick in to select one of these paths. The Leaf may choose to select a path based on the inner frame fields such as SMAC. This applies to all unicast forwarding cases including the ARP reply case listed in the previous section.
  • The VXLAN packet received with the DIP=192.168.0.9  will be appropriately decapsulated. Post-decapsulation, the VTEP reinforces the Layer 2 table entry corresponding to (50140, 00:00:00:00:00:0a) -> 192.168.0.8 and subsequently based on the inner payload lookup sends the packet toward Trunk port to Core Switch where SVI 140 is present. Here Leaf -2 is acting as VXLAN Gateway.
  • Now As soon as Ethernet frame is received on core switch, Core switch will see the DIP of Ethernet frame which is 172.21.141.11/24 and find that the subnet is locally connected as SVI 141 with 172.21.141.1/24. SVI 140 will hand over the packet to SVI 141 Gateway. The SVI 141 will see the MAC add of the Server -2 172.16.141.11/24 , which it does not have and then will send the ARP request to Server -2 over VXLAN Overlay in same way we discussed in Broadcast ARP request Section . Now when VXLAN Encapsulated traffic will reach to Leaf -3, Leaf -3 will learn the MAC address of SVI 141 Gateway 00.00.00.00.00.0d from 192.168.0.11 if the packet is received from 192.168.0.11.
  • Leaf -3 will decpsulate the VXLAN packet and forwards the ARP request to Server -2. Now Server -2 will reply it as we say it ARP reply with its MAC address 00.00.00.00.00.0b. Here SMAC will be 00.00.00.00.00.0b and DMAc will be 00.00.00.00.00.0d as unicast packet. leaf -3 upon receive the ARP reply , will learn the MAC address of Servre-2 in its MAC address Table and look for ( 50140 , 00.00.00.00.00.0d) and will find that it can be reached from VTEP -4 912.168.0.11. Now Leaf – 3 will encapsulate this packet to VXLAN packet with SIP 192.168.0.10 and Dip is 192.168.0.11.
  • When Leaf -4 will receive the VXLAN packet sent from Leaf-3 , it will decapsulate it and Lean the ( 50140, 00.00.00.00.00.0b ) from 192.168.0.10 and forwards the plain Ethernet packet to Core switch as Ether frame has DMAC 00.00.00.00.00.0d
  • Upon receiving the ARP reply Core switch learn the 00.00.00.00.0b MAC address reachable from port eth1/2 . SVI now know the MAC address of Server -2 and now send the Data packet with SMAC 00.00.00.00.00.0a, DMAC 00.00.00.00.00.0b, SIP 172.21.140.10, ], and DIP 172.21.141.11 over trunk port. Leaf 4 will receive the Ethernet frame and lookup for (50140, 00.00.00.00.00.0b) and will find hit and is reachable via Leaf -3 192.168.0.10. Leaf -4 will encapsulate this packet to VXLAN with SIP 192.168.0.11 and DIP 192.168.0.10 and unicast it. As Soon as Leaf -3 recieves this packet, will leach the (50140, 00.00.00.00.00.0a) from 192.168.0.8 and decapsulate it and send it to Port Eth1/3 where Server -2 is connected.

Now Server -1 and Server -2 now know the each other mac address and all the infrastructure is aware about MAC address and its connected Leaf , now traffic will  flow between Leaf -1 and Leaf -3 without going to Gateway every time.

Inter-VXLAN Communication

VXLAN, it is desirable to support inter-VXLAN communication especially for scenarios in which a tenant may lease multiple virtual networks and require communication between end hosts in these networks. However, it is still mandatory for the end hosts of the tenant to be shielded from other tenant end hosts to provide the necessary isolation and security. This can be achieved via tenant-specific virtual routing and forwarding instances (VRFs). Just like a router enables inter-VLAN communication, a similar VXLAN router device is required to facilitate inter- VXLAN communication. Sometimes to distinguish the functionality explicitly, a VXLAN gateway is called a Layer 2 VXLAN gateway (described in the previous section) and a VXLAN router is called a Layer 3 VXLAN gateway.

Such a device would need to perform the following; recall that SIP and DIP refer to source and destination IP addresses, respectively:

  • Decapsulate an incoming VXLAN packet, and extract the source_VNI and the inner payload.
  • Perform Layer 2 learning based on (source_VNI, Inner_SMAC) ->Outer_SIP in the VXLAN header.
  • Perform a (source_VNI, Inner_DMAC) lookup on the inner payload that in turn can drive a Layer 3 lookup based on a router MAC (RMAC) match.
  • The source_VNI would yield the VRF-ID corresponding to the tenant that in turn would be used in the Layer 3 Forwarding Information Base (FIB) lookup based on (VRF-ID, Inner_DIP) where Inner_DIP is derived from the payload.
  • The FIB lookup hit results in providing the (destination_VNI, DMAC) corresponding to the destination end host that in turn would go through another Layer 2 lookup to determine where that end host resides.
  • This Layer 2 lookup in turn yields the VTEP behind which the destination end host resides.
  • The DMAC, SMAC of the inner payload is appropriately rewritten to the destination end host MAC and RMAC, respectively, along with a TTL decrement indicating that the packet was routed.
  • Subsequently, the packet is VXLAN encapsulated with VNI set to destination_VNI, DIP corresponding to the destination VTEP, SIP corresponding to that of the VXLAN router device VTEP, and then dispatched toward the VTEP behind which the destination resides. Again, such a device can be implemented in both a software or hardware form factor, although given the extensive functionality and high throughput that it needs to support, the latter seems like a more suitable choice. BELOW FIGURE shows a scenario in which a given tenant has leased two VXLAN segments: 11000 corresponding to tenant subnet 192.168.1.0/24 and 12000 corresponding to tenant subnet 192.168.2.0/24. End hosts VM-A and VM-B are deployed in segment 11000 and end hosts VMD and VM-E are deployed in segment 12000.

Inter-VLAN Tag Handling and VLAN Translation

  • As per VXLAN IETF draft, the ingress VTEP device will remove the 802.1Q VLAN tag in the original Layer 2 Ethernet packet before encapsulating the packet into the VXLAN format to transmit it through the underlay network.
  • Based on their own VLAN-to-VXLAN VNI mapping configurations the remote VTEP devices have information about the VLAN in which the packet will be placed based on. With this mechanism or method, VTEP devices for the same VXLAN VNI could possibly map a VXLAN VNI to different VLANs.
  • Figure shows an example in which Cisco Nexus 9300 VTEP-1 maps VLAN 600 to VXLAN VNI 6100, whereas Cisco Nexus 9300 VTEP-2 maps VLAN 700 to VXLAN VNI 7100. As a result, VLAN 600 behind VTEP-1 and VLAN 700 behind VTEP-2 are bridged into one Layer 2 domain in the overlay network, and hosts within these two vlans gain direct Layer 2 adjacency.


Comment

    You are will be the first.

LEAVE A COMMENT

Please login here to comment.