EMAIL SUPPORT

dclessons@dclessons.com

LOCATION

NZ

Logical Router & Traffic Flow

Logical Router & Traffic Flow

Posted on Jan 17, 2020 (0)

Logical Router

NSX Logical router or logical router whatever we say is a router whose data plane runs in ESXi host kernel. Below figure states that a logical router in ESXi host is connected to two logical Switches or VNI (VNI 5555 and VNI 6666) and each ESXi host has two powered ON VM. Both ESXi host have same instance of logical router.

A single ESXi host can run 100 different logical router instances and each instance of logical router in ESXi host are independent and separate from each other in running in same ESXi host.

In NSX domain about 1200 local instance of logical router can run and each logical router can support 1000 logical interface (LIFs).

NSX Controller layer 3 master assigns the logical router to NSX controller to manage its control plane and each NSX Controller who is responsible for logical router keeps the copy of master routing table for that logical router. Now this NSX controller who is responsible for Logical router will push the copy of routing table to each ESXi host where logical router instance runs.

If there is any change in routing table, the NSX Controller who is responsible for that logical router, pushes the updated routing table to all its related ESXi hosts running logical router instance.

Each Logical router has two types of interfaces:

Internal LIF: This interface is used to provide connection to Logical switches where VM are connected. There will be no Layer 3 Control plane traffic like OSPF, BGP hellos seen on this interface. When LIF is connected to logical switch it is also termed as VXLAN LIF or when LIF is connected to VLN backed dvPortgroup, LIF is also called as VLAN LIF.

A VLAN LIF can only support up to 0 to 4095 VLAN

External LIF: This interface is used to provide connectivity to NSX Edge Service gateway so that it can provide connectivity to logical switches as well as VM connected on it. In this interface Layer 3 Control plane traffic like OSPF, BGP hellos are seen.

Types of Logical Router

There are two types of logical routers:

  • Distributed logical router
  • Universal logical router

Distributed Logical Router: A logical router that connects to global logical switches is called as distributed logical router.

Universal Logical Router: A logical router that connects to Universal logical switches is called as Universal logical router. A ULR does not supports VLAN LIFs, and only supports VXLAN LIFs.

vMAC & pMAC:

There are two types of MAC used in logical router. Each copies of logical router in ESXi host gets at least two MAC address, the first one is called vMAC and is same for all logical router copies and its standard value is 02:50:56:56:44:52.

The second MAC address is called as pMAC address which is assigned to each logical router instance per dvUplink and is based on teaming policy which was selected during host configuration. pMAC is generated by each ESXi host independently with VMWare unique ID of 00:50:56.

When logical router send the ARP request or replies to ARP request it responds with vMAC. A logical router for any egress traffic from its VXLAN LIFs, it uses vMAC. And for all other traffic including traffic over VLAN LIFs logical router uses pMAC.

Assume you have a VM connected to a logical switch, a logical router with an internal LIF in the same logical switch, and the VM has a default gateway of the LIF’s IP. When the VM sends an ARP request for its default gateway’s MAC, the logical router in the same ESXi host where the VM is running sends back an ARP reply with the vMAC.

In this case, when the virtual machine vMotions, the MAC address of the VMs’ default gateway will be the same at the destination host because it is the vMAC. The same is true if the VM is connected to a universal logical switch with a ULR for its default gateway.

Logical Router Control VM:

As soon as logical router instance is created, one virtual appliance said to be as Logical Router Control VM is deployed.

Control VM is used to handle dynamic component of routers control plane that is making routing table adjacencies, creating forwarding database, routing table, etc.

If the environment has ULR, multiple independent control VM are deployed, and one per NSX Manager in the Cross VCenter domain. One Control VM maintains the routing table , its copy is sent to each ESXi host over which logical router instance is running.

Control VM forwards the dynamic routing table to the NSX Controller, which would merge it with its copy of the static routing table to create the master routing table. A copy of the master routing table is forwarded by the NSX Controller to the ESXi hosts that are running a copy of the logical router instance. Future dynamic routing table updates follow the same communication path.

The logical router instance itself is a data plane entity and therefore can’t do any dynamic control plane, such as running BGP. To participate in the routing control plane process the Control VM automatically has one of its interfaces connected to the uplink segment of the Uplink LIF

The Control VM should never have one of its interfaces connected to an internal segment, with one optional exception: You may connect the Control VM’s High Availability (HA) interface to an internal segment. Prior to NSX 6.2, the High Availability interface was called the Management interface.

The Control VM, being that it is a virtual machine, has 10 interfaces, one of which must be reserved for the HA interface. The HA interface is used to get SSH access to the Control VM as well as for syslog.

Below figure shows a logical view of the Control VM with an interface connected to an uplink segment, where an NSX Edge Services Gateway is connected, and the HA interface connected to a management segment.

Locale ID:

The universal logical router has one special feature not available to the distributed logical router below figure which shows a ULR in two data centers. Virtual machine Web-Ser-01 is in Site -1 Dc and Domingo, and Web-Ser-02 is in Site-2 DC.

Both the NSX Edge in Site-1 DC and the NSX Edge in Data Center Site-2 DC are advertising the subnet of Web-Ser-01 and Web-Ser-02 to the physical world while advertising a default route to the ULR. A user in the Site-1 DC sends a web page request to Web-Ser-01, which is routed via the Site-1 DC NSX Edge. The response from Web-Ser-01 is received by the local copy of the ULR in Data Center Site-1 DC, which sees two default routes, one to each NSX Edge. About half the time the ULR forwards the traffic to the NSX Edge in the Site-2 DC Data Center, which then sends the traffic over to the physical network in Site-2 DC. If the user had requested the page from Web-Ser-02, the reverse would be true.

This is an example of network tromboning. Network tromboning is defined as asymmetrical network traffic that does not use the best path to the destination, causing traffic to flow over nonoptimal paths. Network tromboning typically occurs when subnet location information is obfuscated by the stretching of Layer 2, such as when we use universal logical switches.

With Locale ID we can provide some locality information that is used by the ULR for egress traffic decisions, thus allowing for local egress. The Locale ID is a number in hex, 128 bits long, that is mutually shared by the Control VM and all ULR copies in the same location, such as a data center. When the Control VM sends routing table information to the NSX Controller responsible for the ULR, the NSX Controller only shares the route information with those ESXi hosts running copies of the ULR with the same Locale ID as the Control VM.

Below figure

Which now has two Control VMs and the ULR has been configured for local egress. The Control VM in the Site-1 DC Data Center has the same Locale ID as the ESXi hosts in the Site-1 DC Data Center. The Control VM in the Site-2 DC Data Center has the same Locale ID as the ESXi host in the Site-2 DC Data Center.

The Control VM in Site-1 DC only exchanges routing information with the NSX Edge in Site-1 DC. The Control VM in Site-2 DC only exchanges routing information with the NSX Edge in Site-2 DC. Now when Web-Ser-01 responds to the user, the ULR in Site-1 DC only knows of the routes advertised by the NSX Edge in Site-1 DC, thus it forwards all traffic to the NSX Edge in Site-1 DC.

All copies of the ULR, regardless of the Locale ID, have the same LIFs and directly connected subnets in the routing table.

Logical Router Traffic Flow:

In this we will refer below diagram to understand the logical router traffic flow:

Below is the IP addressing scheme of VM, LIF details and ESXi host information:

Two ESXi clusters, in the same universal transport zone, all configured to support NSX.

Logical Router Packet Walk Example 1

In this packet walk, virtual machine SERVERWEB01 sends a packet to virtual machine SERVERDB01, which then responds back to SERVERWEB01 .  Assume the following to be true:

  • SERVERWEB01 and SERVERDB01 have just powered on and have not sent any traffic.
  • SERVERWEB01 and SERVERDB01 are running on ESXi host COM-A1-ESXi01.
  • SERVERWEB01 and SERVERDB01 have a default gateway of .1 in their respective subnets.
  • Monkey Island is the default gateway for SERVERWEB01 and SERVERDB01.
  • SERVERWEB01 knows the IP of SERVERDB01.

 Step 1. SERVERWEB01 notices the IP of SERVERDB01 is in a different subnet from its own and sends an ARP request for its default gateway’s MAC address.

That is Monkey Island’s LIF in the web logical switch.

Step 2. As it is a broadcast, the ARP request is received by Monkey Island’s WEB LIF in VNI 5555 in COM-A1-ESXi01.

SERVERWEB01’s Switch Security module knows this ARP request is for the logical router thus the ARP request is not forwarded to all other VTEPs in the VTEP table.

Step 3. Monkey Island in COM-A1-ESXi01 sends back a unicast to SERVERWEB01 with the ARP reply, with a source MAC of vMAC.

Step 4. SERVERWEB01 receives the ARP reply and uses the information to create the packet to send to SERVERDB01.

  • Source IP: 10.10.10.11
  • Destination IP: 10.10.12.11
  • Source MAC: W01-MAC
  • Destination MAC: vMAC

Step 5. Monkey Island receives the frame in the WEB LIF in COM-A1-ESXi01 and discards the Layer 2 header after confirming that the destination MAC address is the WEB LIF’s.

Step 6. Monkey Island, in COM-A1-ESXi01, then reads the destination IP and searches for the most specific match in the routing table.

The most specific route in the routing table matches the subnet in the DB LIF. This is commonly referred to as “directly connected” or “directly attached.”

Step 7. Monkey Island then looks in its local ARP table for an entry for IP 10.10.12.11.

By “local” ARP table I mean the logical router’s ARP table in ESXi host COM-A1-ESXi01.

Step 8. Not finding an entry, Monkey Island sends out an ARP request for IP 10.10.12.11.

Remember that SERVERDB01 has sent no traffic and therefore Monkey Island couldn’t have an ARP entry for it yet. The ARP request is sent over the DB LIF in COM-A1-ESXi01.

Note: As of NSX 6.2, the source MAC for the ARP request is the vMAC (before it used to be the pMAC).

Step 9. SERVERDB01 receives the ARP request and sends back a unicast ARP reply to Monkey Island’s DB LIF in COM-A1-ESXi01.The Switch Security module in SERVERDB01 snoops the ARP reply and sends an IP report to the universal NSX Controller responsible for the DB logical switch.

Step 10. Monkey Island receives the ARP reply and uses the information to forward the packet to SERVERDB01.

The packet is forwarded over the DB LIF.

  • Source IP: 10.10.10.11
  • Destination IP: 10.10.12.11
  • Source MAC: vMAC
  • Destination MAC: D01-MAC

Logical routers use the vMAC as the source of all packets sent over LIFs.

Step 11. SERVERDB01 receives the frame and processes it.

Steps 12 through 14 only take place if the OS in SERVERDB01 does not add an ARP entry when it receives, and replies, to an ARP request.

Step 12. SERVERDB01 wants to reply back to SERVERWEB01, notices that it is in a separate subnet, and sends an ARP request for its default gateway’s MAC address.

SERVERDB01 sends back an ARP reply to Monkey Island in Step 9, but it does not add DB LIF’s IP/MAC to the ARP table.

This is normal ARP operation of major operating systems to only add a new ARP entry to their ARP table only upon receiving an ARP reply to an ARP request they sent.

Step 13. Monkey Island receives the ARP request over the DB LIF in COM-A1-ESXi01.

Step 14. Monkey Island sends an ARP reply back to SERVERDB01.

Step 15. SERVERDB01 uses the ARP reply info to send the packet to SERVERWEB01.

  • Source IP: 10.10.12.11
  • Destination IP: 10.10.11.11
  • Source MAC: D01-MAC
  • Destination MAC: vMAC

Step 16. Monkey Island receives the frame in the DB LIF in COM-A1-ESXi01 and discards the Layer 2 header after confirming that the destination MAC address is the DB LIF’s.

Step 17. Monkey Island, in COM-A1-ESXi01, then reads the destination IP and searches for the most specific match in the routing table.

The most specific route in the routing table is directly connected to the WEB LIF.

Step 18. Monkey Island then looks in its local ARP table for an entry for IP 10.10.10.11.

Step 19. Not finding an entry, Monkey Island sends out an ARP request for IP 10.10.10.11 over the WEB LIF.

This is for the same reason as in Step 12.

Step 20. SERVERWEB01 receives the ARP request and sends back a unicast ARP reply to Monkey Island’s WEB LIF in COM-A1-ESXi01.

Step 21. Monkey Island receives the ARP reply and uses the information to forward the packet to SERVERWEB01.

The packet is forwarded over the WEB LIF.

  • Source IP: 10.10.12.11
  • Destination IP: 10.10.10.11
  • Source MAC: vMAC
  • Destination MAC: W01-MAC

Step 22. SERVERWEB01 receives the frame and processes it.

Step 23. Subsequent traffic from SERVERWEB01 toward SERVERDB01 does not require ARP requests.

The virtual machines, the copies of the logical router, and the Switch Security module will not age out the ARP entries if they continue to see traffic sourced from the corresponding IPs before the aged-out timer expires, which is 180 seconds (3 minutes).

Logical Router Packet Walk Example 2

In this packet walk, virtual machine SERVERWEB01 sends a packet to virtual machine SERVERAPP01, which then responds back to SERVERWEB01. Assume the following to be true:

  • SERVERAPP01 has just powered on and has not sent any traffic.
  • SERVERWEB01 is running on ESXi host COM-A1-ESXi01.
  • SERVERAPP01 is running on ESXi host COM-B1-ESXi01.
  • SERVERWEB01 and SERVERAPP01 have a default gateway of .1 in their respective subnets.
  • Monkey Island is the default gateway for SERVERWEB01 and SERVERAPP01.
  • SERVERWEB01 knows the IP of SERVERAPP01.

Below shows the logical view of the scenario for Logical Router Packet Walk Example 2.

Step 1. SERVERWEB01 notices the IP of SERVERAPP01 is in a different subnet from its own and sends a packet to SERVERAPP01 using the default gateway ARP entry in its ARP table.

The ARP table entry was created in Logical Router Packet Walk Example 1.

  • Source IP: 10.10.10.11
  • Destination IP: 10.10.11.11
  • Source MAC: W01-MAC
  • Destination MAC: vMAC

Step 2. Monkey Island receives the frame in the WEB LIF in COM-A1-ESXi01 and discards the Layer 2 header after confirming that the destination MAC address is the WEB LIF’s.

Step 3. Monkey Island, in COM-A1-ESXi01, then reads the destination IP, searches for the most specific match in the routing table, and concludes it is directly connected in the APP LIF.

Step 3 is critical to understanding the functionality of the distributed logical router. The copy of Monkey Island running in COM-A1-ESXi01 does not care in which host SERVERAPP01 is actually located. All that matters is that the IP for SERVERAPP01 is in the subnet directly attached to Monkey Island’s APP LIF.

Step 4. Monkey Island then looks in its local ARP table for an entry for IP 10.10.11.11.

Step 5. Not finding an entry, Monkey Island sends out an ARP request for IP 10.10.11.11.

The ARP request is sent over the APP LIF in COM-A1-ESXi01.

  • Source IP: 10.10.11.1
  • Destination IP: 10.10.11.111
  • Source MAC: vMAC
  • Destination MAC: FFFF.FFFF.FFFF

Step 6. APP logical switch in COM-A1-ESXi01 receives the ARP request and executes its configured Replication Mode to get the frame sent to all ESXi hosts that need it.

Step 7. APP logical switch in COM-B1-ESXi02 receives the ARP request and forwards it to SERVERAPP01.

APP logical switch will not learn MAC address vMAC as coming from VTEP 10.10.40.55.

Before the logical switch processes the ARP request, the logical router Monkey Island in COM-B1-ESXi02 will see the ARP request with a source of the vMAC. Make a note of the ESXi host that sent it, and await for an ARP reply. The logical switch also coordinates with the logical router so it knows vMAC does not belong to a VM. That’s one reason the logical switch never advertises the LIF MAC to the NSX Controller.

Step 8. SERVERAPP01 receives the ARP request and sends back a unicast ARP reply to Monkey Island’s APP LIF.

  • Source IP: 10.10.11.11
  • Destination IP: 10.10.11.1
  • Source MAC: A01-MAC
  • Destination MAC: vMAC

Step 9. The APP logical switch in COM-B1-ESXi02 sees the ARP reply being sent to the vMAC, and gives it to the local copy of Monkey Island. Using the cached information from step 7, the local copy of Monkey Island in COM-B1-ESXi02 shares the ARP reply, via OOB communications, with the copy of Monkey Island in COM-A1-ESXi01.

Step 10. Monkey Island in COM-A1-ESXi01 receives the ARP update and uses the information to forward the packet to SERVERAPP01 over the APP LIF.

The packet is forwarded over the APP LIF.

  • Source IP: 10.10.10.11
  • Destination IP: 10.10.11.11
  • Source MAC: vMAC
  • Destination MAC: A01-MAC

Step 11. SERVERAPP01 receives the frame and processes it.

Step 12. SERVERAPP01 wants to reply back to SERVERWEB01, notices that it is in a separate subnet, and sends an ARP request for its default gateway’s MAC address.

Step 13. Monkey Island receives the ARP request over the APP LIF in COM-B1-ESXi02.

Step 14. Monkey Island sends back an ARP reply to SERVERAPP01.

Step 15. SERVERAPP01 uses the ARP reply info to send the packet to SERVERWEB01.

  • Source IP: 10.10.11.111
  • Destination IP: 10.10.10.11
  • Source MAC: A01-MAC
  • Destination MAC: vMAC

Step 16. Monkey Island in COM-B1-ESXi02 receives the frame in the APP LIF and discards the Layer 2 header after confirming that the destination MAC address is the APP LIF’s.

Step 17. Monkey Island, in COM-B1-ESXi02, then reads the destination IP and searches for the most specific match in the routing table. The most specific route in the routing table is directly connected to the WEB LIF.

Step 18. Monkey Island then looks in its local ARP table for an entry for IP 10.10.10.11.

Step 19. Not finding an entry, Monkey Island sends out an ARP request for IP 10.10.10.11.

The ARP request is sent over the WEB LIF in COM-B1-ESXi02.

  • Source IP: 10.10.11.1
  • Destination IP: 10.10.10.11
  • Source MAC: vMAC
  • Destination MAC: FFFF.FFFF.FFFF

Step 20. Web logical switch in COM-B1-ESXi02 receives the ARP request and follows Replication Mode to get the frame sent to all ESXI hosts that need it.

Step 21. WEB logical switch in COM-A1-ESXi01 receives the ARP request and forwards it to SERVERWEB01.

Monkey Island in COM-A1-ESXi01 notices the ARP request was sent from Monkey Island’s copy in COM-B1-ESXi02.

Step 22. SERVERWEB01 receives the ARP request and sends back a unicast ARP reply to Monkey Island’s WEB LIF in COM-A1-ESXi01.

  • Source IP: 10.10.10.11
  • Destination IP: 10.10.11.1
  • Source MAC: W01-MAC
  • Destination MAC: vMAC

Step 23. The WEB logical switch in COM-A1-ESXi01 sees the ARP reply being sent to the vMAC and gives it to the local copy of Monkey Island. Via OOB communications, the ARP entry is provided to the copy of Monkey Island in COM-B1-ESXi02.

Step 24. Monkey Island in COM-B1-ESXi02 receives the ARP update over its WEB LIF, and uses the information to forward the packet to SERVERWEB01.

The packet is forwarded over the WEB LIF.

  • Source IP: 10.10.11.11
  • Destination IP: 10.10.10.11
  • Source MAC: vMAC
  • Destination MAC: W01-MAC

Step 25. SERVERWEB01 receives the frame and processes it.

Step 26. Subsequent traffic from SERVERWEB01 toward SERVERAPP01 does not require ARP requests.


Comment

    You are will be the first.

LEAVE A COMMENT

Please login here to comment.