EMAIL SUPPORT

dclessons@dclessons.com

LOCATION

NZ

NSX Logical Switches

NSX Logical Switches

Posted on Jan 17, 2020 (0)

NSX Logical Switches

In NSX there are two types of switches Global logical switch and Universal logical switches. ESXi host that uses these switches to create VTEP for VXLAN encapsulation and decaptulation. Both Switches supports single Ethernet broadcast domain that means it has one to one relation between logical switch and its assigned VNI , due to these relationship we assume logical switch and VNI as same.

Global logical switch belong to global transport zone and Universal logical switch belong to universal transport zone.

NSX manager is responsible for managing management plane of these logical switches whereas each ESXi owns its own data plane. NSX Controller or Universal NSX controller handles mostly control plane of logical switches.

Each ESXi host contains the local copies of MAC table per logical switch and contains the following information:

  • MAC address of locally connected VM
  • MAC address of remotely connected host which has active flows.

If there is no activity of any remote VM for more than 5 mins, MAC address of that VM is flushed out.  Logical switch learn the VM MAC from vmx file of the VM, but this default behavior can be changed.

Creating Logical Switches:

NSX manager is used to create the logical switches and is added in transport zone owned by NSX manager. Logical switches can also be created by vSphere Web client.

When Global logical switches are created, it must be assigned to global transport zone and when Universal logical switches are created, it should be assigned to Universal transport zone.

These logical switch can be represented at ESXi host by a dvPortgroup in vDS which is assigned to each NSX cluster during preparation of logical network. Each vCenter can support 10000 dvPortgroup so maximum number of logical switches (global and universal) can also be 10000 which can be deployed in NSX domain.

Logical Switch Tables:

Following prerequisite must be met when a logical switch is created and before first VM is migrated to logical switches.

  • Assignment of NSX controller who will take care of this Logical switch must be done by NSX controller L2 master.
  • All NSX Controller are informed that which NSX controller is taking care of the newly created logical switches.
  • All ESXi host in logical switch transport zone are informed that which NSX controller is taking care of the newly created logical switches.

As soon as the NSX controller has been assigned for particular VNI or logical switch, it has principal copy of three table:

  • The VTEP table
  • The Mac table
  • The ARP table

NSX controller who is responsible for logical switch also keeps connection table of each ESXi host which has at least one powered ON VM. The connection table has following entries:

  • Management VMkernal port of ESXi host
  • TCP port of the connection
  • Logically Connection ID

VTEP table: 

A VTEP table contains list of all VTEP IP that have atleast one Powered ON VM. A VTEP IP is the IP of VXLAN VMkernal port which was assigned during host configuration. ESXI host populates the VTEP table when following action is done for any VM

  • Powers up in the ESXi
  • vMotion to the ESXi

When these above action happens for any VM, ESXi host running VM, sends the request to responsible NSX Controller to add this VTEP to the VTEP table.

Any VTEP will be removed from VTEP table of NSX Controller when following action is done for last VM on that ESXi host.

  • VM Powers Off
  • VMotion from ESXi host.

NSX Controller sends the copy of VTEP table to all ESXi host, whenever any VTEP table is updated (VTEP is added or removed).

Any VTEP table has following five fields, out of which first four entries are provided to NSX controller by ESXi host.

  • VNI ID
  • VTEP ID
  • VTEP Subnet
  • VTEP MAC address
  • Connection-ID

How VTEP table is populated:

This Section assumes that Replication mode for logical switches is configured to unicast or Hybrid.

Below diagram states that there are two ESXi host, with VNI 5555 configured and two VM are connected to this same VNI or logical switch.

Now let’s see how VTEP table is populated.

VM on ESXi A is powered ON, as soon as it is powered on, ESXI –A will send the following information to NSX  from its management VMkernal port over TCP 1234 and once NSX controller will receive this information , NSX Controller will add it in its VTEP table. Following information is send by ESXi host to NSX Controller:

  • VNI
  • VTEP IP
  • VTEP Subnet
  • VTEP MAC address

Once these information is added to VTEP table, NSX Controller will send this table to ESXi host, that is ESXi-A.

Now let’s suppose that VM on ESXi-B is also powered on and same process happens as discussed above, and NSX Controller will now send both VTEP IP information in VTEP table to both ESXi Host for VNI 5555.

How VTEP table is updated:

Now below figure states the current state of NSX environment,

Now let’s suppose VM on ESXi-A powers off, ESXi- A will send the request to NSX Controller so that it can be removed from NSX Controller VTEP table and at same time ESXi-A will flush its VTEP table copy.

As soon as NSX Controller will receive this information, it will remove the ESXi-A VTEP from VTEP table and will send the VTEP table copy to ESXi-B as it has still one VM powered ON.

MAC Table:

There are three MAC table to consider, first one is created & locally owned by logical switch per ESXi host, second MAC table is created by learning vmx file of VM, and third one is created by NSX controller which maps MC address to VTEP address.

The Logical switch uses the vmx file of powered ON VM to learn the MAC address of VM and put it in MAC address table, this table is locally significant to ESXi host and two ESXi host with same VNI will not synchronize MAC table to each other.

Each logical switch in ESXi-host will inform to NSX Controller about its MAC address in the MAC table created by vDS and NSX controller will add these information in its MAC address table and is maintained per VNI or per logical switch. By this method, each NSX Controller have full picture of all MAC address in the NSX domain.

NSX Controller will not push this information to each ESXi-host rather, each ESXi host (per logical switch) will pull this MAC table information from NSX Controller.

When Any VM is powered off in ESXi host , logical switch of that hosted VM will remove its entry from its local MAC table  and same information is propagated to NSX Controller , and as soon as NSX Controller receive this information , it will remove that MAC address form its own MAC address Table.

Now when any vMotion occurs for any VM from one ESXi host to another ESXi Host, following action happens:

  • Source ESXi host inform to NSX Controller and NSX Controller will remove the MAC address from its MAC address table
  • Destination ESXi Host inform to NSX Controller that it has new VM and New MAC address , NESX Controller will add this new Mac to its Mac table with destination ESXi host VTEP IP.
  • Sends Reverse ARP (RARP) to logical switch so that all other VTEP get to know about MAC address.

If the logical switch receives any traffic with destination MAC address and its entry is not present in MAC address table, then ESXi host will send a request to NSX Controller responsible for that VNI to get information about the destination Mac address and if NSX controller has that MAC address in its MAC address table, its sends a response back to ESXi host and upon receive ESXi host adds this entry to its MAC address table with dead timer of 200 sec.

ARP table:

Switch Security module of ESXi host maintains the ARP table for each logical switches by snooping the ARP replies and DHCP acknowledgement. NSX Controller also maintains anther ARP table per logical switch.

If any ESXi host updates its ARP table for any of the connected VM, it will inform to NSX controller and NSX controller likewise will updates its AR table. Two ESXi host with same VNI will not synchronize its ARP table to each other.

ARP table has following entries:

  • VNI
  • MAC address
  • IP address

Below figure states that Local ARP table created by each ESXi host is being shared to NSX controller and this update from ESXi host to NSX Controller is called as IP report for ARP table

When any VM send the ARP request, it is process by Switch security module and the SSM will check its entry in ARP table, if match is found SSM will send the ARP reply to the requestor and does not broadcast the ARP request, however if this does not any entry, it will send request for entry to NSX controller responsible for that VNI.

Unknown Unicast or ARP Request:

If the logical switch receives any unknown unicast frame from VM with Destination MAC address and that destination Mac address is if not present in local MAC address table , then Logical switch will send the query about destination MAC to NSX Controller responsible for that VNI.

If the NSX Controller has its information ,  it will reply back to requestor ESXi host logical switch and upon receipt , it will add the MAC address in its local Mac address table with dead timer of 200 sec.

If any ARP request id received form any local VM, and if its entry is not present in SSM module ARP table, SSM module will send a query to NSX Controller and if the NSX Controller responsible for that VNI has the requested entry will reply back to SSM Module of requestor ESXi host, SSM module upon receipt of the ARP entry adds it in ARP table and sends ARP reply back to VM that requested the ARP.

Replication Mode:

When local switch receives any frame from locally attached VM, whose destination MAC address is not present in its Local MAC address table or if SSM receives any ARP request whose entry is not present in ARP table of SSM ,  a request is sent to NSX Controller, now what happens if any the below condition happens for logical switch :

  • If Logical Switch receives the broadcast from locally connected VM and which is not ARP request
  • If Logical Switch receives the multicast from locally connected VM and that are not IGMP snooping.
  • If NSX Controller do not have any information about unknown unicast MAC address in MAC table
  • If NSX Controller do not have any information about ARP entry in ARP table for requested ARP
  • If NSX controller is down or unavailable

If any of the condition matches, logical switches goes in replication mode and there are three types of replication mode available:

  • Multicast (using this option, it will not leverage the VTEP table)
  • Unicast
  • Hybrid

This replication mode is configured when we tend to configure the transport zone or create the transport zones or this mode can be overwrite while creating logical switches.

Multicast Replication mode:

For multicast mode to work when atleast One VM is powered on over VTEP, ESXi host will send the IGMP join request for multicast group that was provided by NSX Manager over VXLAN VMKernal port. This will make the VM both source and receiver for that multicast group. Same way if the last VM on the VM will be powered off or vMotion, ESXi host will send IGMP Leave request for Multicast group.

Now in Multicast replication mode, if the logical switch does not have MAC address in its MAC address table , it will multicast the packet  by using VXLAN encapsulation with destination IP ( Multicast address for that VNI )

All VTEP that receives the Multicast VXLAN frame, will decapsulate it and send a copy of the BUM to every powered on VM connected to logical switch.

For any logical switch which is configured with multicast replication mode, NSX controller will not keep a VTEP table, ARP table and MAC table.

Multicast Group specific to VNI is configured during segment ID creation, where we provide pool of VNI along with option to enable multicast addressing. If the Number of VNI pool is greater than Pool of multicast, NSX manager will map multiple VNI to same Multicast address.

For Multicast to work, PIM and IGMP must be configured in the underlay.

Disadvantage: With every VM is powered ON, Multicast BUM is created and is processed by every ESXi host on same VNI.

Unicast Replication mode and Proxy VTEP:

When Multicast replication mode is used, NSX controller does not have VTEP, ARP and MAC table for that VNI. Now Following are the case when packet will use unicast mode to replicate if any of the condition becomes true:

  • If Logical Switch receives the broadcast from locally connected VM and which is not ARP request
  • If Logical Switch receives the multicast from locally connected VM and that are not IGMP snooping.
  • If NSX Controller do not have any information about unknown unicast MAC address in MAC table
  • If NSX Controller do not have any information about ARP entry in ARP table for requested ARP
  • If NSX controller is down or unavailable

Disadvantage of this mode is that if VTEP table is very large, source VTEP have to send the large number of unicast copies to VTEP (One copy to One VTEP a per VTEP table).

In order to reduce this impact, PROXT VTEP function is used, in this the Source VTEP will send the BUM packet to Proxy VTEP and it is than proxy VTEP that will unicast the BUM traffic to all VTEP in its VTEP subnet if the BUM packet has replication bit set to 1 which is set by Source VTEP that has generated BUM packet. However Source VTEP will also send the unicast copies of BUM traffic in its local VTEP subnet. Proxy VTEP is selected randomly from VTEP table by source ESXi host per VTEP subnet.

Hybrid Replication Mode:

Hybrid replication   mode is combined of both Unicast replication and multicast replication mode. This mode leverages for principal table of VTEP, ARP, MAC and a frame will be replicated by using hybrid replication principal if any of the condition is true:

  • If Logical Switch receives the broadcast from locally connected VM and which is not ARP request
  • If Logical Switch receives the multicast from locally connected VM and that are not IGMP snooping.
  • If NSX Controller do not have any information about unknown unicast MAC address in MAC table
  • If NSX Controller do not have any information about ARP entry in ARP table for requested ARP
  • If NSX controller is down or unavailable

Working: in this mode, source VTEP will send a single unicast VXLAN frame to proxy VTEP by enabling replication mode bit set to 1 and upon receipt, Proxy VTEP will send the single multicast VXLAN frame to local VTEP subnet. In this mode Proxy VTEP is called multicast Proxy VTEP (MTEP).


Comment

    You are will be the first.

LEAVE A COMMENT

Please login here to comment.