VXLAN Forwarding in ACI
VXLAN Forwarding in ACI
ACI performs L2 and L3 traffic forwarding on VXLAN Overlay . In ACI leaf nodes are called as PTEPs ( Physical Tunnel End Points). But in general leaf switches are called as VTEPs ( VXLAN Tunnel End points).In ACI Layer 2 switched traffic carries a VXLAN Network Identifier (VNID) to identify bridge domains, and Layer 3 (routed) traffic carries a VRF ID in VNID. The encapsulation / decpatulation of VXLAN header is done on VTEP.
The below figure gives an idea about spine and leaf switches , where as leaf switches are actually VTEPS.
VXLAN also allows mapping of location to identity of endpoints. In Cisco ACI, the endpoint’s IP address is the identifier, and a VTEP address designates the location (leaf) where end points are connected .Cisco ACI uses a dedicated VRF and interfaces of the uplinks as the infrastructure to carry VXLAN traffic. The transport infrastructure for VXLAN traffic is known as Overlay-1, which exists as part of tenant Infra.
The Overlay-1 VRF in ACI contains /32 routes to each VTEP, vPC virtual IP address, APIC as well as spine proxy IP address.
TEP IP address:
PTEP IP address :- This is the IP address provided by APIC from Infrastructure Subnets as loopback interface , which was configured on APIC initial configuration phase. This address is used for communication with APIC , other Leafs , MP-BGP peering , traceroute or ping.
Proxy TEP IP address :- This is an anycast IP address that is present across all spines and is used for forwarding lookups into the mapping database.
FTEP IP address :- This address is used when VMM domain ( ESXI environment ) is present. A fabric loopback TEP (FTEP) is used to encapsulate traffic in VXLAN to a vSwitch VTEP . It is a unique FTEP address that is identical on all leaf nodes to allow mobility of downstream VTEP devices.
vPC loopback VTEP address :- This IP address is used when the two leaf nodes forward traffic that enters through a vPC port. Traffic is forwarded by the leaf using the VXLAN encapsulation. This address is shared with the vPC peer.
Following are the control-plane protocols running inside the fabric:
- Intermediate Switch–to–Intermediate Switch (IS-IS) protocol runs on the interfaces between leaf and spine to maintain infrastructure reachability.
- Council of Oracles Protocol referenced as (COOP) runs on the loopback address of PTEP to synchronize and it ensures the consistency of the endpoint database or Mapping table on spine switches.COOP defines roles to spine and leaf. All spines are called as Oracle and all leafs are called as Citizens. If any thing is learned by Citizens they will inform to Oracles and if any thing is learned by Oracles , that will be informed to all Oracles.
- MP-BGP also runs on the PTEP loopback and it advertises all external WAN routes throughout the fabric.
- VXLAN tunnels are created between PTEPs of other leaf and spine proxy TEPs.
Each leaf maintains VXLAN tunnels database with all other leaf nodes on Overlay-1. To check you need to consider the inventory of the fabric.
In this example we will see how lead 10.0.16.21 has Overlay tunnel to 10.0.16.22 .
VXLAN Headers for ACI Fabric:
In the ACI fabric, some extensions have been added to the VXLAN header to support following features in ACI :
- Security zones segmentation ( Tenant )
- Management of filtering rules and policies ( Contracts / Filters )
- Enhanced load-balancing techniques
The VXLAN header used in the Cisco ACI fabric is shown below :
When any packet uses VXLAN in ACI then Minimum MTU size that the fabric ports need to support is the original MTU (1500) + 50 bytes.
Original MTU ( 1500) + 14 Bytes ( Frame ) + 20 Bytes ( IP Header ) + 8 Bytes ( UDP) + 8 bytes ( iVXLAN) = 1550 bytes
The Cisco ACI fabric uplinks are configured for 9150 bytes, which is large enough to accommodate the traffic of servers sending jumbo frames.The MTU of the fabric access ports is 9000 bytes, to accommodate servers sending jumbo frames.
Cisco uses some mote bits and spaces in VXLAN header to use it in its ACI infrastructure. In VXLAN header Cisco Uses following more field:
- Source Group: To determine the Source EPG
- P bit called as Policy bit , When its value is set to 0 , Policy is not instantiated on leaf and if its value is 1 then policy is instantiated.
Underlay Multicast Trees:-
ACI implements and use the routed multicast trees in the underlay network to support multidestination traffic. If the bridge domain is not configured to use the mapping-database lookup, the location of the MAC of remote endpoints (VTEP IP address where endpoint is connected ) is learned as a result of the flooding over the multicast tree.
Each bridge domain is assigned a group IP outer (GIPo) address and is used for all multidestination traffic on the bridge domain inside the fabric.
In Below figure each bridge domain has a multicast IP address, which in this example is 188.8.131.52.
The multicast tree in the underlay is setup automatically without any user configuration. The roots of the trees are always the spine switches, and traffic can be distributed along multiple trees according to a tag, known as the Forwarding Tag ID (FTAG).Forwarding on the spine is independent of the bridge domain and uses only the GIPo address for the forwarding decisions.
END Points in ACI:
ACI uses endpoints for forwarding traffic , endpoints consist of one mac address with one or more ip address associated to it .The below fig shows the a endpoints.
Local End point: Local endpoints are those endpoints which are locally connected to leaf switch ( directly attached ). Local End points are learned from dataplane and are main source of endpoint information for ACI fabric. When leaf switch learn information about the endpoints which are directly connected , leafs put this information in its LST( Local station table ) and reports to COOP ( Council of Oracle protocol ) database located on each spine switch . Due to this activity the Spine is aware of all endpoints discovered by their directly connected leaf switch . As this COOP database is accessible each leaf doesn't have to know about all remote end points.
A leaf switch follows these steps to learn a local endpoint MAC address and IP address:
- When the leaf receives a packet with a source MAC Address (MAC A) and source IP Address (IP A).
- The Cisco ACI leaf learns MAC A as a local endpoint.
- The Cisco ACI leaf learns IP A tied to MAC A if the packet is an ARP packet.
- The Cisco ACI leaf learns IP A which is tied to MAC A if the packet is routed
Remote end points : Remote end points are those ends points which are not directly connected to leaf and is learned on another Leaf as remote via dataplane. So only leaf switch with actual communication traffic create a cache entry for remote endpoints (conversational learning) to forward the packets directly toward the destination leaf.Remote endpoints are those endpoints which have either one MAC address or one IP address but if you compare it with Local End Points, Local End points have combination of a MAC address and IP address . The reason for this difference is that the IP to MAC next-hop resolution can be performed on the destination leaf, and the next-hop MAC address is not required just to reach the destination leaf.
A ACI leaf switch use following steps to learn a remote endpoint MAC or IP address:
1. ACI leaf receives a packet with source MAC A and source IP A from a spine switch.
2. ACI leaf learns MAC A as a remote endpoint and if in its VXLAN header it contains bridge domain information.
3. The Cisco ACI leaf learns IP A as a remote endpoint and if in its VXLAN header it contains VRF information.
ACI Tables and Database:
ACI uses the following tables and Database for traffic forwarding .
Local Station Table : Local station table contains address of all host attached directly to leaf.When End Points are discovered this table is populated and is synchronized with spine-proxy full GST. When any BD is not configured for routing , this table learns only MAC address and if the BD is enabled with routing feature , this table will learn both IP address and MAC address of End Points.
Global Station Table : GST contain address of all host learned as remote end points through active conversation and are locally cached
The Following information are present in these table :
- Local MAC and IP entries of End Points
- Remote MAC if there is an active conversation : VRFs, BD, MAc Address
- Remote IP if there is an active conversation: VRF, IP address