EMAIL SUPPORT

dclessons@dclessons.com

LOCATION

US

Cisco vManage, vSmart, and vBond Redundancy

Cisco vManage, vSmart, and vBond Redundancy

The best practice in any network setup is to install redundant hardware at all levels, including duplicate routers and other systems, redundant fans, power supplies, and other hardware components within these devices, and backup network connections. Providing high availability in the Cisco SD-WAN solution is no different. A network design that is resilient in the face of hardware failure should include redundant Cisco vBond orchestrators, Cisco vSmart controllers, WAN Edge routers, and any available redundant hardware components. Recovery from the total failure of a hardware component in the Cisco SD-WAN overlay network happens in basically the same way as in any other network: A backup component has been preconfigured, and it can perform and take over all necessary network functions by itself.

The Cisco SD-WAN control plane operates with redundant components to help ensure that the overlay network remains resilient if one of the components fails. The control plane is built on top of DTLS connections between Cisco devices and it is monitored by the Cisco SD-WAN OMP protocol. The OMP establishes peering sessions between pairs of Cisco vSmart controllers and WAN Edge routers and between pairs of Cisco vSmart controllers. These peering sessions allow OMP to monitor the status of the Cisco devices and to share the information among them so that each device in the network has a consistent view of the overlay network.

The exchange of control plane information over OMP peering sessions is a key piece in the Cisco SD-WAN high availability solution:

Cisco vSmart controllers quickly and automatically learn when a Cisco vBond orchestrator or a router joins or leaves the network. They can then rapidly make the necessary modifications to the route information that they send to the routers.

Cisco vBond orchestrators quickly and automatically learn when a device joins the network and when a Cisco vSmart controller leaves the network. They can then rapidly make the necessary changes to the list of Cisco vSmart controller IP addresses they send to routers joining the network.

Cisco vBond orchestrators learn when a domain has multiple Cisco vSmart controllers and can then provide multiple Cisco vSmart controller addresses to routers joining the network.

Cisco vSmart controllers learn about the presence of other Cisco vSmart controllers, and they all automatically synchronize their route tables. If one Cisco vSmart controller fails, the remaining systems take over management of the control plane automatically, and all routers in the network continue to receive current, consistent routing and TLOC updates from the remaining Cisco vSmart controllers.

The combination of hardware component redundancy with the architecture of the Cisco SD-WAN control plane results in a highly available network that continues to operate normally and without interruption when a failure occurs in one of the redundant control plane components. Recovery from the total failure of a Cisco vSmart controller, Cisco vBond orchestrator, or WAN Edge router in the Cisco SD-WAN overlay network happens in basically the same way as the recovery from the failure of a regular router or server on the network—a preconfigured backup component can perform all necessary functions by itself.

In the Cisco SD-WAN solution, network operation continues without interruption when a network device fails and a redundant device is present. This process applies to all Cisco devices—Cisco vBond orchestrators, Cisco vSmart controllers, and WAN Edge routers. No user configuration is required to implement this behavior; it happens automatically. The OMP peering sessions between Cisco devices help to ensure that all the devices have a current and accurate view of the network topology.

Cisco vManage Redundancy

Cisco vManage is a centralized network management system that enables the configuration and management of Cisco devices in the overlay network. It also provides a real-time dashboard of the status of the network and network devices. Cisco vManage maintains permanent communication channels with all Cisco WAN Edge routers in the network. Over these channels, Cisco vManage pushes out files that list the serial numbers of all valid devices, pushes out each device's configuration, and pushes out new software images as part of a software upgrade process. From each network device, Cisco vManage receives various status information that is displayed on the Cisco vManage Dashboard and other screens.

A highly available Cisco SD-WAN network contains three or more Cisco vManage devices in each domain. This scenario is referred to as a cluster of Cisco vManage devices, and each Cisco vManage device in a cluster is referred to as a Cisco vManage node. As of the latest release of code, each large Cisco vManage node in a cluster can manage approximately 1500 devices, and a cluster of three large Cisco vManage nodes can manage up to 5000 devices. Cisco vManage devices automatically load-balance the devices that they manage. With three devices, the Cisco vManage cluster remains operational if one of the devices in that cluster fails.

A Cisco vManage cluster consists of the following architectural components:

  • Application server: Serves as a web server for user sessions. Through these sessions, a logged-in user can view a high-level dashboard summary of network events and status and drill down to view details of these events. A user can also manage network serial number files, certificates, software upgrades, device reboots, and Cisco vManage cluster configuration from the Cisco vManage application server.
  • Configuration database: Stores the inventory, state, and the configurations for all Cisco WAN Edge routers.
  • Network configuration system: Stores all configuration information, policies, templates, certificates, and more.
  • Statistics database: Stores the statistical information collected from all Cisco devices in the overlay network.
  • Message bus: Communication bus among the different Cisco vManage nodes. This bus is used to share data and coordinate operations among Cisco vManage nodes in the cluster.
  • Load balancer: The Cisco vManage cluster requires a load balancer to distribute user login sessions among Cisco vManage nodes in the cluster. It is recommended that you use an external load balancer. However, if you choose to use a Cisco vManage node for this purpose, all HTTP and HTTPS traffic directed to its IP address is redirected to other nodes in the cluster.
  • Coordination server: It is used internally by the Messaging server.

The Statistics database and Configuration database services must run on an odd number of Cisco vManage nodes, with a minimum of three. For these databases to be writeable, there must be a quorum of Cisco vManage nodes running, and they should be in sync. A quorum is a simple majority. For example, if you have a cluster of three Cisco vManage devices running these databases, then the two must be running and in sync. Initially, all Cisco vManage devices run the same services. However, you can choose not to run some services on some devices. From the Cluster Management window in the Cisco vManage GUI, you can choose the services that can run on each Cisco vManage. You can add a fourth Cisco vManage device to load-balance more Cisco WAN Edge routers. In such a case, disable the Statistics database and Configuration database on one of the Cisco vManage devices because those services need to run on an odd number of devices. Optionally, you can run the Configuration database on a single Cisco vManage device to reduce the amount of information that is shared between the devices and reduce the load.

[payment]/payment]

Because so much information is shared between Cisco vManage devices, a separate interface is used for intercluster communication. Do not attempt to use the same interface for the cluster and control connections to the other devices. Also, the latency between the devices must be very low. This interface must be in VPN 0. Changes to the cluster configuration require that the NMS services are reloaded, at least, but often a reboot is required. Any Cisco vManage cluster configuration changes should be made during a maintenance window.

The Cisco vManage cluster implements an active-active architecture in the following way:

  • Each Cisco vManage node in the cluster is an independent processing node.
  • All user sessions to the application server are load-balanced, using either an internal Cisco vManage load balancer or an external load balancer.
  • All control sessions between Cisco vManage application servers and the routers are load balanced. A single large Cisco vManage node can manage a maximum of about 1500 Cisco WAN Edge routers. However, all the controller sessions—the sessions between Cisco vManage nodes and Cisco vSmart controllers, and the sessions between Cisco vManage nodes and Cisco vBond orchestrators—are arranged in a full-mesh topology.
  • The configuration and statistics databases can be replicated across Cisco vManage nodes, which can be accessed and used by all Cisco vManage nodes.
  • If one of the Cisco vManage nodes in the cluster fails or otherwise becomes unavailable, the network management services provided by Cisco vManage are still fully available across the network.

The message bus among Cisco vManage nodes in the cluster allows all the nodes to communicate using an out-of-band network. This design leverages a third virtual network interface card (vNIC) on the Cisco vManage virtual machine and avoids using WAN bandwidth for management traffic.

  • All servers in the cluster act as active-active nodes:
    • All cluster members must be in the same data center in the metro area.
  • For geo redundancy, vManage servers operate in active-standby mode:
    • Not clustered
    • Database replication between sites
  • Loss of all vManage servers has no impact on fabric operation:
    • No administrative changes
    • No statistics collection

From any other device, such as a WAN Edge, Cisco vManage appears as a single IP address. Load balancing between devices within the cluster is not visible to other devices.

Within a cluster, all Cisco vManage nodes are in active-active mode. For this mode to work, all cluster members must be in the same metro area to keep the delay between single servers low. If failover between regions is required, only an active/standby model is supported. The Cisco vManage database is synchronized between sites, but only the active Cisco vManage cluster is visible to the network during normal operation.

A complete loss of Cisco vManage does not impact the data plane operation. Traffic flows as before. In the absence of an available Cisco vManage, no administrative changes can be made because they must be made on Cisco vManage. There is no statistics collection of data because WAN Edge routers report statistics directly to Cisco vManage.

Cisco vSmart Redundancy

Cisco vSmart controllers are the central orchestrators of the control plane. They have permanent communication channels with all Cisco devices in the network. Over the DTLS connections between Cisco vSmart controllers and Cisco vBond orchestrators and between pairs of Cisco vSmart controllers, the devices regularly exchange their network view, to help ensure that their route tables remain synchronized. Cisco vSmart controllers pass accurate and timely route information over DTLS connections to WAN Edge routers.

A highly available Cisco SD-WAN network contains two or more vSmart controllers in each domain. A Cisco SD-WAN domain can have up to 12 Cisco vSmart controllers, and each WAN Edge router, by default, connects to two of them over each transport. When the number of Cisco vSmart controllers in a domain is greater than the maximum number of controllers that a domain's routers can connect to, Cisco SD-WAN software load-balances the connections among the available Cisco vSmart controllers.

While the configurations on all Cisco vSmart controllers must be functionally similar, the control policies must be identical to guarantee that, at any time, all WAN Edge routers receive consistent views of the network. If the control policies are not absolutely identical, different Cisco vSmart controllers might give different information to a WAN Edge router, and the likely result will be network connectivity issues.

  • If all Cisco vSmart controllers fail or become unreachable, WAN Edge routers continue operating on a last-known good state for a configurable amount of time (12 hours, by default).
    • No change is allowed.

You can place Cisco vSmart controllers anywhere in the network. It is highly recommended that Cisco vSmart controllers be geographically dispersed for availability. Cisco vSmart controllers are the primary controllers of the network. To maintain this control, they maintain permanent DTLS connections to all Cisco vBond orchestrators and WAN Edge routers. These connections allow Cisco vSmart controllers to be constantly aware of any changes in the network topology. When a network has multiple Cisco vSmart controllers:

  • There is a full mesh of OMP sessions among the vSmart controllers. Over the OMP sessions, Cisco vSmart controllers advertise routes, TLOCs, services, policies, and encryption keys. This exchange of information allows Cisco vSmart controllers to remain synchronized.
  • Each vSmart controller has a permanent DTLS connection to each vBond orchestrator.
  • The vSmart controllers have permanent DTLS connections to the WAN Edge routers. Specifically, each router has a DTLS connection to one of the vSmart controllers.

If one of Cisco vSmart controllers fails, the other Cisco vSmart controllers seamlessly take over handling network control. The remaining Cisco vSmart controllers can work with WAN Edge routers joining the network and continue sending route updates to the routers. As long as one Cisco vSmart controller is present and operating in the domain, the Cisco SD-WAN network can continue operating without interruption. The Cisco SD-WAN overlay network works properly only when the control policies on all Cisco vSmart controllers are identical. Even the slightest difference in the policies results in issues with the functioning of the network.

Cisco vBond Redundancy

Cisco vBond performs two key functions in the Cisco SD-WAN overlay network:

  • It authenticates and validates all Cisco vSmart controllers and WAN Edge routers that attempt to join the Cisco SD-WAN network.
  • It orchestrates the control plane connections between Cisco vSmart controllers and WAN Edge routers, thus enabling Cisco vSmart controllers and WAN Edge routers to connect to each other in the Cisco SD-WAN network.

Cisco vBond runs as a virtual machine on a network server. Cisco vBond can also run on a WAN Edge router configured to be a Cisco vBond, which is not recommended and limits the number of router control connections to 50. If running the Cisco vBond daemon on a router, note that only one Cisco vBond daemon can run at a time on a router. To provide redundancy and high availability, the network must have two or more routers that function as Cisco vBond orchestrators.

Multiple Cisco vBond orchestrators help to ensure that one of them is always available whenever a Cisco device such as a WAN Edge router or a Cisco vSmart controller is attempting to join the network.

Considerations for Cisco SD-WAN deployment:

  • Transient connection.
  • Fully qualified domain name (FQDN) must be configured.
  • DNS load sharing.
  • No impact if WAN Edge routers can connect to at least one Cisco vBond.
  • If all Cisco vBonds fail or become unreachable, no new WAN Edge routers can join the overlay.

Every time that a WAN Edge boots, the WAN Edge connects first to Cisco vBond. For this reason, the IP address or hostname of Cisco vBond must be configured on each WAN Edge or provided through plug-and-play or Zero-Touch Provisioning (ZTP). Cisco vBond provides the WAN Edge router with all information that it must have to operate, such as which other controllers to connect to.

A WAN Edge router learns that it acts as a Cisco vBond orchestrator from its configuration. You include the local option in the system vbond configuration command, which defines the IP address (or addresses) of Cisco vBond in the Cisco SD-WAN overlay network. In this command, you also include the local public IP address of Cisco vBond. Even though on Cisco WAN Edge router and Cisco vSmart controllers, you can specify an IP address of Cisco vBond as a DNS name, on Cisco vBond itself, you must specify it as an IP address.

On Cisco vSmart controllers and Cisco WAN Edge devices, when the network has only a single Cisco vBond, you can configure the location of the Cisco vBond system either as an IP address or as the DNS name (such as vbond.cisco.com). You should use the DNS name when the network has two or more vBonds and they must all be reachable. The DNS server then resolves the name to a single IP address that the Cisco vBond returns to the Cisco WAN Edge router. If the DNS name resolves to multiple IP addresses, Cisco vBond returns them all to Cisco WAN Edge router, and the router tries each address sequentially until it forms a successful connection.

Note that even if your Cisco SD-WAN network has only a single Cisco vBond, it is best practice to specify a DNS name rather than an IP address in the system vbond configuration command to have a scalable configuration. Then, if you add additional Cisco vBond to your network, you do not need to change the configurations on any of the routers or Cisco vSmart controllers in your network.

In a network with multiple Cisco vBond orchestrators, if one fails, the other Cisco vBond orchestrators simply continue operating. They can handle all requests by Cisco SD-WAN devices to join the network. From a control plane point of view, each Cisco vBond maintains a permanent DTLS connection to each Cisco vSmart controller in the network. However, note that there are no connections between Cisco vBond orchestrators. As long as one Cisco vBond is present in the domain, the Cisco SD-WAN network continues operating without interruption because Cisco vSmart controllers and routers can still locate each other and join the network.

Because Cisco vBond orchestrators never participate in the data plane of the overlay network, the failure of any Cisco vBond orchestrator has no impact on data traffic. Cisco vBond orchestrators communicate with routers only when the routers are first joining the network. The joining router establishes a transient DTLS connection with a Cisco vBond to learn the IP address of a Cisco vSmart controller. When the Cisco WAN Edge router configuration lists the Cisco vBond address as a DNS name, the router tries each of the Cisco vBond orchestrators in the list, one by one, until it can establish a DTLS connection. This mechanism allows a router to always be able to join the network, even after one of a group of Cisco vBond orchestrators has failed.


Comment

    You are will be the first.

LEAVE A COMMENT

Please login here to comment.