The NSX Edge Cluster Connects the Logical and Physical worlds and usually hosts the NSX Edge Services Gateways and the DLR Control VMs.
There are deployments where the Edge Cluster may contain the NSX Controllers as well.
In this section we discuss how to design an Edge Cluster to survive a failure of an ESXi host or an Physical entire chassis and lower the time of outage.
In the figure below we deploy NSX Edges, E1 and E2, in ECMP mode where they run active/active both from the perspective of the control and data planes. The DLR Control VMs run active/passive while both E1 and E2 running a dynamic routing protocol with the active DLR Control VM.
When the DLR learns a new route from E1 or E2, it will push this information to the NSX Controller cluster. The NSX Controller will update the routing tables in the kernel of each ESXi hosts, which are running this DLR instance.
In the scenario where the ESXi host, which contains the Edge E1, failed:
- The active DLR will update the NSX Controller to remove E1 as next hop, the NSX Controller will update the ESXi host and as a result the “Web” VM traffic will be routed to Edge E2.
The time it takes to re-route the traffic depends on the dynamic protocol converge time.
In the specific scenario where the failed ESXi or Chassis contained both the Edge E1 and the active DLR, we would instead face a longer outage in the forwarded traffic.
The reason for this is that the active DLR is down and cannot detect the failure of the Edge E1 and accordingly update the Controller. The ESXi will continue to forward traffic to Edge E1 until the passive DLR becomes active, learns that the Edge E1 is down and updates the NSX Controller.
The Golden Rule is:
We must ensure that when the Edge Services Gateway and the DLR Control VM belong to the same tenant they will not reside in the same ESXi host. It is better to distribute them between ESXi hosts and reduce the affected functions.
By default when we deploy a NSX Edge or DLR in active/passive mode, the system takes care of creating a DRS anti-affinity rule and this prevents the active/passive VMs from running in the same ESXi host.
We need to build new DRS rules as these default rules will not prevent us from getting to the previous dual failure scenario.
The figure below describes the network logical view for our specific example. This topology is built from two different tenants where each tenant is being represented with a different color and has its own Edge and DLR.
Note connectivity to the physical world is not displayed in the figure below in order to simplify the diagram.
My physical Edge Cluster has four ESXi hosts which are distributed over two physical chassis:
Chassis A: esxcomp-01a, esxcomp-02a
Chassis B: esxcomp-01b, esxcomp-02b
Create DRS Host Group for each Chassis
We start with creating a container for all the ESXi hosts in Chassis A, this container group configured is in DRS Host Group.
Edge Cluster -> Manage -> Settings -> DRS Groups
Click on Create Add button and call this group “Chassis A”.
Container type need to be “Host DRS Group” and Add ESXi host running on Chassis A (esxcomp-01a and esxcomp-02a).
Create another DRS group called Chassis B that contains esxcomp-01b and esxcomp-02b:
VM’s DRS Group for Chassis A:
We need to create a container for VMs that will run in Chassis A. At this point we just name it as Chassis A, but we are not actually putting the VMs in Chassis A.
This Container type is “VM DRS Group”:
VM DRS Group for Chassis B:
At this point we have four DRS groups:
Now we need to take the DRS object we created before: “Chassis A” and “VM to Chassis A “ and tie them together. The next step is to do the same for “Chassis B” and “VM to Chassis B“
* This configuration needs to be part of “DRS Rules”.
Edge Cluster -> Manage -> Settings -> DRS Rules
Click on the Add button in DRS Rules, in the name enter something like: “VM’s Should Run on Chassis A”
In the Type select “Virtual Machine to Hosts” because we want to bind the VM’s group to the Hosts Group.
In the VM group name choose “VM to Chassis A” object.
Below the VM group selection we need to select the group & hosts binding enforcement type.
We have two different options:
“Should run on hosts in group” or “Must run on hosts in group”
If we choose “Must” option, in the event of the failure of all the ESXi hosts in this group (for example if Chassis A had a critical power outage), the other ESXi hosts in the cluster (Chassis B) would not be considered by vSphere HA as a viable option for the recovery of the VMs. “Should” option will take other ESXi hosts as recovery option.
Same for Chassis B:
Now the problem with the current DRS rules and the VM placement in this Edge cluster is that the Edge and DLR Control VM are actually running in the same ESXi host. We need to create anti-affinity DRS rules.
Anti-Affinity Edge and DLR:
An Edge and DLR that belong to the same tenant should not run in the same ESXi host.
For Green Tenant:
For Blue Tenant:
The Final Result:
In the case of a failure of one of the ESXi hosts we don’t face the problem where Edge and DLR are on the same ESXi host, even if we have a catastrophic event of a chassis A or B failure.
Control VM location can move to compute cluster and we can avoid this design consideration.