Thanks to Dimitri Desmidt for reviewing this post.
Teaming policies allow the NSX vSwitch to load balance the traffic between different physical NIC’s (pNICs).
The NSX Reference Design Guide (available at the link https://communities.vmware.com/docs/DOC-27683) contains a table with different teaming policy configuration options.
VTEP – Special VMkernel interface created on the ESXi host to encapsulate/de-encapsulate VXLAN traffic.
VXLAN traffic has a separate IP stack from the other VMkernel interfaces (Management, vMotion, FT, iSCSI).
At first glance of the table, we can see that only some of the supported teaming options imply the creation of Multiple VTEPs (on the same ESXi host).
What is Multi-VTEP Support?
Multiple VTEPs – two or more VTEP kernel interfaces that can be created in an NSX vSwitch.
In a Multiple VTEPs deployment we will have 1:1 mapping with the physical uplinks of the vSwitch. That means each VTEP will send/receive traffic on a specific pNIC interface.
In our example VTEP1 will map to pNIC1 and VTEP2 will map to pNIC2.
This is the point to mention, all VXLAN traffic originated from VTEP1 will go out to pNIC1 , all encapsulated traffic destined to VTEP1 will be received from pNIC1 (the opposite is true for VTEP2).
This means that all VXLAN outbound and VXLAN inbound traffic from pNIC1 will forward from and to VTEP1.
Why Do We Need Multiple VTEPs ?
If we have more than one physical link that we would like to use for VXLAN traffic and the upstream switches do not support LACP (or they are not configured). In that case the use of multiple VTEPs allows to balance the traffic between physical link’s.
Where Do We configure Multiple VTEPs?
Configuration of the multiple VTEPs is done on the Network & Security > Installation > Configure VXLAN tab.
Note: for the creation of multiple VTEPs it is required to select SRCID or SRCMAC as VMKNic teaming policy during the VXLAN configuration of an ESXi cluster.
In this example we can see how 4 VTEPs are going to be created. This Number is coming from the Number of physical uplink’s configured in the vDS.
Source Port Teaming mode (SRCID)
The NSX vSwitch selects an uplink based on the virtual machine portID.
In our example below we have two VTEPs and two physical uplinks.
When VM1 connects to an NSX vSwitch and sends RED traffic, the NSX vSwitch will pick one of the VTEPs (VTEP1) based on Portgroup1 (portID1) to handle this traffic.
VTEP1 will then send this traffic to pNIC1 (since VTEP1 is pinned to this uplink in our specific example).
When VM2, with portID2, connects and generates green traffic, the NSX vSwitch will pick a different VTEP to send out this traffic.
We will use another VTEP since the NSX vSwitch will see a different portID as the source, and VTEP1 already has traffic. VTEP2 will hence forward this traffic to pNIC2.
At this point we are using both of the physical links.
Now VM3, from portID3, connects and sends yellow traffic. The NSX vSwitch with randomly pick one of the VTEPs to handle this traffic.
Both VTEP1 and VTEP2 already have the same number of VM connections (one on each), so there is no preference for who will be selected in terms of port-group balancing.
In this example, VTEP1 was chosen for this and forwards traffic to pNIC1.
Positive aspects: Very simple and there is no need to configure any LACP on the upstream switch.
Negative aspects: If VM1 doesn’t generate heavy traffic, and VM2 is generating heavy VM traffic, the usage of the physical links will not be balanced.
Source MAC Teaming Policy (SRCMAC)
This method is identical to the previous method, NSX vSwitch Select up link based on the virtual machine MAC Address.
Note: given the fact that the behavior is very similar, the recommendation is to use the previously described SRCID teaming option.
In our example we will have two VTEP’s and two physical uplinks.
When VM1 with MAC1 connects to the NSX vSwitch and sends RED traffic, the NSX vSwitch will pick one of the VTEP (VTEP1) to handle this traffic based on VM1 MAC address.
VTEP1 will send this traffic to pNIC1.
When VM2 with MAC2 connects and generates Green traffic, the NSX vSwitch will pick a different VTEP to send this traffic out.
We will use the other VTEP since the NSX vSwitch sees a different MAC address as Source and VTEP1 already have traffic. VTEP2 will forward this traffic to pNIC2.
At this point we are using both of the physical uplinks.
When VM3 with MAC3 connects and sends Yellow traffic, the NSX vSwitch will pick randomly one of the VTEP to handle this traffic.
Both VTEP1 and VTEP2 already have the same number of VM connections, so there is not preference for who will be selected in the context of MAC address balancing.
In our example VTEP1 is chosen and forwards traffic to pNIC1.
Positive points: very simple, no need to configure any LACP on the upstream switch.
Negative points: if VM1 doesn’t generate heavy traffic and VM2 sources very heavy VM traffic, the utilization of the physical uplinks will not be balanced.
LCAPv2 (Enhanced LACP)
Starting from ESXi 5.5 release, VMware improved the hashing method for LACP to be able to leverage up to 20 different HASH algorithms. vSphere 5.5 supports these load balancing types:
- Destination IP address
- Destination IP address and TCP/UDP port
- Destination IP address and VLAN
- Destination IP address, TCP/UDP port and VLAN
- Destination MAC address
- Destination TCP/UDP port
- Source IP address
- Source IP address and TCP/UDP port
- Source IP address and VLAN
- Source IP address, TCP/UDP port and VLAN
- Source MAC address
- Source TCP/UDP port
- Source and destination IP address
- Source and destination IP address and TCP/UDP port
- Source and destination IP address and VLAN
- Source and destination IP address, TCP/UDP port and VLAN
- Source and destination MAC address
- Source and destination TCP/UDP port
- Source port ID
Source or Destination IP Hash will derive from the VTEP IP address located in the outer IP header of the VXLAN frame.
Every time we need to calculate the Hash algorithm for Source or Destination IP Method (option 1 or 7) the VTEP IP address will be used.
Selecting LACPv2 (also referred to as “Enhanced LACP”) as teaming policy between an ESXi host and the ToR switch leads to the creation of one VTEP only.
In this example we have 2 physical uplinks connected to one physical upstream switch.
Those uplinks are bundled together in a single “logical uplink”, which explains why a single VTEP is created.
LACPv2 Source or Destination ip Hash (Bad for NSX)
In this scenario we are selecting the IP Hash algorithm for LACPv2. We have two ESXi hosts, esx1 and esx2.
When VM1 connects to NSX vSwitch on host1 and generates Red traffic toward VM2, the traffic is sent to VTEP1 (the only VTEP we have in the source ESXi host).
Then the NSX vSwitch calculates the Hash value based on Source VTEP IP1 or Destination VTEP IP2 and as a result of this Hash value it selects pNIC1.
When the physical switch connected to esx2 receives the frame, it performs a similar hash calculation (assuming the same IP Hash algorithm is also locally configured on the physical switch) and selects one of the physical links (in this example pNIC1).
Now VM3 connects to NSX vSwitch at esx1 try to send Green traffic to VM4 also connected to esx2, hence VTEP1 will handle this traffic.
NSX vSwitch will calculate the Hash algorithm base on the source IP (VTEP1) or destination IP (VTEP2) or both.
In any case, the result will be electing the same pNIC1 since this is the same Hash that was calculated when VM1 sent traffic to VM2!!!
In this scenario we can see that both traffic flows originated from VM1 and VM3 are using the same pNIC1 uplink.
LACPv2 Layer 4
When using L4 information, the Hashing will be calculated based on “Source port” or “Destination port” (Option 2,4,6,8). In VXLAN that mean Hash will be derived based on the values in the “Outer UDP Header”.
VXLAN destination port is always udp/8472
VMware creates a random UDP source port value based on the L2/L3/L4 headers present in the original frame.
As a result of this method, every time a different flow (identified by the original L2, L3 and L4 values) is established between VMs, a different random UDP source port will be generated.
That mean different Hash results = Better Load Balancing
Now when VM1 and VM3 send traffic, the load-balancing algorithm may select different pNIC’s (the more number of flows are originated, the more even utilization of the uplinks is achieved).
Note: both uplinks can be utilized also for flows originated from the same VM, as long as they are associated to different types of communications (for example HTTP and FTP flows).
Determine the VM pinning:
To know exactly what ESXi uplink is going to be used for traffic sourced by a given VM, it is possible to use the following command after connecting SSH to the ESXi host where that VM is located:
- Type esxtop and then press ‘n’ (shortcut of network).
Here is an example of the output for the esxtop command:
The VM named “web-sv-01a” is pinned to vmnic0. vmk3 is the VMkernel interface used for VXLAN traffic and is pinned to vmnic0.
Note: in vSphere, vmnicx represent the physical uplinks of the ESXi host (also previously called pNics).
When ever possible use LACPv2 with L4 as Hash Algorithm.
Source MAC teaming option is more CPU intensive than Source ID, hence Source ID is recommended when LACP is not possible and it is desired to leverage more than one uplinks for a specific type of traffic.