NSX-v Host Preparation

The information in this post is based on my NSX Professional experience in the field and from a lecture by Kevin Barrass, a NSX solution architect.

Thanks toTiran Efrat for reviewing this post.

Host preparation overview

Host preparation is the process in which the NSX manager installs the NSX Kernel module inside vSphere cluster and builds the NSX Control plan fabric.

Before the host preparation process we need to complete:

  1. Register the NSX Manager in the vCenter. This process was covered in NSX-V Troubleshooting registration to vCenter.
  2. Deploy the NSX Controllers, covered in deploying-nsx-v-controller-disappear-from-vsphere-client

Three components are involved during the NSX host preparation:
vCenter, NSX Manager, EAM(ESX Agent Manager).

Host Preperation1

vCenter Server:
Management of vSphere compute infrastructure.

NSX Manager:
Provides the single point of configuration and REST API entry-points in a vSphere environment for NSX.

EAM (ESX Agent Management):
The middleware component between the NSX manager and the vCenter. The EAM is part of the vCenter and is responsible to install the VIB (vSphere Installation Bundles), which are software packages prepared to be installed inside a ESXi host.

Host Preparation process

The host preparation begins when we click the “Install” process in vCenter GUI.

host preparation

host preparation

This process is done in the vSphere Cluster level and not per ESXi host. The EAM will create an agent to track the VIB’s installation process for each host. The VIB’s are being copied from the NSX manager and cache in EAM.
If the VIBs are not present in the ESXi host, the EAM will install the VIBs (ESXi host reboot is not needed).
The EAM will remove installed old version VIBs but an ESXi host reboot is needed.

VIBs installed during host preparation:

The ESXi host has a fully working Control Plane after the host preparation was successfully completed

Two control plan channels will be created:

  • RabbitMQMessage bus: provides communication between the vsfwd process on the ESXi hypervisor to NSX Manager over TCP/5671.
  • User World Agent (UWA) process (netcpa on the ESXi hypervisor): establishes TCP/1234 over SSL communication channels to the Controller Cluster nodes.

Host Preperation2

Troubleshooting Host Preparation


EAM fails to deploy VIBs due to misconfigured DNS or no DNS configuration on host.
We may get a status of “Not Ready”:

Not Ready

This indicates “Agent VIB module not installed” on one or more hosts.

We can check the vSphere ESX Agent Manager for errors:

“vCenter home > vCenter Solutions Manager > vSphere ESX Agent Manager”

On “vSphere ESX Agent Manager”, check the status of “Agencies” prefixed with “_VCNS_153” If any of the agencies has a bad status, select the agency and view its issues:


We need to check the associated log  /var/log/esxupdate.log (on the ESXi host) for more details on host preparation issues.
Log into host in which you have the issue, run “tail /var/log/esxupdate.log” to view the log

esxupdate error1

Configure the DNS settings in the ESXi host for the NSX host preparation to success.


TCP/80 from ESXi to vCenter is blocked:

The ESXi host unable to connect to vCenter EAM on TCP/80:

Could be caused by a firewall block on this port. From the ESXi host /var/log/esxupdate.log file:

esxupdate: esxupdate: ERROR: MetadataDownloadError: (‘http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), None, “( http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), ‘/tmp/tmp_TKl58’, ‘[Errno 4] IOError: <urlopen error [Errno 111] Connection refused>’)”)

The NSX-v has a list of ports that need to be open in order for the host preparation to succeed.
The complete list can be found in:


Older VIB’s version:

If an old VIBs version exists on the ESXi host, EAM will remove the old VIB’s
But host preparation will not automatically continue.

We will need to reboot the ESXi host to complete the process.


ESXi Bootbank Space issue:

If you try Upgrade ESXi 5.1u1 to ESXi 5.5 and then start NSX host preparation you may face issue and from /var/log/esxupdate log file you will see message like:
“Installationerror: the pending transaction required 240MB free space, however the maximum size is 239 MB”
I faced this issue in customer ISO of IBM blade but may appear in other vendors.

Install fresh ESXi 5.5 Customer ISO. (this is the version i upgrade too)


vCenter on Windows, EAM TCP/80 taken by other application:

If the vCenter runs on a Windows machine, other applications can be installed and use port 80,  causing a conflict with EAM port tcp/80.

For example: By default IIS server use TCP/80

Use a different port for EAM:

Changed the port to 80 in eam.properties in \ProgramFiles\VMware\Infrastructure\tomcat\webapps\eam\WEB-INF\


UWA Agent Issues:

In rare cases the installation of the VIBs succeeded but for some reason one or both of the userworld agents does not functioning correctly. This could manifest itself as:
The firewall showing a bad status OR The control plane between hypervisor(s) and the controllers being down
UWA error

If Message bus service is active on NSX Manager:

Check the messaging bus userworld agent status on hosts by running the command /etc/init.d/vShield-Stateful-Firewall status on the ESXi hosts


Check Message bus userworld logs on hosts at /var/log/vsfwd.log

esxcfg-advcfg -l | grep Rmq

Run this command on the ESXi hosts to show all Rmq variables –there should be 16 variable in total

esxcfg-advcfg -g /UserVars/RmqIpAddress

Run this command on the ESXi hosts, it should display the NSX Manager IP address


Run this command on the ESXi hosts to check for active messaging bus connection

esxcli network ip connection list | grep 5671 (Message bus TCP connection)

network connection



The NSX manager has a direct link to download the VIB’s as zip file:



Reverting a NSX prepared ESXi host:

Remove the host from the vSphere cluster:

Put ESXi host in maintenance mode and remove the ESXi host from the cluster. This will automatically uninstall NSX VIBs.

Note: ESXi host must be rebooted to complete the operation.


Manually Uninstall VIB’s:

esxcli software vib remove -n esx-vxlan

esxcli software vib remove -n esx-vsip

esxcli software vib remove -n dvfilter-switch-security

Note: ESXi host must be rebooted to complete the operation

Asymmetric routing with ECMP and Edge Firewall Enabled

What is Asymmetric Routing?

In Asymmetric routing, a packet traverses from a source to a destination in one path and takes a different path when it returns to the source.

Start from version 6.1 NSX Edge can work with ECMP – Equal Cost Multipath, ECMP traffic involved Asymmetric routing between Edges and DLR or between Edge and physical routers.

ECMP Consideration with Asymmetric Routing

ECMP with  Asymmetric routing is not a problem by itself, but will cause problems when more than one NSX Edge in place  and stateful services inserted in the path of the traffic.

Stateful services like firewall, Load Balanced  Network Address Translation (NAT) can’t work with asymmetric routing.

Explain the problem:

User from outside try to access Web VM inside the Data Center. the traffic will pass through E1 Edge.

From E1 the traffic will go to DLR transverse NSX distributed firewall and get to Web VM.

When Web VM respond back the traffic will hit the DLR default gateway. DLR have two option to route the traffic E1 or E2.

If DLR choose E2 the traffic will get the E2 and will Dropped !!!

The reason for this is E2 does not aware the state of session started at E1, replay packet from Red VM arrived to E2 are not match any existing session at E2.
From E2 perspective this is new session need to validate, any new TCP session should start with SYN, since this is not the begin of the session E2 will drop it!!!

Asymmetric Routing with Edge Firewall Enabled

Asymmetric Routing with Edge Firewall Enabled

Note: NSX Distributed firewall is not part of this problem, NSX Distributed firewall implement at the vNic level, all traffic get in/out same vNic.

there is no Asymmetric route in the vNic level, btw this is the reason when we vMotion VM, the Firewall Rule, Connection state is move with the VM itself.

ECMP and Edge Firewall NSX

Starting from version 6.1 when we enable ECMP  on NSX Edge get message:

Enable ECMP in 6.1 version

The firewall service disabled by default:

Enable ECMP in 6.1 version Firewall turnoff

Even if you try to enable it you will get warning message:

Firewall Service in 6.1 with ECMP

In version 6.1.2 when we enable ECMP we get same message:

Enable ECMP in 6.1 version

But the BIG difference is Firewall Service  is Not disable by default. (you need to turn it off)

Even if you have “Any, Any” rule with “Accept” action we still be subject for DROP packet subject of the Asymmetric routing problem!!!

Firewall Service Enable in 6.1.2

Even in Syslog or LogInSight you will not see this DROP packet !!!

The end users expirese for will be some of the session’s are working just fine (this sessions are not asymmetric) other session will drop (asymmetric sessions)

The place i found we can learn packet are drops because state of the session is with the command: show tech-support:

show tech-support
vShield Edge Firewall Packet Counters:
~~~~~~~~~~~~~~~ snip ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
rid    pkts bytes target     prot opt in     out     source               destination         
0        20  2388 ACCEPT     all  --  *      lo             
0        12   720 DROP       all  --  *      *              state INVALID
0        51  7108 block_out  all  --  *      *             
0         0     0 ACCEPT     all  --  *      *              PHYSDEV match --physdev-in tap0 --physdev-out vNic_+
0         0     0 ACCEPT     all  --  *      *              PHYSDEV match --physdev-in vNic_+ --physdev-out tap0
0         0     0 ACCEPT     all  --  *      *              PHYSDEV match --physdev-in na+ --physdev-out vNic_+
0         0     0 ACCEPT     all  --  *      *              PHYSDEV match --physdev-in vNic_+ --physdev-out na+
0         0     0 ACCEPT     all  --  *      *              state RELATED,ESTABLISHED
0        51  7108 usr_rules  all  --  *      *             
0         0     0 DROP       all  --  *      *  

From line 7 we can see DROP packet because of INVALID state.


When you enable ECMP and you have more then one NSX Edge in you topology, go to Firewall service and disable it by yourself otherwise you will spend lots of troubleshooting hours 🙁

NSX-v Troubleshooting L2 Connectivity

In this blog post we describe the methodology to troubleshoot L2 connectivity within the same Logical switch L2 segment.

Some of the steps here can and should be done via NSX GUI,vRealize Operations Manager 6.0 and vRealize Log Insight,  so see it like education post.

There are lots of CLI commands in this post :-). To view the output of CLI command you can scroll right.


High level approach to solve L2 problems:

1. Understand  the problem.

2. Know your network topology.

3. Figure out  if is its configuration issue.

4. Check  if the problem within the physical space or logical space.

5. Verify NSX control plane from ESXi hosts and NSX Controllers.

6. Move VM to different ESXi host.

7. Start to Capture traffic in right spots.


Understand the Problem

VM’s on same logical switch 5001 are  unable to communicate .

show the problem:

web-sv-01a:~ # ping
PING ( 56(84) bytes of data.
--- ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms


Know your network topology:


VM’s: web-sv-01a and  web-sv-02a  reside in different compute resource  esxcomp-01a and esxcomp-02a respectively.

web-sv-01a: IP:,  MAC: 00:50:56:a6:7a:a2

web-sv-02a: IP:, MAC: 00:50:56:a6:a1:e3


Validate network topology

I know its sounds stupid, let’s make sure that VM’s actually reside in the right esxi host and connected to right VXLAN.

Verify VM “web-sb-01a” is actually reside in “escomp-01a“:

From esxcomp-01a run the command esxtop then press “n” (Network):

esxcomp-01a # esxtop
   PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          8.41    0.02     437.81    3.17   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          5.87    0.01       1.76    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.01       0.98    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.39    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.20    0.00       0.39    0.00   0.00   0.00
  50331656 35669:db-sv-01a.eth0     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331657 35888:web-sv-01a.eth     vmnic0 DvsPortset-0          4.89    0.01       3.72    0.01   0.00   0.00
  50331658          vdr-vdrPort     vmnic0 DvsPortset-0          2.15    0.00       0.00    0.00   0.00   0.00

In line 12 we can see that “web-sv-01a.eth0” is shown, another imported information is has “Port-ID“.

The “Port-ID” is unique identifier for each virtual switch port , in our example web-sv-01a.eth0 as Port-ID “50331657″.

Find the vDS name:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan list
VDS ID                                           VDS Name      MTU  Segment ID     Gateway IP     Gateway MAC        Network Count  Vmknic Count
-----------------------------------------------  -----------  ----  -------------  -------------  -----------------  -------------  ------------
3b bf 0e 50 73 dc 49 d8-2e b0 df 20 91 e4 0b bd  Compute_VDS  1600  00:50:56:09:46:07              4             1

From Line 4 vDS name is “Compute_VDS

Verify “web-sv-01a.eth0″ Connect to VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331657  68                   0
      50331658  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331657 this port ID is the Same port ID of VM web-sv-01a.eth0

Verification in esxcomp-01b:

esxcomp-01b esxtop
  PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          6.54    0.01     528.31    4.06   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          2.77    0.00       1.19    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.00       0.40    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331656 35663:web-sv-02a.eth     vmnic0 DvsPortset-0          3.96    0.01       3.57    0.01   0.00   0.00
  50331657          vdr-vdrPort     vmnic0 DvsPortset-0          2.18    0.00       0.00    0.00   0.00   0.00

From Line 11 we can see that “web-sv-02a.eth0” has Port-ID “50331656“.

Verify “web-sv-02a.eth0″ Connect to VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331656  69                   0
      50331657  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331656

At this point we verify are VM’s located as draw in topology. now start with actual TSHOOT steps.

Is the problem in the physical network ?

Our first step will be to find out  if the problem is in the physical space or logical space.


The easy way to find out is by ping from VTEP in esxcomp-01a to VTEP in esxcomp-01b, before ping let’s find out the VTEP IP address.

esxcomp-01a # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type         
vmk0       16                  IPv4                 00:50:56:09:08:3e 1500    65535     true    STATIC       
vmk1       26                  IPv4                       00:50:56:69:80:0f 1500    65535     true    STATIC       
vmk2       35                  IPv4                       00:50:56:64:70:9f 1500    65535     true    STATIC       
vmk3       44                  IPv4                 00:50:56:66:e2:ef 1600    65535     true    STATIC

From Line 6 we can tell that VTEP IP address for VMK3(MTU is 1600) is

Another command to find VTEP IP address is:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  44                     0        0                   0                      0

Same commands in esxcomp-01b:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  46                     0        0                   0                      0

VTEP IP for esxcomp-01b is now let’s add this info to our  topology.



Checks for VXLAN Routing:

NSX use use different IP stack for VXLAN  traffic,so we need to verify if default gateway is configured correctly for VXLAN traffic.

From esxcomp-01a:

esxcomp-01a # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default  vmk3       MANUAL        vmk3       MANUAL

From esxcomp-01b:

esxcomp-01b # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default  vmk3       MANUAL        vmk3       MANUAL

My two ESXi hosts in VTEP IP address space for this LAB work on same L2 segment, both VTEP have same default gateway.

Ping from VTEP in esxcomp-01a to VTEP located in esxcomp-02a.

Source ping will be from VXLAN IP stack with packet size of 1570 and don’t fragment bit set to 1.

esxcomp-01a #  ping ++netstack=vxlan -s 1570 -d
PING ( 1570 data bytes
1578 bytes from icmp_seq=0 ttl=64 time=0.585 ms
1578 bytes from icmp_seq=1 ttl=64 time=0.936 ms
1578 bytes from icmp_seq=2 ttl=64 time=0.831 ms

--- ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.585/0.784/0.936 ms

Ping is successfully.

If ping with “-d” don’t work and without “-d” work its MTU problem. Check for MTU in the physical switch’s

Because VXLAN in this example in the same L2 we can view ARP entry for others VTEP’s:

From esxcomp-01a:

esxcomp-01a # esxcli network ip neighbor list -N vxlan
Neighbor        Mac Address        Vmknic    Expiry  State  Type
--------------  -----------------  ------  --------  -----  -----------  00:50:56:64:f4:25  vmk3    1173 sec         Unknown  00:50:56:67:d9:91  vmk3    1171 sec         Unknown   00:50:56:09:46:07  vmk3    1187 sec         Autorefresh

Look like our physical layer is not the issue.


Verify NSX control plane

During NSX host preparation NSX Manager install  VIB agents called User World Agent (UWA) inside ESXi hosts.

The process responsible to communicate with NSX controller called netcpad.

ESXi host using VMkernel Management interface to create this secure channel over TCP/1234, traffic is encrypted with SSL.

Part of the information netcpad send to NSX Controller is:

VM’s: MAC, IP.



Routing: Routes learn from the DLR Control VM. (explain in next post).


Base on this information the Controller learn the network state and build directory services.

To learn how the Controller Cluster works and how fix problem in the cluster itself  NSX Controller Cluster Troubleshooting .

For two VM’s to be able to talk to each others we need working control plane. In this lab we have 3 NSX controller.

Verification command need to done from both ESXi  and Controllers side.

NSX controllers IP address:,,

Control Plane verification from ESXi point of view:

Verify esxcomp-01a have ESTABLISHED connection to NSX Controllers. (grep 1234  to show only TCP port 1234 ).

esxcomp-01a # esxcli network ip  connection list | grep 1234
tcp         0       0  ESTABLISHED     35185  newreno  netcpa-worker
tcp         0       0  ESTABLISHED     34519  newreno  netcpa-worker
tcp         0       0  ESTABLISHED     34519  newreno  netcpa-worker

Verify esxcomp-01b have ESTABLISHED connection to NSX Controllers:

esxcomp-01b # esxcli network ip  connection list | grep 1234
tcp         0       0  ESTABLISHED     34517  newreno  netcpa-worker
tcp         0       0  ESTABLISHED     34678  newreno  netcpa-worker
tcp         0       0  ESTABLISHED     34516  newreno  netcpa-worker

Example of problem with communication from ESXi host to NSX Controllers:

esxcli network ip  connection list | grep 1234
tcp         0       0  TIME_WAIT           0
tcp         0       0  FIN_WAIT_2      34519  newreno
tcp         0       0  TIME_WAIT           0

If we can’t see ESTABLISHED connection check:

1. IP connectivity from ESXi host to all NSX controllers.

2. If you have firewall between ESXi host to NSX controllers, TCP/1234 need to be open.

3. Is netcpad is running on ESXi host:

/etc/init.d/netcpad status
netCP agent service is not running

start netcpad:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running

If netcpad is not running start with command:

esxcomp-01a #/etc/init.d/netcpad start
Memory reservation set for netcpa
netCP agent service starts

Verify again:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running


Verify in esxcomp-01a Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            2                0                0
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            1                3                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            1                2                0

Verify in esxcomp-01b Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            1                0                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            1                2                0
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy) (up)            1                0                0

Check esxcomp-01a learn ARP of remote VM’s VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------  00:50:56:a6:a1:e3  00001101

From this output we can understand that esxcomp-01a learn the ARP info of  web-sv-02a

Check esxcomp-01b learn ARP  for remote VM’s VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------  00:50:56:a6:7a:a2  00010001

From this output we can understand that esxcomp-01b learn the ARP info of  web-sv-01a

What we can tell at this point.


Know web-sv-01a is VM running in VXLAN 5001, his ip and MAC address : 00:50:56:a6:7a:a2.

The communication to Controller’s cluster is UP for VXLAN 5001.


Know web-sv-01b is VM running in VXLAN 5001, his ip and MAC address: 00:50:56:a6:a1:e3

The communication to Controller’s cluster is UP for VXLAN 5001.

So why web-sv-01a can’t take to web-sv-02a ?

the answer to this question is an another question: what the NSX  controller know ?

Control Plane verification from NSX Controller point of view:

We have 3 active controller, one of then is elected to manage VXLAN 5001. Remember slicing ?

Find out who is manage VXLAN 5001, SSH to one of the NSX controllers, for example

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001 Enabled         Enabled   0           0

Line 3 say that is manage VXLAN 5001, so the next command will run from

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001 Enabled         Enabled   6           4

From this output we learn that VXLAN 5001 have 4 VTEP connected to him and total of 6 active connection.

At this point i would like to point you for excellent blogger with lots of information of what is happen under the hood in NSX.

His name is Dmitri Kalintsev. link to his blog: NSX for vSphere: Controller “Connections” and “VTEPs”

From Dimitri Post:

“ESXi host joins a VNI in two cases:

  1. When a VM running on that host connects to VNI’s dvPg and its vNIC transitions into “Link Up” state; and
  2. When DLR kernel module on that host needs to route traffic to a VM on that VNI that’s running on a different host.”

We are not route traffic between VM’s, DLR is not  part of the game here.

Find out VTEP IP address connected to VXLAN 5001:

nsx-controller # show control-cluster logical-switches vtep-table 5001
VNI      IP              Segment         MAC               Connection-ID
5001   00:50:56:67:d9:91 5
5001   00:50:56:64:f4:25 3
5001   00:50:56:66:e2:ef 4
5001   00:50:56:60:bc:e9 6

From this output we can learn that both VTEP’s esxcomp-01a line 5  and esxcomp-01b line 3 are seen by NSX Controller on VXLAN 5001.

The MAC address output in this comments are VTEP’s MAC.

Find out that MAC address of the VM’s has learn by NSX Controller:

nsx-controller # show control-cluster logical-switches mac-table 5001
VNI      MAC               VTEP-IP         Connection-ID
5001     00:50:56:a6:7a:a2  4
5001     00:50:56:a6:a1:e3  5
5001     00:50:56:8e:45:33  6

Line 3 show MAC of web-sv-01a, line 4 show MAC of web-sv-02a

Find out that ARP entry of the VM’s has learn by NSX Controller:


nsx-controller # show control-cluster logical-switches arp-table 5001
VNI      IP              MAC               Connection-ID
5001    00:50:56:a6:7a:a2 4
5001    00:50:56:a6:a1:e3 5
5001    00:50:56:8e:45:33 6

Line 3,4 show the exact IP/MAC of  web-sv-01a and  web-sv-02a

To understand how Controller have learn this info read my post NSX-V IP Discovery

Some time restart the netcpad process can fix problem between ESXi host and NSX Controllers.

esxcomp-01a # /etc/init.d/netcpad restart
watchdog-netcpa: Terminating watchdog process with PID 4273913
Memory reservation released for netcpa
netCP agent service is stopped
Memory reservation set for netcpa
netCP agent service starts

Summary of controller verification:

NSX Controller Controller know where VM’s is located, their  ip address and MAC address. it’s seem like Control plane work just fine.


Move VM to different ESXi host

In NSX-v each ESXi host has its one UWA service daemon part of the management and control  plane, sometimes when UWA not working as expected VMs on this ESXi host will have connectivity issue.

The fast way to check it, is to vMotion none working VMs  from one ESXi host to different, it VMs start to work we need to focus on the none working ESXi host control plane.

In this scenario even i vMotion my VM to different ESXi host the problem didn’t go away.


Capture in the right spots:

pktcap-uw command allow to capture traffic in so many places in NSX environments.

before start to capture all over the place, lets think where we think the problem is.

When VM connect to Logical switch there are few security services that pack a transverse, each service represent with different slot id.


SLOT 0 : implement vDS Access List.

SLOT 1: Switch Security module (swsec) capture DHCP Ack and ARP message, this info then forward to NSX Controller.

SLOT2: NSX Distributed Firewall.

We need Check if VM traffic successfully pass  after NSX Distributed firewall, that mean in slot 2.

The capture command will need to SLOT 2 filter name for Web-sv-01a

From esxcomp-01a:

esxcomp-01a # summarize-dvfilter
world 35888 vmm0:web-sv-01a vcUuid:'50 26 c7 cd b6 f3 f4 bc-e5 33 3d 4b 25 5c 62 77'
 port 50331657 web-sv-01a.eth0
  vNic slot 2
   name: nic-35888-eth0-vmware-sfw.2
   agentName: vmware-sfw
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Dynamic Filter Creation
  vNic slot 1
   name: nic-35888-eth0-dvfilter-generic-vmware-swsec.1
   agentName: dvfilter-generic-vmware-swsec
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Alternate Opaque Channel

We can see in line4 that VM name is web-sv-01a, in line  5 that filter applied at slot 2 and in line 6 we have the filter name: nic-35888-eth0-vmware-sfw.2

pktcap-uw command help with -A output:

esxcomp-01a # pktcap-uw -A
Supported capture points:
        1: Dynamic -- The dynamic inserted runtime capture point.
        2: UplinkRcv -- The function that receives packets from uplink dev
        3: UplinkSnd -- Function to Tx packets on uplink
        4: Vmxnet3Tx -- Function in vnic backend to Tx packets from guest
        5: Vmxnet3Rx -- Function in vnic backend to Rx packets to guest
        6: PortInput -- Port_Input function of any given port
        7: IOChain -- The virtual switch port iochain capture point.
        8: EtherswitchDispath -- Function that receives packets for switch
        9: EtherswitchOutput -- Function that sends out packets, from switch
        10: PortOutput -- Port_Output function of any given port
        11: TcpipDispatch -- Tcpip Dispatch function
        12: PreDVFilter -- The DVFIlter capture point
        13: PostDVFilter -- The DVFilter capture point
        14: Drop -- Dropped Packets capture point
        15: VdrRxLeaf -- The Leaf Rx IOChain for VDR
        16: VdrTxLeaf -- The Leaf Tx IOChain for VDR
        17: VdrRxTerminal -- Terminal Rx IOChain for VDR
        18: VdrTxTerminal -- Terminal Tx IOChain for VDR
        19: PktFree -- Packets freeing point

capture command have support to sniff traffic in interesting points, with PreDVFilter and PostDVFilter line 14,15 can sniffing traffic before or after filtering action.

Capture after SLOT 2 filter:

pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap
The session capture point is PostDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_after.pcap
No server port specifed, select 784 as the port
Local CID 2
Listen on port 784
Accept...Vsock connection from port 1049 cid 2
Destroying session 25

Dumped 0 packet to file web-sv-01a_after.pcap, dropped 0 packets.

PostDVFilter = capture after the filter name.

–proto=01x capture only icmp packet.

–dvfilter = filter name as it show from summarize-dvfilter command.

-o = where to capture the traffic.

From output of this command line 12 we can tell ICMP packet are not pass this filters because we have 0 Dumped packet.

We found our smoking gun 🙂

Now capture before SLOT 2 filter.

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap
The session capture point is PreDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_before.pcap
No server port specifed, select 5782 as the port
Local CID 2
Listen on port 5782
Accept...Vsock connection from port 1050 cid 2
Dump: 6, broken : 0, drop: 0, file err: 0Destroying session 26

Dumped 6 packet to file web-sv-01a_before.pcap, dropped 0 packets.

Now we can see at line 6 that we have Dumped packet. we can open web-sv-01a_before.pcap  captured  file:

esxcomp-01a # tcpdump-uw -r web-sv-01a_before.pcap
reading from file web-sv-01a_before.pcap, link-type EN10MB (Ethernet)
20:15:31.389158 IP > ICMP echo request, id 3144, seq 18628, length 64
20:15:32.397225 IP > ICMP echo request, id 3144, seq 18629, length 64
20:15:33.405253 IP > ICMP echo request, id 3144, seq 18630, length 64
20:15:34.413356 IP > ICMP echo request, id 3144, seq 18631, length 64
20:15:35.421284 IP > ICMP echo request, id 3144, seq 18632, length 64
20:15:36.429219 IP > ICMP echo request, id 3144, seq 18633, length 64

Walla, NSX dFW block the traffic.

And now from NSX GUI:


Looking back on this article can be skipped intentionally step 3 “Configuration issue”.

If we were checked configuration settings, we immediately notice this problem.



Summary of all CLI Commands for this post:

ESXI Commands:

esxcfg-vmknic -l
esxcli network vswitch dvs vmware vxlan list
esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
esxcli network ip route ipv4 list -N vxlan
esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
esxcli network ip connection list | grep 1234
ping ++netstack=vxlan -s 1570 -d
/etc/init.d/netcpad (status|start|)
pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap


NSX Controller Commands:

show control-cluster logical-switches vni 5001
show control-cluster logical-switches vtep-table 5001
show control-cluster logical-switches mac-table 5001
show control-cluster logical-switches arp-table 5001


NSX-V IP Discovery

Thanks to Dimitri Desmidt for feedbacks.

IP discovery Allow NSX to suppress the ARP message over the logical switch.

To understand why we need IP Discovery and how its work we need some background regard ARP.

IP Discovery

What is ARP (Address Resolution Protocol)?

When VM1 need to communicate with VM2 it need know VM2 MAC Address, the way to know MAC2 is to send a broadcast message (ARP request) to all VM in the same L2 segment (same VLAN or in the example above same VXLAN 5001).

ALL VM’s on this Logical Switch will see this message including VM3 since it’s a broadcast, but only VM’2 will respond. The response will come in Unicast Message from VM2 directly to VM1 mac@ with VM’2 MAC address (MAC2) in the response body.



VM1 will cache the mac@ of VM2 IP@ in its ARP table. The entry it saved between few seconds to few minutes depending on the Operating System.

Windows 7 OS for example


If VM1 and VM2 will not talk again in this Cache time window, VM1 will clear is ARP table entry for that MAC2, when next time VM1 will need to talk to VM2, VM1 OS will send again ARP message to relearn same MAC2 of VM2.

Note: In the unlikely event of the NSX Controller who dodoesn’tnow the mac@ for VM2-IP@, then the ARP request message is flooded, but only to the ESXi that have VMs in the logical switch 5001.


How IP Discovery works:

VMware NSX leverage NSX controller to achieve IP Discovery.

Inside ESXi host running with NSX software there is process called User World Agent (UWA), this process communicate with NSX  controller and update the controller directory MAC,IP,VTEP tables for VM’s reside inside this ESXi host.

NSX Arch

When VM connect to Logical switch there are few security services that pack a transverse, each service represent with different slot id.

Filter slots

SLOT 0 : implement vDS Access List

SLOT 1: Switch Security module (swsec) capture DHCP Ack and ARP message, this info then forward to NSX Controller.

SLOT2: NSX Distributed Firewall.

From the figure above we now understand that slot 1 is the service responsibly to implement the IP Discovery.

When VM1 power up even if the ip address is static the VM will send out ARP message do discover the MAC address of the default gateway, when swsec module see this ARP message he will forward to NSX Controller. That way NSX controller learn VM1 MAC1 address, same way Controller will learn VM2 MAC2 address.

Now when VM1 want to talk to VM2, MAC2 is not known to VM1, then ARP message will send out to VXLAN 5001.

The UWA will send out query to NSX controller and ask if he know MAC2, since controller already know this Controller will  send unicast message back to VM1 with MAC2, the ARP broadcast message will not send out to all VM’s  in VXLAN 5001.

Note: There is 3 min timer in NSX controller for ARP query, if host send same query in this time frame the controller ignore this request and broadcast message will be send out to all VM in the logical switch


IP Discovery Verification:

The easiest to know if IP discovery it actually works is to run Wireshark software in VM3, clear the ARP table in VM1 with the command: arp –d.

Now ping from VM1 to VM2, ARP broadcast message from VM1 should not see in VM3.

I would like to point out grate post explain in deep dive how IP discovery work by Dmitri Kalintsev

NSX-v under the hood: VXLAN ARP suppression



Thanks to Francis Guillier Max Ardica and  Tiran Efrat of the overview and feedback.

One of the most important NSX Edge features is NAT.
With NAT (Network Address Translation) we can change the Source or Destination IP addresses and TCP/UDP port. Combined NAT and Firewall rules can lead to confusion when we try to determine the correct IP address to which apply the firewall rule.
To create the correct rule we need to understand the packet flow inside the NSX Edge in details. In NSX Edge we have two different type of NAT: Source Nat (SNAT) and Destination NAT (DNAT).



Allows translating an internal IP address (for example private IP described in RFC 1918) to a public External IP address.
In figure below, the IP address for any VM in VXLAN 5001 that needs outside connectivity to the WAN can be translated to an external IP address (this mapping is configured on the Edge). For example, VM1 with IP address needs to communicate with WAN Internet, so the NSX Edge can translate it to a IP address configured on the Edge external interface.
Users in the external network are not aware of the internal Private IP address.




Allow to access internal private IP addresses from the outside world.
In the example in figure below, users from the WAN need to communicate with the Server
NSX Edge DNAT mapping configuration is created so that the users from outside connect to and NSX Edge translates this IP address to


Below is the outline of the Packet flow process inside the Edge. The important parts are where the SNAT/DNAT Action and firewall decision action are being taken.

packet flow

We can see from this process that the ingress packet will evaluate against FW rules before SNAT/DNAT translation.

Note: the actual packet flow details are more complicated with more action/decisions in Edge flow, but the emphasis here is on the NAT and FW functionalities only.

Note:  NAT function will work only if firewall service is enabled.

Enable Firewall Service



Firewall rules and SNAT

Because of this packet flow the firewall rule for SNAT need to be applied on the internal IP address object and not on the IP address translated by the SNAT function. For example, when a VM1 needs to communicate with the WAN, the firewall rule needs to be:

fw and SNAT

 Firewall rules and DNAT

Because of this packet flow the firewall rules for DNAT need to be applied on the public IP address object and not on the Private IP address after the DNAT translation. When a user from the WAN sends traffic to, this packet will be checked against this FW rule and then the NAT will change the destination IP address to

fw and DNAT

DNAT Configuration

Users from outside need to access an internal web server connecting to its public IP address.
The server internal IP address is, the NAT IP address is



The first step is creating the External IP on the Edge, this IP is secondary because this edge already has a main IP address configured in the IP subnet.

Note: the main IP address is marked with a black Ddot (

For this example the DNAT IP address is


Create a DNAT Rule in the Edge:


Now pay attention to the firewall rules one the Edge: a user coming from the outside will try to access the internal server by connecting to the public IP address This implies that the fw rule needs to allow this access.



DNAT Verification:

There are several ways to verify NAT is functioning as originally planned. In our example, users from any source address access the public IP address, and after the NAT translation the packet destination IP address is changed to

The output of the command:

show nat

show nat

The output of the command:

show firewall flow

We can see that packet is received by the Edge and destined to the address, the return traffic is instead originated from the different IP address (the private IP address).
That means DNAT translation is happening here.

show flow

We can capture the traffic and see the actual packet:
Capture Edge traffic on its outside interface vNic_0, in this example user source IP address is and destination is

The command for capture is:
debug packet display interface vNic_0 port_80_and_src_192.168.110.10

Debug packet display interface vNic_0 port_80_and_src_192.168.110.10

debug packet 1

Capture edge on internal interface vNic_1 we can see destination IP address has changed to because of DNAT translation:

debug packet 2

SNAT configuration

All the servers part of VXLAN segment 5001 (associated to the IP subnet need to leverage SNAT translation (in this example to IP address on the outside interface of the Edge to be able to communicate with the external network.


SNAT config

SNAT Configuration:

snat config 2

Edge Firewall Rules:

Allow to to go out

SNAT config fw rule



The output of the command

Show nat

show nat verfication

DNAT with L4 Address Translation (PAT)

DNAT with L4 Address Translation allows changing Layer4 TCP/UDP port.
For example we would like to mask our internal SSH server port for all users from outside.
The new port will be TCP/222 instead of regular SSH TCP/22 port.

The user originates a connection to the Web Server on destination port TCP/222 but the NSX Edge will change it to TCP/22.


From the command line the show nat command:

PAT show nat

NAT Order

In this specific scenario, we want to create the two following SNAT rules.

  • SNAT Rule 1:
    The IP addresses for the devices part of VXLAN 5001 (associated to the IP subnet need to be translated to the Edge outside interface address
  • SNAT Rule 2:
    Web-SRV-01a on VXLAN 5001 needs its IP address to be translated to the Edge outside address

nat order

In the configuration example above, traffic will never hit rule number 4 because is part of subnet, so its IP address will be translated to (and not the desired

Order for SNAT rules is important!
We need to re-order the SNAT rules and put the more specific one on top, so that rule 3 will be hit for traffic originated from the IP address, whereas rule 4 will apply to all the other devices part of IP subnet

nat reorder

After re-order:

nat after reorer


another useful command

show configuration nat


NSX-V Troubleshooting registration to vCenter

In the current NSX software release, the NSX Manager is tightly connected to the vCenter server in a 1:1 relationship.

During the process of coupling the NSX Manager to vCenter we have two different initial steps: the configuration of “Lookup Service” and “vCenter Server”.


Lookup Service:

Lookup Service allows to bind NSX role to SSO user or group. In other word this enable the “Role Based Access Control” authentication functionality in NSX and its optional configuration. Notice that without Lookup service configuration the functionality of NSX is not affected at all.


 VCenter Server:

This is a mandatory configuration. Registering the NSX Manager with vCenter injects a plugin into the vSphere Web Client for consumption of NSX functionalities within the Web management platform.

While trying to Register to vCenter or configuring the Lookup Service you might see this error:

“nested exception is java.net.UnknownHostException: vc-l-01a.corp.local( vc-l-01a.corp.local )”


Or when trying to setup the Lookup Service:

“nested exception is java.net.UnknownHostException: vc-l-01a.corp.local( vc-l-01a.corp.local )”


Or similar to this Error:

“NSX Management Service operation failed.( Initialization of Admin Registration Service Provider failed. Root Cause: Error occurred while registration of lookup service, com.vmware.vim.sso.admin.exception.InternalError: General failure. )”


Most of the problems to register NSX Manager to vCenter or configure the SSO Lookup service are:

  1. Connectivity problem between the NSX Managers and vCenter.
  2. Firewall blocking this connection.
  3. DNS not configured properly on NSX Manager or vCenter.
  4. Time is not synced between NSX Manager and vCenter.
  5. The user authenticated via SSO needs to have administrative rights.


TSHOT steps

Connectivity issue:

Verify connectivity from NSX Manager to vCenter. Ping from NSX Manager to vCenter using both the IP address and the Fully Qualified Domain Name (FQDN). Check for routing or static information or for the presence of a default route in NSX Manager:

nsxmgr-l-01a# show ip route

Codes: K – kernel route, C – connected, S – static,

> – selected route, * – FIB route

S>* [1/0] via, mgmt

C>* is directly connected, mgmt


DNS Issue:

Verify NSX Manager can successfully resolve the vCenter DNS name. Ping from NSX Manager to vCenter with FQDN:

nsxmgr-l-01a# ping vc-l-01a.corp.local

PING vc-l-01a.corp.local ( 56 data bytes

64 bytes from icmp_seq=0 ttl=64 time=0.576 ms

If this does not work verify the DNS configuration on the NSX Manager.

Go to Manage -> Network -> DNS Servers:


Firewall Issue:

If you have a firewall between NSX Manager and vCenter, verify it allows SSL communication on TCP/443 (also allow ping for connective checks).

A complete list of the communication ports and protocols used for VMware NSX for vSphere is available at the links below:





NTP issue:

Verify that actual time is synced between vCenter and NSX Manager.


From NSX Manager CLI:

nsxmgr-l-01a# show clock
Tue Nov 18 06:51:34 UTC 2014


From vCenter CLI:

vc-l-01a:~ # date
Tue Nov 18 06:51:31 UTC 2014

Note: After configuration of Time settings, Appliance needs to be restarted.


User permission issue:

Registered user to vCenter or Lookup service must have administrative rights.
Try to work with default administrator user: administrator@vsphere.local

Now the official KB publish at 21/1/15:


Troubleshooting NSX-V Controller


The Controller cluster in the NSX platform is the control plane component that is responsible in managing the switching and routing modules in the hypervisors.

The use of controller cluster in managing VXLAN based logical switches eliminates the need for multicast.


Each Controller Node is assigned a set of roles that define the type of tasks the node can implement. By default, each Controller Node is assigned all roles.

NSX controller roles:

API provider: Handles HTTP web service requests from external clients (NSX Manager) and initiates processing by other Controller Node tasks.

Persistence Server: Stores data from the NVP API and vDS devices that must be persisted across all Controller Nodes in case of node failures or shutdowns.

Logical manager: Monitors when endhosts arrive or leave vDS devices and configures the vDS forwarding states to implement logical connectivity and policies..

Switch manager: Maintains management connections for one or more vDS devices.

Directory server: manage VXLAN and the distributed logical routing directory of information.

Any multi-node HA mechanism has the potential for a “split brain” scenario in which a cluster is partitioned into two or more groups, and those groups are not able to communicate. In this scenario, each group might assume control of all tasks under the assumption that the other nodes have failed. NSX uses leader election to solve this split-brain problem. One of the Controller Nodes is elected as a leader for each role, which requires a majority vote of all active and inactive nodes in the cluster.


The leader for each role is responsible for allocating tasks to individual Controller Nodes and determining when a node has failed. Since election requires a majority of all nodes,

it is not possible for two leaders to exist simultaneously within a cluster, preventing a split brain scenario. The leader election mechanism requires a majority of all cluster nodes to be functional at all times.

Note: Currently NSX-V 6.1 support maximum 3 controllers

Here is example of 3 NSX Controllers and role election per Node members.


Node 1 master for roles:  API Provider and Logical Manager

Node 2 master for roles: Persistence Server and Directory Server

Node 3 master for roles: Switch Manger.

The different majority number scenarios depending on the number of Controller Cluster nodes. It is evident how deploying 2 nodes (traditionally considered an example of a redundant system) would increase the scalability of the Controller Cluster (since at steady state two nodes would work in parallel)

without providing any additional resiliency. This is because with 2 nodes, the majority number is 2 and that means that if one of the two nodes were to fail, or they lost communication with each other (dual-active scenario), neither of them would be able to keep functioning (accepting API calls, etc.). The same considerations apply to a deployment with 4 nodes that cannot provide more resiliency than a cluster with 3 elements (even if providing better performance).


TSHOT NSX controllers

The next part of TSHOT NSX Controller base on VMware NSX MH 4.1 User Guide:


NSX Controller nodes ip address for the next screenshots are:

Node1, Node1, Node1

Verify NSX Controller installation

Ensure that the Controllers are installed on systems that meet the minimum requirements.
On each Controller:

The CLI command “request system compatibility-report” provides informational details that determine whether a Controller system is compatible with the Controller requirements.

# request system compatibility-report


Check controller status in NSX Manager

The NSX Manager continually checks whether all Controller Clusters are accessible. If a Controller Cluster is currently in disconnected status, your diagnostic efforts and log review should be focused on the time immediately after the Controller Cluster was last seen as connected.

Here example of “Disconnected” controller from NSX Manager:


This NSX “Controller nodes status” screenshot show status between the NSX Manager to Controller and not the overall controller cluster status.

So even if we have all controllers in “Normal”state like the figure below , that doesn’t mean the overall controller status is ok.  

Checking the Controller Cluster Status from CLI

The current status of the Controller Cluster can be determined by running show control-cluster status:


# show control-cluster status


Join status: verify this node complete join to clusters process.

Majority status: check  if this cluster is part of the majority.

Cluster ID: all node members need to be in the same cluster id

The current status of the Controller Node’s intra-cluster communication connections can be determined by running

show control-cluster connections


If a Controller node is a Controller Cluster majority leader, it will be listening on port 2878 (as indicated by the Y in the “listening” column).

The other Controller nodes will have a dash (-) in the “listening” column.

The next step is to check whether the Controller Cluster majority leader has any open connections as indicated by the number in the “open conns” column. On a properly functioning Controller, the open connections should be the same as the number of other Controller nodes in the Controller Cluster (e.g. In a three-node Controller Cluster, the Controller Cluster majority leader should show two open connections).

The command show control-cluster history will allow you to see a history of Controller Cluster-related events on this node including restarts, upgrades, Controller Cluster errors and loss of majority.

controller # show control-cluster history


Joining a Controller Node to Controller Cluster

This section covers issues that may be encountered when attempting to join a new Controller Node to an existing Controller Cluster. An explanation of why the issue occurs and instructions on how to resolve the issue are also provided.

Symptom: Joining a new Controller node to a Controller Cluster may fail all of the existing Controllers are disconnected.

Example for this situation:

As we can see controller-1 and controller-2 are in disconnected from the NSX manager


When we try to add new controller cluster we get this error message:




If n nodes have joined the NSX Controller Cluster, then a majority (strictly greater than 50%) of those n nodes must be alive and connected to each other, before any new data to the system. This means that if you have a Controller Cluster of 3 nodes, 2 of them must be alive and connected in order for new data to be written in NSX.

In our case to add new controller node to cluster we need at least on member of the cluster to be in “Normal” state.


Resolution: Start the Disconnected Controller. If the Controller is disconnected due to a permanent failure, remove the Controller from the Controller Cluster.

Symptom: the join control-cluster CLI command hangs without ever completing the join operation.


The IP address passed into the join control-cluster command was incorrect, and/or does not refer to a currently live Controller node.

For example the user type the command:

join control-cluster

Make sure that is part of existing controller cluster.


Use the IP address of a properly configured Controller that is reachable across the network.


The join control-cluster CLI command fails.

Explanation: If you have a Controller configured as part of a Controller Cluster, that Controller has been disconnected from the Controller Cluster for a long period of time (perhaps it was taken offline or shut down), and during that time, the other Controllers in that Controller Cluster were removed from the Controller Cluster and formed into a new Controller Cluster, then the long-disconnected Controller will not be allowed to rejoin the Controller Cluster that it left, because that original Controller Cluster is gone.

The following event log message in the new Controller Cluster indicates that something like this has happened:

Node b567a47f-9a61-43b3-8d53-36b3d1fd0675 tried to join with incorrect cluster ID


You must issue the join control-cluster command with the force option on the old Controller to force it to clear its state and join the new Controller Cluster with a fresh start.

Note: The forced join command deletes previously joined node with the same IP.

nvp-controller # join control-cluster force


Recovering node disconnect from cluster

When controller cluster majority issue arises, it will very difficult to spot it from the NSX manager GUI.

For example the current state of the controllers from the NSX manager point of view is that all the member are in “Normal” state.


But in fact the current status in my cluster is:


Node1 + Node 2 are create cluster and share the roles between them, for some rezone Node 3 disconnected from the majority of the cluster:

Output example from controller Node 3:



Node 3 think his alone and own all of the roles.

From Node 1 perspective he is the leader (have the Y) and have one open connection from Node2 as show:



To recover from this scenario Node 3 need to join to majority of the cluster, the  ip address to join need to be to Node1 because his the leader of the majority.

join control-cluster force

Recovering from lost all Controller Nodes

In this scenario all NSX Controller nodes failed or deleted,  Do we need start from scratch ? 🙁

The assumption is our environment already deployed NSX Edge, DLR and we have logical switch connected to VM’s and would like to preserve it.

The recovering process:

 Step 1:

Migrate existing logical switch to Multicast mode.


Step 2:

Deployed 3 new NSX controllers.

Step 3:

Sync the new deployed NSX controllers to unicast mode with the current state of our NSX.


other useful commands:

Checking Controller Processes

Even if the “join-cluster” command on a node appears to have been successful, the node might not have come up completely for a variety of reasons. The way this error tends to manifest itself most visibly is that the controller process isn’t listening on all the ports it’s supposed to be, and no API requests or switch connections are happening.

# show network connections of-type tcp

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local Address      Foreign Address     State       PID/Program

tcp        0      0*           LISTEN      14038/domain

tcp        0      0*           LISTEN      14072/java

tcp        0      0*           LISTEN      14067/domain

tcp        0      0*           LISTEN      14038/domain

tcp        0      0*           LISTEN      14038/domain

tcp        0      0*           LISTEN      14072/java

tcp        0      0*           LISTEN      14072/java

tcp        0      0   ESTABLISHED 14072/java

tcp        0      0   ESTABLISHED 14072/java

tcp        0      0    ESTABLISHED 14038/domain

tcp        0      0   ESTABLISHED 14067/domain


The show network connection output shown in the preceding block is an example from a healthy Controller. If you find some of these missing, it’s likely that NSX didn’t get past its install phase.  Here are some misconfigurations that can cause this:

Bad management address or listen IP

You’ve set an incorrect IP as the management-address, or as the listen-ip for one of the roles (like switch_manager or api_provider).

NSX attempts to bind to the specified address, and fails early if it cannot do so.  You’ll see log messages in cloudnet_cpp.log.ERROR like:

E0506 01:20:17.099596  7188 dso-deployer.cc:516] Controller component installation of rpc-broker failed: Unable to bind a RPC port $tags:tracing:3ef7d1f519ffb7fb^

E0506 01:20:17.100162  7188 main.cc:271] RPC deployment subsystem not installed; exiting. $tags:tracing:3ef7d1f519ffb7fb^

Or in cloudnet_cpp.log.WARNING:

W0506 01:22:27.721777  7694 ssl-socket.cc:530] SSLSocket failed to bind to Cannot assign requested address

Note that if you are using DHCP for the IP addresses of your controller nodes (not recommended or supported), the IP address could have changed since the last time you configured it.

Verify that the IP addresses for switch_manager and api_provider are what they are supposed to be by performing the CLI command:

<switch_manager|api_provider>  listen-ip


Bad first node address

You’ve provided the wrong IP address for the first node in the Controller Cluster.   Run show

control-cluster startup-nodes

to determine whether the IPs listed correspond to the IPs of the Controllers in the Controller Cluster.


Out of disk space

The Controller may be out of disk space. Use the

“show status”

see if any of the partitions have 0 bytes available.

The NSX CLI command show system statistics can be used to display resource utilization for disk space, disk I/O, memory, CPU and various other processes on the Controller Nodes. The command offers statistics with one-minute intervals for a window of one hour for various combinations. The show system statistics CLI command does auto-completion and can be used to view the list of metric data available.

show system statistics <datasource>       : for the tabular output
show system statistics graph <datasource> : for the graphical format output


As an example, the following output shows the RRD statistics for the datasource disk_ops:write associated with the disk sda1 on the Controller in a tabular form:

# show system statistics disk-sda1/disk_ops:write

Time  Write

12:29             0.74

12:28         0.731429

12:27         0.617143

12:26         0.665714  <snip>


more commands:

# show network interface
# show network default-gateway
# show network dns-servers
# show network ntp-servers
# show network ntp-status
# traceroute <ip_address or dns_name>
# ping <ip address>
# ping interface addr <alternate_src_ip> <ip_address>
# watch network interface breth0 traffic

NSX L2 Bridging


This next overview of L2 Bridging  was taken from great work of Max Ardica and Nimish Desai in the official NSX Design Guide:

There are several circumstances where it may be required to establish L2 communication between virtual and physical workloads. Some typical scenarios are (not exhaustive list):

  • Deployment of multi-tier applications: in some cases, the Web, Application and Database tiers can be deployed as part of the same IP subnet. Web and Application tiers are typically leveraging virtual workloads, but that is not the case for the Database tier where bare-metal servers are commonly deployed. As a consequence, it may then be required to establish intra-subnet (intra-L2 domain) communication between the Application and the Database tiers.
  • Physical to virtual (P-to-V) migration: many customers are virtualizing applications running on bare metal servers and during this P-to-V migration it is required to support a mix of virtual and physical nodes on the same IP subnet.
  • Leveraging external physical devices as default gateway: in such scenarios, a physical network device may be deployed to function as default gateway for the virtual workloads connected to a logical switch and a L2 gateway function is required to establish connectivity to that gateway.
  • Deployment of physical appliances (firewalls, load balancers, etc.).

To fulfill the specific requirements listed above, it is possible to deploy devices performing a “bridging” functionality that enables communication between the “virtual world” (logical switches) and the “physical world” (non virtualized workloads and network devices connected to traditional VLANs).

NSX offers this functionality in software through the deployment of NSX L2 Bridging allowing VMs to be connected at layer 2 to a physical network (VXLAN to VLAN ID mapping), even if the hypervisor running the VM is not physically connected to that L2 physical network.

L2 Bridge topology


Figure above shows an example of L2 bridging, where a VM connected in logical space to the VXLAN segment 5001 needs to communicate with a physical device deployed in the same IP subnet but connected to a physical network infrastructure (in VLAN 100). In the current NSX-v implementation, the VXLAN-VLAN bridging configuration is part of the distributed router configuration; the specific ESXi hosts performing the L2 bridging functionality is hence the one where the control VM for that distributed router is running. In case of failure of that ESXi host, the ESXi hosting the standby Control VM (which gets activated once it detects the failure of the Active one) would take the L2 bridging function.

Independently from the specific implementation details, below are some important deployment considerations for the NSX L2 bridging functionality:

  • The VXLAN-VLAN mapping is always performed in 1:1 fashion. This means traffic for a given VXLAN can only be bridged to a specific VLAN, and vice versa.
  • A given bridge instance (for a specific VXLAN-VLAN pair) is always active only on a specific ESXi host.
  • However, through configuration it is possible to create multiple bridges instances (for different VXLAN-VLAN pairs) and ensure they are spread across separate ESXi hosts. This improves the overall scalability of the L2 bridging function.
  • The NSX Layer 2 bridging data path is entirely performed in the ESXi kernel, and not in user space. Once again, the Control VM is only used to determine the ESXi host where a given bridging instance is active, and not to perform the bridging function.



Configure L2 Bridge

In this scenario we would like to Bridge Between App VM connected to VXLAN 5002 to virtual machine connected to VLAN 100.

Create Bridge 1

My current Logical Switch configuration:

Logical Switch table

We have pre-configured a VLAN-backed port group for VLAN 100:

Port group

Bridging configuration is done at the DLR level. In this specific example, the DLR name is Distributed-Router:

Double Click on the edge-1:



Click on the Bridging and then green + button:


Type Bridge Name, Logical Switch ID and Port-Group name:



Click OK and Publish:



Now VM on Logical Switch App-Tier-01 can communicate with Physical or virtual machine on VLAN 100.


Design Consideration

Currently in NSX-V 6.1 we can’t enable routing on the VXLAN logical switch that is bridged to a VLAN.

In other words, the default gateway for devices connected to the VLAN can’t be configured on the distributed logical router:

None working  L2 Bridge Topology

None working L2 Bridge Topology

So how can VM in VXLAN 5002 communicate with VXLAN 5001?

The big difference is VXLAN 5002 is no longer connected to the DLR LIF, but it is connected instead to the NSX Edge.

Working Bridge Topology


DLR Control VM can work in high availability mode, if the Active DLR control VM fails, the standby Control VM takes over, which means the Bridge instance will move to a new ESXi host location.



Bridge Troubleshooting:

Most issues I ran into was that the bridged VLAN was missing on the trunk interface configured on the physical switch.

In the figure below:

  • Physical server is connected to VLAN 100, App VM connected to VXLAN 5002 in esx-01b.
  • Active DLR control VM is located at esx-02a, so the bridging function will be active in this ESXi host.
  • Both ESXi hosts have two physical nics: vmnic2 and vmnic3.
  • Transport VLAN carries all VNI (VXLAN’s) traffic and is forwarded on the physical switch in VLAN 20.
  • On physical switch-2 port E1/1 we must configure trunk port and allow both VLAN 100 and VLAN 20.

Bridge and Trunk configuration

Note: Port E1/1 will carry both VXLAN and VLAN traffic. 




Find Where Bridge is Active:

We need to know where the Active DLR Control VM is located (if we have HA). Inside this ESXi host the Bridging happens in kernel space. The easy way to find it is to look at “Configuration” section in the “Manage” tab.

Note: When we powered off the DLR Control VM (if HA is not enabled), the bridging function on this ESXi host will stop to prevent loop.

DLR5We can see that Control VM located in esx-02a.corp.local

SSH to this esxi host,  find the Vdr Name of the DLR Control VM:

xxx-xxx -I -l

VDR Instance Information :

Vdr Name: default+edge-1
Vdr Id: 1460487509
Number of Lifs: 4
Number of Routes: 5
State: Enabled
Controller IP:
Control Plane IP:
Control Plane Active: Yes
Num unique nexthops: 1
Generation Number: 0
Edge Active: Yes

Now we know that “default+edge-1” is the VDR name.


xxx-xxx -b –mac default+edge-1


~ # xxx-xxx -b –mac default+edge-1

VDR ‘default+edge-1’ bridge ‘Bridge_App_VLAN100’ mac address tables :
Network ‘vxlan-5002-type-bridging’ MAC address table:
total number of MAC addresses: 0
number of MAC addresses returned: 0
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
——————- ———— ——- ——– —————- —
Network ‘vlan-100-type-bridging’ MAC address table:
total number of MAC addresses: 0
number of MAC addresses returned: 0
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
——————- ———— ——- ——– —————- —


From this output we can see there is no any mac address learning ,

After connect VM to Logical Switch App-Tier-01 and ping VM in VLAN 100.

Now we can see mac address from both VXLAN 5002 and VLAN100:






NSX Role Based Access Control

One of the most challenging problems in managing large networks is the complexity of security administration.

“Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise. In this context, access is the ability of an individual user to perform a specific task, such as view, create, or modify a file. Roles are defined according to job competency, authority, and responsibility within the enterprise”

Within NSX we have four built in roles, We can map User or Group to one of the NSX Role. but i think Instead of assigning roles to individual users the preferred way is to assigning role to group.

Organizations create user groups for proper user management. After integration with SSO, NSX Manager can get the details of groups to which a user belongs to.

NSX Roles

Within NSX Manager we have four pre built RBAC roles cover different nsx permission and area in NSX environment.

The four NSX built in roles are: Auditor, Security Administrator, NSX administrator and Enterprise Administrator:

NSX RBAC Diagram

NSX RBAC Diagram

Configure the Lookup Service in NSX Manager

Whenever we want to assign role on NSX, we can assign role to SSO User or Group. When Lookup service is not configured then the group based role assignment would not work i.e the user from that group would not be able to login to NSX.

The reason is we cannot fetch any group information from the SSO server. The group based authentication provider is only available when Lookup service is configured. User login where the user is explicitly assigned role on NSX will not be affected. This means that the customer has to individually assign roles to the users and would not be able to take advantage of SSO groups.

For NSX, vCenter SSO server is one of the identity provider for authentication. For authentication on NSX, prerequisite is that the user / group has to be assigned role on NSX.

NSX Manager Lookup Service

NSX Manager Lookup Service

Note: NTP/DNS must configure on the NSX Manager for lookup service to work.

Note: The domain account must have AD read permission for all objects in the domain tree. 

Configure Active Directory Groups

In this blog i will use Microsoft Active directory  as user Identity source.  in “Active Directory Users and Computers” i created four different groups. The groups will have the same name is the NSX roles to make life easier:

Auditor, Security Administrator, NSX Administrator, Enterprise Administrator.

AD Groups

AD Groups

We create four A/D users and Add each user to different A/D group. for example nsxadmin user:

the user nsxadmin is associate with the group NSX Administrator. the association done by the Add button:

AD user

AD user

Same way i will associate a others users to A/D groups:

username:     groups:

auditor1      ->  Auditor

secadmin ->   Security Administrator

nsxadmin ->  NSX Administrator

entadmin ->  Enterprise Administrator

Connect Directory Domain to NSX Manager.

Go to “Network & Security” tab double click on the “NSX Manager”

map ad to nsx manager role 1

map ad to nsx manager role 1

Double click on “” icon:

map ad to nsx manager role 2

Note: Configure Domain is not needed for RBAC, only if we want to use identity firewall rules base of user or group.

Go to “Manage -> “Domains” -> Click on the green Plus button:

map ad to nsx manager role 8

Fill Name and NetBIOS name fields with appropriate information of your Domain Name and NetBIOS name:

In My example my domain name is corp.local:

map ad to nsx manager role 9

Enter LDAP (i.e AD) IP address or hostname and domain account (username and password):

map ad to nsx manager role 10

Configuring LDAP option task  can be done via direct API call to bypass the Event Log Access described in the next steps).

Click on next.

Event Log Access:

In case we need to create NSX firewall rule with user identity based on AD groups. We will need to allow the NSX Manager read Active Directory “Security Event Log”. This logs contain Active Directory users logon/logoff from to domain. We use this information to bind the AD user  to an IP address.

NSX need access to “Event Log” provide dFW with user identity in one of the two case:

  1. The user logon to VM that doesn’t running VMtools.
  2. The user logon to the domain from PC located on physical environment.

BTW users login to to VM with VMtools up and running , we do not need the “Security Event Log” to bind the user to IP.

Permissions for the user to read logon/logoff events:

Windows 2008 or later domain servers:

Add the account to the Event Log Readers group. If you are using the on-device

User-ID agent, the account must also be a member of the Distributed COM Users Group.


 Windows 2003 domain servers:

Assign Manage Auditing and

Security Logs permissions through group policy

In both of this cases NSX will need to access the AD with read permissions for security event logs, the protocol using to read this information are CIFS or WMI.

During this process NSX  collecting  the following microsoft event ID:

For windows 2008/2012 – Event ID: 4624

For Windows 2003 – Event ID: 540

NSX will “Copy” this Event access log and from A/D and parse the data inside the nsx manager appliance.

map ad to nsx manager role 11

Click Next and Finish:

map ad to nsx manager role 12

Mapping Active Directory  Groups to NSX Managers Roles

Note: This step is must for NSX RBAC to work. 

Now we can map Active Directory groups to pre-built NSX Manager roles.

Go to “Manage -> “Users” -> Click on the green Plus button:

map ad to nsx manager role 3

Here we can select if we want to map specific A/D user to NSX Role or A/D Group to Role.

map ad to nsx manager role 4

In this blog i will use A/D group, we create A/D group called auditor. The format to input here is:

“group_name”@domain.name.  let’s start with auditor group, this group is “Read Only” permission:

map ad to nsx manager role 5

Select one of the NSX Role, for Auditor A/D group we chose Auditor

map ad to nsx manager role 6

We can limit the scope this group can work inside nsx manager object, for this example there is no limit:

map ad to nsx manager role 7

Same way Map all others A/D groups to NSX Roles:

Auditor@corp.local                           – >  Auditor

Security Administrator@corp.local        -> Security Administrator

NSX Administrator@corp.local               -> NSX Administrator

Enterprise Administrator@corp.local     -> Enterprise Administrator

Try our first login with user Auditor1:


 The login successfull but where is the “Network & Security” tab gone ?


So far we configure all NSX Manager part but we didnt take care of the vCenter Configuration permission for that group. are you confusing ?

vCenter has is own Role for each group. we need to configure roles to etch A/D group we configured. These settings determine what the user can make the in vCenter environment.

Configure vCenter Roles:

Let’s start by configure the Auditor Role for Auditor A/D group. we know this group is for “Read Only” in the NSX Manager, so it will make sense to give this group “Read Only” to all other vCenter environment.

Go to vCenter -> Manage -> Permissions and click the green button:

vCenter Roles 1

We need to choose Roles from the Assigned Role, if we select No-Access we will not be able login to vCenter. So we need to choose something from “Read-Only” to “Administrator”

For Auditor Role “Read Only” is the Minimum.

Select “Read Only” from the Assigned Role drop down list and click on the “Add” button from “User and Group”:

vCenter Roles 2

From the Domain Select your Domain name, in our lab the domain is “CORP”, choose your Active Directory group from the list (Auditor for this example) and click the “Add” button:

vCenter Roles 3

Click Ok and Ok for Next Step:

vCenter Roles 4

Same way we need to configure all other groups roles:

vCenter Roles 5

Now we can try to login with auditor1 user:


As we can see auditor1 is in “Read Only” role:


We can  verify that auditor1 can’t change any other vCenter configuration:


Test secadmin user map to “NSX Security” role, this user cannot Change any NSX infrastructure related task like create new  add new NSX Controller Node:


But secadmin can create new firewall rule:


When logging with nsxadmin user map to NSX Administrator Role we can see that the user can add new Controller Node:


But nsxadmin user cannot change or see any firewall rules configure :


What if the user member of two A/D Group ?

The user will gain combined permission access of both of the groups.

For example: the user memberof “Auditor” group and “NSX Security”, the results will be user will have read only permission on all nsx infrastructure and also gain access to all security related area in NSX.


In this post we demonstrate the NSX manager different roles. We configure Microsoft Active Directory as External database source for user’s identity.

VMware NSX Edge Scale Out with Equal-Cost Multi-Path Routing

This post was written by Roie Ben Haim and Max Ardica, with a special thanks to Jerome Catrouillet, Michael Haines, Tiran Efrat and Ofir Nissim for their valuable input

The modern data center design is changing, following a shift in the habits of consumers using mobile devices, the number of new applications that appear every day and the rate of end-user browsing which has grown exponentially. Planning a new data center requires meeting certain fundamental design guidelines. The principal goals in data center design are: Scalability, Redundancy and High-bandwidth.

In this blog we will describe the Equal Cost Multi-Path functionality (ECMP) introduced in VMware NSX release 6.1 and discuss how it addresses the requirements of scalability, redundancy and high bandwidth. ECMP has the potential to offer substantial increases in bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for failed paths. This is a feature which is available on physical networks but we are now introducing this capability for virtual networking as well. ECMP uses a dynamic routing protocol to learn the next-hop towards a final destination and to converge in case of failures. For a great demo of how this works, you can start by watching this video, which walks you through these capabilities in VMware NSX.




Scalability and Redundancy and ECMP

To keep pace with the growing demand for bandwidth, the data center must meet scale out requirements, which provide the capability for a business or technology to accept increased volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and replace” of the existing physical infrastructure in order to keep up with the growing demands of the applications. Data centers running business critical applications need to achieve near 100 percent uptime. In order to achieve this goal, we need the ability to quickly recover from failures affecting the main core components. Recovery from catastrophic events needs to be transparent to end user experiences.

ECMP with VMware NSX 6.1 allows you to use upto a maximum of 8 ECMP Paths simultaneously. In a specific VMware NSX deployment, those scalability and resilience improvements are applied to the “on-ramp/off-ramp” routing function offered by the Edge Services Gateway (ESG) functional component, which allows communication between the logical networks and the external physical infrastructure.

ECMP Topology

ECMP Topology


External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-E8) to reach the virtual servers (Web, App, DB).

In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router (DLR), which can choose up to 8 different paths to get to the core network.

How the Path is Determined

NSX for vSphere Edge Services Gateway device:

When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets related to this flow by sending them through the same path. Once the next-hop is selected for a particular Source IP and Destination IP pair, the route cache stores this. Once a path has been chosen, all packets related to this flow will follow the same path.

There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this period of time, it is then eligible to be removed from route cache. Note that these settings can be tuned for your environment.

Distributed Logical Router (DLR):

The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.


What happens in case of a failure on one of Edge Devices?

In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP. If we take OSPF for example, the main factor influencing the traffic outage experience is the tuning of the

OSPF timers.

OSPF will send hello messages between neighbors, the OSPF “Hello” protocol is used and determines the Interval as to how often an OSPF Hello is sent.

Another OSPF timer called “Dead” Interval is used, which is how long to wait before we consider an OSPF neighbor as “down”. The OSPF Dead Interval is the main factor that influences the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP) timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to speed up the traffic recovery.


ECMP failed Edge

ECMP failed Edge


In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as Dead at the expiration of the Dead timer and remove their OSPF neighborship with him. As a consequence, the DLR and the physical router remove the routing table entries that originally pointed to the specific next-hop IP address of the failed ESG.

As a result, all corresponding flows on the affected path are re-hashed through the remaining active units. It’s important to emphasize that network traffic that was forwarded across the non-affected paths remains unaffected.


Troubleshooting and visibility

With ECMP it’s important to have introspection and visibility tools in order to troubleshoot optional point of failure. Let’s look at the following topology.



A user outside our Data Center would like to access the Web Server service inside the Data Center. The user IP address is and the web server IP address is

This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with E1 and E2 (the Edge devices). As a result R1 will learn how to get to the Web server from both E1 and E2 and will get two different active paths towards R1 will pick one of the paths to forward the traffic to reach the Web server and will advertise the user network subnet to both E1 and E2 with OSPF.

E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR. E1 and E2 will learn how to get to the Web server via OSPF control plane communication with the DLR.

From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network.
From the DLR we can verify OSPF adjacency with E1, E2.

We can use the command: “show ip ospf neighbor”

show ip ospf neighbor

show ip ospf neighbor

From this output we can see that the DLR has two Edge neighbors: and next step will be to verify that ECMP is actually working.

We can use the command: “show ip route”

show ip route

show ip route

The output from this command shows that the DLR learned the user network via two different paths, one via E1 = and the other via E2 =

Now we want to display all the packets which were captured by an NSX for vSphere Edge interface.

In the example below and in order to display the traffic passing through interface vNic_1, and which is not OSPF protocol control packets, we need to type this command:
“debug packet display interface vNic_1 not_ip_proto_ospf”

We can see an example with a ping running from host to host

Capture traffic

Capture traffic

If we would like to display the captured traffic to a specific ip address, the command capture would look like: “debug packet display interface vNic_1 dst_172.16.10.10”

debug packet display interface vNic_1 dst

debug packet display interface vNic_1 dst

* Note: When using the command “debug packter display interface” we need to add underscore between the expressions after the interface name.

Useful CLI for Debugging ECMP

To check which ECMP path is chosen for a flow

  • debug packet display interface IFNAME

To check the ECMP configuration

  • show configuration routing-global

To check the routing table

  • show ip route

To check the forwarding table

  • show ip forwarding


Useful CLI for Dynamic Routing

  • show ip ospf neighbor
  • show ip ospf database
  • show ip ospf interface
  • show ip bgp neighbors
  • show ip bgp

ECMP Deployment Consideration

ECMP currently implies stateless behavior. This means that there is no support for stateful services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway.

Starting from 6.1.2 Edge Firewall not disabled automatic on ESG when ECMP is enabled, turn off Firewall when enable ECMP.

In the current NSX 6.1 release, the Edge Firewall and ECMP cannot be turned on at the same time on NSX edge device. Note however, that the Distributed Firewall (DFW) is unaffected by this.


About the authors:

Roie Ben Haim

Roie works as a professional services consultant at VMware, focusing on design and implementation of VMware’s software-defined data center products.  Roie has more than 12 years in data center architecture, with a focus on network and security solutions for global enterprises. An enthusiastic M.Sc. graduate, Roie holds a wide range of industry leading certifications including Cisco CCIE x2 # 22755 (Data Center, CCIE Security), Juniper Networks JNCIE – Service Provider #849, and VMware vExpert 2014, VCP-NV, VCP-DCV.

Max Ardica

Max Ardica is a senior technical product manager in VMware’s networking and security business unit (NSBU). Certified as VCDX #171, his primary task is helping to drive the evolution of the VMware NSX platform, building the VMware NSX architecture and providing validated design guidance for the software-defined data center, specifically focusing on network virtualization. Prior to joining VMware, Max worked for almost 15 years at Cisco, covering different roles, from software development to product management. Max owns also a CCIE certification (#13808).