NSX-v Troubleshooting L2 Connectivity

In this blog post we describe the methodology to troubleshoot L2 connectivity within the same Logical switch L2 segment.

Some of the steps here can and should be done via NSX GUI,vRealize Operations Manager 6.0 and vRealize Log Insight,  so see it like education post.

There are lots of CLI commands in this post :-). To view the output of CLI command you can scroll right.

 

High level approach to solve L2 problems:

1. Understand  the problem.

2. Know your network topology.

3. Figure out  if is its configuration issue.

4. Check  if the problem within the physical space or logical space.

5. Verify NSX control plane from ESXi hosts and NSX Controllers.

6. Move VM to different ESXi host.

7. Start to Capture traffic in right spots.

 

Understand the Problem

VM’s on same logical switch 5001 are  unable to communicate .

show the problem:

web-sv-01a:~ # ping 172.16.10.12
PING 172.16.10.12 (172.16.10.12) 56(84) bytes of data.
^C
--- 172.16.10.12 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms

 

Know your network topology:

TSHOT1

VM’s: web-sv-01a and  web-sv-02a  reside in different compute resource  esxcomp-01a and esxcomp-02a respectively.

web-sv-01a: IP: 172.16.10.11,  MAC: 00:50:56:a6:7a:a2

web-sv-02a: IP:172.16.10.12, MAC: 00:50:56:a6:a1:e3

 

Validate network topology

I know its sounds stupid, let’s make sure that VM’s actually reside in the right esxi host and connected to right VXLAN.

Verify VM “web-sb-01a” is actually reside in “escomp-01a“:

From esxcomp-01a run the command esxtop then press “n” (Network):

esxcomp-01a # esxtop
   PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          8.41    0.02     437.81    3.17   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          5.87    0.01       1.76    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.01       0.98    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.39    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.20    0.00       0.39    0.00   0.00   0.00
  50331656 35669:db-sv-01a.eth0     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331657 35888:web-sv-01a.eth     vmnic0 DvsPortset-0          4.89    0.01       3.72    0.01   0.00   0.00
  50331658          vdr-vdrPort     vmnic0 DvsPortset-0          2.15    0.00       0.00    0.00   0.00   0.00

In line 12 we can see that “web-sv-01a.eth0” is shown, another imported information is has “Port-ID“.

The “Port-ID” is unique identifier for each virtual switch port , in our example web-sv-01a.eth0 as Port-ID “50331657″.

Find the vDS name:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan list
VDS ID                                           VDS Name      MTU  Segment ID     Gateway IP     Gateway MAC        Network Count  Vmknic Count
-----------------------------------------------  -----------  ----  -------------  -------------  -----------------  -------------  ------------
3b bf 0e 50 73 dc 49 d8-2e b0 df 20 91 e4 0b bd  Compute_VDS  1600  192.168.250.0  192.168.250.2  00:50:56:09:46:07              4             1

From Line 4 vDS name is “Compute_VDS

Verify “web-sv-01a.eth0″ Connect to VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331657  68                   0
      50331658  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331657 this port ID is the Same port ID of VM web-sv-01a.eth0

Verification in esxcomp-01b:

esxcomp-01b esxtop
  PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          6.54    0.01     528.31    4.06   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          2.77    0.00       1.19    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.00       0.40    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331656 35663:web-sv-02a.eth     vmnic0 DvsPortset-0          3.96    0.01       3.57    0.01   0.00   0.00
  50331657          vdr-vdrPort     vmnic0 DvsPortset-0          2.18    0.00       0.00    0.00   0.00   0.00

From Line 11 we can see that “web-sv-02a.eth0” has Port-ID “50331656“.

Verify “web-sv-02a.eth0″ Connect to VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331656  69                   0
      50331657  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331656

At this point we verify are VM’s located as draw in topology. now start with actual TSHOOT steps.

Is the problem in the physical network ?

Our first step will be to find out  if the problem is in the physical space or logical space.

TSHOT2

The easy way to find out is by ping from VTEP in esxcomp-01a to VTEP in esxcomp-01b, before ping let’s find out the VTEP IP address.

esxcomp-01a # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type         
vmk0       16                  IPv4      192.168.210.51                          255.255.255.0   192.168.210.255 00:50:56:09:08:3e 1500    65535     true    STATIC       
vmk1       26                  IPv4      10.20.20.51                             255.255.255.0   10.20.20.255    00:50:56:69:80:0f 1500    65535     true    STATIC       
vmk2       35                  IPv4      10.20.30.51                             255.255.255.0   10.20.30.255    00:50:56:64:70:9f 1500    65535     true    STATIC       
vmk3       44                  IPv4      192.168.250.51                          255.255.255.0   192.168.250.255 00:50:56:66:e2:ef 1600    65535     true    STATIC

From Line 6 we can tell that VTEP IP address for VMK3(MTU is 1600) is 192.168.250.51.

Another command to find VTEP IP address is:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  44                     0        0  192.168.250.51  255.255.255.0                   0                      0  192.168.250.0

Same commands in esxcomp-01b:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  46                     0        0  192.168.250.53  255.255.255.0                   0                      0  192.168.250.0

VTEP IP for esxcomp-01b is 192.168.250.53. now let’s add this info to our  topology.

 

TSHOT3

Checks for VXLAN Routing:

NSX use use different IP stack for VXLAN  traffic,so we need to verify if default gateway is configured correctly for VXLAN traffic.

From esxcomp-01a:

esxcomp-01a # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default        0.0.0.0        192.168.250.2  vmk3       MANUAL
192.168.250.0  255.255.255.0  0.0.0.0        vmk3       MANUAL

From esxcomp-01b:

esxcomp-01b # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default        0.0.0.0        192.168.250.2  vmk3       MANUAL
192.168.250.0  255.255.255.0  0.0.0.0        vmk3       MANUAL

My two ESXi hosts in VTEP IP address space for this LAB work on same L2 segment, both VTEP have same default gateway.

Ping from VTEP in esxcomp-01a to VTEP located in esxcomp-02a.

Source ping will be from VXLAN IP stack with packet size of 1570 and don’t fragment bit set to 1.

esxcomp-01a #  ping ++netstack=vxlan 192.168.250.53 -s 1570 -d
PING 192.168.250.53 (192.168.250.53): 1570 data bytes
1578 bytes from 192.168.250.53: icmp_seq=0 ttl=64 time=0.585 ms
1578 bytes from 192.168.250.53: icmp_seq=1 ttl=64 time=0.936 ms
1578 bytes from 192.168.250.53: icmp_seq=2 ttl=64 time=0.831 ms

--- 192.168.250.53 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.585/0.784/0.936 ms

Ping is successfully.

If ping with “-d” don’t work and without “-d” work its MTU problem. Check for MTU in the physical switch’s

Because VXLAN in this example in the same L2 we can view ARP entry for others VTEP’s:

From esxcomp-01a:

esxcomp-01a # esxcli network ip neighbor list -N vxlan
Neighbor        Mac Address        Vmknic    Expiry  State  Type
--------------  -----------------  ------  --------  -----  -----------
192.168.250.52  00:50:56:64:f4:25  vmk3    1173 sec         Unknown
192.168.250.53  00:50:56:67:d9:91  vmk3    1171 sec         Unknown
192.168.250.2   00:50:56:09:46:07  vmk3    1187 sec         Autorefresh

Look like our physical layer is not the issue.

 

Verify NSX control plane

During NSX host preparation NSX Manager install  VIB agents called User World Agent (UWA) inside ESXi hosts.

The process responsible to communicate with NSX controller called netcpad.

ESXi host using VMkernel Management interface to create this secure channel over TCP/1234, traffic is encrypted with SSL.

Part of the information netcpad send to NSX Controller is:

VM’s: MAC, IP.

VTEP: MAC, IP.

VXLAN: the VXLAN Id’s

Routing: Routes learn from the DLR Control VM. (explain in next post).

TSHOT4

Base on this information the Controller learn the network state and build directory services.

To learn how the Controller Cluster works and how fix problem in the cluster itself  NSX Controller Cluster Troubleshooting .

For two VM’s to be able to talk to each others we need working control plane. In this lab we have 3 NSX controller.

Verification command need to done from both ESXi  and Controllers side.

NSX controllers IP address: 192.168.110.201, 192.168.110.202, 192.168.110.203

Control Plane verification from ESXi point of view:

Verify esxcomp-01a have ESTABLISHED connection to NSX Controllers. (grep 1234  to show only TCP port 1234 ).

esxcomp-01a # esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.51:54153  192.168.110.202:1234  ESTABLISHED     35185  newreno  netcpa-worker
tcp         0       0  192.168.210.51:34656  192.168.110.203:1234  ESTABLISHED     34519  newreno  netcpa-worker
tcp         0       0  192.168.210.51:41342  192.168.110.201:1234  ESTABLISHED     34519  newreno  netcpa-worker

Verify esxcomp-01b have ESTABLISHED connection to NSX Controllers:

esxcomp-01b # esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.56:16580  192.168.110.202:1234  ESTABLISHED     34517  newreno  netcpa-worker
tcp         0       0  192.168.210.56:49434  192.168.110.203:1234  ESTABLISHED     34678  newreno  netcpa-worker
tcp         0       0  192.168.210.56:12358  192.168.110.201:1234  ESTABLISHED     34516  newreno  netcpa-worker

Example of problem with communication from ESXi host to NSX Controllers:

esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.51:54153  192.168.110.202:1234  TIME_WAIT           0
tcp         0       0  192.168.210.51:34656  192.168.110.203:1234  FIN_WAIT_2      34519  newreno
tcp         0       0  192.168.210.51:41342  192.168.110.201:1234  TIME_WAIT           0

If we can’t see ESTABLISHED connection check:

1. IP connectivity from ESXi host to all NSX controllers.

2. If you have firewall between ESXi host to NSX controllers, TCP/1234 need to be open.

3. Is netcpad is running on ESXi host:

/etc/init.d/netcpad status
netCP agent service is not running

start netcpad:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running

If netcpad is not running start with command:

esxcomp-01a #/etc/init.d/netcpad start
Memory reservation set for netcpa
netCP agent service starts

Verify again:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running

 

Verify in esxcomp-01a Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            2                0                0
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.201 (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                3                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.203 (up)            1                2                0

Verify in esxcomp-01b Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.201 (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                0                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.203 (up)            1                2                0
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                0                0

Check esxcomp-01a learn ARP of remote VM’s VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------
172.16.10.12  00:50:56:a6:a1:e3  00001101

From this output we can understand that esxcomp-01a learn the ARP info of  web-sv-02a

Check esxcomp-01b learn ARP  for remote VM’s VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------
172.16.10.11  00:50:56:a6:7a:a2  00010001

From this output we can understand that esxcomp-01b learn the ARP info of  web-sv-01a

What we can tell at this point.

esxcomp-01a:

Know web-sv-01a is VM running in VXLAN 5001, his ip 172.16.10.11 and MAC address : 00:50:56:a6:7a:a2.

The communication to Controller’s cluster is UP for VXLAN 5001.

esxcomp-01b:

Know web-sv-01b is VM running in VXLAN 5001, his ip 172.16.10.12 and MAC address: 00:50:56:a6:a1:e3

The communication to Controller’s cluster is UP for VXLAN 5001.

So why web-sv-01a can’t take to web-sv-02a ?

the answer to this question is an another question: what the NSX  controller know ?

Control Plane verification from NSX Controller point of view:

We have 3 active controller, one of then is elected to manage VXLAN 5001. Remember slicing ?

Find out who is manage VXLAN 5001, SSH to one of the NSX controllers, for example 192.168.110.202:

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001     192.168.110.201 Enabled         Enabled   0           0

Line 3 say that 192.168.110.201 is manage VXLAN 5001, so the next command will run from 192.168.110.201:

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001     192.168.110.201 Enabled         Enabled   6           4

From this output we learn that VXLAN 5001 have 4 VTEP connected to him and total of 6 active connection.

At this point i would like to point you for excellent blogger with lots of information of what is happen under the hood in NSX.

His name is Dmitri Kalintsev. link to his blog: NSX for vSphere: Controller “Connections” and “VTEPs”

From Dimitri Post:

“ESXi host joins a VNI in two cases:

  1. When a VM running on that host connects to VNI’s dvPg and its vNIC transitions into “Link Up” state; and
  2. When DLR kernel module on that host needs to route traffic to a VM on that VNI that’s running on a different host.”

We are not route traffic between VM’s, DLR is not  part of the game here.

Find out VTEP IP address connected to VXLAN 5001:

nsx-controller # show control-cluster logical-switches vtep-table 5001
VNI      IP              Segment         MAC               Connection-ID
5001     192.168.250.53  192.168.250.0   00:50:56:67:d9:91 5
5001     192.168.250.52  192.168.250.0   00:50:56:64:f4:25 3
5001     192.168.250.51  192.168.250.0   00:50:56:66:e2:ef 4
5001     192.168.150.51  192.168.150.0   00:50:56:60:bc:e9 6

From this output we can learn that both VTEP’s esxcomp-01a line 5  and esxcomp-01b line 3 are seen by NSX Controller on VXLAN 5001.

The MAC address output in this comments are VTEP’s MAC.

Find out that MAC address of the VM’s has learn by NSX Controller:

nsx-controller # show control-cluster logical-switches mac-table 5001
VNI      MAC               VTEP-IP         Connection-ID
5001     00:50:56:a6:7a:a2 192.168.250.51  4
5001     00:50:56:a6:a1:e3 192.168.250.53  5
5001     00:50:56:8e:45:33 192.168.150.51  6

Line 3 show MAC of web-sv-01a, line 4 show MAC of web-sv-02a

Find out that ARP entry of the VM’s has learn by NSX Controller:

 

nsx-controller # show control-cluster logical-switches arp-table 5001
VNI      IP              MAC               Connection-ID
5001     172.16.10.11    00:50:56:a6:7a:a2 4
5001     172.16.10.12    00:50:56:a6:a1:e3 5
5001     172.16.10.10    00:50:56:8e:45:33 6

Line 3,4 show the exact IP/MAC of  web-sv-01a and  web-sv-02a

To understand how Controller have learn this info read my post NSX-V IP Discovery

Some time restart the netcpad process can fix problem between ESXi host and NSX Controllers.

esxcomp-01a # /etc/init.d/netcpad restart
watchdog-netcpa: Terminating watchdog process with PID 4273913
Memory reservation released for netcpa
netCP agent service is stopped
Memory reservation set for netcpa
netCP agent service starts

Summary of controller verification:

NSX Controller Controller know where VM’s is located, their  ip address and MAC address. it’s seem like Control plane work just fine.

 

Move VM to different ESXi host

In NSX-v each ESXi host has its one UWA service daemon part of the management and control  plane, sometimes when UWA not working as expected VMs on this ESXi host will have connectivity issue.

The fast way to check it, is to vMotion none working VMs  from one ESXi host to different, it VMs start to work we need to focus on the none working ESXi host control plane.

In this scenario even i vMotion my VM to different ESXi host the problem didn’t go away.

 

Capture in the right spots:

pktcap-uw command allow to capture traffic in so many places in NSX environments.

before start to capture all over the place, lets think where we think the problem is.

When VM connect to Logical switch there are few security services that pack a transverse, each service represent with different slot id.

TSHOT5

SLOT 0 : implement vDS Access List.

SLOT 1: Switch Security module (swsec) capture DHCP Ack and ARP message, this info then forward to NSX Controller.

SLOT2: NSX Distributed Firewall.

We need Check if VM traffic successfully pass  after NSX Distributed firewall, that mean in slot 2.

The capture command will need to SLOT 2 filter name for Web-sv-01a

From esxcomp-01a:

esxcomp-01a # summarize-dvfilter
~~~snip~~~~
world 35888 vmm0:web-sv-01a vcUuid:'50 26 c7 cd b6 f3 f4 bc-e5 33 3d 4b 25 5c 62 77'
 port 50331657 web-sv-01a.eth0
  vNic slot 2
   name: nic-35888-eth0-vmware-sfw.2
   agentName: vmware-sfw
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Dynamic Filter Creation
  vNic slot 1
   name: nic-35888-eth0-dvfilter-generic-vmware-swsec.1
   agentName: dvfilter-generic-vmware-swsec
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Alternate Opaque Channel

We can see in line4 that VM name is web-sv-01a, in line  5 that filter applied at slot 2 and in line 6 we have the filter name: nic-35888-eth0-vmware-sfw.2

pktcap-uw command help with -A output:

esxcomp-01a # pktcap-uw -A
Supported capture points:
        1: Dynamic -- The dynamic inserted runtime capture point.
        2: UplinkRcv -- The function that receives packets from uplink dev
        3: UplinkSnd -- Function to Tx packets on uplink
        4: Vmxnet3Tx -- Function in vnic backend to Tx packets from guest
        5: Vmxnet3Rx -- Function in vnic backend to Rx packets to guest
        6: PortInput -- Port_Input function of any given port
        7: IOChain -- The virtual switch port iochain capture point.
        8: EtherswitchDispath -- Function that receives packets for switch
        9: EtherswitchOutput -- Function that sends out packets, from switch
        10: PortOutput -- Port_Output function of any given port
        11: TcpipDispatch -- Tcpip Dispatch function
        12: PreDVFilter -- The DVFIlter capture point
        13: PostDVFilter -- The DVFilter capture point
        14: Drop -- Dropped Packets capture point
        15: VdrRxLeaf -- The Leaf Rx IOChain for VDR
        16: VdrTxLeaf -- The Leaf Tx IOChain for VDR
        17: VdrRxTerminal -- Terminal Rx IOChain for VDR
        18: VdrTxTerminal -- Terminal Tx IOChain for VDR
        19: PktFree -- Packets freeing point

capture command have support to sniff traffic in interesting points, with PreDVFilter and PostDVFilter line 14,15 can sniffing traffic before or after filtering action.

Capture after SLOT 2 filter:

pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap
The session capture point is PostDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_after.pcap
No server port specifed, select 784 as the port
Local CID 2
Listen on port 784
Accept...Vsock connection from port 1049 cid 2
Destroying session 25

Dumped 0 packet to file web-sv-01a_after.pcap, dropped 0 packets.

PostDVFilter = capture after the filter name.

–proto=01x capture only icmp packet.

–dvfilter = filter name as it show from summarize-dvfilter command.

-o = where to capture the traffic.

From output of this command line 12 we can tell ICMP packet are not pass this filters because we have 0 Dumped packet.

We found our smoking gun 🙂

Now capture before SLOT 2 filter.

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap
The session capture point is PreDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_before.pcap
No server port specifed, select 5782 as the port
Local CID 2
Listen on port 5782
Accept...Vsock connection from port 1050 cid 2
Dump: 6, broken : 0, drop: 0, file err: 0Destroying session 26

Dumped 6 packet to file web-sv-01a_before.pcap, dropped 0 packets.

Now we can see at line 6 that we have Dumped packet. we can open web-sv-01a_before.pcap  captured  file:

esxcomp-01a # tcpdump-uw -r web-sv-01a_before.pcap
reading from file web-sv-01a_before.pcap, link-type EN10MB (Ethernet)
20:15:31.389158 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18628, length 64
20:15:32.397225 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18629, length 64
20:15:33.405253 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18630, length 64
20:15:34.413356 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18631, length 64
20:15:35.421284 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18632, length 64
20:15:36.429219 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18633, length 64

Walla, NSX dFW block the traffic.

And now from NSX GUI:

TSHOT6

Looking back on this article can be skipped intentionally step 3 “Configuration issue”.

If we were checked configuration settings, we immediately notice this problem.

 

 

Summary of all CLI Commands for this post:

ESXI Commands:

esxtop
esxcfg-vmknic -l
esxcli network vswitch dvs vmware vxlan list
esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
esxcli network ip route ipv4 list -N vxlan
esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
esxcli network ip connection list | grep 1234
ping ++netstack=vxlan 192.168.250.53 -s 1570 -d
/etc/init.d/netcpad (status|start|)
pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap

 

NSX Controller Commands:

show control-cluster logical-switches vni 5001
show control-cluster logical-switches vtep-table 5001
show control-cluster logical-switches mac-table 5001
show control-cluster logical-switches arp-table 5001