NSX Cross-VC Extensibility kit

Overview:

NSX Cross-VC Extensibility kit was created enhance the implementation with Cross vCenter mode.

Introduction  and deep dive to NSX Cross-VC can be found in Amazing work of Humair Ahmed in this link.

The package covers 3 main use cases around Cross-VC NSX deployment:

  • Recovery: Automating the recovery of NSX Components for disaster avoidance and unplanned disaster event.
  • Security: Sync local security policy, group and tags from primary NSX Manager to secondary.
  • Routing: Automate the local egress/ingress traffic for disaster avoidance and unplanned failover.

Each use case is covered by a separated workflows and can run independently or combined.

 

Note: This expendability kit is release under “community support” mode and it’s provided as-is with no implied support or warranty.

This kit include the “NSX Cross-VC Extensibility.package” file.

This package has been tested and validated with the following software versions:

  • NSX 6.2.1
  • vSphere 6

(but it is expected to work with later versions of software components.)

Prerequisites

For the recovery workflow to succeed we must create two rest host objects (for Primary and Secondary NSX Managers) in our vRO.

For each rest host we will need to make sure that the “Connection timeout” is increased from its default value 30 sec to 300 sec, and that “operation time out” is changed from 60 to 600 seconds.

If we will not change these settings the re-deployed of the NSX controller process will fail.

The following screenshots contains the changed values:

HTTP ReST Host

vRO Configuration

We will need to Initialize a few vRO Attributes before running any vRO workflows.

In the configuration tab click on the “NSX Cross-VC Extensibility” and configure the following attribute:

 

General Attributes (relevant to all use cases):

The following attributes are required to all use cases covered in this package.

We will need to define the user and password for the admin user of the NSX managers.

The vRO need to have two RESTHost attributes, Primary and Secondary NSX managers.

 

General Attribute:

General Attributes

General Attributes

Recovery NSX Components use case

The workflows in this use case will automate the recovery of NSX components in case of disaster avoidance and unplanned disaster event, the workflow will take care of changing the roles of the NSX manager, deploy NSX controllers, re-deploy UDLR if needed and update the controllers state.

In the initial state before running this workflow we will assume that we already have one NSX manager in Primary role that have running nsx controllers and another nsx manager act as secondary on different site.

The following figure show example for initial state of the environment before running recovery process.

Disaster Avoidance Initial status:

Recovery NSX Components

Disaster Avoidance NSX Cross-VC failover:

In this scenario the user wants to switch between the primary NSX to secondary NSX roles.

After finish running this workflow in the right side we will have the Primary NSX Manager with 3 NSX controller and UDLR control VM deployed.

In a scenario where the UDLR control VM already deployed in the secondary site ( like shown in the figure below) we don’t need to re-deploy it

Cross-VC After

As part of the recovery process we will need to deploy a new NSX controller in the secondary site.

The following attributes need to have a value set by the administrator before running the workflow:

Recovery attributes:

Recovery attributes

The workflow we need to run to achieve this goal is: “Disaster Avoidance NSX Cross-VC – Main”, the following figure contains the workflow building blocks:

picture7

 Unplanned Recovery:

In this scenario the main site completely failed and we need to recover the NSX components at the secondary site:

The workflow that covered this scenario is: “Unplanned Recovery NSX Cross-VC – Main”.

We will need to update the same attributes we’ve shown before in order to successfully deploy the NSX controllers.

Unplanned Recovery NSX Cross-VC – Main

After running this workflows the Primary NSX manager and NSX controllers will run at the secondary site as shown in the figure below:

Unplanned Recovery

 Security usecase:

In an NSX deployment with Cross-VC feature used, the Universal security group is automatically synced between the primary and secondary NSX managers.

In a DR scenario where we want to work with local security groups where the classification criteria is NSX security tag, we will need to manually sync the groups between the NSX managers.

The main goal of this workflow is to automatically sync local security objects like NSX security tag, NSX security policy and security groups from the primary NSX manager to the secondary.

This workflow will only work for DR scenario where all of the active workloads located in the protected site and there are no active workloads at the recovery site. In other words, we can’t create NSX firewall rules between workloads at the protect site (security objects existence in NSX manager in the protected site only) to workload in the recovery site (security objects existence in NSX manager in the recovery site only).

The input parameter for this workflow is source vSphere folder where the source VMs located, and destination vSphere folder where the target VMs located. Normally in SRM deployment we already have this folders part of the resource mapping process.

sync local security objects

The workflow is built from two major workflows, Sync Security tags and Sync Service Composer.

picture12

Sync Security tag:

This workflow will first sync all security tags names from the primary nsx manager to the secondary nsx manager.

If the security tag already exists on the secondary manager, the workflow will skip the sync for that security tag.

After completing this step the workflow will attach the security tag to the destination machines.

Sync service Composer:

This workflow will sync service composer objects from the Primary NSX manager to the secondary.

service composer

The sync will export the current service composer security policy and security groups from the primary NSX manager and then import them to the secondary NSX manager.

Security groups that we will sync must use security tag in dynamic criteria only.

The workflow will sync security Group and Security policy that have specific prefix name. That prefix name is determined by attribute name  “SecurityDRPrefixName”.

In this example the workflow will sync security groups and security policies that starts with “DR_” or “dr_”.

DR

Note: Before the import workflow occurs we will delete all security groups and security policies on the secondary NSX manager. The “delete workflow” is necessary in order the “import workflow” to succeed.

 

Routing usecase

This workflow will automate the N-S routing egress/ingress part of the recovery process.

The solution is based on NSX Local-ID feature and is complementing the recovery process in  VMware SRM.

In the initial status we have configured the Locale-ID on UDLR at the Protected and the Recovery clusters to be the same as the protected site. The L2 segment is span between the protected site to the recovery site, as consequence we need to control the route advertised to ensure single site ingress/egress.

Routing

We can control the ingress traffic by using allow/deny redistribution Prefix List on U-DLR Control VM.

At the protected site’s Control VM we will advertise routes from protected site by creating an allow Prefix list.

At the control VM on the recovery site we do not advertise routes from the protected site by creating a deny prefix list.

In this status only the control VM in the protected site has the local-id value, at the recovery site we’ve cleared this value.

Routing attribute:

Routing Attribute picture16

Routing_LocalID:

This attribute defines the NSX Local ID for the routing. The ID can be any text in the UUID format. For example, XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,

where each X is replaced with a base 16 digit (0-F)

Routing_prefixListName:

Defines the array of routing prefixes needed to be redistributed. The name must match the same name configured in the UDLR.

Routing_SiteA_Clusters:

Array of the NSX prepared clusters in SiteA

Routing_SiteB_Clusters:

Array of the NSX prepared clusters in SiteB

Routing_UDLRID:

This attribute defines the UDLR control VM id.

 

Routing Workflows:

“Disaster avoidance Egress via Site A – Main”

“Disaster avoidance Egress via Site B – Main”

“Unplanned Local Egress via Site A – Main”

“Unplanned Local Egress via Site B – Main”

The difference between planned an unplanned events.

Disaster avoidance event:

This workflow created of disaster avoidance, in this scenario both of the site are up and running but we would like to revert the north/south traffic to the other datacenter (ingress/egress).

So instead of all traffic flowing in/out via site A we are switching it to go via Site B.

Unplanned event:

In this scenario we are facing a complete site failure, we lost our primary site and we would like all North/South traffic to go via the recovery site.

 

Disaster Avoidance Local Egress via Site A/B – Main Usecase

Disaster Avoidance Local Egress via Site A/B – Main Usecase

 

Unplanned Local Egress Site A/B ucasecase

Unplanned Local Egress Site A/B ucasecase

 

The following  demos will show the NSX Cross-VC Extensibility kit in action.

Demonstrate the recoverability of the NSX components in Disaster Avoidance scenario with North/South routing switch between the datacenters:

Demonstrate the sync of the NSX local security objects between NSX Managers scenario:

 

Special thanks  to Daniel Bakshi that help me a lots to review this blog post.

Posted in Automation, Design Tagged with:

NSX Dual Active/Active Datacenters BCDR

Overview

The modern data center design requires better redundancy and demands the ability to have Business Continuity (BC) and Disaster Recovery (DR) in case of catastrophic failure in our datacenter. Planning a new data center with BCDR requires meeting certain fundamental design guidelines.

In this blog post I will describe the Active/Active datacenter with VMware Full SDDC product suite.

The NSX running in Cross-vCenter mode, this ability introduced in VMware NSX release 6.2.x. In this blog post we will focus on network and security.

An introduction and overview blog post can be found in this link:

http://blogs.vmware.com/consulting/2015/11/how-nsx-simplifies-and-enables-true-disaster-recovery-with-site-recovery-manager.html

The goals that we are trying to achieve in this post are:

  1. Having the ability to deploy workloads with vRA on both of the datacenters.
  2. Provide Business Continuity in case of a partial of a full site failure.
  3. Having the ability to perform planned or unplanned migration of workloads from one datacenter to another.

To demonstrate the functionality of this design I’ve created demo ‘vPOD’ in VMware internal cloud with the following products in each datacenter:

  • vCenter 6.0 with ESXi host 6.0
  • NSX 6.2.1
  • vRA 6.2.3
  • vSphere Replication 6.1
  • SRM 6.1
  • Cloud Client 3.4.1

In this blog post I will not cover the recovery part of the vRA/vRO components, but this could be achieved with a separated SRM instance for the management infrastructure.

Environment overview

I’m adding short video to introduce the environment.

NSX Manager

The NSX manager in Site A will have the IP address of 192.168.110.15 and will be configured as primary.

The NSX Manager in site B will be configured with the IP 192.168.210.15 and is set as secondary.

Each NSX manager pairs with its own vCenter and learns its local inventory. Any configuration change related to the cross site deployment will run at the primary NSX manager and will be replicated automatically to the remote site.

 

Universal Logical Switch (ULS)

Creating logical switches (L2) between sites with VxLAN is not new to NSX, however starting from version 6.2.X we’ve introduced the ability of stretching the L2 between NSX managers paired to different vCenters. This new logical switch is known as a ‘Universal Logical Switch’ or ‘ULS’. Any new ULS we will create in the Primary NSX Manger will be synced to the secondary.

I’ve created the following ULS in my Demo vPOD:

Universal Logical Switch (ULS)

Universal Distributed Logical Router (UDLR)

The concept of a Distributed Logical Router is still the same as it was before NSX 6.2.x. The new functionally that was added to this release allows us to configure Universal Distributed Logical Router (UDLR).  When we deploy a UDLR it will show up in all NSX Managers Universal Transport Zone.

The following UDLR created was created:

Universal Distributed Logical Router (UDLR)

Universal Security Policy with Distributed Firewall (UDFW)

With version 6.2.x we’ve introduced the universal security group and universal IP-Set.

Any firewall rule configured in the Universal Section must be IP-SET or Security Group that contain IP-SET.

When we are configuring or changing Universal policy, automatically there is a sync process that runs from the primary to the secondary NSX manager.

The recommended way to work with an ipset is to add it to a universal security group.

The following Universal security policy is an example to allow communicating to 3-Tier application. The security policy is built from universal security groups. Each group contain IP-SET with the relevant IP address for each tier.

Universal Security Policy with Distributed Firewall (UDFW)

vRA

At the automation side we’re creating two unique machine blueprints per site. The MBP are based on Classic CentOS image that allows us to perform some connectivity tests.

The MBP named “Center-Site_A” will be deployed by vRA to Site A into the green ULS named: ULS_Green_Web-A.

The IP address pool configured for this ULS is 172.16.10.0/24.

The MBP named “Center-Site_B” will be deployed by vRA to Site B into the blue ULS named: ULS_Blue_Web-B.

The IP address pool configured for this ULS is 172.17.10.0/24

vRA Catalog

Cloud Client:

To quote from VMware Official documentation:

“Typically, a vSphere hosted VM managed by vRA belongs to a reservation, which belongs to a compute resource (cluster), which in turn belongs to a vSphere Endpoint. The VMs reservation in vRA needs to be accurate in order for vRA to know which vSphere proxy agent to utilize to manage that VM in the underlying vSphere infrastructure. This is all well and good and causes few (if any) problems in a single site setup, as the VM will not normally move from the vSphere endpoint it is originally located on.

With a multi-site deployment utilizing Site Recovery Manager all this changes as part of the site to site fail over process involves moving VMs from one vCenter to another. This has the effect in vRA of moving the VM to a different endpoint, but the reservation becomes stale. As a result it becomes no longer possible to perform day 2 operation on the VMs until the reservation is updated.”

When we failover VMs from Site A to Site B cloud client will run the following action behind the science to solve this challenge.

Process Flow for Planned Failover:

Process Flow for Planned Failover

The Conceptual Routing Design with Active/Active Datacenter

The main key point for this design is to run Active/Active for workloads in both datacenters.

The workloads will reside on both Site A and Site B. In the modern datacenter the entry point is protected with perimeter firewall.

In our design each site has its on perimeter firewall run independently FW_A located in Site A and FW_B Located in Site B.
Site A (Shown in Green color) run its own ESGs (Edge Security Gateways), Universal DLR (UDLR) and Universal Logical Switch (ULS).

Site B site (shown in Blue color) have different ESGs, Universal DLR (UDLR) and Universal Logical Switch (ULS).

The main reason for the different ESG, UDLR and ULS per site is to force single ingress/egress point for workload traffic per site.

Without this ingress/egress deterministic traffic flow, we may face asymmetric routing between the two sites, that means that ingress traffic will be via Site A to FW_A and egress via Site B to FW_B, this asymmetric traffic will dropped by the FW_B.

Note: The ESGs in this blog run in ECMP mode, As a consequence we turned off the firewall service on the ESGs.

The Green network will always will be advertise via FW_A.  For an example The Control VM (IP 192.168.110.10) shown in the figure below need to access the Green Web VM connected to the ULS_Web_Green_A , the traffic  from the client will be routed via Core router and to FW_A, from there to one of the ESG working in ECMP mode, then to the Green UDLR and finally to the Green Web VM itself.

Now Assume the same client would like to access the Blue Web VM connected to ULS_Web_Blue_B, this traffic will be routed via the Core router to FW_B, from there to one of the Blue ESG working in ECMP mode, to the Blue ULDR and at the end to the Blue VM itself.

Routing Design with Active/Active Datacenter

What is the issue with this design?

What will happen if we will face a complete failure in one of our Edge Clusters or FW_A?

For our scenario I’ve combined failures of the Green Edge cluster and FW_A in the image below.

In that case we will lose all our N-S traffic to all of our ULS behind this Green Edge Cluster.

As a result, all clients outside the SDDC will lose connectivity immediately to all of the green Green ULS.

Please note: forwarding traffic to the Blue ULS will continue to work in this event regardless of the failure in Site A.

 

PIC7

If we’ll have a stretched vSphere Edge cluster between Site A and Site B, then we will able to leverage vSphere HA to restart the failed Green ESGs in the remote Blue site (This is not the case here, in our design each site has its own local cluster and storage), but even if we had vSphere HA, the restart process can take few minutes. Another way to recover from this failure is to manually deploy Green ESGs in Site B, and connect them to Site B FW_B. The recovery time of this solution could take few minutes. Both of these options are not suitable for modern datacenter design.

In the next paragraph I will introduce a new way to design the ESGs in Active/Active datacenter architecture.

This design will be much faster and will work in a more efficient way to recover from such an event in Site A (or Site B).

Active/Active Datacenter with mirrored ESGs

In this design architecture we will be deploying mirrored Green ESGs in Site B, and blue mirrored ESGs into Site A. Under normal datacenter operation the mirrored ESGs will be up and running but will not forward traffic. Site-A green ULS traffic from external clients will always enter via Site A ESGs (E1-Green-A , E2-Green-A) for all of Site A Prefix and leave through the same point.

Adding the mirrored ESGs add some complexity in the single Ingres/Egress design, but improves the converge time of any failure.

PIC8How Ingress Traffic flow works in this design?

Now we will explain how the Ingress traffic flow works in this architecture with mirrored ESGs. In order to simplify the explanation, we will be focusing only on the green flow in both of the datacenters and remove the blue components from the diagrams but the same explanation works for the Blue Site B network as well.

Site A Green UDLR control VM runs eBGP protocol with all Green ESGs (E1-Green-A to E4-Green-B). The UDLR Redistributes all connected interfaces as Site A prefix via eBGP. Note: “Site A prefix” represent any Green Segments part of the green ULS.

The Green ESGs (E1-Green-A  to E4-Green-B) sends out via BGP Site-A’s prefix to both physical firewalls: FW_A located in Site A and FW_B located Site B.

FW_B in Site B will add BGP AS prepending for Site A prefix.

From the Core router point of view, we’ll have two different paths to reach Site A Prefix: one via FW_A (Site A) and the second via FW_B (Site B). Under normal operation, this traffic will flow only through Site A because of the fact that Site B prepending for prefix A.

PIC9

Egress Traffic

Egress traffic is handled by UDLR control VM with different BGP Weigh values.

Site A ESGs: E1-Green-A and E2-Green-A has mirrors ESGs: E3-Green-B and E4-Green-B located at Site B. The mirrors ESGs provide availability. Under normal operation The UDLR Control VM will always prefer to route the traffic via higher BGP Wight value of E1-Green-A and E2-Green-A.  E3-Green-B and E4-Green-B will not forward any traffic and will wait for E1-E2 to fail.

In the figure below, we can see Web workload running on Site A ULS_Green_A initiate traffic to the Core. This egress traffic pass through DLR Kernel module, trough E1-Green-A ESG and then forward to Site A FW_A.

PIC10

There are other options for ingress/egress within NSX 6.2:

Great new feature called ‘Local-ID’. Hany Michael wrote a blog post to cover this option.

In Hany’s blog we don’t have a firewall like in my design so please pay attention to few minor differences.

http://www.networkskyx.com/2016/01/06/introducing-the-vmware-nsx-vlab-2-0/

Anthony Burke wrote a blog post about how to use local-id with physical firewall

https://networkinferno.net/ingress-optimisation-with-nsx-for-vsphere

Routing updates

Below, we’re demonstrating routing updates for Site-A, but the same mechanism works for Site B. The Core router connected to FW_A in Site A will peer with the FW_A via eBGP.

The core will send out 0/0 Default gateway.

FW_A will perform eBGP peering with both E1-Green-A and E2-Green-A. FW_A will forward the 0/0 default gateway to Green ESGs and will receive Site A green Prefix’s from Green ESGs. The Green ESGs E1-Green-A and E2-Green-A peers in eBGP with UDLR control VM.

The UDLR and the ESGs will work in ECMP mode, as results the UDLR will get 0/0 from both ESGs. The UDLR will redistribute connected interfaces (LIFs) to both green ESGs.

The golden rule of iBGP is that each iBGP router must peer with all other iBGP Neighbors unless we use route-reflector or confederation (currently Not support on NSX). Therefore, if iBGP is used, its restriction will force us to peer between all ESGs and the UDLR control VMs, resulting in exponential operation complexity.  As a result, it will be a better decision to go with eBGP between ESGs to UDLR Control VM.

In order to reduce the eBGP converge time of Active UDLR control VM failure, we will configure flowing static route in all of the Green side to point to UDLR forwarding address for the internal LIF’s.

Routing filters will apply on all ESGs to prevent unwanted prefixes advertisement and EGSs becoming transit gateways.

PIC11

Failure of One Green ESG in Site A

The Green ESGs: E1-Green-A and E2-Green-A working in ECMP mode. From UDLR and FW_A point of view both of the ESG work in Active/Active mode.

As long as we have at least one active Green ESG in Site A, The Green UDLR and the Core router will always prefer to work with Site A Green ESGs.

Let’s assume we have active flow of traffic from the Green WEB VM in site A to the external client behind the core router, and this traffic initially passing through via E1-Green-A. In and event of failure of E1-Green-A ESG, the UDLR will reroute the traffic via E2-Green-ESG because this ESG has better weight then Green ESGs on site B (E3-Green-B and E4-Green-B).

FW_A is still advertising a better as-path to ‘ULS_Web_Green_A’ prefixes than FW_B (remember FW_B always prepending Site_A prefix).

We’ll use low BGP time interval settings (hello=1 sec, hold down=3 sec) to improve BGP converge routing.

 

PIC12

Complete Edges cluster failure in site A

In this scenario we face a failure of all Edge cluster in Site A (Green ESGs and Blue ESGs), this issue might include the failure of FW_A.

Core router we will not be receiving any BGP updates from the Site A, so the core will prefer to go to FW_B in order to reach any Site A prefix.

From the UDLR point of view there arn’t any working Green ESGs in Site A, so the UDLR will work with the remaining green ESGs in site B (E3-Green-B, E4-Green-B).

The traffic initiated from the external client will be reroute via the mirrored green ESGs (E3-Green-B and E4-GreenB) to the green ULS in site B. The reroute action will work very fast based on the BGP converge routing time interval settings (hello=1 sec, hold down=3 sec).

This solution is much faster than other options mentioned before.

Same recovery mechanism exists for failure in Site B datacenter.

PIC13

Note: The Green UDLR control VM was deployed to the payload cluster and isn’t affected by this failure.

 

Complete Site A failure:

In this catastrophic scenario all components in site A were failed. Including the management infrastructure (vCenter, NSX Manager, controller, ESGs and UDLR control VM). Green workloads will face an outage until they are recovered in Site B, the Blue workloads continues to work without any interference.

The recovery procedure for this event will be made for the infrastructure management/control plan component and for the workloads them self.

Recovery the Management/control plan:

  • Log in to secondary NSX Manager and then Promote Secondary NSX Manager to Primary by: Assign Primary Role.
  • Deploy new Universal Controller Cluster and synchronize all objects
  • Universal CC configuration pushed to ESXi Hosts managed by Secondary
  • Redeploying the UDLR Control VM.

The recovery procedure for the workloads will run the “Recovery plan” from SRM located in site B.

PIC14

 

Summery:

In this blog post we are demonstrating the great power of NSX to create Active/Active datacenter with the ability to recover very fast from many failure scenarios.

  • We showed how NSX simplifies Disaster Recovery process.
  • NSX and SRM Integration is the reasonable approach to DR where we can’t use stretch vSphere cluster.
  • NSX works in Cross vCenter mode. Dual vCenters and NSX managers improving our availability. Even in the event of a complete site failure we were able to continue working immediately in our management layer (Seconday NSX manager and vCenter are Up and running).
  • In this design, half of our environment (Blue segments) wasn’t affected by a complete site failure. SRM recovered our failed Green workloads without need to change our Layer 2/ Layer 3 networks topology.
  • We did not use any specific hardware to achieve our BCDR and we were 100% decupled from the physical layer.
  • With SRM and vRO we were able to protect any deployed VM from Day 0.

 

I would like to thanks to:

Daniel Bakshi that help me a lots to review this blog post.

Also Thanks Boris Kovalev and Tal Moran that help to with the vRA/vRO demo vPOD.

 

 

 

Posted in Design, DLR, Edge, Install

NSX Service Composer: Methodology Concept

Background

Recently in one of my NSX projects I was asked by the customer to develop flexible yet simple to use Security methodology of working with NSX service composer.

The focus was to build the right construct of security groups and security policy based on following requirements:

  • The customer owns different environment types: Dev, Pre-Prod and Prod, Each environment required different security policy.
  • Customer would like to avoid creating specific blocking rules between the environments, such deny rules will cause operation complexity. Security policy should be based on specific allowing rules while all others traffic will be blocked in the last cleanup rule.
  • Minimizing the human error of connecting workload to the wrong security group, which may result in giving it an unwanted access. For example connecting Prod workload to Dev security group.
  • The customer would like to protect 3-tier applications (Web, App and DB), these applications run on 3 vSphere clusters (Dev, Pre-Prod and Prod) .

 

To achieve the customer requirements, we will build a security concept that is based on the NSX Service Composer.

We will demonstrate the implementation of security policies and security groups to protect 3-tier applications, based on the assumption that the application run on pre-prod cluster but the same concept applies to any cluster.

 

Security Level Concept:

We will use the concept of “Security Levels” to differentiate between firewall rules granularly.

Each level will have different firewall access policy starting from zero (no access) till the highest level (Application access).

Level-1 Basic Security groups (SG)

Level 1 (L1) SG used to create the building block for the firewall rules, we are not using any security policy directly with Level 1 security groups.

The following security groups created at Level 1:

Cluster SG

Cluster Security Group represents different vSphere clusters.

In our example: Dev, Pre-Prod and Prod. (Some customers may have only two clusters (Dev and Prod) or a similar scenario.

For each vSphere cluster we will have a dedicated security group. Any deployed VM will be included automatically and dynamically in the relevant Cluster security group.

For example: any VM from Pre-Prod cluster will be included in “SG-L1-CL-Pre-Prod” security group.

Picture1

By creating this dynamic criteria, we have eliminated the need for manual human action.

We can leverage this ability further and create Security policy levels on top on this one.

By doing that we are reducing the human error factor of connecting Virtual Machines to the wrong security groups and as a result enabling them with dangerous unwanted security access.

In Level-1 SG, the machines are member only in a cluster security group representing its vSphere cluster environment and it will not get any dFW rules at this level.

Environment Security Group

This Security Group represents the environment that the machine belongs to.

For example: We might have a different Prod env for IT, R&D, and Sales.

This is very useful when we want to say that a machine is “Prod” and running in “R&D” env rather than “Sales”.

Env-L1 could represent different departments or projects in the company according to the way you build your infrastructure.

For example we will create a security group called “SG-L1-EN-R&D” to represent any machine owned by R&D.

The membership criteria in this example is security tag called “L1-ST-ENV-R&D”.

Picture2

Application Security Group

The Application Security Group comes to indicate the application installed on the machine (For example: Web, App or DB).

Virtual Machines will be assigned to this group by NSX security tag.

An example of security group name “SG-L1-Web” with match criteria security tag called
“L1-ST-APP-Web”.

Picture3

The full list of level 1 security tag shown in image bellow:

Picture4

Level 1 Security Groups Building Blocks Concept:

Level 1 Security Groups Building Blocks Concept

Level 2 Infrastructure Security Group

Infrastructure rules are used by machines to get common System Services like: Active directory, DNS, Anti virus and different agents and services that manage the environment.

Combining both L1-Cluster and L1-Env security groups (with Logical “AND”) will form the “Infrastructure” security group shown as Level-2 that allow the Virtual Machines to get infrastructure security policy based on the VM role.

For example, we will create Level 2 Security group called “SG-L2-INF -R&D“, this SG represent Virtual Machines from the SG-L1-CL-Pre-Pre SG  AND belong to SG-L1-EN-R&D SG environment.

The match criteria are security groups, we called this nested security groups.

Picture6

The Result of adding Level 2 security on-top of Level 1 security is illustrated in the following diagram:

Level 2 security

 

Level 3 Application Security Group

Level-3 security group are used for application access level. Security groups in this level are combining a level-2 infrastructure SG AND a level 1 application SG.

For example, we will create the security group “SG-L3-Pre-Pro-WEB” with dynamic membership criteria matching of L1 Security group “SG-L1-WEB” and security group “SG-L2-INF-R&D”:

Picture8

The relation between the different security groups is illustrated in the next diagram:

Level 3

For example, web VM from the web tier. This VM was deployed to Pre-Prod cluster, as results this VM automaticity belong to “SG-L1-CL-Pre-Prod.

The VM got the NSX security tag called “L1-ST-EN-Pre-Prod” and as result it will be a member of security group called “SG-L1-EN-Pre-Prod”.

The VM was attached with the NSX security tag “L1-ST-APP-Web” and is now a member of security group called “SG-L1-APP-WEB”.

Because of the membership in both security groups: “SG-L1-EN-Pre-Prod” AND “SG-L1-APP-WEB” this VM is automatically a member of security group called “SG-L2-EN-Pre-Prod”.

As a result of the VM being a member of “SG-L2-EN-Pre-Prod” and “SG-L1-AP-WEB” it will automatically be member of “SG”L3-Pre-Prod-APP-WEB”

Please note we’re demonstrating here just the WEB tier what the same concept Apply to APP and DB tier.

Service Composer Security Policy

Security policy for L2 infrastructure workloads in service composer includes generic firewall rules to enable workloads with infrastructure system connectivity like DNS,AD, Anti-Virus etc..

For example Level 2 security policy named as “L2-SP-INF-R&D” and contain example of 4 firewall rules:

Picture10

then we’ll apply the “L2-SP-INF-R&D” security policy on the “SG-L2-INF-R&D” security group.

Service Composer Application Security Policy

Security policy for Application workloads in service Composer includes firewall policy for application. For example, the web tier security policy called “SG-L3-R&D-WEB” and contains one firewall rule:

Picture11

Then we will apply the “L3-SP-R&D-WEB” to security policy on the “SG-L3-R&D-WEB” security group.

Example for security policy to allow the Web tier to talk to App tier with Tomcat service:

Picture12

Then we’ll apply the “L3-SP-R&D-APP” security policy on the “SG-L3-R&D- APP” security group.

Example for security policy to allow App tier to talk to DB tier with MySQL:

Picture13

Then we will apply the “L3-SP-R&D-DB” to security policy on the “SG-L3-R&D-DB” security group.

Starting from NSX version 6.2.x we can enable the “Apply To” feature to automatically enforce security policy on the security group object instead of the default distributed firewall (means apply the policy anywhere).

This is a great feature that help us avoid “spamming” of objects with unrelated dFW rules so we are more efficient with our system.

To enable this feature we need to use the following steps. In Service Composer click on “Edit Policy Firewall Setting”

Picture14

Then choose the checkbox the “Policy Security Group” instead of “Distributed Firewall”.

Picture15

We can view the full service composer firewall result in the “Firewall” Tab:

Picture16

To demonstrate the effective security policy combined with the different security levels let’s look at ‘Monitor -> service Composer’ at a VM object level.

Here is screenshot of Web-02a VM part of the WEB tier:

Picture17

The effective security policy for web-02a can be verified under Monitor -> Service Composer tab in the web client:

Picture23

The effective security policy for App-02a can be shown in the Monitor -> Service Composer tab:

Picture24

The effective security policy for DB-02a can be shown in the Monitor -> Service Composer tab:

Picture25

To recap the complete Security Group list:

Picture26

The complete Security Policy list:

Picture27

I would like to Thanks to Daniel Bakshi that review this blog post.

 

Reference blogs post by my colleagues:

Sean Howard wrote great blog post with service composer concept:

http://nsxperts.com/?p=65

Anthony Burke also Cover the Service Composer subject :

https://networkinferno.net/service-composer-security-groups-and-security-tags

And from  Service Composer on VMware official wrote by Romain Decker 

https://blogs.vmware.com/consulting/2015/01/automating-security-policy-enforcement-nsx-service-composer.html

Posted in Design, Firewall

NSX Distributed Firewall Deep Dive

The following topics will be covered by this NSX DFW Deep dive:

NSX Distributed Firewall Overview:

NSX DFW is an distributed firewall spread over ESXi host and enforced as close to source of the VMs traffic (shown in each VM). The DFW runs as a kernel service inside the ESXi host.

With the NSX DFW we can enforce a stateful firewall service for VMs and the enforcement point will be at the VM virtual NICvNIC. Every packet that leaves the VM (before VTEP encapsulation) or enters the VM (After VTEP deencapsulation) can be inspected with a firewall policy.

 

The DFW runs inside the ESXi host as a kernel space module, resulting in an impressive throughput.

What makes the DFW an amazing feature is that as we add more ESXi host to vSphere cluster we increase the DFW throughput capacity.

The DFW rules can be based on Layer 2 up to Layer 4 and with 3-Party vendor integration the NSX can implement security features up and including L7.

  • L2 rules are based on MAC address L2 protocols like ARP, RARP and LLDP etc.
  • L3 rules are based on IP source destination and L4 uses a TCP or UDP service port.

The policy is created in centralized point at the vSphere vCenter server using vCenter web client. The objects used are being used from the vCenter inventory.

How NSX Distributed Firewall work:

 This section take from amazing NSX Design guide:

The DFW instance on an ESXi host (1 instance per VM vNIC) contains 2 separate tables:

Rule table: used to store all policy rules.

Connection tracker table: cache flow entries for rules with permit action.

Note: a specific flow is identified by the 5-tuple information Source IP address/Destination IP address/protocols/L4 source port/L4 destination port. Notice that by default, DFW does not perform a lookup on L4 source port, but it can be configured to do so by defining a specific policy rule.

Before exploring the use case for these 2 tables, let’s first understand how DFW rules are enforced:

DFW rules are enforced in top-to-bottom ordering. Each packet is checked against the top rule in the rule table before moving down the subsequent rules in the table. The first rule in the table that matches the traffic parameters is enforced Because of this behavior, when writing DFW rules, it is always recommended to put the most granular policies at the top of the rule table. This is the best way to ensure they will be enforced before any other rule.

DFW default policy rule (the one at the bottom of the rule table) is a “catch-all” rule: packet not matching any rule above the default rule will be enforced by the default rule. After the host preparation operation, the DFW default rule is set to ‘allow’ action. The main reason is because VMware does not want to break any VM to VM communication during staging or migration phases. However, it is a best practice to change the default rule to ‘block’ action and enforce access controls through a positive control model (only traffic defined in the firewall policy is allowed onto the network).

Let’s now have a look at policy rule lookup and packet flow:

An IP packet (first packet – pkt1) that matches Rule number 2 is sent by the VM. The order of operation is the following: Lookup is performed in the connection tracker table to check if an entry for the flow already exists.

As Flow 3 is not present in the connection tracker table (i.e miss result), a lookup is performed in the rule table to identify which rule is applicable to Flow 3. The first rule that match the flow will be enforced.

Rule 2 matches for Flow 3. Action is set to ‘Allow’.

Because action is set to ‘Allow’ for Flow 3, a new entry will be created inside the connection tracker table. The packet is then transmitted properly out of DFW.

 

 

 

DFW policy rule lookup and packet – subsequent packets.

Subsequent packets are processed in this order:

Lookup is performed in the connection tracker table to check if an entry for the flow already exists.

An entry for Flow 3 exists in the connection tracker table => Packet is transmitted properly out of DFW

One important aspect to emphasize is that DFW fully supports vMotion (automatic vMotion with DRS or manual vMotion). The rule table and the connection tracker table always follow the VM during vMotion operation. The positive result is there is no traffic disruption during workload moves and connections initiated before vMotion remain intact after the vMotion is completed. DFW brings VM movement freedom while ensuring continuous network traffic protection.

Note: this functionality is not dependent of Controllers or NSX Manager being up and available.

NSX DFW brings a paradigm shift that was not possible before: security services are no longer dependent on the network topology. With DFW, security is completely decoupled from logical network topology.

In legacy environments, to provide security services to a server or set of servers, traffic from/to these servers must be redirected to a firewall using VLAN stitching method or L3 routing operations: traffic must go through this dedicated firewall in order to protect network traffic.

With NSX DFW, this is no longer needed as the firewall function is brought directly to the VM. Any traffic sent or received by this VM is systematically processed by the DFW. As a result, traffic protection between VMs (workload to workload) can be enforced if VMs are located on same Logical Switch (or VDS VLAN-backed port-group) or on different Logical switches.

 

NSX DFW architecture:

The vCenter, NSX Manager and ESXi host are functioning as the 3 main components in this architecture.

DFW Architecture

DFW Architecture

NSX Manager: The NSX manager provides the single point of configuration and the REST API entry-points in a vSphere environment for NSX. The consumption of NSX can be driven directly via the NSX manager UI. In a vSphere environment this is available via the vSphere Web UI itself. Typically end-users tie in the network virtualization to their cloud management platform for deploying applications.

vCenter: VMware vCenter Server provides a centralized platform for managing your VMware vSphere environments so you can automate and deliver a virtual infrastructure with confidence.

ESXi host: VMware ESXi is the hypervisor running the virtual machines guest OS. DFW related modules:

  1. vShiled-Statefull-Firewal service daemon run in the user space
  2. vSIP run in the kernel space.

vShiled-Statefull-Firewal: Service demon Runs constantly on the ESXi host and performs multiple tasks:

  1. Interact with NSX Manager to retrieve DFW policy rules.
  2. Gather DFW statistics information and send them to the NSX Manager.
  3. Send audit logs information to the NSX Manager.
  4. Receive configuration from NSX manager to create (or delete) DLR Control VM, create (or delete) ESG.
  5. Part of the host preparation process SSL related tasks from NSX manager

 

Message Bus Client: The NSX Manager communicates with the ESXi host using a secure protocol called AMQP.

Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security”

Source: http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol.

RabbitMQ is the NSX AMQP implementation.

The vShiled-Statefull-Firewal is acting as a RabbitMQ Client in the ESXi. The vShiled-Statefull-Firewal is a user space service daemon and uses a TCP/5671 connection to the RabbitMQ server in the NSX manager. The message bus is used by the NSX Manager to send various information’s to the ESXi hosts:

Policy rules for the DFW module, controller nodes IP addresses, private key and host certificate to authenticate the communication between host and controller and requests to create/delete DLR instances.

vSIP: VMware Internetworking Service Insertion Platform. This is the distributed firewall kernel space module core component. The vSIP receives firewall rules from NSX manager (through vShiled-Statefull-Firewal) and downloads them down to each VM VMware-sfw.
Note: VMware Internetworking Service-Insertion Platform is also a framework that provides the ability to dynamically introduce 3rd party and VMware’s own virtual as well as physical security and networking services into VMware virtual network.

VPXA: A vCenter agent, installed on the ESXi host when the vCenter communicates with the ESXi host for first time. With the VPXA the vCenter manage the ESXi host for vSphere related tasks. Although it is not a direct part of the DFW architecture the VPXA is being used to report the VM IP address with VMtools. 

 

IOChains: VMware have a reserved IOchains handle packet process at the Kernel level.

Slot 0: DVFilter (Distributed Virtual Filter):

Distributed Virtual Filter DVFilteris is the VMkernel between the protected vNIC at SLOT 0 associated Distributed Virtual Switch (DVS) port, and is instantiated when a virtual machine with a protected virtual NIC gets created. It monitors the incoming and outgoing traffic on the protected virtual NIC and performs stateless filtering.

Slot 1: sw-sec (Switch Security): sw-sec module learns VMs IP and MAC address. sw-sec is critical component capture DHCP Ack and ARP broadcast message and forward this info as unicast to NSX Controller to perform the ARP suppression feature. sw-sec is the layer where NSX IP spoofgurd is implemented,

 

Slot-2: VMware-sfw: This is the place where DFW firewall rules are stored and enforced, VMware-sfw contains rules table and connections table.

NSX policy push process:

We create a security policy with vCenter GUI, this configuration then stored inside NSX manager. When we push the policy NSX Manager Will sends firewall rules in protobuf format to all vSphere clusters.

Protocol Buffers are a method of serializing structured data. As such, they are useful in developing programs to communicate with each other over a wire or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates from that description source code in various programming languages for generating or parsing a stream of bytes that represents the structured data.

Source: http://en.wikipedia.org/wiki/Protocol_Buffers

All ESXi host part from vCenter cluster will receive this policy with vShiled-Statefull-Firewal daemon over the message bus.At this point vShiled-Statefull-Firewal need to parse this rules and convert them from protobuf messages into the vmkernel vSIPIOCTL format, then vSIP will apply the rules on every VMs SLOT 2 VMware-sfw.

Note: This process flow describe “Applied To” filed in security policy is “Distributed Firewall” = rule applied everywhere.

A security administrator can create firewall rules built from vCenter objects like:

Cluster, DC, VDS port-group, Logical Switch, IPSets, Resource Pool, vAPP, VM, vNIC and Security Groups. The NSX firewall enforce point at the VMware-sfw can only understand IP address or MAC address.

In figure shows below we create a firewall rule to allow ICMP Source “Compute Cluster A to Destination “Compute Cluster B”:

Policy push process

Policy push process

The NSX Manager will need to figure out what are the object IDs represented by “Compute Cluster A” and “Compute Cluster B” and then reveal what the IP address correspond to those VMs.

The NSX firewall rules inside ESXi host are created as VSIPIOCTL format and then applied on the VMware-sfw.

The NSX manager relies on the vCenter internal database to get object-ID/IP address mapping, we can view the data with the vCenter MOB (Managed Object Browser) Using this url: https://vCenter_IP/mob/

The NSX manager keeps this info inside his internal database.

The vCenter server represents any object with a unique id. For example “Compute Cluster A” in fact equals to domain-c25 and “Compute Cluster B” equal to domain-c26. Here is screenshots from vCenter MOB:

MOB1

 NSX DFW and VMtools:

The NSX (up to current version 6.1.3) relies on the VMtools to learn the IP address of the Guest OS. The IP address learned from the VMtools is stored in the vCenter database and reported to NSX manager. We can tell if VMtools reports the IP address in VM Summery screen:

VMtools1

To view the vm-id we can be retrieve using vCenter at the following path:
https://<vCenter server>/mob and select content -> rootFolder -> childEntity -> vmFolder.

MOB2

To view the VMs IP address go to click on the vm-id from the list above, for example vm-36 (web-sv-01a).

     GuestInfo -> Net

In vCenter MOB “web-sv-01a” has Object ID: vm-36 with IP address “172.16.10.11”.

MOB3

If the VMTools was stopped or removed the vCenter removes the IP address entry immediately. An update notification will send to NSX manager cause to firewall module send a list updates to all the vShiled-Statefull-Firewal processes using protobuf format. If we configure firewall rules using vCenter objects (not IP address) as show in screenshot below, there will be a match on the last firewall rule (most of the time called catch-all rule).

MOB4

If this rule configure to block (as in this example) then this VM will be blocked from the network, but if this rules send the permit then the VM gets a full network access, which allows the user to bypass security policy.

Spoofguard: NSX feature that we can use to eliminate the need of VMtools to learn the VM IP address but this will be address in a different blog post.

 

 

 NSX Firewall and vMotion:

vMotion: enables to move a VM from one ESXi host to another while this VM is powered on and connected to network in vSphere environment.

This feature can be managed automatic by the vSphere DRS mechanism or manually by the vSphere administrator.

As we saw in the NSX DFW architecture for each VM we have two separate tables:

DFW table: Contains the firewall rules.

DFW Connection table: Contains the live active (approved) connection passing through this VM. When VM1 vMotion process starts these two tables will follow the VM1 movement over the vMotion link.

NSX dFW vMotion

NSX dFW vMotion

When the vMotion process completes the VM1 will land at esxcomp-01b and have same firewall rules and same connection table and as a result there’s no traffic disruption for VM1.

NSX dFW after vMotion

NSX dFW after vMotion

Note: The NSX Manager is not involved in this vMotion since we don’t use the “Applied To” feature (Explained later).

NSX Firewall Applied To:

By default when we’re creating a firewall rule in NSX, the “Applied to” field is set to “Distributed Firewall”. The firewall rule will be stored in NSX manager’s database and will be applied to all VMs vNICs, regardless of the VMs location. It’s important to mention that even when dFW rule is applied to all VMs, we still need a match on source/destination to take action on that rules.

The Applied To field is determined by vSphere objects: Cluster, Datacenter, vDS Distributed PortGroup, Logical Switch, Edge, Host system, Security Group, VM, or even vNIC!

NSX Apply To option

NSX Apply To option

When we start using the “Applied To” field, NSX Manager will map the “Applied To object to the corresponding vSphere cluster. Only ESXi hosts in the cluster will receive this rule.

Each ESXi host that receives this rule will use vShiled-Statefull-Firewal demon to parse it and figure out which VMs need to apply it. When using the “Applied To” field, the perimeter scope limit is the vSphere Cluster.

The shows below “Rule ID” 1002 in which we’ve configured “Applied To” Distributed Firewall. (Default behavior)

NSX Apply To 3

When we will push this firewall rule, NSX manager will send this rule to all vSphere clusters.

As a result: all VMs will get rule id 1002 at their vNic level.

 NSX Apply To 1

Continue our example we add another rules ID 1005 in which we use the “Applied To” on web-sv-01a and web-sv-01b. “Rule ID” 1002 stay the same with “Applied To” Distributed Firewall.

NSX Apply To 2

Assuming we have the following machines: web-sv-01a, web-sv-02a, app-sv-01A and sec-mgr-01a. And we’ve configured the rules above.

If web-sv-01a and app-sv-01A is part of “Computer Cluster A”, web-sv-02a is part of “Computer Cluster B” and sec-mgr-01a run in “Management Edge Cluster”.

Now when we push this policy the NSX Manager need will figure rules boundaries for each cluster.

NSX Apply To 4

Base this rules calculation NSX manager will push the firewall update to only to “Compute Cluster A” and “Compute Cluster B” , “Management Edge Cluster” will not receive any update because is not any vSphere object part of the “Applied To” filed. When “Compute Cluster A” received this rule firewall update vSIP kernel module will need to pars which VM will need to apply rule 1005. only web-sv-01a will get update new rule id 1005 In addition to old rule id 1002. VM name app-sv-01a will not get any firewall rule update. When “Compute Cluster B” received the rule update all ESXi host will get the firewall update inside the cluster, vSIP demon will parse it but only the ESXi host run VM name we-sv-01b will applied rule id 1005.

 

Applied To” benefits:

  • Reduce the amount of rules per VMware-sfw, this improves efficiency because the DFW will have less rules to evaluate for every new session.
  • In case of an overlap IP address within multi-tenancy environment we must use “Applied To” to distinguish between one tenant and others.

 

 

Cross cluster vMotion with “Apply To”:

When we do use the “Applied To” feature and the VM traffic perform a vMotion across clusters then the NSX Manger will be involved in the process to update destination VM cluster with the relevant firewall rules. The NSX manager must be up to complete this operation.

For example VM name web-sv-01a need to vMotion from Compute cluster A to Management Edge Cluster, vCenter will send vMotion notification to NSX Manager, as a results NSX Manager will trigger policy push to all ESXi host in Management Edge Cluster. web-sv-01a will get same rule before vMotion occur with just change in the domain object from “domain-c25” to “domain-c7”

NSX Apply To 5

 

If the NSX Manager is down, No update rule will be push! When VM land on destination cluster no VM specific rule apply for that VM. Its important to note that when the NSX manager is down all existing VMs forwarding plane with DFW rules continue to work, only “New” VMs cannot have firewall rules until NSX Manager come back.

The NSX DFW keeps the rule table as a “.dat” file at the ESXi host at the following path:

/etc/vmware/vShiled-Statefull-Firewal/vsipfw_ruleset.dat .

Created in the cloud with Saaspose.Words. http://saaspose.com

NSX L2 to L4 Firewall:

The VMware NSX DFW can enforce security policy from L2 (Data Link Layer) to L4 (Transport Layer).

With L2 we can create DFW rules base on the MAC address or L2 protocol like: ARP,RARP,LLDP.

L3/L4 security rules can be enforced with a source/destination IP address or TCP/UDP ports.

VMware have list of 3-Party vendor (constantly growing list)

 

Default Policy:

The DFW enforce L2 rules before L3.

L2 Default policy: Fresh DFW installations will have a default policy, Which is a L2 policy with Source: Any, Destination: Any, Action: Allow

 

L3 Default Policy:

We have a default L3 policy with Source: Any, Distention: Any, Action: Allow

 

DFW Exclusion functionality:

Working on daily tasks with firewalls can sometimes lead to a situation where you end up blocking your access to the firewall management.
Regardless of the vendor you are working with, this is very challenging situation.
The end result of this scenario is that you are unable to access the firewall management to remove the rules that are blocking you from reaching the firewall management!
 

Think of a situation where you deploy a distributed firewall into each of your ESX hosts in a cluster, including the management cluster where you have your vCenter server located.

And then you change the default rule from the default “Allow” value to “Block” (as shown below):

http://i2.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/deny-any-any-rule.jpg

What you’ve done by implementing this rule, can be shown in the following figure:

cut tree you sit on

Like the poor guy above dropping himself from his tree, by implementing this rule, you have blocked yourself from managing your vCenter.

 

Resilience:

OR: How can we protect ourselves from this situation?

Put your vCenter (and other critical virtual machines) in an exclusion list.
Any VM on that list will not receive any distributed firewall rules.
Go to the Network & security tab Click on NSX Manager

http://i2.wp.com/www.routetocloud.com/wp-content/uploads/2014/05/exclusion-vm-list-16.jpg

Exclusion VM list 1

 

Double click on the IP address object. In my example it is 192.168.110.42

Exclusion VM list 2

 

Click on Manage:

Exclusion VM list 3

Go in the “Exclusion List” tab and click on the green plus button.

http://i1.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-4.jpg

Choose your virtual machine.

http://i1.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-5.jpg

That’s it!  Now your VC is excluded from any enforced firewall rules.

http://i2.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-6.jpg

Exclusion VM list 6

 

Restoring default firewall rules set:

We can use the NSX Manager REST API to revert to the default firewall rules set to overcome a mistake when we do not yet have access to the VC.

Perform a configuration backup at this stage.
By default the NSX Manager is automatically excluded from DFW, so it is always possible to send API calls to it.
Using a REST Client or cURL:

https://addons.mozilla.org/en-US/firefox/addon/restclient

Submit a DELETE request to:

https://$nsxmgr/api/4.0/firewall/globalroot-0/config

http://i1.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-7.jpg

After receiving the expected code status 204 we will revert to the default DFW policy with default rule set to allow.

http://i2.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-8.jpg

Now we can access our VC! . As we can see, we reverted to the default policy, but don’t panic :-)  as we saved the policy.

http://i2.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-9.jpg

Click on the “Load Saved Configuration” button.

http://i2.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-10.jpg

Load Saved Configuration before the last Saved.

Note: Every time we push dFW policy NSX automaticly save the policy. We have limit of 50 versions. 

Exclusion VM list 11

Accept the warning by click Yes.

Exclusion VM list 12
 

Now we have our last policy before we blocked our VC, it’s loaded but not applied.

http://i0.wp.com/www.routetocloud.com/wp-content/uploads/2014/05/exclusion-vm-list-131.jpg

We will need to change the last Rule from Block to Allow to fix the problem.

http://i0.wp.com/50.87.248.91/~routetw6/wp-content/uploads/2014/05/exclusion-vm-list-14.jpg

And Click “Publish the Changes”.

http://i1.wp.com/www.routetocloud.com/wp-content/uploads/2014/05/exclusion-vm-list-151.jpg

 

  • It’s not possible to disable the DFW functionality per vNIC, Exclusion List only allows to disable DFW functionality per VM.
  • The following list is automatically excluded from DFW functions, by default: The NSX Manager, NSX Controllers, Edge Service Gateway and Service VM (PAN FW for instance).

 

NSX and Application Level Gateway(ALG):

Application Level Gateway (ALG) is the ability of a firewall or a NAT device that can either allow or block applications that uses dynamic ephemeral ports to communicate. In the absence of ALG, it could be a nightmare for security and network administrators with the options of trade off between communication and security. A network administrator can suggest opening a large number of ports which would pose security threat for the network or the given server while a security administrator can suggest blocking all other ports except the known ports which again breaks the communication.ALG reads the network address found inside the application payload and opening respective ports for preceding communication and also synchronizing data across multiple sessions across different ports. For example: FTP uses different ports for session initiation/control connection and actual data transfers. An ALG would manage any information passed on the control connection as well as data connection in the above case.NSX-v acts as ALG for few protocols such as FTP, CIFS, ORACLE TNS, MS-RPC, SUN-RPC.

 

NSX DFW logs:

In NSX we have three different log types: System Events, Rule Events and Audit Messages and host Flows.

NSX Manager System Events

NSX Manager System Events are related to NSX operation like: FW configuration applied, Fail to publish FW configuration, Filter created, Filter deleted, VM added to Security Group.

For each System event have severity level:

INFORMATIONAL(“Informational”),

LOW(“Low”),
MEDIUM(“Medium”),
MAJOR(“Major”),
CRITICAL(“Critical”),
HIGH(“High”)

To view the NSX System Events Go to Network & Security -> NSX Managers Click on the NSX Manager IP address -> Monitor -> System Events

Here is example for Critical event polled by NSX manager.

This event indicate vShiled-Statefull-Firewal demon went down on ESXi host id “host-38”.

We can view system event from the ESXi host itself. Here is example for FW configuration events can be view vShiled-Statefull-Firewal.log. this event cuase by  policy push from the NSX manager to ESXi host. The file location is: var/log/vShiled-Statefull-Firewal.log

Example for output:

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] Received vsa message of RuleSet, length 67

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] Processed vsa message RuleSet: 67

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] L2 rule optimization is enabled

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] applying firewall config to vnic list on host host-10

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] Enabling TCP strict policy for default drop rule.

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] sending event: applied vmware-sfw ruleset 1425955389291 for vnic 500e519a-87fd-4acd-cee2-c97c2c6291ad.000

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] successfully saved config to file /etc/vmware/vShiled-Statefull-Firewal/vsipfw_ruleset.dat

2015-03-10T02:43:12Z vShiled-Statefull-Firewal: [INFO] Sending vsa reply of domain-c7 host host-10: 0

 

NSX Manager Audit Events:

This log contains all the events related to: Admin login, FW configuration changes (pre and post change of the DFW rule).

To view the Audit Events Go to Network & Security -> NSX Managers Click on the NSX Manager IP address -> Monitor -> Audit Logs.

Here is Example for User bob login to system:

 

ESXi DFW host Rules Messages:

This log contains all the events related to: The DFW has dedicated log file (introduced in version 6.1) to view start/termination session and drop/pass packets. This logs contains the rule id associated vCenter objects.

File name is: dfwpktlogs.log

File location on the ESXi host:  /var/log/dfwpktlogs.log 

A log example: more /var/log/dfwpktlogs.log 

2015-03-10T03:22:22.671Z INET match DROP domain-c7/1002 IN 242 UDP 192.168.110.10/138->192.168.110.255/138

 

1002 is the DFW rule-id

domain-c7 is cluster ID in the vCenter MOB.

192.168.110.10/138 is the source IP

192.168.110.255/138 destination IP

 

To view the Log filled we need to enable the “Log” option field.

By default when we create DFW rule there is no logging enabled. Logging occurs only after we enable the Log field on the firewall rules table

In order to see “Allow” or “Block” packet in the DFW logs files we need to change the “Log” field from “Do not log”  to “Log”.

In the following example we’re changing the last rule id 1002 from “Do no log” to “Log”:

In the next example for DFW log event we will see the results of a ping from my Control VM Management IP 192.168.110.10 to 172.16.10.12:

~ # tail -f /var/log/dfwpktlogs.log | grep 192.168.110.10

2015-03-10T03:20:31.274Z INET match DROP domain-c27/1002 IN 60 PROTO 1 192.168.110.10->172.16.10.12

2015-03-10T03:20:35.794Z INET match DROP domain-c27/1002 IN 60 PROTO 1 192.168.110.10->172.16.10.12

 

Live Flows:

With DFW we have ability to view live flows. These flows are pulled by the vShiled-Statefull-Firewal from the vSIP kernel module and aggregated appropriately. The NSX Manager pulls normal flows from vShiled-Statefull-Firewal every 5 minutes and realtime flows every 5 secs.

Enable the Flow Monitoring by clicking on “Flow Monitoring” -> Configuration and click on the Enable. Global Flow Collection should change to green, “Enabled” status

To View the vNIC flow go to “Live Flow” tab and browse for a specific VM and vNIC.

Click the start button

Live flow will show up in the screen. The refresh Rate is 5 second.

From ESXi host the /var/log/vsfwd.log file we can see the related events:

2015-03-18T03:20:01Z vShiled-Statefull-Firewal: [INFO] Received vsa message of FlowConfiguration, length 120

2015-03-18T03:20:01Z vShiled-Statefull-Firewal: [INFO] Processed vsa message FlowConfiguration: 120

2015-03-18T03:20:01Z vShiled-Statefull-Firewal: [INFO] Loaded flow config: [120]

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] Received message in request queue of topic FlowRequest

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] Received vsa message of FlowRequest, length 52

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] Processed vsa message FlowRequest: 52

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] rmqRealTimeFlowDataRetrieve started

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] Done with configuring start of real time flows

2015-03-18T03:21:29Z vShiled-Statefull-Firewal: [INFO] rmqRealTimeFlowDataPush started

 

Service Composer:

Service Composer helps you to provision and assign network and security services to applications in a virtual infrastructure.

You map these services to a security group, and the services are applied to the virtual machines in the security group

Security group

With NSX DFW we have the ability to group vCenter elements such as VMs to container called security groups. Base of this security groups we can built DFW rules. The of what vSphere object can be part of the security group can be dynamic or static.

Security group can be consume directly in to firewall tab without use the service composer.

 

Security Group = (Dynamic Inclusion + Static Inclusion) – Static Exclusion

Creating Security groups base of VM name example:

Go to Network & Security -> Service Composer -> Security Groups

 

 

Dynamic inclusion:

A list of dynamic options for inclusion.

In the example below we chose “VM name” for any VM contains the word “web”:

 

Static Inclusion:

We can select the object type that we want to (permanently) include as part of this security group.

With static inclusion we can create nested security groups.

  • Nested groups are group inside groups.

In the following example we will not exclude any object:

 Static exclusion:

We can select what is the object type we want to permanently exclude as part of this security group.

 

 

Summary of Security Group:

 

We can view the security group object members from the “Security Groups tab.

In our example, the “Web Servers” security has two VMs: web-sv-01 and web-sv-02a and this group membership is handled dynamically because the criteria is that they have “web” as part of their VM name.

If a new VM, called web-sv-03a” is being created in this vCenter it will automatically be part of this security group.

 

Security policy using security groups:

We can create a firewall rule that leverage this new security group:

Source can be any and destination will be the “Web Servers” security group.

The service can be one L4 service or group of services, here we chose HTTPS. 

 

We can apply this policy to any cluster running NSX “Distributed Firewall”.

We can apply this policy on security groups, which will result in the rule (policy) applied only to VMs that are part of this security groups.

The security rule:

vSphere environments are dynamic by nature. When new VMs join (or leave) a security group (ex. “Web Servers”) the vCenter will update his database and as a result an update notification will be sent to the NSX manager. This notification will trigger a list of updates to the VMs that are part of this security group due to being part of the “Applied To”.  Security administrator do not need to constantly update the policy rules for every time new VMs join or leave the network. NSX firewall policy can automatically reflect these changes and this is the reason using vCenter objects is so powerful

 

 

Security tag:

To keep the virtualization flexibility, without compromise security, VMware invented security tag.

By adding new tag attribute to VMs we can apply security policy.  Adding or removing tag to VM can be done dynamically by automation, 3-Party or even manual.

We can use the example we used above in which we created a security group called Web Servers”, but rather than use a VM name containing “web” as the criteria for this VM group membership, we can attach a security tag to this VM.

Create Manual Security tag:

 

Create the security tag name:

We have a user defined security tag with 0 VM count:

 

Manual apply Security Tag:

Select the “web servers” security tag name and right click.

 

Filter “web” from the list and choose web-sv-01a and web-sv-02a from list:

Now we can modify the “Web servers” security policy to use a dynamic including criteria:

 

After this changed we have two VM count with security tag “web servers”.

 

The security groups “Web serverscontains the VM names: web-sv-01a and web-sv-02a.

 

 

In the VMs summary page we can see the security tags that are applied to this VM and to which security group this VM belongs. From the “web-sv-01a” example:

Security Policy:

Using security policy we can create templates containing DFW policy approved security admin this is “how you want to protect” your environment, then apply this on security groups “WHAT you want to protect”. Security policy may contain traffic redirection rules to 3rd-party vendors for service chaining.

Security policy is part of the service composer building blocks.

We can apply a security policy to more than one security group.

In the example below we apply “Web Security Policy” to both “Security Group 1” and Security Group 2”.

 

 

 

A different option is to apply two different security policies to same security groups.

This can result in a contradiction between the policies.

For example we apply “Security Policy 1 and Security Policy 2 to “WEB Security Groups”:

The security policy precedency will be with a “Weight” value, configured by the security admin

In the following example we demonstrate this when we create two different security policies: “Allow ICMP SP” and “Allow HTTP SP” and apply both to the previously created security group “Web Servers.

Create “Allow ICMP SP:

 

In create “Firewall Rules”: The “Weight is 4300

  • The related action is Allow
  • The source filled is: any
  • Destination is: “Policy Security Groups”.

The following is the interesting part: Due to the fact that this security policy works as a template we may reuse it for different security groups and our goal is to avoid tying this template to a specific security group.

Service: ICMP ECHO Request, ICMP ECHO Replay.

 

Ready to compute and click Finish:

 

At the firewall tab we can note that at this stage:

  • We have not applied this security policy to any security groups and so this policy has not been activated yet. We can see it as the gray policy in the firewall tab.
  • There is no security group in the designation:

 

Create Allow WEB SP” is the same way:

Notice that the “Weight field is 1300, which is lower than the previous “Allow ICMP SP” 4300

Cerate the WEB rule (same flow as above):

 

The firewall policy order shows “Allow ICMP” before “Allow WEB”

Now we apply both security policies on the same security group, using “Apply Policy”:

Choose “Allow ICMP Security Policy”:

And do the same for the second security policy called “Allow WEB SP”.

In the “Security Policy tab view we can see the results of this action:

From the “Firewall tab we can see that now we have two activated service composer security rules.

In the service composer canvas view we have an excellent summery of the security services which were applied to the security group:

 

3rd party vendor service integration

The NSX DFW can integrate with 3rd party vendors to achieve a higher level of application security level with different services. The partner needs to register their service on NSX manager.

We define the traffic redirection policy in service composer.

For example, we can redirect the traffic that leaves/enters the VMs to a third party partner product device for inspection.

We can define traffic redirection in two different places.

Using “Partner Security Service”:

In this example with a Palo Alto firewall we define the “any” source traffic , designated to PAN-SG-WEB security group, will be redirected to the PAN firewall:

 

Using security policy:

 

We follow the same policy definition construct as DFW (i.e. same options for source field, destination field and services field) and the only difference is in the action field: instead of Block/Allow/Reject, a user can select between redirect/no redirect followed by a partner list (any partner that has been registered with NSX and that has been successfully deployed on the platform).Finally: Log options can be enabled for this traffic redirection rule.

Install NSX DFW:

NSX DFW Pre-requirements:

Table 1 list vSphere pre-requirements for NSX DFW

vCenter ESXi host NSX Manager VMtools vSphere Switch
5.5 or later 5.1,5.5 6.0 or later VMtoool must install and run on VM guest OS if DFW policy base on vCenter objects.
VMtools can be Any version
vMware Distributed switch (vDS)
version 5.1 or later.
VSS is not supported

 

It’s imported to mention that NSX DFW can work on VXLAN port-group or VLAN port-group.  Enable dFW on vSS is not tested by VMware and No supported mean if you enable it, it may work.

NSX Controller is not required with DFW. NSX Controller is only required for VXLAN and Logical Distributed Router.

The NSX DFW installation is done through the Host preparation process.

The NSX Manager triggers the NSX kernel modules installation inside a vSphere cluster and builds the NSX Control plan fabric.

Note: Before the host preparation process we need to complete the following:

  • Registering the NSX Manager with vCenter.
  • Deploying the NSX Controllers.

Three components are involved during the NSX host preparation: vCenter, NSX Manager, EAM (ESX Agent Manager).

Host Preperation1

vCenter Server:
Management of vSphere compute infrastructure.

NSX Manager:
Provides the single point of configuration and REST API entry-points in a vSphere environment for NSX.

EAM (ESX Agent Management):
The middleware component between the NSX Manager and the vCenter. The EAM is part of the vCenter and is responsible to install the VIBs (vSphere Installation Bundles), which are software packages prepared to be installed inside an ESXi host.

 

The host preparation begins when we click the “Install” button in the vCenter GUI.

  • This process is done at the vSphere Cluster level and not per ESXi host.
  • The EAM will create an agent to track the VIB’s installation process for each host. The VIB’s are being copied from the NSX Manager and cached in EAM. If the VIBs are not present in the ESXi host, the EAM will install the VIBs (ESXi host reboot is not needed at the end of this process).
  • During an NSX software upgrade, the EAM also takes care of removing the installed old version of the VIBs but an ESXi host reboot is then needed.

VIBs installed during host preparation:

  • esxdvfilter-switch-security
  • esx-vsip
  • esx-vxlan

Once the host preparation was successfully completed the ESXi host has a fully working Control Plane.

 Two control plan channels will be created:

  • RabbitMQ (RMQ) Message bus: Provides communication between the vShiled-Statefull-Firewal process on the ESXi hypervisor and the NSX Manager over TCP/5671.
  • User World Agent (UWA) process (netcpa on the ESXi hypervisor): Establishes TCP/1234 connection over SSL communication channels to the Controller Cluster nodes.

Troubleshooing DFW installation:

The NSX DFW installation is actually the host preparation process.

We have few examples for host preparation issues.

DNS Issues:

EAM fails to deploy VIBs due to misconfigured DNS or no DNS configuration on host.
We can verify if those DFW VIBs have been successfully installed by connecting to each ESXi host in the cluster and issuing the command “esxcli software vib list”.

~# esxcli software vib list | grep esx-vsip

esx-vsip                       5.5.0-0.0.2318233                     VMware  VMwareCertified   2015-01-24

~ # esxcli software vib list | grep dvfilter

esxdvfilter-switch-security   5.5.0-0.0.2318233                     VMware  VMwareCertified   2015-01-24

 

In this case, we may get a status of “Not Ready”:

Not Ready

The message clearly indicates “Agent VIB module not installed” on one or more hosts.

We can check the vSphere ESX Agent Manager for errors:

vCenter home > vCenter Solutions Manager > vSphere ESX Agent Manager”

On “vSphere ESX Agent Manager”, check the status of “Agencies” prefixed with “_VCNS_153”. If any of the agencies has a bad status, select the agency and view its issues:

EAM

We need to check the associated log  /var/log/esxupdate.log (on the ESXi host) for more details on host preparation issues.
Log into the ESXi host in which you have the issue, run “tail /var/log/esxupdate.log” to view the log

esxupdate error1

From the log it appears suddenly clear that the issues may be related to DNS name resolution.

Solution:
Configure the DNS settings in the ESXi host for the NSX host preparation to succeed.

 

TCP/80 from ESXi to vCenter is blocked:

The ESXi host is unable to connect to vCenter EAM on TCP/80:

Could be caused by a firewall blocking communication on this port. From the ESXi host /var/log/esxupdate.log file:

esxupdate: esxupdate: ERROR: MetadataDownloadError: (‘http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), None, “( http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), ‘/tmp/tmp_TKl58’, ‘[Errno 4] IOError: <urlopen error [Errno 111] Connection refused>’)”)

Solution:
The NSX-v has a list of ports that need to be open in order for the host preparation to succeed.
The complete list can be found in:
https://communities.vmware.com/docs/DOC-28142
 

Existing VIB’s Version

If an old VIBs version exists on the ESXi host, EAM will remove the old VIB’s
But the host preparation will not automatically continue.

Solution:
We will need to reboot the ESXi host to complete the process (this condition will be clearly indicated next to the host name on vCenter).

 

ESXi bootbank space issue:

If you try Upgrade ESXi 5.1u1 to ESXi 5.5 and then start NSX host preparation you may face issue and from /var/log/esxupdate log file you will see message like:
Installationerror: the pending transaction required 240MB free space, however the maximum size is 239 MB”
I faced this issue in a customer ISO of IBM blade but may appear in other vendors.

Solution:
Install fresh ESXi 5.5 Custom ISO. (This is the version I upgraded too)

 

EAM TCP/80:

If the vCenter runs on a Windows machine, other applications can be installed and already using port 80, causing a conflict with EAM port tcp/80.

For example: By default IIS server use TCP/80

Solution:
Use a different port for EAM:

Changed the port to 80 in eam.properties in \ProgramFiles\VMware\Infrastructure\tomcat\webapps\eam\WEB-INF\

 

Download VIBs link:

The NSX manager has a direct link to download the VIB’s as zip file:

https://$nsxmgr/bin/vdn/vibs/5.5/vxlan.zip

 

Reverting installation:

Reverting a NSX Prepared ESXi Host requires the following steps:

  • Remove the host from the vSphere cluster.
  • Put ESXi host in maintenance mode and remove the ESXi host from the cluster. This will automatically uninstall NSX VIBs.

Note: ESXi host must be rebooted to complete the operation.

Manually Uninstall VIBs:

The following commands can be entered directly on the ESXi host to remove the installed NSX VIBs:

esxcli software vib remove -n esx-vxlan

esxcli software vib remove -n esx-vsip

esxcli software vib remove -n dvfilter-switch-security

Note: The ESXi host must be rebooted to complete the operation

DFW (UWA) agent issues:

The VIBs installation completes successful but on rare occasions one or both user world agents is not functioning correctly. This could manifest itself as either:

  • The firewall showing a bad status Error for example.
  • The control plane between the hypervisor(s) and the controllers is being down
     

UWA error

Validate Message bus service is active on NSX Manager:

Check the messaging bus userworld agent status by running the command/etc/init.d/vShieldStateful-Firewall status on the ESXi hosts

vShield-Stateful-Firewall

 

vShiled-Statefull-Firewal is the service daemon part of UWA (User Web Agent) running on ESXi host.

To check if vShiled-Statefull-Firewal daemon is working properly, issue the following CLI

~ # ps | grep vShiled-Statefull-Firewal

36169 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36170 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36171 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36172 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36173 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36174 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36175 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36176 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36178 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36179 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

36909 36169 vShiled-Statefull-Firewal                /usr/lib/vmware/vsfw/vShiled-Statefull-Firewal

 

The ESX shows the threads this way, these are not processes but are threads.

The vShiled-Statefull-Firewal provided activates are performed by several threads:

 

  • Firewall Rule publishing, Flow monitoring, NetX config thread, heart beat, threshold monitoring, ipfix, netcpa proxy etc… Are all supported vShiled-Statefull-Firewal activities that are run by these threads.

Run the below command on the ESXi hosts to check for active messaging bus connection:

esxcli network ip connection list | grep 5671 (Message bus TCP connection)

network connection

Please ensure that port 5671 is opened for communication in the any external network firewall.

Logs are recorded under /var/log/vswfd.log. If the Message Bus is communicating properly with the VSM, you should see logs as follows (Heartbeats):

2015-03-10T14:10:34Z vShiled-Statefull-Firewal: [INFO] Received HeartBeart Seq 2545 , Sending response

2015-03-10T15:22:34Z vShiled-Statefull-Firewal: [INFO] Received HeartBeart Seq 2569 , Sending response

2015-03-10T16:34:34Z vShiled-Statefull-Firewal: [INFO] Received HeartBeart Seq 2593 , Sending response

2015-03-10T17:46:34Z vShiled-Statefull-Firewal: [INFO] Received HeartBeart Seq 2617 , Sending response

 

 

Since this is a module which operates at the kernel level, it is highly unlikely that the module would fail as it gets loaded as a part of the boot image. However, in case of any failures of the distributed firewall functionality; for an instance an ESXi host maxed out on the CPU, the traffic would be blocked by default and packets would start dropping for the VMs which are protected.

In event when vShiled-Statefull-Firewal is down:

          You would see “messaging infrastructure down on host” in System Events (with host name)

          New rules will not get pushed to the host. DFW UI would indicate last publish operation is pending (as opposed to succeeded …true even if its one out of 100 host that push failed).

          In all cases, Enforcement of all/any of already programmed rules will Never stop.

          If the vShiled-Statefull-Firewal crashes on the host,

          A watchdog process will restart it automatically.

          The downtime involved will be not observable as the restart is pretty quick.

          Every time a vShiled-Statefull-Firewal restart occurs, the NSX manager is contacted to sync all the rules info to make sure the state is in sync between the NSX manager and the host.

         If the vShiled-Statefull-Firewal is stopped manually on the host (i.e “/etc/init.d/vShieldStateful-Firewall stop”). Then there is no attempt to restart the process.

  

esxcfg-advcfg -l | grep Rmq

Run this command on the ESXi hosts to show all Rmq variables –there should be 16 variable in total

esxcfg-advcfg -g /UserVars/RmqIpAddress

Run this command on the ESXi hosts, it should display the NSX Manager IP address

RmqIpAddress

 

DFW Kernel Space:

Verify that the Kernel Module was loaded to memory:

VSIP (VMware Internetworking Service Insertion Platform) is the distributed firewall kernel module component

Command to check if distributed firewall kernel module is successfully installed on the host:

 

~ # vmkload_mod -l | grep vsip

vsip                     13   452

summarize-dvfilter

This command display all IOChains and firewall filter on ESXi host.

 

Fast Path = traffic filter in the Kernel module

Slow Path = Traffic redirected to 3-Party vendor like PaloAlto . In this screenshot we can see there is no Slow Path.

Filters: tied display for etch vNIC in this esxi host what Slot he belong to.

 

In Fastpath we can see the filter orders start from SLOT 0 dvfilter-faulter.

 

~ # summarize-dvfilter

Fastpaths:

agent: dvfilter-faulter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter

agent: ESXi-Firewall, refCount: 5, rev: 0x1010000, apiRev: 0x1010000, module: esxfw

agent: vmware-sfw, refCount: 2, rev: 0x1010000, apiRev: 0x1010000, module: vsip

agent: dvfilter-generic-vmwareswsec, refCount: 2, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-switch-security

agent: bridgelearningfilter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: vdrb

agent: dvfilter-generic-vmware, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-generic-fastpath

agent: dvfg-igmp, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfg-igmp

 

 

Filters:

port 50331654 vmk2

  vNic slot 0

   name: nic-0-eth4294967295-ESXi-Firewall.0

   agentName: ESXi-Firewall

   state: IOChain Attached

   vmState: Detached

   failurePolicy: failOpen

   slowPathID: none

   filter source: Invalid

 

port 50331655 vmk3

  vNic slot 0

   name: nic-0-eth4294967295-ESXi-Firewall.0

   agentName: ESXi-Firewall

   state: IOChain Attached

   vmState: Detached

   failurePolicy: failOpen

   slowPathID: none

   filter source: Invalid

 

world 35677 vmm0:web-sv-02a vcUuid:’50 26 b7 4d c5 6c 1e d9-47 c0 09 25 95 80 2f ad’

 port 50331656 web-sv-02a.eth0

 

vNic slot 2

   name: nic-35677-eth0-vmware-sfw.2

   agentName: vmware-sfw

   state: IOChain Attached

   vmState: Detached

   failurePolicy: failClosed

   slowPathID: none

   filter source: Dynamic Filter Creation

 

vNic slot 1

   name: nic-35677-eth0-dvfilter-generic-vmware-swsec.1

   agentName: dvfilter-generic-vmwareswsec

   state: IOChain Attached

   vmState: Detached

   failurePolicy: failClosed

   slowPathID: none

   filter source: Alternate Opaque Channel

 

Packet capture command:

 

When VM connect to Logical switch NSX will implement different IOChains security services in the VM packet flow. Each service represents with unique slot id.

IN the figure shows below we have VM name Web-sv-01a part of ESXi host name esxcomp-01a. web-sv-01a have tree different IOChains before the VM packet get to vDS port-group. This IOChains handle VM packet process at the Kernel level.

 

Slot 0: DVFilter (Distributed Virtual Filter):

Distributed Virtual Filter DVFilteris is the VMkernel between the protected vNIC at SLOT 0 associated Distributed Virtual Switch (DVS) port, and is instantiated when a virtual machine with a protected virtual NIC gets created. It monitors the incoming and outgoing traffic on the protected virtual NIC and performs stateless filtering.

Slot 1: sw-sec (Switch Security): sw-sec module learns VMs IP and MAC address. sw-sec is critical component capture DHCP Ack and ARP broadcast message and forward this info as unicast to NSX Controller to perform the ARP suppression feature. sw-sec is the layer where NSX IP spoofgurd is implemented,

 

 

Slot-2: VMware-sfw: This is the place where DFW firewall rules are stored and enforced, VMware-sfw contains rules table and connections table.

With vSphere we can capture VMs traffic with command pktcap-uw, for this example we send continuous ping (ICMP echo request) packet from web-sv-01a. The capture command will need to be place on IOChain SLOT 2 with appropriate filter name for web-sv-01a.

To find the exact filter name we need to use the command summarize-dvfilter.

We can grep the exact name with the –A 3 switch mean show 3 line more after the grep term found.

From ESXi host name esxcomp-01a:

~ # summarize-dvfilter | grep web-sv-01a  –A 3

world 35682 vmm0:web-sv-01a vcUuid:’50 26 c7 cd b6 f3 f4 bc-e5 33 3d 4b 25 5c 62 77′

 port 50331656 web-sv-01a.eth0

  vNic slot 2

   name: nic-35682-eth0-vmware-sfw.2

   agentName: vmware-sfw

 

From this output we can see that the filter name is nic-35682-eth0-vmware-sfw.2 for SLOT 2

pktcap-uw command help with -A output:

esxcomp-01a # pktcap-uw -A

Supported capture points:

        1: Dynamic — The dynamic inserted runtime capture point.

        2: UplinkRcv — The function that receives packets from uplink dev

        3: UplinkSnd — Function to Tx packets on uplink

        4: Vmxnet3Tx — Function in vnic backend to Tx packets from guest

        5: Vmxnet3Rx — Function in vnic backend to Rx packets to guest

        6: PortInputPort_Input function of any given port

        7: IOChain — The virtual switch port iochain capture point.

        8: EtherswitchDispath — Function that receives packets for switch

        9: EtherswitchOutput — Function that sends out packets, from switch

        10: PortOutput — <

Related Blog post:

An introduction to Zero Trust virtualization-centric security

What is a Distributed Firewall?

Deep Dive: How does NSX Distributed Firewall work

Distributed Firewall (DFW) in NSX for vSphere, and “Applied To:”

Security-as-a-Service with NSX Service Composer

Configure and Administer Firewall Services

Service Composer – Resultant Set of Policy

Validating Distributed Firewall rulesets in NSX

Stateful Firewall and NSX

 

Thanks to Tiran Efrat and Francis Guillier for reviewing this document and answering some of the questions during creating this document.

Posted in Design, Firewall Tagged with: , , , , , , , , , , , , , , , ,

vExpert 2015

It is a great honor to be selected as vExpert for 2015.

vexpert2015

My blog routetocloud.com focus only in VMware NSX-v and reflect my passion to this product.
Thank you at Congratulations to all the 2015 vExperts

Posted in Uncategorized

NSX-v Host Preparation

The information in this post is based on my NSX Professional experience in the field and from a lecture by Kevin Barrass, a NSX solution architect.

Thanks toTiran Efrat for reviewing this post.

Host preparation overview

Host preparation is the process in which the NSX manager installs the NSX Kernel module inside vSphere cluster and builds the NSX Control plan fabric.

Before the host preparation process we need to complete:

  1. Register the NSX Manager in the vCenter. This process was covered in NSX-V Troubleshooting registration to vCenter.
  2. Deploy the NSX Controllers, covered in deploying-nsx-v-controller-disappear-from-vsphere-client

Three components are involved during the NSX host preparation:
vCenter, NSX Manager, EAM(ESX Agent Manager).

Host Preperation1

vCenter Server:
Management of vSphere compute infrastructure.

NSX Manager:
Provides the single point of configuration and REST API entry-points in a vSphere environment for NSX.

EAM (ESX Agent Management):
The middleware component between the NSX manager and the vCenter. The EAM is part of the vCenter and is responsible to install the VIB (vSphere Installation Bundles), which are software packages prepared to be installed inside a ESXi host.

Host Preparation process

The host preparation begins when we click the “Install” process in vCenter GUI.

host preparation

host preparation

This process is done in the vSphere Cluster level and not per ESXi host. The EAM will create an agent to track the VIB’s installation process for each host. The VIB’s are being copied from the NSX manager and cache in EAM.
If the VIBs are not present in the ESXi host, the EAM will install the VIBs (ESXi host reboot is not needed).
The EAM will remove installed old version VIBs but an ESXi host reboot is needed.

VIBs installed during host preparation:
esx-dvfilter-switch-security
esx-vsip
esx-vxlan

The ESXi host has a fully working Control Plane after the host preparation was successfully completed

Two control plan channels will be created:

  • RabbitMQMessage bus: provides communication between the vsfwd process on the ESXi hypervisor to NSX Manager over TCP/5671.
  • User World Agent (UWA) process (netcpa on the ESXi hypervisor): establishes TCP/1234 over SSL communication channels to the Controller Cluster nodes.

Host Preperation2

Troubleshooting Host Preparation

DNS:

EAM fails to deploy VIBs due to misconfigured DNS or no DNS configuration on host.
We may get a status of “Not Ready”:

Not Ready

This indicates “Agent VIB module not installed” on one or more hosts.

We can check the vSphere ESX Agent Manager for errors:

“vCenter home > vCenter Solutions Manager > vSphere ESX Agent Manager”

On “vSphere ESX Agent Manager”, check the status of “Agencies” prefixed with “_VCNS_153” If any of the agencies has a bad status, select the agency and view its issues:

EAM

We need to check the associated log  /var/log/esxupdate.log (on the ESXi host) for more details on host preparation issues.
Log into host in which you have the issue, run “tail /var/log/esxupdate.log” to view the log

esxupdate error1

Solution:
Configure the DNS settings in the ESXi host for the NSX host preparation to success.

 

TCP/80 from ESXi to vCenter is blocked:

The ESXi host unable to connect to vCenter EAM on TCP/80:

Could be caused by a firewall block on this port. From the ESXi host /var/log/esxupdate.log file:

esxupdate: esxupdate: ERROR: MetadataDownloadError: (‘http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), None, “( http://VC_IP_Address:80/eam/vib?id=xxx-xxx-xxx-xxx), ‘/tmp/tmp_TKl58’, ‘[Errno 4] IOError: <urlopen error [Errno 111] Connection refused>’)”)

Solution:
The NSX-v has a list of ports that need to be open in order for the host preparation to succeed.
The complete list can be found in:
https://communities.vmware.com/docs/DOC-28142

 

Older VIB’s version:

If an old VIBs version exists on the ESXi host, EAM will remove the old VIB’s
But host preparation will not automatically continue.

Solution:
We will need to reboot the ESXi host to complete the process.

 

ESXi Bootbank Space issue:

If you try Upgrade ESXi 5.1u1 to ESXi 5.5 and then start NSX host preparation you may face issue and from /var/log/esxupdate log file you will see message like:
“Installationerror: the pending transaction required 240MB free space, however the maximum size is 239 MB”
I faced this issue in customer ISO of IBM blade but may appear in other vendors.

Solution:
Install fresh ESXi 5.5 Customer ISO. (this is the version i upgrade too)

 

vCenter on Windows, EAM TCP/80 taken by other application:

If the vCenter runs on a Windows machine, other applications can be installed and use port 80,  causing a conflict with EAM port tcp/80.

For example: By default IIS server use TCP/80

Solution:
Use a different port for EAM:

Changed the port to 80 in eam.properties in \ProgramFiles\VMware\Infrastructure\tomcat\webapps\eam\WEB-INF\

 

UWA Agent Issues:

In rare cases the installation of the VIBs succeeded but for some reason one or both of the userworld agents does not functioning correctly. This could manifest itself as:
The firewall showing a bad status OR The control plane between hypervisor(s) and the controllers being down
UWA error

If Message bus service is active on NSX Manager:

Check the messaging bus userworld agent status on hosts by running the command /etc/init.d/vShield-Stateful-Firewall status on the ESXi hosts

vShield-Stateful-Firewall

Check Message bus userworld logs on hosts at /var/log/vsfwd.log

esxcfg-advcfg -l | grep Rmq

Run this command on the ESXi hosts to show all Rmq variables –there should be 16 variable in total

esxcfg-advcfg -g /UserVars/RmqIpAddress

Run this command on the ESXi hosts, it should display the NSX Manager IP address

RmqIpAddress

Run this command on the ESXi hosts to check for active messaging bus connection

esxcli network ip connection list | grep 5671 (Message bus TCP connection)

network connection

 

 

The NSX manager has a direct link to download the VIB’s as zip file:

https://$nsxmgr/bin/vdn/vibs/5.5/vxlan.zip

 

Reverting a NSX prepared ESXi host:

Remove the host from the vSphere cluster:

Put ESXi host in maintenance mode and remove the ESXi host from the cluster. This will automatically uninstall NSX VIBs.

Note: ESXi host must be rebooted to complete the operation.

 

Manually Uninstall VIB’s:

esxcli software vib remove -n esx-vxlan

esxcli software vib remove -n esx-vsip

esxcli software vib remove -n dvfilter-switch-security

Note: ESXi host must be rebooted to complete the operation

Posted in Install, Troubleshooting Tagged with: , , , , , , , , , ,

Asymmetric routing with ECMP and Edge Firewall Enabled

What is Asymmetric Routing?

In Asymmetric routing, a packet traverses from a source to a destination in one path and takes a different path when it returns to the source.

Start from version 6.1 NSX Edge can work with ECMP – Equal Cost Multipath, ECMP traffic involved Asymmetric routing between Edges and DLR or between Edge and physical routers.

ECMP Consideration with Asymmetric Routing

ECMP with  Asymmetric routing is not a problem by itself, but will cause problems when more than one NSX Edge in place  and stateful services inserted in the path of the traffic.

Stateful services like firewall, Load Balanced  Network Address Translation (NAT) can’t work with asymmetric routing.

Explain the problem:

User from outside try to access Web VM inside the Data Center. the traffic will pass through E1 Edge.

From E1 the traffic will go to DLR transverse NSX distributed firewall and get to Web VM.

When Web VM respond back the traffic will hit the DLR default gateway. DLR have two option to route the traffic E1 or E2.

If DLR choose E2 the traffic will get the E2 and will Dropped !!!

The reason for this is E2 does not aware the state of session started at E1, replay packet from Red VM arrived to E2 are not match any existing session at E2.
From E2 perspective this is new session need to validate, any new TCP session should start with SYN, since this is not the begin of the session E2 will drop it!!!

Asymmetric Routing with Edge Firewall Enabled

Asymmetric Routing with Edge Firewall Enabled

Note: NSX Distributed firewall is not part of this problem, NSX Distributed firewall implement at the vNic level, all traffic get in/out same vNic.

there is no Asymmetric route in the vNic level, btw this is the reason when we vMotion VM, the Firewall Rule, Connection state is move with the VM itself.

ECMP and Edge Firewall NSX

Starting from version 6.1 when we enable ECMP  on NSX Edge get message:

Enable ECMP in 6.1 version

The firewall service disabled by default:

Enable ECMP in 6.1 version Firewall turnoff

Even if you try to enable it you will get warning message:

Firewall Service in 6.1 with ECMP

In version 6.1.2 when we enable ECMP we get same message:

Enable ECMP in 6.1 version

But the BIG difference is Firewall Service  is Not disable by default. (you need to turn it off)

Even if you have “Any, Any” rule with “Accept” action we still be subject for DROP packet subject of the Asymmetric routing problem!!!

Firewall Service Enable in 6.1.2

Even in Syslog or LogInSight you will not see this DROP packet !!!

The end users expirese for will be some of the session’s are working just fine (this sessions are not asymmetric) other session will drop (asymmetric sessions)

The place i found we can learn packet are drops because state of the session is with the command: show tech-support:

show tech-support
vShield Edge Firewall Packet Counters:
~~~~~~~~~~~~~~~ snip ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
rid    pkts bytes target     prot opt in     out     source               destination         
0        20  2388 ACCEPT     all  --  *      lo      0.0.0.0/0            0.0.0.0/0           
0        12   720 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            state INVALID
0        51  7108 block_out  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
0         0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-in tap0 --physdev-out vNic_+
0         0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-in vNic_+ --physdev-out tap0
0         0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-in na+ --physdev-out vNic_+
0         0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-in vNic_+ --physdev-out na+
0         0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
0        51  7108 usr_rules  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
0         0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0

From line 7 we can see DROP packet because of INVALID state.

Conclusion:

When you enable ECMP and you have more then one NSX Edge in you topology, go to Firewall service and disable it by yourself otherwise you will spend lots of troubleshooting hours 🙁

Posted in Design, Edge, Firewall, Troubleshooting

NSX Edge and DRS Rules

The NSX Edge Cluster Connects the Logical and Physical worlds and usually hosts the NSX Edge Services Gateways and the DLR Control VMs.

There are deployments where the Edge Cluster may contain the NSX Controllers as well.

In this section we discuss how to design an Edge Cluster to survive a failure of an ESXi host or an Physical entire chassis and lower the time of outage.

In the figure below we deploy NSX Edges, E1 and E2, in ECMP mode where they run active/active both from the perspective of the control and data planes. The DLR Control VMs run active/passive while both E1 and E2 running a dynamic routing protocol with the active DLR Control VM.

When the DLR learns a new route from E1 or E2, it will push this information to the NSX Controller cluster. The NSX Controller will update the routing tables in the kernel of each ESXi hosts, which are running this DLR instance.

 

1

 

In the scenario where the ESXi host, which contains the Edge E1, failed:

  • The active DLR will update the NSX Controller to remove E1 as next hop, the NSX Controller will update the ESXi host and as a result the “Web” VM traffic will be routed to Edge E2.
    The time it takes to re-route the traffic depends on the dynamic protocol converge time.

2

In the specific scenario where the failed ESXi or Chassis contained both the Edge E1 and the active DLR, we would instead face a longer outage in the forwarded traffic.

The reason for this is that the active DLR is down and cannot detect the failure of the Edge E1 and accordingly update the Controller. The ESXi will continue to forward traffic to Edge E1 until the passive DLR becomes active, learns that the Edge E1 is down and updates the NSX Controller.

3

The Golden Rule is:

We must ensure that when the Edge Services Gateway and the DLR Control VM belong to the same tenant they will not reside in the same ESXi host. It is better to distribute them between ESXi hosts and reduce the affected functions.

By default when we deploy a NSX Edge or DLR in active/passive mode, the system takes care of creating a DRS anti-affinity rule and this prevents the active/passive VMs from running in the same ESXi host.

DRS anti affinity rules

DRS anti affinity rules

We need to build new DRS rules as these default rules will not prevent us from getting to the previous dual failure scenario.

The figure below describes the network logical view for our specific example. This topology is built from two different tenants where each tenant is being represented with a different color and has its own Edge and DLR.

Note connectivity to the physical world is not displayed in the figure below in order to simplify the diagram.

multi tenants

My physical Edge Cluster has four ESXi hosts which are distributed over two physical chassis:

Chassis A: esxcomp-01a, esxcomp-02a

Chassis B: esxcomp-01b, esxcomp-02b

4

Create DRS Host Group for each Chassis

We start with creating a container for all the ESXi hosts in Chassis A, this container group configured is in DRS Host Group.

Edge Cluster -> Manage -> Settings -> DRS Groups

Click on Create Add button and call this group “Chassis A”.

Container type need to be “Host DRS Group” and Add ESXi host running on Chassis A (esxcomp-01a and esxcomp-02a).

5

Create another DRS group called Chassis B that contains esxcomp-01b and esxcomp-02b:

6

 

VM’s DRS Group for Chassis A:

We need to create a container for VMs that will run in Chassis A. At this point we just name it as Chassis A, but we are not actually putting the VMs in Chassis A.

This Container type is “VM DRS Group”:

7

VM DRS Group for Chassis B:

8

 

At this point we have four DRS groups:

9

DRS Rules:

Now we need to take the DRS object we created before: “Chassis A” and “VM to Chassis A “ and tie them together. The next step is to do the same for “Chassis B” and “VM to Chassis B“

* This configuration needs to be part of “DRS Rules”.

Edge Cluster -> Manage -> Settings -> DRS Rules

Click on the Add button in DRS Rules, in the name enter something like: “VM’s Should Run on Chassis A”

In the Type select “Virtual Machine to Hosts” because we want to bind the VM’s group to the Hosts Group.

In the VM group name choose “VM to Chassis A” object.

Below the VM group selection we need to select the group & hosts binding enforcement type.

We have two different options:

“Should run on hosts in group” or “Must run on hosts in group”

If we choose “Must” option, in the event of the failure of all the ESXi hosts in this group (for example if Chassis A had a critical power outage), the other ESXi hosts in the cluster (Chassis B) would not be considered by vSphere HA as a viable option for the recovery of the VMs. “Should” option will take other ESXi hosts as recovery option.

10

 

Same for Chassis B:

11

Now the problem with the current DRS rules and the VM placement in this Edge cluster is that the Edge and DLR Control VM are actually running in the same ESXi host.  We need to create anti-affinity DRS rules.

Anti-Affinity Edge and DLR:

An Edge and DLR that belong to the same tenant should not run in the same ESXi host.

For Green Tenant:

12

For Blue Tenant:

13

The Final Result:

In the case of a failure of one of the ESXi hosts we don’t face the problem where Edge and DLR are on the same ESXi host, even if we have a catastrophic event of a chassis A or B failure.

15

 

Note:

Control VM location can move to compute cluster and we can avoid this design consideration.

Thanks to Max Ardica and  Tiran Efrat for reviewing this post.

 

Posted in Design, DLR, Edge, Install Tagged with: , , , ,

NSX-v Troubleshooting L2 Connectivity

In this blog post we describe the methodology to troubleshoot L2 connectivity within the same Logical switch L2 segment.

Some of the steps here can and should be done via NSX GUI,vRealize Operations Manager 6.0 and vRealize Log Insight,  so see it like education post.

There are lots of CLI commands in this post :-). To view the output of CLI command you can scroll right.

 

High level approach to solve L2 problems:

1. Understand  the problem.

2. Know your network topology.

3. Figure out  if is its configuration issue.

4. Check  if the problem within the physical space or logical space.

5. Verify NSX control plane from ESXi hosts and NSX Controllers.

6. Move VM to different ESXi host.

7. Start to Capture traffic in right spots.

 

Understand the Problem

VM’s on same logical switch 5001 are  unable to communicate .

show the problem:

web-sv-01a:~ # ping 172.16.10.12
PING 172.16.10.12 (172.16.10.12) 56(84) bytes of data.
^C
--- 172.16.10.12 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms

 

Know your network topology:

TSHOT1

VM’s: web-sv-01a and  web-sv-02a  reside in different compute resource  esxcomp-01a and esxcomp-02a respectively.

web-sv-01a: IP: 172.16.10.11,  MAC: 00:50:56:a6:7a:a2

web-sv-02a: IP:172.16.10.12, MAC: 00:50:56:a6:a1:e3

 

Validate network topology

I know its sounds stupid, let’s make sure that VM’s actually reside in the right esxi host and connected to right VXLAN.

Verify VM “web-sb-01a” is actually reside in “escomp-01a“:

From esxcomp-01a run the command esxtop then press “n” (Network):

esxcomp-01a # esxtop
   PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          8.41    0.02     437.81    3.17   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          5.87    0.01       1.76    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.01       0.98    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.39    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.20    0.00       0.39    0.00   0.00   0.00
  50331656 35669:db-sv-01a.eth0     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331657 35888:web-sv-01a.eth     vmnic0 DvsPortset-0          4.89    0.01       3.72    0.01   0.00   0.00
  50331658          vdr-vdrPort     vmnic0 DvsPortset-0          2.15    0.00       0.00    0.00   0.00   0.00

In line 12 we can see that “web-sv-01a.eth0” is shown, another imported information is has “Port-ID“.

The “Port-ID” is unique identifier for each virtual switch port , in our example web-sv-01a.eth0 as Port-ID “50331657″.

Find the vDS name:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan list
VDS ID                                           VDS Name      MTU  Segment ID     Gateway IP     Gateway MAC        Network Count  Vmknic Count
-----------------------------------------------  -----------  ----  -------------  -------------  -----------------  -------------  ------------
3b bf 0e 50 73 dc 49 d8-2e b0 df 20 91 e4 0b bd  Compute_VDS  1600  192.168.250.0  192.168.250.2  00:50:56:09:46:07              4             1

From Line 4 vDS name is “Compute_VDS

Verify “web-sv-01a.eth0″ Connect to VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331657  68                   0
      50331658  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331657 this port ID is the Same port ID of VM web-sv-01a.eth0

Verification in esxcomp-01b:

esxcomp-01b esxtop
  PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554433           Management        n/a vSwitch0              0.00    0.00       0.00    0.00   0.00   0.00
  50331649           Management        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331650               vmnic0          - DvsPortset-0          6.54    0.01     528.31    4.06   0.00   0.00
  50331651     Shadow of vmnic0        n/a DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331652                 vmk0     vmnic0 DvsPortset-0          2.77    0.00       1.19    0.00   0.00   0.00
  50331653                 vmk1     vmnic0 DvsPortset-0          0.59    0.00       0.40    0.00   0.00   0.00
  50331654                 vmk2     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331655                 vmk3     vmnic0 DvsPortset-0          0.00    0.00       0.00    0.00   0.00   0.00
  50331656 35663:web-sv-02a.eth     vmnic0 DvsPortset-0          3.96    0.01       3.57    0.01   0.00   0.00
  50331657          vdr-vdrPort     vmnic0 DvsPortset-0          2.18    0.00       0.00    0.00   0.00   0.00

From Line 11 we can see that “web-sv-02a.eth0” has Port-ID “50331656“.

Verify “web-sv-02a.eth0″ Connect to VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
Switch Port ID  VDS Port ID  VMKNIC ID
--------------  -----------  ---------
      50331656  69                   0
      50331657  vdrPort              0

From Line 4 we have VM connect to VXLAN 5001 to port ID 50331656

At this point we verify are VM’s located as draw in topology. now start with actual TSHOOT steps.

Is the problem in the physical network ?

Our first step will be to find out  if the problem is in the physical space or logical space.

TSHOT2

The easy way to find out is by ping from VTEP in esxcomp-01a to VTEP in esxcomp-01b, before ping let’s find out the VTEP IP address.

esxcomp-01a # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type         
vmk0       16                  IPv4      192.168.210.51                          255.255.255.0   192.168.210.255 00:50:56:09:08:3e 1500    65535     true    STATIC       
vmk1       26                  IPv4      10.20.20.51                             255.255.255.0   10.20.20.255    00:50:56:69:80:0f 1500    65535     true    STATIC       
vmk2       35                  IPv4      10.20.30.51                             255.255.255.0   10.20.30.255    00:50:56:64:70:9f 1500    65535     true    STATIC       
vmk3       44                  IPv4      192.168.250.51                          255.255.255.0   192.168.250.255 00:50:56:66:e2:ef 1600    65535     true    STATIC

From Line 6 we can tell that VTEP IP address for VMK3(MTU is 1600) is 192.168.250.51.

Another command to find VTEP IP address is:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  44                     0        0  192.168.250.51  255.255.255.0                   0                      0  192.168.250.0

Same commands in esxcomp-01b:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP              Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID
-----------  --------------  -----------  -----------  -------  --------------  -------------  ------------------  ---------------------  -------------
vmk3               50331655  46                     0        0  192.168.250.53  255.255.255.0                   0                      0  192.168.250.0

VTEP IP for esxcomp-01b is 192.168.250.53. now let’s add this info to our  topology.

 

TSHOT3

Checks for VXLAN Routing:

NSX use use different IP stack for VXLAN  traffic,so we need to verify if default gateway is configured correctly for VXLAN traffic.

From esxcomp-01a:

esxcomp-01a # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default        0.0.0.0        192.168.250.2  vmk3       MANUAL
192.168.250.0  255.255.255.0  0.0.0.0        vmk3       MANUAL

From esxcomp-01b:

esxcomp-01b # esxcli network ip route ipv4 list -N vxlan
Network        Netmask        Gateway        Interface  Source
-------------  -------------  -------------  ---------  ------
default        0.0.0.0        192.168.250.2  vmk3       MANUAL
192.168.250.0  255.255.255.0  0.0.0.0        vmk3       MANUAL

My two ESXi hosts in VTEP IP address space for this LAB work on same L2 segment, both VTEP have same default gateway.

Ping from VTEP in esxcomp-01a to VTEP located in esxcomp-02a.

Source ping will be from VXLAN IP stack with packet size of 1570 and don’t fragment bit set to 1.

esxcomp-01a #  ping ++netstack=vxlan 192.168.250.53 -s 1570 -d
PING 192.168.250.53 (192.168.250.53): 1570 data bytes
1578 bytes from 192.168.250.53: icmp_seq=0 ttl=64 time=0.585 ms
1578 bytes from 192.168.250.53: icmp_seq=1 ttl=64 time=0.936 ms
1578 bytes from 192.168.250.53: icmp_seq=2 ttl=64 time=0.831 ms

--- 192.168.250.53 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.585/0.784/0.936 ms

Ping is successfully.

If ping with “-d” don’t work and without “-d” work its MTU problem. Check for MTU in the physical switch’s

Because VXLAN in this example in the same L2 we can view ARP entry for others VTEP’s:

From esxcomp-01a:

esxcomp-01a # esxcli network ip neighbor list -N vxlan
Neighbor        Mac Address        Vmknic    Expiry  State  Type
--------------  -----------------  ------  --------  -----  -----------
192.168.250.52  00:50:56:64:f4:25  vmk3    1173 sec         Unknown
192.168.250.53  00:50:56:67:d9:91  vmk3    1171 sec         Unknown
192.168.250.2   00:50:56:09:46:07  vmk3    1187 sec         Autorefresh

Look like our physical layer is not the issue.

 

Verify NSX control plane

During NSX host preparation NSX Manager install  VIB agents called User World Agent (UWA) inside ESXi hosts.

The process responsible to communicate with NSX controller called netcpad.

ESXi host using VMkernel Management interface to create this secure channel over TCP/1234, traffic is encrypted with SSL.

Part of the information netcpad send to NSX Controller is:

VM’s: MAC, IP.

VTEP: MAC, IP.

VXLAN: the VXLAN Id’s

Routing: Routes learn from the DLR Control VM. (explain in next post).

TSHOT4

Base on this information the Controller learn the network state and build directory services.

To learn how the Controller Cluster works and how fix problem in the cluster itself  NSX Controller Cluster Troubleshooting .

For two VM’s to be able to talk to each others we need working control plane. In this lab we have 3 NSX controller.

Verification command need to done from both ESXi  and Controllers side.

NSX controllers IP address: 192.168.110.201, 192.168.110.202, 192.168.110.203

Control Plane verification from ESXi point of view:

Verify esxcomp-01a have ESTABLISHED connection to NSX Controllers. (grep 1234  to show only TCP port 1234 ).

esxcomp-01a # esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.51:54153  192.168.110.202:1234  ESTABLISHED     35185  newreno  netcpa-worker
tcp         0       0  192.168.210.51:34656  192.168.110.203:1234  ESTABLISHED     34519  newreno  netcpa-worker
tcp         0       0  192.168.210.51:41342  192.168.110.201:1234  ESTABLISHED     34519  newreno  netcpa-worker

Verify esxcomp-01b have ESTABLISHED connection to NSX Controllers:

esxcomp-01b # esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.56:16580  192.168.110.202:1234  ESTABLISHED     34517  newreno  netcpa-worker
tcp         0       0  192.168.210.56:49434  192.168.110.203:1234  ESTABLISHED     34678  newreno  netcpa-worker
tcp         0       0  192.168.210.56:12358  192.168.110.201:1234  ESTABLISHED     34516  newreno  netcpa-worker

Example of problem with communication from ESXi host to NSX Controllers:

esxcli network ip  connection list | grep 1234
tcp         0       0  192.168.210.51:54153  192.168.110.202:1234  TIME_WAIT           0
tcp         0       0  192.168.210.51:34656  192.168.110.203:1234  FIN_WAIT_2      34519  newreno
tcp         0       0  192.168.210.51:41342  192.168.110.201:1234  TIME_WAIT           0

If we can’t see ESTABLISHED connection check:

1. IP connectivity from ESXi host to all NSX controllers.

2. If you have firewall between ESXi host to NSX controllers, TCP/1234 need to be open.

3. Is netcpad is running on ESXi host:

/etc/init.d/netcpad status
netCP agent service is not running

start netcpad:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running

If netcpad is not running start with command:

esxcomp-01a #/etc/init.d/netcpad start
Memory reservation set for netcpa
netCP agent service starts

Verify again:

esxcomp-01a # /etc/init.d/netcpad status
netCP agent service is running

 

Verify in esxcomp-01a Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            2                0                0
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.201 (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                3                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.203 (up)            1                2                0

Verify in esxcomp-01b Control Plane is Enable and connection is up state for VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
VXLAN ID  Multicast IP               Control Plane                        Controller Connection  Port Count  MAC Entry Count  ARP Entry Count
--------  -------------------------  -----------------------------------  ---------------------  ----------  ---------------  ---------------
    5001  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.201 (up)            2                3                0
    5000  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                0                0
    5002  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.203 (up)            1                2                0
    5003  N/A (headend replication)  Enabled (multicast proxy,ARP proxy)  192.168.110.202 (up)            1                0                0

Check esxcomp-01a learn ARP of remote VM’s VXLAN 5001:

esxcomp-01a # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------
172.16.10.12  00:50:56:a6:a1:e3  00001101

From this output we can understand that esxcomp-01a learn the ARP info of  web-sv-02a

Check esxcomp-01b learn ARP  for remote VM’s VXLAN 5001:

esxcomp-01b # esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
IP            MAC                Flags
------------  -----------------  --------
172.16.10.11  00:50:56:a6:7a:a2  00010001

From this output we can understand that esxcomp-01b learn the ARP info of  web-sv-01a

What we can tell at this point.

esxcomp-01a:

Know web-sv-01a is VM running in VXLAN 5001, his ip 172.16.10.11 and MAC address : 00:50:56:a6:7a:a2.

The communication to Controller’s cluster is UP for VXLAN 5001.

esxcomp-01b:

Know web-sv-01b is VM running in VXLAN 5001, his ip 172.16.10.12 and MAC address: 00:50:56:a6:a1:e3

The communication to Controller’s cluster is UP for VXLAN 5001.

So why web-sv-01a can’t take to web-sv-02a ?

the answer to this question is an another question: what the NSX  controller know ?

Control Plane verification from NSX Controller point of view:

We have 3 active controller, one of then is elected to manage VXLAN 5001. Remember slicing ?

Find out who is manage VXLAN 5001, SSH to one of the NSX controllers, for example 192.168.110.202:

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001     192.168.110.201 Enabled         Enabled   0           0

Line 3 say that 192.168.110.201 is manage VXLAN 5001, so the next command will run from 192.168.110.201:

nsx-controller # show control-cluster logical-switches vni 5001
VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs
5001     192.168.110.201 Enabled         Enabled   6           4

From this output we learn that VXLAN 5001 have 4 VTEP connected to him and total of 6 active connection.

At this point i would like to point you for excellent blogger with lots of information of what is happen under the hood in NSX.

His name is Dmitri Kalintsev. link to his blog: NSX for vSphere: Controller “Connections” and “VTEPs”

From Dimitri Post:

“ESXi host joins a VNI in two cases:

  1. When a VM running on that host connects to VNI’s dvPg and its vNIC transitions into “Link Up” state; and
  2. When DLR kernel module on that host needs to route traffic to a VM on that VNI that’s running on a different host.”

We are not route traffic between VM’s, DLR is not  part of the game here.

Find out VTEP IP address connected to VXLAN 5001:

nsx-controller # show control-cluster logical-switches vtep-table 5001
VNI      IP              Segment         MAC               Connection-ID
5001     192.168.250.53  192.168.250.0   00:50:56:67:d9:91 5
5001     192.168.250.52  192.168.250.0   00:50:56:64:f4:25 3
5001     192.168.250.51  192.168.250.0   00:50:56:66:e2:ef 4
5001     192.168.150.51  192.168.150.0   00:50:56:60:bc:e9 6

From this output we can learn that both VTEP’s esxcomp-01a line 5  and esxcomp-01b line 3 are seen by NSX Controller on VXLAN 5001.

The MAC address output in this comments are VTEP’s MAC.

Find out that MAC address of the VM’s has learn by NSX Controller:

nsx-controller # show control-cluster logical-switches mac-table 5001
VNI      MAC               VTEP-IP         Connection-ID
5001     00:50:56:a6:7a:a2 192.168.250.51  4
5001     00:50:56:a6:a1:e3 192.168.250.53  5
5001     00:50:56:8e:45:33 192.168.150.51  6

Line 3 show MAC of web-sv-01a, line 4 show MAC of web-sv-02a

Find out that ARP entry of the VM’s has learn by NSX Controller:

 

nsx-controller # show control-cluster logical-switches arp-table 5001
VNI      IP              MAC               Connection-ID
5001     172.16.10.11    00:50:56:a6:7a:a2 4
5001     172.16.10.12    00:50:56:a6:a1:e3 5
5001     172.16.10.10    00:50:56:8e:45:33 6

Line 3,4 show the exact IP/MAC of  web-sv-01a and  web-sv-02a

To understand how Controller have learn this info read my post NSX-V IP Discovery

Some time restart the netcpad process can fix problem between ESXi host and NSX Controllers.

esxcomp-01a # /etc/init.d/netcpad restart
watchdog-netcpa: Terminating watchdog process with PID 4273913
Memory reservation released for netcpa
netCP agent service is stopped
Memory reservation set for netcpa
netCP agent service starts

Summary of controller verification:

NSX Controller Controller know where VM’s is located, their  ip address and MAC address. it’s seem like Control plane work just fine.

 

Move VM to different ESXi host

In NSX-v each ESXi host has its one UWA service daemon part of the management and control  plane, sometimes when UWA not working as expected VMs on this ESXi host will have connectivity issue.

The fast way to check it, is to vMotion none working VMs  from one ESXi host to different, it VMs start to work we need to focus on the none working ESXi host control plane.

In this scenario even i vMotion my VM to different ESXi host the problem didn’t go away.

 

Capture in the right spots:

pktcap-uw command allow to capture traffic in so many places in NSX environments.

before start to capture all over the place, lets think where we think the problem is.

When VM connect to Logical switch there are few security services that pack a transverse, each service represent with different slot id.

TSHOT5

SLOT 0 : implement vDS Access List.

SLOT 1: Switch Security module (swsec) capture DHCP Ack and ARP message, this info then forward to NSX Controller.

SLOT2: NSX Distributed Firewall.

We need Check if VM traffic successfully pass  after NSX Distributed firewall, that mean in slot 2.

The capture command will need to SLOT 2 filter name for Web-sv-01a

From esxcomp-01a:

esxcomp-01a # summarize-dvfilter
~~~snip~~~~
world 35888 vmm0:web-sv-01a vcUuid:'50 26 c7 cd b6 f3 f4 bc-e5 33 3d 4b 25 5c 62 77'
 port 50331657 web-sv-01a.eth0
  vNic slot 2
   name: nic-35888-eth0-vmware-sfw.2
   agentName: vmware-sfw
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Dynamic Filter Creation
  vNic slot 1
   name: nic-35888-eth0-dvfilter-generic-vmware-swsec.1
   agentName: dvfilter-generic-vmware-swsec
   state: IOChain Attached
   vmState: Detached
   failurePolicy: failClosed
   slowPathID: none
   filter source: Alternate Opaque Channel

We can see in line4 that VM name is web-sv-01a, in line  5 that filter applied at slot 2 and in line 6 we have the filter name: nic-35888-eth0-vmware-sfw.2

pktcap-uw command help with -A output:

esxcomp-01a # pktcap-uw -A
Supported capture points:
        1: Dynamic -- The dynamic inserted runtime capture point.
        2: UplinkRcv -- The function that receives packets from uplink dev
        3: UplinkSnd -- Function to Tx packets on uplink
        4: Vmxnet3Tx -- Function in vnic backend to Tx packets from guest
        5: Vmxnet3Rx -- Function in vnic backend to Rx packets to guest
        6: PortInput -- Port_Input function of any given port
        7: IOChain -- The virtual switch port iochain capture point.
        8: EtherswitchDispath -- Function that receives packets for switch
        9: EtherswitchOutput -- Function that sends out packets, from switch
        10: PortOutput -- Port_Output function of any given port
        11: TcpipDispatch -- Tcpip Dispatch function
        12: PreDVFilter -- The DVFIlter capture point
        13: PostDVFilter -- The DVFilter capture point
        14: Drop -- Dropped Packets capture point
        15: VdrRxLeaf -- The Leaf Rx IOChain for VDR
        16: VdrTxLeaf -- The Leaf Tx IOChain for VDR
        17: VdrRxTerminal -- Terminal Rx IOChain for VDR
        18: VdrTxTerminal -- Terminal Tx IOChain for VDR
        19: PktFree -- Packets freeing point

capture command have support to sniff traffic in interesting points, with PreDVFilter and PostDVFilter line 14,15 can sniffing traffic before or after filtering action.

Capture after SLOT 2 filter:

pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap
The session capture point is PostDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_after.pcap
No server port specifed, select 784 as the port
Local CID 2
Listen on port 784
Accept...Vsock connection from port 1049 cid 2
Destroying session 25

Dumped 0 packet to file web-sv-01a_after.pcap, dropped 0 packets.

PostDVFilter = capture after the filter name.

–proto=01x capture only icmp packet.

–dvfilter = filter name as it show from summarize-dvfilter command.

-o = where to capture the traffic.

From output of this command line 12 we can tell ICMP packet are not pass this filters because we have 0 Dumped packet.

We found our smoking gun 🙂

Now capture before SLOT 2 filter.

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap

pktcap-uw –capture PreDVFilter –dvfilter nic-35888-eth0-vmware-sfw.2 –proto=0x1 -o web-sv-01a_before.pcap
The session capture point is PreDVFilter
The name of the dvfilter is nic-35888-eth0-vmware-sfw.2
The session filter IP protocol is 0x1
The output file is web-sv-01a_before.pcap
No server port specifed, select 5782 as the port
Local CID 2
Listen on port 5782
Accept...Vsock connection from port 1050 cid 2
Dump: 6, broken : 0, drop: 0, file err: 0Destroying session 26

Dumped 6 packet to file web-sv-01a_before.pcap, dropped 0 packets.

Now we can see at line 6 that we have Dumped packet. we can open web-sv-01a_before.pcap  captured  file:

esxcomp-01a # tcpdump-uw -r web-sv-01a_before.pcap
reading from file web-sv-01a_before.pcap, link-type EN10MB (Ethernet)
20:15:31.389158 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18628, length 64
20:15:32.397225 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18629, length 64
20:15:33.405253 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18630, length 64
20:15:34.413356 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18631, length 64
20:15:35.421284 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18632, length 64
20:15:36.429219 IP 172.16.10.11 > 172.16.10.12: ICMP echo request, id 3144, seq 18633, length 64

Walla, NSX dFW block the traffic.

And now from NSX GUI:

TSHOT6

Looking back on this article can be skipped intentionally step 3 “Configuration issue”.

If we were checked configuration settings, we immediately notice this problem.

 

 

Summary of all CLI Commands for this post:

ESXI Commands:

esxtop
esxcfg-vmknic -l
esxcli network vswitch dvs vmware vxlan list
esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5001
esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=Compute_VDS
esxcli network ip route ipv4 list -N vxlan
esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS
esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5001
esxcli network ip connection list | grep 1234
ping ++netstack=vxlan 192.168.250.53 -s 1570 -d
/etc/init.d/netcpad (status|start|)
pktcap-uw --capture PostDVFilter --dvfilter nic-35888-eth0-vmware-sfw.2 --proto=0x1 -o web-sv-01a_after.pcap

 

NSX Controller Commands:

show control-cluster logical-switches vni 5001
show control-cluster logical-switches vtep-table 5001
show control-cluster logical-switches mac-table 5001
show control-cluster logical-switches arp-table 5001

 

Posted in Controller, Install, Troubleshooting Tagged with: ,

Working with NSXv API

NSX RESTful API allow to interact with NSX manager, read the state of the NSX and change configuration. Some of NSX task can only be done via API and not GUI.

The protocol using to communication with the NSX Manager is HTTPS, over secure channel we send request and response data in XML format.

In this blog post we will use Mozilla web browser as REST client.

 

Install Mozilla REST Client:

REST Client can download from:

https://addons.mozilla.org/en-US/firefox/addon/restclient/

To install the REST Client Click on the “Open Menu” Button and Then click on the “Add-ons” button:

API3

Click on “Install add-on from file”:

API4

Import the REST Client file from the folder you download and click “Install”.

API5

Note: Mozilla browser will restart

To open the REST Client click on the Red button:

API6

One time configuration of REST Client:

Choose “Basic authentication”

API75

Type in NSX Credential for NSX Manager and Click Okay.

API65

Select from menu “Headers” and  “Custom Header”

Type the Username and Password for NSX Manager and Click Okay.

API7

In name type “Content-Type”.

In Value type: “application/xml”

Click Save to favorite and Okay

API8

Now we can start to work with REST Client.

There are few different Method to read and configure the NSX Manager.

API111

The frequent tasks we will work with REST Client are:

GET: to read information from NSX Manager

POST: to change configuration in NSX Manager

DELETE: to DELETE configuration in the NSX Manager

 

In the URL we need to type the NSX Manager URL:

https://nsxmgr-l-01a.corp.local/

Our first exercise will be to configure Logical Switch.

Create Logical Switch with API

To do this we will need to know which transport zone will use this logical switch.

In my lab we have two transport zone Global-Transport-Zone and DMZ-Transport-Zone.

API16

NSX as different scope ID to represent different transport zone.

Assuming we would like to configure the logical switch in DMZ-Transport-Zone we need to find out what is the Transport zone scope.

Finding this scope can be done with API call:

https://nsxmgr-l-01a.corp.local/api/2.0/vdn/scopes

Click on the red “SEND” button.

The “status code” 200 mean we successfully get the API information

API17

After click on the Response Body we got results for DMZ-Transport-Zone we get vdnsope-2

API18

For Global-Transport-Zone we get vdnsope-1

API19

Since we want to configure this logical switch on DMZ-Transport-Zone we will use vdnsope-2 in the POST API Call.

Create Logical switch in DMZ-transport-Zone:

The Method: POST

URL: https://nsxmgr-l-01a.corp.local/api/2.0/vdn/scopes/vdnscope-2/virtualwires

Body is the Payload of the actual configuration.

Body type:

<name>DMZ-Logical-Switch-01</name>
<description>Created via REST API</description>
<tenantId>virtual wire tenant</tenantId>
<controlPlaneMode>UNICAST_MODE</controlPlaneMode>
</virtualWireCreateSpec>

 

“415 Unsupported Media Type”

The reason for this is typo in “content type”

API20

My typo configuration is: Con: application/xml instead of writing Content-Type: application/xml

API13

Another issue can lead to same error:

Ensure that the SSL certificate used by NSX Manager is accepted by the client [browser or fat client] before making the call. By default, NSX Manager uses self-signed certificate that is accepted by the user. In such cases, ensure that the NSX Manager certificate is accepted before making the REST API call

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2094596

 

The REST APIs use a HTTP response message to inform clients of their request’s result (as defined in RFC 2616). 5 Categories Defined:

1xx: Informational Communicates transfer protocol-level information.

2xx: Success indicates that the client’s request was accepted successful.

3xx: Redirection indicates that the client must take some additional action in order to complete their request.

4xx: Client Error This category of error status codes points the finger at clients.

5xx: Server Error The server takes responsibility for these error status codes.

 

After fixing this typo and click on “SEND” we get status 201

API14

Each logical switch as unique virtual wire id:

API21

Form NSX GUI, Logical Switch tab we can see we successfully crested the Logical switch.

API15

Link to official API guide

http://pubs.vmware.com/NSX-61/topic/com.vmware.ICbase/PDF/nsx_61_api.pdf

 

 

Posted in API Tagged with: ,
Roie Ben Haim

Roie Ben Haim

Roie Ben Haim is a Senior Member of Technical Staff who specializes in Networking and Security at VMware and who is currently focused on implementing solutions, which incorporate VMware’s NSX platform as well as integrating with various Cloud platforms on VMware’s infrastructure. Roie works in VMware’s Consulting (PSO) team whose focus is on the delivery of Networking Virtualization and Security solutions. In this role Roie provides technical leadership in all aspects, including the installation, configuration, and implementation of VMware’s products and services. This is also includes being involved from the inception of these project, through requirements assessment, design and deployment phases and then into production which ensures continuity for VMware’s customers. Roie has over a 15 years of experience working on data center technologies, and providing solutions for global enterprises, which primarily focus on Network and Security. A highly motivated and enthusiastic MSc graduate Roie holds a wide range of industry leading certificates, including his most recent Network Virtualization (VCDX-NV). Cisco CCIE x2 (DC/SEC) and Juniper JNCIE-SP. Roie is not only a strong team member, but is also able to demonstrate his skills and experience working in various fields. As a well known and respected blogger, Roie maintains an impressive blog at: http://routetocloud.com

View Full Profile →

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 85 other subscribers