NSX Load Balancing

This next overview of Load Balancing  was taken from great work of Max Ardica and Nimish Desai in the official NSX Design Guide:

Overview

Load Balancing is another network service available within NSX that can be natively enabled on the NSX Edge device. The two main drivers for deploying a load balancer are scaling out an application (through distribution of workload across multiple servers), as well as improving its high-availability characteristics

NSX Load Balancing

NSX Load Balancing

The NSX load balancing service is specially designed for cloud with the following characteristics:

  • Fully programmable via API
  • Same single central point of management/monitoring as other NSX network services

The load balancing services natively offered by the NSX Edge satisfies the needs of the majority of the application deployments. This is because the NSX Edge provides a large set of functionalities:

  • Support any TCP applications, including, but not limited to, LDAP, FTP, HTTP, HTTPS
  • Support UDP application starting from NSX SW release 6.1.
  • Multiple load balancing distribution algorithms available: round-robin, least connections, source IP hash, URI
  • Multiple health checks: TCP, HTTP, HTTPS including content inspection
  • Persistence: Source IP, MSRDP, cookie, ssl session-id
  • Connection throttling: max connections and connections/sec
  • L7 manipulation, including, but not limited to, URL block, URL rewrite, content rewrite
  • Optimization through support of SSL offload

Note: the NSX platform can also integrate load-balancing services offered by 3rd party vendors. This integration is out of the scope for this paper.

In terms of deployment, the NSX Edge offers support for two types of models:

  • One-arm mode (called proxy mode): this scenario is highlighted in Figure below and consists in deploying an NSX Edge directly connected to the logical network it provides load-balancing services for.
One-Arm Mode Load Balancing Services

One-Arm Mode Load Balancing Services

The one-armed load balancer functionality is shown above:

  1. The external client sends traffic to the Virtual IP address (VIP) exposed by the load balancer.
  2. The load balancer performs two address translations on the original packets received from the client: Destination NAT (D-NAT) to replace the VIP with the IP address of one of the servers deployed in the server farm and Source NAT (S-NAT) to replace the client IP address with the IP address identifying the load-balancer itself. S-NAT is required to force through the LB the return traffic from the server farm to the client.
  3. The server in the server farm replies by sending the traffic to the LB (because of the S-NAT function previously discussed).

The LB performs again a Source and Destination NAT service to send traffic to the external client leveraging its VIP as source IP address.

The advantage of this model is that it is simpler to deploy and flexible as it allows deploying LB services (NSX Edge appliances) directly on the logical segments where they are needed without requiring any modification on the centralized NSX Edge providing routing communication to the physical network. On the downside, this option requires provisioning more NSX Edge instances and mandates the deployment of Source NAT that does not allow the servers in the DC to have visibility into the original client IP address.

Note: the LB can insert the original IP address of the client into the HTTP header before performing S-NAT (a function named “Insert X-Forwarded-For HTTP header”). This provides the servers visibility into the client IP address but it is obviously limited to HTTP traffic.

Inline mode (called transparent mode) requires instead deploying the NSX Edge inline to the traffic destined to the server farm. The way this works is shown in Figure below.

Two-Arms Mode Load Balancing Services

Two-Arms Mode Load Balancing Services

    1. The external client sends traffic to the Virtual IP address (VIP) exposed by the load balancer.
    2. The load balancer (centralized NSX Edge) performs only Destination NAT (D-NAT) to replace the VIP with the IP address of one of the servers deployed in the server farm.
    3. The server in the server farm replies to the original client IP address and the traffic is received again by the LB since it is deployed inline (and usually as the default gateway for the server farm).
    4. The LB performs Source NAT to send traffic to the external client leveraging its VIP as source IP address.

    This deployment model is also quite simple and allows the servers to have full visibility into the original client IP address. At the same time, it is less flexible from a design perspective as it usually forces using the LB as default gateway for the logical segments where the server farms are deployed and this implies that only centralized (and not distributed) routing must be adopted for those segments. It is also important to notice that in this case LB is another logical service added to the NSX Edge already providing routing services between the logical and the physical networks. As a consequence, it is recommended to increase the form factor of the NSX Edge to X-Large before enabling load-balancing services.

     

    In terms of scalability and throughput figures, the NSX load balancing services offered by each single NSX Edge can scale up to (best case scenario):

    • Throughput: 9 Gbps
    • Concurrent connections: 1 million
    • New connections per sec: 131k

     

    In below are some deployment examples of tenants with different applications and different load balancing needs. Notice how each of these applications is hosted on the same Cloud with the network services offered by NSX.

Deployment Examples of NSX Load Balancing

Deployment Examples of NSX Load Balancing

Two final important points to highlight:

  • The load balancing service can be fully distributed across This brings multiple benefits:
  • Each tenant has its own load balancer.
  • Each tenant configuration change does not impact other tenants.
  • Load increase on one tenant load-balancer does not impact other tenants load-balancers scale.
  • Each tenant load balancing service can scale up to the limits mentioned above.

Other network services are still fully available

  • The same tenant can mix its load balancing service with other network services such as routing, firewalling, VPN.

 

One Arm Load Balance Lab Topology

In this One Arm Load Balance Lab Topology we have a 3-tiers application built from:

Web servers: web-sv-01a (172.16.10.11), web-sv-02a (172.16.10.12)

App: app-sv-01a (172.16.20.11)

DB: db-sv-01a (172.16.30.11)

We will add to this lab NSX Edge service gateway (ESG) for load balancer function.

The ESG (highlighted with the red line) is deployed in one-arm mode and exposes the VIP 172.16.10.10 to load-balance traffic to the Web-Tier-01 segment.

One-Armed Lab topology

 

Configure One Arm Load Balance

Create NSX Edge gateway:

One-Arem-1

Select Edge Service Gateway (ESG):
One-Arem-2

Set the Admin password, enable SSH and Auto rule:

One-Arem-3

Install the ESG in Management Cluster:

One-Arem-4

In our lab appliance size is Compact, but we should choose the right size according to amount of traffic expected:

One-Arem-5

Configure the Edge interface and IP address; since this is one-arm mode we have only one interface:

One-Arem-6

Create default gateway

One-Arem-8

Configure default accept fw rule:

One-Arem-9

Complete the installation:

One-Arem-10

Verify ESG is deployed::

One-Arem-11

Enable Load Balance in the ESG, go to Load Balance and click Edit:

One-Arem-12

Check mark “Enable Load Balancer”

One-Arem-13

Create the application profile:

One-Arem-14

Add a name, in the Type select HTTPS and Enable SSL Passthrough:

One-Arem-15

Create the pool:

One-Arem-16

In the Algorithm select ROUND-ROBIN, monitor is default https, and add two servers member to monitor:

One-Arem-16h

To add Members click on the + icon, the port we monitor is 443:

One-Arem-17

We need then to create the VIP:

One-Arem-18

In this step we glue all the configuration parts, tie the application profile to pool and give it the Virtual IP address:

One-Arem-19

Now we can check that the load balancer is actually working by connecting to the VIP address with a client web browser.

In the web browser, we point to the VIP address 172.16.10.10.

The results is to hit 172.16.10.11 web-sv-01a:

One-Arem-verification-1

When we try to refresh our web browser client we see we hit 172.16.10.12 web-sv-02a :

One-Arem-verification-2

Troubleshooting One Arm Load Balance

General Loadbalancer troubleshooting workflow

Review the configuration through UI

Check the pool member status through UI

Do online troubleshooting via CLI:

  • Check LB engine status (L4/L7)
  • Check LB objects statistics (vips, pools, members)
  • Check Service Monitor status (OK, WARNING, CRITICAL)
  • Check system log message (# show log)
  • Check LB L4/L7 session table
  • Check LB L7 sticky-table status

 

Check the configuration through UI

 

 

One-Arem-TSHOT-1

 

  1. Check the pool member status through UI:

 

One-Arem-TSHOT-2

Possible errors discovered:

  1. 80/443 port might be used by other services (e.g. sslvpn);
  2. Member port and monitor port are miss configured hence health check failed.
  3. Member in WARNING state should be treated as DOWN.
  4. L4 LB is used when:
    a) TCP/HTTP protocol;
    b) no persistence settings and L7 settings;
    c) accelerateEnable is true;
  5. Pool is in transparent mode but Edge doesn’t sit in the return pat

Do online troubleshooting via CLI:

Check LB engine status (L4/L7)

# show service loadbalancer

Check LB objects statistics (vips, pools, members)

# show service loadbalancer virtual [vip-name]

# show service loadbalancer pool [poo-name]

Check Service Monitor status (OK, WARNING, CRITICAL)

# show service loadbalancer monitor

Check system log message

# show log

Check LB session table

# show service loadbalancer session

Check LB L7 sticky-table status

# show service loadbalancer table

 

 

One-Arm-LB-0> show service loadbalancer
<cr>
error Show loadbalancer Latest Errors information.
monitor Show loadbalancer HealthMonitor information.
pool Show loadbalancer pool information.
session Show loadbalancer Session information.
table Show loadbalancer Sticky-Table information.
virtual Show loadbalancer virtualserver information.

#########################################################

One-Arm-LB-0> show service loadbalancer
———————————————————————–
Loadbalancer Services Status:

L7 Loadbalancer : running
Health Monitor : running

#########################################################

One-Arm-LB-0> show service loadbalancer monitor
———————————————————————–
Loadbalancer HealthMonitor Statistics:

POOL                               MEMBER                                  HEALTH STATUS
Web-Servers-Pool-01  web-sv-02a_172.16.10.12   default_https_monitor:OK
Web-Servers-Pool-01  web-sv-01a_172.16.10.11   default_https_monitor:OK
One-Arm-LB-0>

##########################################################

One-Arm-LB-0> show service loadbalancer virtual
———————————————————————–
Loadbalancer VirtualServer Statistics:

VIRTUAL Web-Servers-VIP
| ADDRESS [172.16.10.10]:443
| SESSION (cur, max, total) = (0, 3, 35)
| RATE (cur, max, limit) = (0, 6, 0)
| BYTES in = (17483), out = (73029)
+->POOL Web-Servers-Pool-01
| LB METHOD round-robin
| LB PROTOCOL L7
| Transparent disabled
| SESSION (cur, max, total) = (0, 3, 35)
| BYTES in = (17483), out = (73029)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-01a_172.16.10.11, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 2, 8)
| | BYTES in = (8882), out = (43709)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-02a_172.16.10.12, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 1, 7)
| | BYTES in = (7233), out = (29320)

####################################################################
One-Arm-LB-0> show service loadbalancer pool
———————————————————————–
Loadbalancer Pool Statistics:

POOL Web-Servers-Pool-01
| LB METHOD round-robin
| LB PROTOCOL L7
| Transparent disabled
| SESSION (cur, max, total) = (0, 3, 35)
| BYTES in = (17483), out = (73029)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-01a_172.16.10.11, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 2, 8)
| | BYTES in = (8882), out = (43709)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-02a_172.16.10.12, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 1, 7)
| | BYTES in = (7233), out = (29320)

##########################################################################

One-Arm-LB-0> show service loadbalancer session
———————————————————————–
L7 Loadbalancer Current Sessions:

0x5fe50a2b230: proto=tcpv4 src=192.168.110.10:49392 fe=Web-Servers-VIP be=Web-Servers-Pool-01 srv=web-sv-01a_172.16.10.11 ts=08 age=8s calls=3 rq[f=808202h,i=0,an=00h,rx=4m53s,wx=,ax=] rp[f=008202h,i=0,an=00h,rx=4m53s,wx=,ax=] s0=[7,8h,fd=13,ex=] s1=[7,8h,fd=14,ex=] exp=4m52s
0x5fe50a22960: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=09 age=0s calls=2 rq[f=c08200h,i=0,an=00h,rx=20s,wx=,ax=] rp[f=008002h,i=0,an=00h,rx=,wx=,ax=] s0=[7,8h,fd=1,ex=] s1=[7,0h,fd=-1,ex=] exp=20s
———————————————————————–

 

Disconnect web-sv-01a_172.16.10.11 from the network

 

 

One-Arem-TSHOT-3

From the GUI we can see the effect in members pool status:

One-Arem-TSHOT-4

 

One-Arm-LB-0> show service loadbalancer virtual
———————————————————————–
Loadbalancer VirtualServer Statistics:

VIRTUAL Web-Servers-VIP
| ADDRESS [172.16.10.10]:443
| SESSION (cur, max, total) = (0, 3, 35)
| RATE (cur, max, limit) = (0, 6, 0)
| BYTES in = (17483), out = (73029)
+->POOL Web-Servers-Pool-01
| LB METHOD round-robin
| LB PROTOCOL L7
| Transparent disabled
| SESSION (cur, max, total) = (0, 3, 35)
| BYTES in = (17483), out = (73029)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-01a_172.16.10.11, STATUS: DOWN
| | STATUS = DOWN, MONITOR STATUS = default_https_monitor:CRITICAL
| | SESSION (cur, max, total) = (0, 2, 8)
| | BYTES in = (8882), out = (43709)
+->POOL MEMBER: Web-Servers-Pool-01/web-sv-02a_172.16.10.12, STATUS: UP
| | STATUS = UP, MONITOR STATUS = default_https_monitor:OK
| | SESSION (cur, max, total) = (0, 1, 7)
| | BYTES in = (7233), out = (29320)

NSX L2 Bridging

Overview

This next overview of L2 Bridging  was taken from great work of Max Ardica and Nimish Desai in the official NSX Design Guide:

There are several circumstances where it may be required to establish L2 communication between virtual and physical workloads. Some typical scenarios are (not exhaustive list):

  • Deployment of multi-tier applications: in some cases, the Web, Application and Database tiers can be deployed as part of the same IP subnet. Web and Application tiers are typically leveraging virtual workloads, but that is not the case for the Database tier where bare-metal servers are commonly deployed. As a consequence, it may then be required to establish intra-subnet (intra-L2 domain) communication between the Application and the Database tiers.
  • Physical to virtual (P-to-V) migration: many customers are virtualizing applications running on bare metal servers and during this P-to-V migration it is required to support a mix of virtual and physical nodes on the same IP subnet.
  • Leveraging external physical devices as default gateway: in such scenarios, a physical network device may be deployed to function as default gateway for the virtual workloads connected to a logical switch and a L2 gateway function is required to establish connectivity to that gateway.
  • Deployment of physical appliances (firewalls, load balancers, etc.).

To fulfill the specific requirements listed above, it is possible to deploy devices performing a “bridging” functionality that enables communication between the “virtual world” (logical switches) and the “physical world” (non virtualized workloads and network devices connected to traditional VLANs).

NSX offers this functionality in software through the deployment of NSX L2 Bridging allowing VMs to be connected at layer 2 to a physical network (VXLAN to VLAN ID mapping), even if the hypervisor running the VM is not physically connected to that L2 physical network.

L2 Bridge topology

 

Figure above shows an example of L2 bridging, where a VM connected in logical space to the VXLAN segment 5001 needs to communicate with a physical device deployed in the same IP subnet but connected to a physical network infrastructure (in VLAN 100). In the current NSX-v implementation, the VXLAN-VLAN bridging configuration is part of the distributed router configuration; the specific ESXi hosts performing the L2 bridging functionality is hence the one where the control VM for that distributed router is running. In case of failure of that ESXi host, the ESXi hosting the standby Control VM (which gets activated once it detects the failure of the Active one) would take the L2 bridging function.

Independently from the specific implementation details, below are some important deployment considerations for the NSX L2 bridging functionality:

  • The VXLAN-VLAN mapping is always performed in 1:1 fashion. This means traffic for a given VXLAN can only be bridged to a specific VLAN, and vice versa.
  • A given bridge instance (for a specific VXLAN-VLAN pair) is always active only on a specific ESXi host.
  • However, through configuration it is possible to create multiple bridges instances (for different VXLAN-VLAN pairs) and ensure they are spread across separate ESXi hosts. This improves the overall scalability of the L2 bridging function.
  • The NSX Layer 2 bridging data path is entirely performed in the ESXi kernel, and not in user space. Once again, the Control VM is only used to determine the ESXi host where a given bridging instance is active, and not to perform the bridging function.

 

 

Configure L2 Bridge

In this scenario we would like to Bridge Between App VM connected to VXLAN 5002 to virtual machine connected to VLAN 100.

Create Bridge 1

My current Logical Switch configuration:

Logical Switch table

We have pre-configured a VLAN-backed port group for VLAN 100:

Port group

Bridging configuration is done at the DLR level. In this specific example, the DLR name is Distributed-Router:

Double Click on the edge-1:

DLR1

 

Click on the Bridging and then green + button:

DLR2

Type Bridge Name, Logical Switch ID and Port-Group name:

DLR3

 

Click OK and Publish:

DLR4

 

Now VM on Logical Switch App-Tier-01 can communicate with Physical or virtual machine on VLAN 100.

 

Design Consideration

Currently in NSX-V 6.1 we can’t enable routing on the VXLAN logical switch that is bridged to a VLAN.

In other words, the default gateway for devices connected to the VLAN can’t be configured on the distributed logical router:

None working  L2 Bridge Topology

None working L2 Bridge Topology

So how can VM in VXLAN 5002 communicate with VXLAN 5001?

The big difference is VXLAN 5002 is no longer connected to the DLR LIF, but it is connected instead to the NSX Edge.

Working Bridge Topology

Redundancy

DLR Control VM can work in high availability mode, if the Active DLR control VM fails, the standby Control VM takes over, which means the Bridge instance will move to a new ESXi host location.

HA

 

Bridge Troubleshooting:

Most issues I ran into was that the bridged VLAN was missing on the trunk interface configured on the physical switch.

In the figure below:

  • Physical server is connected to VLAN 100, App VM connected to VXLAN 5002 in esx-01b.
  • Active DLR control VM is located at esx-02a, so the bridging function will be active in this ESXi host.
  • Both ESXi hosts have two physical nics: vmnic2 and vmnic3.
  • Transport VLAN carries all VNI (VXLAN’s) traffic and is forwarded on the physical switch in VLAN 20.
  • On physical switch-2 port E1/1 we must configure trunk port and allow both VLAN 100 and VLAN 20.

Bridge and Trunk configuration

Note: Port E1/1 will carry both VXLAN and VLAN traffic. 

 

 

 

Find Where Bridge is Active:

We need to know where the Active DLR Control VM is located (if we have HA). Inside this ESXi host the Bridging happens in kernel space. The easy way to find it is to look at “Configuration” section in the “Manage” tab.

Note: When we powered off the DLR Control VM (if HA is not enabled), the bridging function on this ESXi host will stop to prevent loop.

DLR5We can see that Control VM located in esx-02a.corp.local

SSH to this esxi host,  find the Vdr Name of the DLR Control VM:

xxx-xxx -I -l

VDR Instance Information :
—————————

Vdr Name: default+edge-1
Vdr Id: 1460487509
Number of Lifs: 4
Number of Routes: 5
State: Enabled
Controller IP: 192.168.110.201
Control Plane IP: 192.168.110.52
Control Plane Active: Yes
Num unique nexthops: 1
Generation Number: 0
Edge Active: Yes

Now we know that “default+edge-1” is the VDR name.

 

xxx-xxx -b –mac default+edge-1

###################################################################################################

~ # xxx-xxx -b –mac default+edge-1

VDR ‘default+edge-1’ bridge ‘Bridge_App_VLAN100’ mac address tables :
Network ‘vxlan-5002-type-bridging’ MAC address table:
total number of MAC addresses: 0
number of MAC addresses returned: 0
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
——————- ———— ——- ——– —————- —
Network ‘vlan-100-type-bridging’ MAC address table:
total number of MAC addresses: 0
number of MAC addresses returned: 0
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
——————- ———— ——- ——– —————- —

###################################################################################################

From this output we can see there is no any mac address learning ,

After connect VM to Logical Switch App-Tier-01 and ping VM in VLAN 100.

Now we can see mac address from both VXLAN 5002 and VLAN100:

Bridge TSHOOT

 

 

 

 

NSX Role Based Access Control

One of the most challenging problems in managing large networks is the complexity of security administration.

“Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise. In this context, access is the ability of an individual user to perform a specific task, such as view, create, or modify a file. Roles are defined according to job competency, authority, and responsibility within the enterprise”

Within NSX we have four built in roles, We can map User or Group to one of the NSX Role. but i think Instead of assigning roles to individual users the preferred way is to assigning role to group.

Organizations create user groups for proper user management. After integration with SSO, NSX Manager can get the details of groups to which a user belongs to.

NSX Roles

Within NSX Manager we have four pre built RBAC roles cover different nsx permission and area in NSX environment.

The four NSX built in roles are: Auditor, Security Administrator, NSX administrator and Enterprise Administrator:

NSX RBAC Diagram

NSX RBAC Diagram

Configure the Lookup Service in NSX Manager

Whenever we want to assign role on NSX, we can assign role to SSO User or Group. When Lookup service is not configured then the group based role assignment would not work i.e the user from that group would not be able to login to NSX.

The reason is we cannot fetch any group information from the SSO server. The group based authentication provider is only available when Lookup service is configured. User login where the user is explicitly assigned role on NSX will not be affected. This means that the customer has to individually assign roles to the users and would not be able to take advantage of SSO groups.

For NSX, vCenter SSO server is one of the identity provider for authentication. For authentication on NSX, prerequisite is that the user / group has to be assigned role on NSX.

NSX Manager Lookup Service

NSX Manager Lookup Service

Note: NTP/DNS must configure on the NSX Manager for lookup service to work.

Note: The domain account must have AD read permission for all objects in the domain tree. 

Configure Active Directory Groups

In this blog i will use Microsoft Active directory  as user Identity source.  in “Active Directory Users and Computers” i created four different groups. The groups will have the same name is the NSX roles to make life easier:

Auditor, Security Administrator, NSX Administrator, Enterprise Administrator.

AD Groups

AD Groups

We create four A/D users and Add each user to different A/D group. for example nsxadmin user:

the user nsxadmin is associate with the group NSX Administrator. the association done by the Add button:

AD user

AD user

Same way i will associate a others users to A/D groups:

username:     groups:

auditor1      ->  Auditor

secadmin ->   Security Administrator

nsxadmin ->  NSX Administrator

entadmin ->  Enterprise Administrator

Connect Directory Domain to NSX Manager.

Go to “Network & Security” tab double click on the “NSX Manager”

map ad to nsx manager role 1

map ad to nsx manager role 1

Double click on “192.168.110.42” icon:

map ad to nsx manager role 2

Note: Configure Domain is not needed for RBAC, only if we want to use identity firewall rules base of user or group.

Go to “Manage -> “Domains” -> Click on the green Plus button:

map ad to nsx manager role 8

Fill Name and NetBIOS name fields with appropriate information of your Domain Name and NetBIOS name:

In My example my domain name is corp.local:

map ad to nsx manager role 9

Enter LDAP (i.e AD) IP address or hostname and domain account (username and password):

map ad to nsx manager role 10

Configuring LDAP option task  can be done via direct API call to bypass the Event Log Access described in the next steps).

Click on next.

Event Log Access:

In case we need to create NSX firewall rule with user identity based on AD groups. We will need to allow the NSX Manager read Active Directory “Security Event Log”. This logs contain Active Directory users logon/logoff from to domain. We use this information to bind the AD user  to an IP address.

NSX need access to “Event Log” provide dFW with user identity in one of the two case:

  1. The user logon to VM that doesn’t running VMtools.
  2. The user logon to the domain from PC located on physical environment.

BTW users login to to VM with VMtools up and running , we do not need the “Security Event Log” to bind the user to IP.

Permissions for the user to read logon/logoff events:

Windows 2008 or later domain servers:

Add the account to the Event Log Readers group. If you are using the on-device

User-ID agent, the account must also be a member of the Distributed COM Users Group.

 

 Windows 2003 domain servers:

Assign Manage Auditing and

Security Logs permissions through group policy

In both of this cases NSX will need to access the AD with read permissions for security event logs, the protocol using to read this information are CIFS or WMI.

During this process NSX  collecting  the following microsoft event ID:

For windows 2008/2012 – Event ID: 4624

For Windows 2003 – Event ID: 540

NSX will “Copy” this Event access log and from A/D and parse the data inside the nsx manager appliance.

map ad to nsx manager role 11

Click Next and Finish:

map ad to nsx manager role 12

Mapping Active Directory  Groups to NSX Managers Roles

Note: This step is must for NSX RBAC to work. 

Now we can map Active Directory groups to pre-built NSX Manager roles.

Go to “Manage -> “Users” -> Click on the green Plus button:

map ad to nsx manager role 3

Here we can select if we want to map specific A/D user to NSX Role or A/D Group to Role.

map ad to nsx manager role 4

In this blog i will use A/D group, we create A/D group called auditor. The format to input here is:

“group_name”@domain.name.  let’s start with auditor group, this group is “Read Only” permission:

map ad to nsx manager role 5

Select one of the NSX Role, for Auditor A/D group we chose Auditor

map ad to nsx manager role 6

We can limit the scope this group can work inside nsx manager object, for this example there is no limit:

map ad to nsx manager role 7

Same way Map all others A/D groups to NSX Roles:

Auditor@corp.local                           – >  Auditor

Security Administrator@corp.local        -> Security Administrator

NSX Administrator@corp.local               -> NSX Administrator

Enterprise Administrator@corp.local     -> Enterprise Administrator

Try our first login with user Auditor1:

Login1

 The login successfull but where is the “Network & Security” tab gone ?

Login2

So far we configure all NSX Manager part but we didnt take care of the vCenter Configuration permission for that group. are you confusing ?

vCenter has is own Role for each group. we need to configure roles to etch A/D group we configured. These settings determine what the user can make the in vCenter environment.

Configure vCenter Roles:

Let’s start by configure the Auditor Role for Auditor A/D group. we know this group is for “Read Only” in the NSX Manager, so it will make sense to give this group “Read Only” to all other vCenter environment.

Go to vCenter -> Manage -> Permissions and click the green button:

vCenter Roles 1

We need to choose Roles from the Assigned Role, if we select No-Access we will not be able login to vCenter. So we need to choose something from “Read-Only” to “Administrator”

For Auditor Role “Read Only” is the Minimum.

Select “Read Only” from the Assigned Role drop down list and click on the “Add” button from “User and Group”:

vCenter Roles 2

From the Domain Select your Domain name, in our lab the domain is “CORP”, choose your Active Directory group from the list (Auditor for this example) and click the “Add” button:

vCenter Roles 3

Click Ok and Ok for Next Step:

vCenter Roles 4

Same way we need to configure all other groups roles:

vCenter Roles 5

Now we can try to login with auditor1 user:

auditor1

As we can see auditor1 is in “Read Only” role:

auditor2

We can  verify that auditor1 can’t change any other vCenter configuration:

auditor3

Test secadmin user map to “NSX Security” role, this user cannot Change any NSX infrastructure related task like create new  add new NSX Controller Node:

secadmin1

But secadmin can create new firewall rule:

secadmin2

When logging with nsxadmin user map to NSX Administrator Role we can see that the user can add new Controller Node:

nsxadmin1

But nsxadmin user cannot change or see any firewall rules configure :

nsxadmin2

What if the user member of two A/D Group ?

The user will gain combined permission access of both of the groups.

For example: the user memberof “Auditor” group and “NSX Security”, the results will be user will have read only permission on all nsx infrastructure and also gain access to all security related area in NSX.

Summery

In this post we demonstrate the NSX manager different roles. We configure Microsoft Active Directory as External database source for user’s identity.

VMware NSX Edge Scale Out with Equal-Cost Multi-Path Routing

This post was written by Roie Ben Haim and Max Ardica, with a special thanks to Jerome Catrouillet, Michael Haines, Tiran Efrat and Ofir Nissim for their valuable input

The modern data center design is changing, following a shift in the habits of consumers using mobile devices, the number of new applications that appear every day and the rate of end-user browsing which has grown exponentially. Planning a new data center requires meeting certain fundamental design guidelines. The principal goals in data center design are: Scalability, Redundancy and High-bandwidth.

In this blog we will describe the Equal Cost Multi-Path functionality (ECMP) introduced in VMware NSX release 6.1 and discuss how it addresses the requirements of scalability, redundancy and high bandwidth. ECMP has the potential to offer substantial increases in bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for failed paths. This is a feature which is available on physical networks but we are now introducing this capability for virtual networking as well. ECMP uses a dynamic routing protocol to learn the next-hop towards a final destination and to converge in case of failures. For a great demo of how this works, you can start by watching this video, which walks you through these capabilities in VMware NSX.

 

https://www.youtube.com/watch?v=Tz7SQL3VA6c

 

Scalability and Redundancy and ECMP

To keep pace with the growing demand for bandwidth, the data center must meet scale out requirements, which provide the capability for a business or technology to accept increased volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and replace” of the existing physical infrastructure in order to keep up with the growing demands of the applications. Data centers running business critical applications need to achieve near 100 percent uptime. In order to achieve this goal, we need the ability to quickly recover from failures affecting the main core components. Recovery from catastrophic events needs to be transparent to end user experiences.

ECMP with VMware NSX 6.1 allows you to use upto a maximum of 8 ECMP Paths simultaneously. In a specific VMware NSX deployment, those scalability and resilience improvements are applied to the “on-ramp/off-ramp” routing function offered by the Edge Services Gateway (ESG) functional component, which allows communication between the logical networks and the external physical infrastructure.

ECMP Topology

ECMP Topology

 

External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-E8) to reach the virtual servers (Web, App, DB).

In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router (DLR), which can choose up to 8 different paths to get to the core network.

How the Path is Determined

NSX for vSphere Edge Services Gateway device:

When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets related to this flow by sending them through the same path. Once the next-hop is selected for a particular Source IP and Destination IP pair, the route cache stores this. Once a path has been chosen, all packets related to this flow will follow the same path.

There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this period of time, it is then eligible to be removed from route cache. Note that these settings can be tuned for your environment.

Distributed Logical Router (DLR):

The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.

 

What happens in case of a failure on one of Edge Devices?

In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP. If we take OSPF for example, the main factor influencing the traffic outage experience is the tuning of the

OSPF timers.

OSPF will send hello messages between neighbors, the OSPF “Hello” protocol is used and determines the Interval as to how often an OSPF Hello is sent.

Another OSPF timer called “Dead” Interval is used, which is how long to wait before we consider an OSPF neighbor as “down”. The OSPF Dead Interval is the main factor that influences the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP) timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to speed up the traffic recovery.

 

ECMP failed Edge

ECMP failed Edge

 

In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as Dead at the expiration of the Dead timer and remove their OSPF neighborship with him. As a consequence, the DLR and the physical router remove the routing table entries that originally pointed to the specific next-hop IP address of the failed ESG.

As a result, all corresponding flows on the affected path are re-hashed through the remaining active units. It’s important to emphasize that network traffic that was forwarded across the non-affected paths remains unaffected.

 

Troubleshooting and visibility

With ECMP it’s important to have introspection and visibility tools in order to troubleshoot optional point of failure. Let’s look at the following topology.

TSHOT

TSHOT

A user outside our Data Center would like to access the Web Server service inside the Data Center. The user IP address is 192.168.100.86 and the web server IP address is 172.16.10.10.

This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with E1 and E2 (the Edge devices). As a result R1 will learn how to get to the Web server from both E1 and E2 and will get two different active paths towards 172.16.10.10. R1 will pick one of the paths to forward the traffic to reach the Web server and will advertise the user network subnet 192.168.100.0/24 to both E1 and E2 with OSPF.

E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR. E1 and E2 will learn how to get to the Web server via OSPF control plane communication with the DLR.

From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network.
From the DLR we can verify OSPF adjacency with E1, E2.

We can use the command: “show ip ospf neighbor”

show ip ospf neighbor

show ip ospf neighbor

From this output we can see that the DLR has two Edge neighbors: 198.168.100.3 and 192.168.100.10.The next step will be to verify that ECMP is actually working.

We can use the command: “show ip route”

show ip route

show ip route

The output from this command shows that the DLR learned the user network 192.168.100.0/24 via two different paths, one via E1 = 192.168.10.1 and the other via E2 = 192.168.10.10.

Now we want to display all the packets which were captured by an NSX for vSphere Edge interface.

In the example below and in order to display the traffic passing through interface vNic_1, and which is not OSPF protocol control packets, we need to type this command:
“debug packet display interface vNic_1 not_ip_proto_ospf”

We can see an example with a ping running from host 192.168.100.86 to host 172.16.10.10

Capture traffic

Capture traffic

If we would like to display the captured traffic to a specific ip address 172.16.10.10, the command capture would look like: “debug packet display interface vNic_1 dst_172.16.10.10”

debug packet display interface vNic_1 dst

debug packet display interface vNic_1 dst

* Note: When using the command “debug packter display interface” we need to add underscore between the expressions after the interface name.

Useful CLI for Debugging ECMP

To check which ECMP path is chosen for a flow

  • debug packet display interface IFNAME

To check the ECMP configuration

  • show configuration routing-global

To check the routing table

  • show ip route

To check the forwarding table

  • show ip forwarding

 

Useful CLI for Dynamic Routing

  • show ip ospf neighbor
  • show ip ospf database
  • show ip ospf interface
  • show ip bgp neighbors
  • show ip bgp

ECMP Deployment Consideration

ECMP currently implies stateless behavior. This means that there is no support for stateful services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway.

Starting from 6.1.2 Edge Firewall not disabled automatic on ESG when ECMP is enabled, turn off Firewall when enable ECMP.

In the current NSX 6.1 release, the Edge Firewall and ECMP cannot be turned on at the same time on NSX edge device. Note however, that the Distributed Firewall (DFW) is unaffected by this.

 

About the authors:

Roie Ben Haim

Roie works as a professional services consultant at VMware, focusing on design and implementation of VMware’s software-defined data center products.  Roie has more than 12 years in data center architecture, with a focus on network and security solutions for global enterprises. An enthusiastic M.Sc. graduate, Roie holds a wide range of industry leading certifications including Cisco CCIE x2 # 22755 (Data Center, CCIE Security), Juniper Networks JNCIE – Service Provider #849, and VMware vExpert 2014, VCP-NV, VCP-DCV.

Max Ardica

Max Ardica is a senior technical product manager in VMware’s networking and security business unit (NSBU). Certified as VCDX #171, his primary task is helping to drive the evolution of the VMware NSX platform, building the VMware NSX architecture and providing validated design guidance for the software-defined data center, specifically focusing on network virtualization. Prior to joining VMware, Max worked for almost 15 years at Cisco, covering different roles, from software development to product management. Max owns also a CCIE certification (#13808).

 

 

vExpert 2014

I’m glad I was chosen to be vExpert 2014 program.

vEXPERT 2014

vEXPERT 2014

Each year, we bring together in the vExpert Program the people who have made some of the most important contributions to the VMware community.

These are the bloggers, book authors, VMUG leaders, speakers, tool builders, community leaders and general enthusiasts.

They work as IT admins and architects for VMware customers, they act as trusted advisors and implementors for VMware partners or as independent consultants, and some work for VMware itself. All of them have the passion and enthusiasm for technology and applying technology to solve problems.

They have contributed to the success of us all by sharing their knowledge and expertise over their days, nights, and weekends.

The list of vEXPERT for 2014 Q3 can be found here:

vExpert 2014 Q3 Announcement