NSX Cross-VC Extensibility kit

Overview:

NSX Cross-VC Extensibility kit was created enhance the implementation with Cross vCenter mode.

Introduction  and deep dive to NSX Cross-VC can be found in Amazing work of Humair Ahmed in this link.

The package covers 3 main use cases around Cross-VC NSX deployment:

  • Recovery: Automating the recovery of NSX Components for disaster avoidance and unplanned disaster event.
  • Security: Sync local security policy, group and tags from primary NSX Manager to secondary.
  • Routing: Automate the local egress/ingress traffic for disaster avoidance and unplanned failover.

Each use case is covered by a separated workflows and can run independently or combined.

 

Note: This expendability kit is release under “community support” mode and it’s provided as-is with no implied support or warranty.

This kit include the “NSX Cross-VC Extensibility.package” file.

This package has been tested and validated with the following software versions:

  • NSX 6.2.1
  • vSphere 6

(but it is expected to work with later versions of software components.)

Prerequisites

For the recovery workflow to succeed we must create two rest host objects (for Primary and Secondary NSX Managers) in our vRO.

For each rest host we will need to make sure that the “Connection timeout” is increased from its default value 30 sec to 300 sec, and that “operation time out” is changed from 60 to 600 seconds.

If we will not change these settings the re-deployed of the NSX controller process will fail.

The following screenshots contains the changed values:

HTTP ReST Host

vRO Configuration

We will need to Initialize a few vRO Attributes before running any vRO workflows.

In the configuration tab click on the “NSX Cross-VC Extensibility” and configure the following attribute:

 

General Attributes (relevant to all use cases):

The following attributes are required to all use cases covered in this package.

We will need to define the user and password for the admin user of the NSX managers.

The vRO need to have two RESTHost attributes, Primary and Secondary NSX managers.

 

General Attribute:

General Attributes

General Attributes

Recovery NSX Components use case

The workflows in this use case will automate the recovery of NSX components in case of disaster avoidance and unplanned disaster event, the workflow will take care of changing the roles of the NSX manager, deploy NSX controllers, re-deploy UDLR if needed and update the controllers state.

In the initial state before running this workflow we will assume that we already have one NSX manager in Primary role that have running nsx controllers and another nsx manager act as secondary on different site.

The following figure show example for initial state of the environment before running recovery process.

Disaster Avoidance Initial status:

Recovery NSX Components

Disaster Avoidance NSX Cross-VC failover:

In this scenario the user wants to switch between the primary NSX to secondary NSX roles.

After finish running this workflow in the right side we will have the Primary NSX Manager with 3 NSX controller and UDLR control VM deployed.

In a scenario where the UDLR control VM already deployed in the secondary site ( like shown in the figure below) we don’t need to re-deploy it

Cross-VC After

As part of the recovery process we will need to deploy a new NSX controller in the secondary site.

The following attributes need to have a value set by the administrator before running the workflow:

Recovery attributes:

Recovery attributes

The workflow we need to run to achieve this goal is: “Disaster Avoidance NSX Cross-VC – Main”, the following figure contains the workflow building blocks:

picture7

 Unplanned Recovery:

In this scenario the main site completely failed and we need to recover the NSX components at the secondary site:

The workflow that covered this scenario is: “Unplanned Recovery NSX Cross-VC – Main”.

We will need to update the same attributes we’ve shown before in order to successfully deploy the NSX controllers.

Unplanned Recovery NSX Cross-VC – Main

After running this workflows the Primary NSX manager and NSX controllers will run at the secondary site as shown in the figure below:

Unplanned Recovery

 Security usecase:

In an NSX deployment with Cross-VC feature used, the Universal security group is automatically synced between the primary and secondary NSX managers.

In a DR scenario where we want to work with local security groups where the classification criteria is NSX security tag, we will need to manually sync the groups between the NSX managers.

The main goal of this workflow is to automatically sync local security objects like NSX security tag, NSX security policy and security groups from the primary NSX manager to the secondary.

This workflow will only work for DR scenario where all of the active workloads located in the protected site and there are no active workloads at the recovery site. In other words, we can’t create NSX firewall rules between workloads at the protect site (security objects existence in NSX manager in the protected site only) to workload in the recovery site (security objects existence in NSX manager in the recovery site only).

The input parameter for this workflow is source vSphere folder where the source VMs located, and destination vSphere folder where the target VMs located. Normally in SRM deployment we already have this folders part of the resource mapping process.

sync local security objects

The workflow is built from two major workflows, Sync Security tags and Sync Service Composer.

picture12

Sync Security tag:

This workflow will first sync all security tags names from the primary nsx manager to the secondary nsx manager.

If the security tag already exists on the secondary manager, the workflow will skip the sync for that security tag.

After completing this step the workflow will attach the security tag to the destination machines.

Sync service Composer:

This workflow will sync service composer objects from the Primary NSX manager to the secondary.

service composer

The sync will export the current service composer security policy and security groups from the primary NSX manager and then import them to the secondary NSX manager.

Security groups that we will sync must use security tag in dynamic criteria only.

The workflow will sync security Group and Security policy that have specific prefix name. That prefix name is determined by attribute name  “SecurityDRPrefixName”.

In this example the workflow will sync security groups and security policies that starts with “DR_” or “dr_”.

DR

Note: Before the import workflow occurs we will delete all security groups and security policies on the secondary NSX manager. The “delete workflow” is necessary in order the “import workflow” to succeed.

 

Routing usecase

This workflow will automate the N-S routing egress/ingress part of the recovery process.

The solution is based on NSX Local-ID feature and is complementing the recovery process in  VMware SRM.

In the initial status we have configured the Locale-ID on UDLR at the Protected and the Recovery clusters to be the same as the protected site. The L2 segment is span between the protected site to the recovery site, as consequence we need to control the route advertised to ensure single site ingress/egress.

Routing

We can control the ingress traffic by using allow/deny redistribution Prefix List on U-DLR Control VM.

At the protected site’s Control VM we will advertise routes from protected site by creating an allow Prefix list.

At the control VM on the recovery site we do not advertise routes from the protected site by creating a deny prefix list.

In this status only the control VM in the protected site has the local-id value, at the recovery site we’ve cleared this value.

Routing attribute:

Routing Attribute picture16

Routing_LocalID:

This attribute defines the NSX Local ID for the routing. The ID can be any text in the UUID format. For example, XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,

where each X is replaced with a base 16 digit (0-F)

Routing_prefixListName:

Defines the array of routing prefixes needed to be redistributed. The name must match the same name configured in the UDLR.

Routing_SiteA_Clusters:

Array of the NSX prepared clusters in SiteA

Routing_SiteB_Clusters:

Array of the NSX prepared clusters in SiteB

Routing_UDLRID:

This attribute defines the UDLR control VM id.

 

Routing Workflows:

“Disaster avoidance Egress via Site A – Main”

“Disaster avoidance Egress via Site B – Main”

“Unplanned Local Egress via Site A – Main”

“Unplanned Local Egress via Site B – Main”

The difference between planned an unplanned events.

Disaster avoidance event:

This workflow created of disaster avoidance, in this scenario both of the site are up and running but we would like to revert the north/south traffic to the other datacenter (ingress/egress).

So instead of all traffic flowing in/out via site A we are switching it to go via Site B.

Unplanned event:

In this scenario we are facing a complete site failure, we lost our primary site and we would like all North/South traffic to go via the recovery site.

 

Disaster Avoidance Local Egress via Site A/B – Main Usecase

Disaster Avoidance Local Egress via Site A/B – Main Usecase

 

Unplanned Local Egress Site A/B ucasecase

Unplanned Local Egress Site A/B ucasecase

 

The following  demos will show the NSX Cross-VC Extensibility kit in action.

Demonstrate the recoverability of the NSX components in Disaster Avoidance scenario with North/South routing switch between the datacenters:

Demonstrate the sync of the NSX local security objects between NSX Managers scenario:

 

Special thanks  to Daniel Bakshi that help me a lots to review this blog post.

NSX Dual Active/Active Datacenters BCDR

Overview

The modern data center design requires better redundancy and demands the ability to have Business Continuity (BC) and Disaster Recovery (DR) in case of catastrophic failure in our datacenter. Planning a new data center with BCDR requires meeting certain fundamental design guidelines.

In this blog post I will describe the Active/Active datacenter with VMware Full SDDC product suite.

The NSX running in Cross-vCenter mode, this ability introduced in VMware NSX release 6.2.x. In this blog post we will focus on network and security.

An introduction and overview blog post can be found in this link:

http://blogs.vmware.com/consulting/2015/11/how-nsx-simplifies-and-enables-true-disaster-recovery-with-site-recovery-manager.html

The goals that we are trying to achieve in this post are:

  1. Having the ability to deploy workloads with vRA on both of the datacenters.
  2. Provide Business Continuity in case of a partial of a full site failure.
  3. Having the ability to perform planned or unplanned migration of workloads from one datacenter to another.

To demonstrate the functionality of this design I’ve created demo ‘vPOD’ in VMware internal cloud with the following products in each datacenter:

  • vCenter 6.0 with ESXi host 6.0
  • NSX 6.2.1
  • vRA 6.2.3
  • vSphere Replication 6.1
  • SRM 6.1
  • Cloud Client 3.4.1

In this blog post I will not cover the recovery part of the vRA/vRO components, but this could be achieved with a separated SRM instance for the management infrastructure.

Environment overview

I’m adding short video to introduce the environment.

NSX Manager

The NSX manager in Site A will have the IP address of 192.168.110.15 and will be configured as primary.

The NSX Manager in site B will be configured with the IP 192.168.210.15 and is set as secondary.

Each NSX manager pairs with its own vCenter and learns its local inventory. Any configuration change related to the cross site deployment will run at the primary NSX manager and will be replicated automatically to the remote site.

 

Universal Logical Switch (ULS)

Creating logical switches (L2) between sites with VxLAN is not new to NSX, however starting from version 6.2.X we’ve introduced the ability of stretching the L2 between NSX managers paired to different vCenters. This new logical switch is known as a ‘Universal Logical Switch’ or ‘ULS’. Any new ULS we will create in the Primary NSX Manger will be synced to the secondary.

I’ve created the following ULS in my Demo vPOD:

Universal Logical Switch (ULS)

Universal Distributed Logical Router (UDLR)

The concept of a Distributed Logical Router is still the same as it was before NSX 6.2.x. The new functionally that was added to this release allows us to configure Universal Distributed Logical Router (UDLR).  When we deploy a UDLR it will show up in all NSX Managers Universal Transport Zone.

The following UDLR created was created:

Universal Distributed Logical Router (UDLR)

Universal Security Policy with Distributed Firewall (UDFW)

With version 6.2.x we’ve introduced the universal security group and universal IP-Set.

Any firewall rule configured in the Universal Section must be IP-SET or Security Group that contain IP-SET.

When we are configuring or changing Universal policy, automatically there is a sync process that runs from the primary to the secondary NSX manager.

The recommended way to work with an ipset is to add it to a universal security group.

The following Universal security policy is an example to allow communicating to 3-Tier application. The security policy is built from universal security groups. Each group contain IP-SET with the relevant IP address for each tier.

Universal Security Policy with Distributed Firewall (UDFW)

vRA

At the automation side we’re creating two unique machine blueprints per site. The MBP are based on Classic CentOS image that allows us to perform some connectivity tests.

The MBP named “Center-Site_A” will be deployed by vRA to Site A into the green ULS named: ULS_Green_Web-A.

The IP address pool configured for this ULS is 172.16.10.0/24.

The MBP named “Center-Site_B” will be deployed by vRA to Site B into the blue ULS named: ULS_Blue_Web-B.

The IP address pool configured for this ULS is 172.17.10.0/24

vRA Catalog

Cloud Client:

To quote from VMware Official documentation:

“Typically, a vSphere hosted VM managed by vRA belongs to a reservation, which belongs to a compute resource (cluster), which in turn belongs to a vSphere Endpoint. The VMs reservation in vRA needs to be accurate in order for vRA to know which vSphere proxy agent to utilize to manage that VM in the underlying vSphere infrastructure. This is all well and good and causes few (if any) problems in a single site setup, as the VM will not normally move from the vSphere endpoint it is originally located on.

With a multi-site deployment utilizing Site Recovery Manager all this changes as part of the site to site fail over process involves moving VMs from one vCenter to another. This has the effect in vRA of moving the VM to a different endpoint, but the reservation becomes stale. As a result it becomes no longer possible to perform day 2 operation on the VMs until the reservation is updated.”

When we failover VMs from Site A to Site B cloud client will run the following action behind the science to solve this challenge.

Process Flow for Planned Failover:

Process Flow for Planned Failover

The Conceptual Routing Design with Active/Active Datacenter

The main key point for this design is to run Active/Active for workloads in both datacenters.

The workloads will reside on both Site A and Site B. In the modern datacenter the entry point is protected with perimeter firewall.

In our design each site has its on perimeter firewall run independently FW_A located in Site A and FW_B Located in Site B.
Site A (Shown in Green color) run its own ESGs (Edge Security Gateways), Universal DLR (UDLR) and Universal Logical Switch (ULS).

Site B site (shown in Blue color) have different ESGs, Universal DLR (UDLR) and Universal Logical Switch (ULS).

The main reason for the different ESG, UDLR and ULS per site is to force single ingress/egress point for workload traffic per site.

Without this ingress/egress deterministic traffic flow, we may face asymmetric routing between the two sites, that means that ingress traffic will be via Site A to FW_A and egress via Site B to FW_B, this asymmetric traffic will dropped by the FW_B.

Note: The ESGs in this blog run in ECMP mode, As a consequence we turned off the firewall service on the ESGs.

The Green network will always will be advertise via FW_A.  For an example The Control VM (IP 192.168.110.10) shown in the figure below need to access the Green Web VM connected to the ULS_Web_Green_A , the traffic  from the client will be routed via Core router and to FW_A, from there to one of the ESG working in ECMP mode, then to the Green UDLR and finally to the Green Web VM itself.

Now Assume the same client would like to access the Blue Web VM connected to ULS_Web_Blue_B, this traffic will be routed via the Core router to FW_B, from there to one of the Blue ESG working in ECMP mode, to the Blue ULDR and at the end to the Blue VM itself.

Routing Design with Active/Active Datacenter

What is the issue with this design?

What will happen if we will face a complete failure in one of our Edge Clusters or FW_A?

For our scenario I’ve combined failures of the Green Edge cluster and FW_A in the image below.

In that case we will lose all our N-S traffic to all of our ULS behind this Green Edge Cluster.

As a result, all clients outside the SDDC will lose connectivity immediately to all of the green Green ULS.

Please note: forwarding traffic to the Blue ULS will continue to work in this event regardless of the failure in Site A.

 

PIC7

If we’ll have a stretched vSphere Edge cluster between Site A and Site B, then we will able to leverage vSphere HA to restart the failed Green ESGs in the remote Blue site (This is not the case here, in our design each site has its own local cluster and storage), but even if we had vSphere HA, the restart process can take few minutes. Another way to recover from this failure is to manually deploy Green ESGs in Site B, and connect them to Site B FW_B. The recovery time of this solution could take few minutes. Both of these options are not suitable for modern datacenter design.

In the next paragraph I will introduce a new way to design the ESGs in Active/Active datacenter architecture.

This design will be much faster and will work in a more efficient way to recover from such an event in Site A (or Site B).

Active/Active Datacenter with mirrored ESGs

In this design architecture we will be deploying mirrored Green ESGs in Site B, and blue mirrored ESGs into Site A. Under normal datacenter operation the mirrored ESGs will be up and running but will not forward traffic. Site-A green ULS traffic from external clients will always enter via Site A ESGs (E1-Green-A , E2-Green-A) for all of Site A Prefix and leave through the same point.

Adding the mirrored ESGs add some complexity in the single Ingres/Egress design, but improves the converge time of any failure.

PIC8How Ingress Traffic flow works in this design?

Now we will explain how the Ingress traffic flow works in this architecture with mirrored ESGs. In order to simplify the explanation, we will be focusing only on the green flow in both of the datacenters and remove the blue components from the diagrams but the same explanation works for the Blue Site B network as well.

Site A Green UDLR control VM runs eBGP protocol with all Green ESGs (E1-Green-A to E4-Green-B). The UDLR Redistributes all connected interfaces as Site A prefix via eBGP. Note: “Site A prefix” represent any Green Segments part of the green ULS.

The Green ESGs (E1-Green-A  to E4-Green-B) sends out via BGP Site-A’s prefix to both physical firewalls: FW_A located in Site A and FW_B located Site B.

FW_B in Site B will add BGP AS prepending for Site A prefix.

From the Core router point of view, we’ll have two different paths to reach Site A Prefix: one via FW_A (Site A) and the second via FW_B (Site B). Under normal operation, this traffic will flow only through Site A because of the fact that Site B prepending for prefix A.

PIC9

Egress Traffic

Egress traffic is handled by UDLR control VM with different BGP Weigh values.

Site A ESGs: E1-Green-A and E2-Green-A has mirrors ESGs: E3-Green-B and E4-Green-B located at Site B. The mirrors ESGs provide availability. Under normal operation The UDLR Control VM will always prefer to route the traffic via higher BGP Wight value of E1-Green-A and E2-Green-A.  E3-Green-B and E4-Green-B will not forward any traffic and will wait for E1-E2 to fail.

In the figure below, we can see Web workload running on Site A ULS_Green_A initiate traffic to the Core. This egress traffic pass through DLR Kernel module, trough E1-Green-A ESG and then forward to Site A FW_A.

PIC10

There are other options for ingress/egress within NSX 6.2:

Great new feature called ‘Local-ID’. Hany Michael wrote a blog post to cover this option.

In Hany’s blog we don’t have a firewall like in my design so please pay attention to few minor differences.

http://www.networkskyx.com/2016/01/06/introducing-the-vmware-nsx-vlab-2-0/

Anthony Burke wrote a blog post about how to use local-id with physical firewall

https://networkinferno.net/ingress-optimisation-with-nsx-for-vsphere

Routing updates

Below, we’re demonstrating routing updates for Site-A, but the same mechanism works for Site B. The Core router connected to FW_A in Site A will peer with the FW_A via eBGP.

The core will send out 0/0 Default gateway.

FW_A will perform eBGP peering with both E1-Green-A and E2-Green-A. FW_A will forward the 0/0 default gateway to Green ESGs and will receive Site A green Prefix’s from Green ESGs. The Green ESGs E1-Green-A and E2-Green-A peers in eBGP with UDLR control VM.

The UDLR and the ESGs will work in ECMP mode, as results the UDLR will get 0/0 from both ESGs. The UDLR will redistribute connected interfaces (LIFs) to both green ESGs.

We can work with iBGP or eBGP  or mix from the UDLR – > ESG ->  physical routers.

In order to reduce the eBGP converge time of Active UDLR control VM failure, we will configure flowing static route in all of the Green side to point to UDLR forwarding address for the internal LIF’s.

Routing filters will apply on all ESGs to prevent unwanted prefixes advertisement and EGSs becoming transit gateways.

PIC11

Failure of One Green ESG in Site A

The Green ESGs: E1-Green-A and E2-Green-A working in ECMP mode. From UDLR and FW_A point of view both of the ESG work in Active/Active mode.

As long as we have at least one active Green ESG in Site A, The Green UDLR and the Core router will always prefer to work with Site A Green ESGs.

Let’s assume we have active flow of traffic from the Green WEB VM in site A to the external client behind the core router, and this traffic initially passing through via E1-Green-A. In and event of failure of E1-Green-A ESG, the UDLR will reroute the traffic via E2-Green-ESG because this ESG has better weight then Green ESGs on site B (E3-Green-B and E4-Green-B).

FW_A is still advertising a better as-path to ‘ULS_Web_Green_A’ prefixes than FW_B (remember FW_B always prepending Site_A prefix).

We’ll use low BGP time interval settings (hello=1 sec, hold down=3 sec) to improve BGP converge routing.

 

PIC12

Complete Edges cluster failure in site A

In this scenario we face a failure of all Edge cluster in Site A (Green ESGs and Blue ESGs), this issue might include the failure of FW_A.

Core router we will not be receiving any BGP updates from the Site A, so the core will prefer to go to FW_B in order to reach any Site A prefix.

From the UDLR point of view there arn’t any working Green ESGs in Site A, so the UDLR will work with the remaining green ESGs in site B (E3-Green-B, E4-Green-B).

The traffic initiated from the external client will be reroute via the mirrored green ESGs (E3-Green-B and E4-GreenB) to the green ULS in site B. The reroute action will work very fast based on the BGP converge routing time interval settings (hello=1 sec, hold down=3 sec).

This solution is much faster than other options mentioned before.

Same recovery mechanism exists for failure in Site B datacenter.

PIC13

Note: The Green UDLR control VM was deployed to the payload cluster and isn’t affected by this failure.

 

Complete Site A failure:

In this catastrophic scenario all components in site A were failed. Including the management infrastructure (vCenter, NSX Manager, controller, ESGs and UDLR control VM). Green workloads will face an outage until they are recovered in Site B, the Blue workloads continues to work without any interference.

The recovery procedure for this event will be made for the infrastructure management/control plan component and for the workloads them self.

Recovery the Management/control plan:

  • Log in to secondary NSX Manager and then Promote Secondary NSX Manager to Primary by: Assign Primary Role.
  • Deploy new Universal Controller Cluster and synchronize all objects
  • Universal CC configuration pushed to ESXi Hosts managed by Secondary
  • Redeploying the UDLR Control VM.

The recovery procedure for the workloads will run the “Recovery plan” from SRM located in site B.

PIC14

 

Summery:

In this blog post we are demonstrating the great power of NSX to create Active/Active datacenter with the ability to recover very fast from many failure scenarios.

  • We showed how NSX simplifies Disaster Recovery process.
  • NSX and SRM Integration is the reasonable approach to DR where we can’t use stretch vSphere cluster.
  • NSX works in Cross vCenter mode. Dual vCenters and NSX managers improving our availability. Even in the event of a complete site failure we were able to continue working immediately in our management layer (Seconday NSX manager and vCenter are Up and running).
  • In this design, half of our environment (Blue segments) wasn’t affected by a complete site failure. SRM recovered our failed Green workloads without need to change our Layer 2/ Layer 3 networks topology.
  • We did not use any specific hardware to achieve our BCDR and we were 100% decupled from the physical layer.
  • With SRM and vRO we were able to protect any deployed VM from Day 0.

 

I would like to thanks to:

Daniel Bakshi that help me a lots to review this blog post.

Also Thanks Boris Kovalev and Tal Moran that help to with the vRA/vRO demo vPOD.