NSX Cross-VC Extensibility kit was created enhance the implementation with Cross vCenter mode.
The package covers 3 main use cases around Cross-VC NSX deployment:
- Recovery: Automating the recovery of NSX Components for disaster avoidance and unplanned disaster event.
- Security: Sync local security policy, group and tags from primary NSX Manager to secondary.
- Routing: Automate the local egress/ingress traffic for disaster avoidance and unplanned failover.
Each use case is covered by a separated workflows and can run independently or combined.
Note: This expendability kit is release under “community support” mode and it’s provided as-is with no implied support or warranty.
This kit include the “NSX Cross-VC Extensibility.package” file.
This package has been tested and validated with the following software versions:
- NSX 6.2.1
- vSphere 6
(but it is expected to work with later versions of software components.)
For the recovery workflow to succeed we must create two rest host objects (for Primary and Secondary NSX Managers) in our vRO.
For each rest host we will need to make sure that the “Connection timeout” is increased from its default value 30 sec to 300 sec, and that “operation time out” is changed from 60 to 600 seconds.
If we will not change these settings the re-deployed of the NSX controller process will fail.
The following screenshots contains the changed values:
We will need to Initialize a few vRO Attributes before running any vRO workflows.
In the configuration tab click on the “NSX Cross-VC Extensibility” and configure the following attribute:
General Attributes (relevant to all use cases):
The following attributes are required to all use cases covered in this package.
We will need to define the user and password for the admin user of the NSX managers.
The vRO need to have two RESTHost attributes, Primary and Secondary NSX managers.
Recovery NSX Components use case
The workflows in this use case will automate the recovery of NSX components in case of disaster avoidance and unplanned disaster event, the workflow will take care of changing the roles of the NSX manager, deploy NSX controllers, re-deploy UDLR if needed and update the controllers state.
In the initial state before running this workflow we will assume that we already have one NSX manager in Primary role that have running nsx controllers and another nsx manager act as secondary on different site.
The following figure show example for initial state of the environment before running recovery process.
Disaster Avoidance Initial status:
Disaster Avoidance NSX Cross-VC failover:
In this scenario the user wants to switch between the primary NSX to secondary NSX roles.
After finish running this workflow in the right side we will have the Primary NSX Manager with 3 NSX controller and UDLR control VM deployed.
In a scenario where the UDLR control VM already deployed in the secondary site ( like shown in the figure below) we don’t need to re-deploy it
As part of the recovery process we will need to deploy a new NSX controller in the secondary site.
The following attributes need to have a value set by the administrator before running the workflow:
The workflow we need to run to achieve this goal is: “Disaster Avoidance NSX Cross-VC – Main”, the following figure contains the workflow building blocks:
In this scenario the main site completely failed and we need to recover the NSX components at the secondary site:
The workflow that covered this scenario is: “Unplanned Recovery NSX Cross-VC – Main”.
We will need to update the same attributes we’ve shown before in order to successfully deploy the NSX controllers.
After running this workflows the Primary NSX manager and NSX controllers will run at the secondary site as shown in the figure below:
In an NSX deployment with Cross-VC feature used, the Universal security group is automatically synced between the primary and secondary NSX managers.
In a DR scenario where we want to work with local security groups where the classification criteria is NSX security tag, we will need to manually sync the groups between the NSX managers.
The main goal of this workflow is to automatically sync local security objects like NSX security tag, NSX security policy and security groups from the primary NSX manager to the secondary.
This workflow will only work for DR scenario where all of the active workloads located in the protected site and there are no active workloads at the recovery site. In other words, we can’t create NSX firewall rules between workloads at the protect site (security objects existence in NSX manager in the protected site only) to workload in the recovery site (security objects existence in NSX manager in the recovery site only).
The input parameter for this workflow is source vSphere folder where the source VMs located, and destination vSphere folder where the target VMs located. Normally in SRM deployment we already have this folders part of the resource mapping process.
The workflow is built from two major workflows, Sync Security tags and Sync Service Composer.
Sync Security tag:
This workflow will first sync all security tags names from the primary nsx manager to the secondary nsx manager.
If the security tag already exists on the secondary manager, the workflow will skip the sync for that security tag.
After completing this step the workflow will attach the security tag to the destination machines.
Sync service Composer:
This workflow will sync service composer objects from the Primary NSX manager to the secondary.
The sync will export the current service composer security policy and security groups from the primary NSX manager and then import them to the secondary NSX manager.
Security groups that we will sync must use security tag in dynamic criteria only.
The workflow will sync security Group and Security policy that have specific prefix name. That prefix name is determined by attribute name “SecurityDRPrefixName”.
In this example the workflow will sync security groups and security policies that starts with “DR_” or “dr_”.
Note: Before the import workflow occurs we will delete all security groups and security policies on the secondary NSX manager. The “delete workflow” is necessary in order the “import workflow” to succeed.
This workflow will automate the N-S routing egress/ingress part of the recovery process.
The solution is based on NSX Local-ID feature and is complementing the recovery process in VMware SRM.
In the initial status we have configured the Locale-ID on UDLR at the Protected and the Recovery clusters to be the same as the protected site. The L2 segment is span between the protected site to the recovery site, as consequence we need to control the route advertised to ensure single site ingress/egress.
We can control the ingress traffic by using allow/deny redistribution Prefix List on U-DLR Control VM.
At the protected site’s Control VM we will advertise routes from protected site by creating an allow Prefix list.
At the control VM on the recovery site we do not advertise routes from the protected site by creating a deny prefix list.
In this status only the control VM in the protected site has the local-id value, at the recovery site we’ve cleared this value.
This attribute defines the NSX Local ID for the routing. The ID can be any text in the UUID format. For example, XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,
where each X is replaced with a base 16 digit (0-F)
Defines the array of routing prefixes needed to be redistributed. The name must match the same name configured in the UDLR.
Array of the NSX prepared clusters in SiteA
Array of the NSX prepared clusters in SiteB
This attribute defines the UDLR control VM id.
“Disaster avoidance Egress via Site A – Main”
“Disaster avoidance Egress via Site B – Main”
“Unplanned Local Egress via Site A – Main”
“Unplanned Local Egress via Site B – Main”
The difference between planned an unplanned events.
Disaster avoidance event:
This workflow created of disaster avoidance, in this scenario both of the site are up and running but we would like to revert the north/south traffic to the other datacenter (ingress/egress).
So instead of all traffic flowing in/out via site A we are switching it to go via Site B.
In this scenario we are facing a complete site failure, we lost our primary site and we would like all North/South traffic to go via the recovery site.
Disaster Avoidance Local Egress via Site A/B – Main Usecase
Unplanned Local Egress Site A/B ucasecase
The following demos will show the NSX Cross-VC Extensibility kit in action.
Demonstrate the recoverability of the NSX components in Disaster Avoidance scenario with North/South routing switch between the datacenters:
Demonstrate the sync of the NSX local security objects between NSX Managers scenario:
Special thanks to Daniel Bakshi that help me a lots to review this blog post.