NSX Dual Active/Active Datacenters BCDR

Overview

The modern data center design requires better redundancy and demands the ability to have Business Continuity (BC) and Disaster Recovery (DR) in case of catastrophic failure in our datacenter. Planning a new data center with BCDR requires meeting certain fundamental design guidelines.

In this blog post I will describe the Active/Active datacenter with VMware Full SDDC product suite.

The NSX running in Cross-vCenter mode, this ability introduced in VMware NSX release 6.2.x. In this blog post we will focus on network and security.

An introduction and overview blog post can be found in this link:

http://blogs.vmware.com/consulting/2015/11/how-nsx-simplifies-and-enables-true-disaster-recovery-with-site-recovery-manager.html

The goals that we are trying to achieve in this post are:

  1. Having the ability to deploy workloads with vRA on both of the datacenters.
  2. Provide Business Continuity in case of a partial of a full site failure.
  3. Having the ability to perform planned or unplanned migration of workloads from one datacenter to another.

To demonstrate the functionality of this design I’ve created demo ‘vPOD’ in VMware internal cloud with the following products in each datacenter:

  • vCenter 6.0 with ESXi host 6.0
  • NSX 6.2.1
  • vRA 6.2.3
  • vSphere Replication 6.1
  • SRM 6.1
  • Cloud Client 3.4.1

In this blog post I will not cover the recovery part of the vRA/vRO components, but this could be achieved with a separated SRM instance for the management infrastructure.

Environment overview

I’m adding short video to introduce the environment.

NSX Manager

The NSX manager in Site A will have the IP address of 192.168.110.15 and will be configured as primary.

The NSX Manager in site B will be configured with the IP 192.168.210.15 and is set as secondary.

Each NSX manager pairs with its own vCenter and learns its local inventory. Any configuration change related to the cross site deployment will run at the primary NSX manager and will be replicated automatically to the remote site.

 

Universal Logical Switch (ULS)

Creating logical switches (L2) between sites with VxLAN is not new to NSX, however starting from version 6.2.X we’ve introduced the ability of stretching the L2 between NSX managers paired to different vCenters. This new logical switch is known as a ‘Universal Logical Switch’ or ‘ULS’. Any new ULS we will create in the Primary NSX Manger will be synced to the secondary.

I’ve created the following ULS in my Demo vPOD:

Universal Logical Switch (ULS)

Universal Distributed Logical Router (UDLR)

The concept of a Distributed Logical Router is still the same as it was before NSX 6.2.x. The new functionally that was added to this release allows us to configure Universal Distributed Logical Router (UDLR).  When we deploy a UDLR it will show up in all NSX Managers Universal Transport Zone.

The following UDLR created was created:

Universal Distributed Logical Router (UDLR)

Universal Security Policy with Distributed Firewall (UDFW)

With version 6.2.x we’ve introduced the universal security group and universal IP-Set.

Any firewall rule configured in the Universal Section must be IP-SET or Security Group that contain IP-SET.

When we are configuring or changing Universal policy, automatically there is a sync process that runs from the primary to the secondary NSX manager.

The recommended way to work with an ipset is to add it to a universal security group.

The following Universal security policy is an example to allow communicating to 3-Tier application. The security policy is built from universal security groups. Each group contain IP-SET with the relevant IP address for each tier.

Universal Security Policy with Distributed Firewall (UDFW)

vRA

At the automation side we’re creating two unique machine blueprints per site. The MBP are based on Classic CentOS image that allows us to perform some connectivity tests.

The MBP named “Center-Site_A” will be deployed by vRA to Site A into the green ULS named: ULS_Green_Web-A.

The IP address pool configured for this ULS is 172.16.10.0/24.

The MBP named “Center-Site_B” will be deployed by vRA to Site B into the blue ULS named: ULS_Blue_Web-B.

The IP address pool configured for this ULS is 172.17.10.0/24

vRA Catalog

Cloud Client:

To quote from VMware Official documentation:

“Typically, a vSphere hosted VM managed by vRA belongs to a reservation, which belongs to a compute resource (cluster), which in turn belongs to a vSphere Endpoint. The VMs reservation in vRA needs to be accurate in order for vRA to know which vSphere proxy agent to utilize to manage that VM in the underlying vSphere infrastructure. This is all well and good and causes few (if any) problems in a single site setup, as the VM will not normally move from the vSphere endpoint it is originally located on.

With a multi-site deployment utilizing Site Recovery Manager all this changes as part of the site to site fail over process involves moving VMs from one vCenter to another. This has the effect in vRA of moving the VM to a different endpoint, but the reservation becomes stale. As a result it becomes no longer possible to perform day 2 operation on the VMs until the reservation is updated.”

When we failover VMs from Site A to Site B cloud client will run the following action behind the science to solve this challenge.

Process Flow for Planned Failover:

Process Flow for Planned Failover

The Conceptual Routing Design with Active/Active Datacenter

The main key point for this design is to run Active/Active for workloads in both datacenters.

The workloads will reside on both Site A and Site B. In the modern datacenter the entry point is protected with perimeter firewall.

In our design each site has its on perimeter firewall run independently FW_A located in Site A and FW_B Located in Site B.
Site A (Shown in Green color) run its own ESGs (Edge Security Gateways), Universal DLR (UDLR) and Universal Logical Switch (ULS).

Site B site (shown in Blue color) have different ESGs, Universal DLR (UDLR) and Universal Logical Switch (ULS).

The main reason for the different ESG, UDLR and ULS per site is to force single ingress/egress point for workload traffic per site.

Without this ingress/egress deterministic traffic flow, we may face asymmetric routing between the two sites, that means that ingress traffic will be via Site A to FW_A and egress via Site B to FW_B, this asymmetric traffic will dropped by the FW_B.

Note: The ESGs in this blog run in ECMP mode, As a consequence we turned off the firewall service on the ESGs.

The Green network will always will be advertise via FW_A.  For an example The Control VM (IP 192.168.110.10) shown in the figure below need to access the Green Web VM connected to the ULS_Web_Green_A , the traffic  from the client will be routed via Core router and to FW_A, from there to one of the ESG working in ECMP mode, then to the Green UDLR and finally to the Green Web VM itself.

Now Assume the same client would like to access the Blue Web VM connected to ULS_Web_Blue_B, this traffic will be routed via the Core router to FW_B, from there to one of the Blue ESG working in ECMP mode, to the Blue ULDR and at the end to the Blue VM itself.

Routing Design with Active/Active Datacenter

What is the issue with this design?

What will happen if we will face a complete failure in one of our Edge Clusters or FW_A?

For our scenario I’ve combined failures of the Green Edge cluster and FW_A in the image below.

In that case we will lose all our N-S traffic to all of our ULS behind this Green Edge Cluster.

As a result, all clients outside the SDDC will lose connectivity immediately to all of the green Green ULS.

Please note: forwarding traffic to the Blue ULS will continue to work in this event regardless of the failure in Site A.

 

PIC7

If we’ll have a stretched vSphere Edge cluster between Site A and Site B, then we will able to leverage vSphere HA to restart the failed Green ESGs in the remote Blue site (This is not the case here, in our design each site has its own local cluster and storage), but even if we had vSphere HA, the restart process can take few minutes. Another way to recover from this failure is to manually deploy Green ESGs in Site B, and connect them to Site B FW_B. The recovery time of this solution could take few minutes. Both of these options are not suitable for modern datacenter design.

In the next paragraph I will introduce a new way to design the ESGs in Active/Active datacenter architecture.

This design will be much faster and will work in a more efficient way to recover from such an event in Site A (or Site B).

Active/Active Datacenter with mirrored ESGs

In this design architecture we will be deploying mirrored Green ESGs in Site B, and blue mirrored ESGs into Site A. Under normal datacenter operation the mirrored ESGs will be up and running but will not forward traffic. Site-A green ULS traffic from external clients will always enter via Site A ESGs (E1-Green-A , E2-Green-A) for all of Site A Prefix and leave through the same point.

Adding the mirrored ESGs add some complexity in the single Ingres/Egress design, but improves the converge time of any failure.

PIC8How Ingress Traffic flow works in this design?

Now we will explain how the Ingress traffic flow works in this architecture with mirrored ESGs. In order to simplify the explanation, we will be focusing only on the green flow in both of the datacenters and remove the blue components from the diagrams but the same explanation works for the Blue Site B network as well.

Site A Green UDLR control VM runs eBGP protocol with all Green ESGs (E1-Green-A to E4-Green-B). The UDLR Redistributes all connected interfaces as Site A prefix via eBGP. Note: “Site A prefix” represent any Green Segments part of the green ULS.

The Green ESGs (E1-Green-A  to E4-Green-B) sends out via BGP Site-A’s prefix to both physical firewalls: FW_A located in Site A and FW_B located Site B.

FW_B in Site B will add BGP AS prepending for Site A prefix.

From the Core router point of view, we’ll have two different paths to reach Site A Prefix: one via FW_A (Site A) and the second via FW_B (Site B). Under normal operation, this traffic will flow only through Site A because of the fact that Site B prepending for prefix A.

PIC9

Egress Traffic

Egress traffic is handled by UDLR control VM with different BGP Weigh values.

Site A ESGs: E1-Green-A and E2-Green-A has mirrors ESGs: E3-Green-B and E4-Green-B located at Site B. The mirrors ESGs provide availability. Under normal operation The UDLR Control VM will always prefer to route the traffic via higher BGP Wight value of E1-Green-A and E2-Green-A.  E3-Green-B and E4-Green-B will not forward any traffic and will wait for E1-E2 to fail.

In the figure below, we can see Web workload running on Site A ULS_Green_A initiate traffic to the Core. This egress traffic pass through DLR Kernel module, trough E1-Green-A ESG and then forward to Site A FW_A.

PIC10

There are other options for ingress/egress within NSX 6.2:

Great new feature called ‘Local-ID’. Hany Michael wrote a blog post to cover this option.

In Hany’s blog we don’t have a firewall like in my design so please pay attention to few minor differences.

http://www.networkskyx.com/2016/01/06/introducing-the-vmware-nsx-vlab-2-0/

Anthony Burke wrote a blog post about how to use local-id with physical firewall

https://networkinferno.net/ingress-optimisation-with-nsx-for-vsphere

Routing updates

Below, we’re demonstrating routing updates for Site-A, but the same mechanism works for Site B. The Core router connected to FW_A in Site A will peer with the FW_A via eBGP.

The core will send out 0/0 Default gateway.

FW_A will perform eBGP peering with both E1-Green-A and E2-Green-A. FW_A will forward the 0/0 default gateway to Green ESGs and will receive Site A green Prefix’s from Green ESGs. The Green ESGs E1-Green-A and E2-Green-A peers in eBGP with UDLR control VM.

The UDLR and the ESGs will work in ECMP mode, as results the UDLR will get 0/0 from both ESGs. The UDLR will redistribute connected interfaces (LIFs) to both green ESGs.

We can work with iBGP or eBGP  or mix from the UDLR – > ESG ->  physical routers.

In order to reduce the eBGP converge time of Active UDLR control VM failure, we will configure flowing static route in all of the Green side to point to UDLR forwarding address for the internal LIF’s.

Routing filters will apply on all ESGs to prevent unwanted prefixes advertisement and EGSs becoming transit gateways.

PIC11

Failure of One Green ESG in Site A

The Green ESGs: E1-Green-A and E2-Green-A working in ECMP mode. From UDLR and FW_A point of view both of the ESG work in Active/Active mode.

As long as we have at least one active Green ESG in Site A, The Green UDLR and the Core router will always prefer to work with Site A Green ESGs.

Let’s assume we have active flow of traffic from the Green WEB VM in site A to the external client behind the core router, and this traffic initially passing through via E1-Green-A. In and event of failure of E1-Green-A ESG, the UDLR will reroute the traffic via E2-Green-ESG because this ESG has better weight then Green ESGs on site B (E3-Green-B and E4-Green-B).

FW_A is still advertising a better as-path to ‘ULS_Web_Green_A’ prefixes than FW_B (remember FW_B always prepending Site_A prefix).

We’ll use low BGP time interval settings (hello=1 sec, hold down=3 sec) to improve BGP converge routing.

 

PIC12

Complete Edges cluster failure in site A

In this scenario we face a failure of all Edge cluster in Site A (Green ESGs and Blue ESGs), this issue might include the failure of FW_A.

Core router we will not be receiving any BGP updates from the Site A, so the core will prefer to go to FW_B in order to reach any Site A prefix.

From the UDLR point of view there arn’t any working Green ESGs in Site A, so the UDLR will work with the remaining green ESGs in site B (E3-Green-B, E4-Green-B).

The traffic initiated from the external client will be reroute via the mirrored green ESGs (E3-Green-B and E4-GreenB) to the green ULS in site B. The reroute action will work very fast based on the BGP converge routing time interval settings (hello=1 sec, hold down=3 sec).

This solution is much faster than other options mentioned before.

Same recovery mechanism exists for failure in Site B datacenter.

PIC13

Note: The Green UDLR control VM was deployed to the payload cluster and isn’t affected by this failure.

 

Complete Site A failure:

In this catastrophic scenario all components in site A were failed. Including the management infrastructure (vCenter, NSX Manager, controller, ESGs and UDLR control VM). Green workloads will face an outage until they are recovered in Site B, the Blue workloads continues to work without any interference.

The recovery procedure for this event will be made for the infrastructure management/control plan component and for the workloads them self.

Recovery the Management/control plan:

  • Log in to secondary NSX Manager and then Promote Secondary NSX Manager to Primary by: Assign Primary Role.
  • Deploy new Universal Controller Cluster and synchronize all objects
  • Universal CC configuration pushed to ESXi Hosts managed by Secondary
  • Redeploying the UDLR Control VM.

The recovery procedure for the workloads will run the “Recovery plan” from SRM located in site B.

PIC14

 

Summery:

In this blog post we are demonstrating the great power of NSX to create Active/Active datacenter with the ability to recover very fast from many failure scenarios.

  • We showed how NSX simplifies Disaster Recovery process.
  • NSX and SRM Integration is the reasonable approach to DR where we can’t use stretch vSphere cluster.
  • NSX works in Cross vCenter mode. Dual vCenters and NSX managers improving our availability. Even in the event of a complete site failure we were able to continue working immediately in our management layer (Seconday NSX manager and vCenter are Up and running).
  • In this design, half of our environment (Blue segments) wasn’t affected by a complete site failure. SRM recovered our failed Green workloads without need to change our Layer 2/ Layer 3 networks topology.
  • We did not use any specific hardware to achieve our BCDR and we were 100% decupled from the physical layer.
  • With SRM and vRO we were able to protect any deployed VM from Day 0.

 

I would like to thanks to:

Daniel Bakshi that help me a lots to review this blog post.

Also Thanks Boris Kovalev and Tal Moran that help to with the vRA/vRO demo vPOD.

 

 

 

NSX Service Composer: Methodology Concept

Background

Recently in one of my NSX projects I was asked by the customer to develop flexible yet simple to use Security methodology of working with NSX service composer.

The focus was to build the right construct of security groups and security policy based on following requirements:

  • The customer owns different environment types: Dev, Pre-Prod and Prod, Each environment required different security policy.
  • Customer would like to avoid creating specific blocking rules between the environments, such deny rules will cause operation complexity. Security policy should be based on specific allowing rules while all others traffic will be blocked in the last cleanup rule.
  • Minimizing the human error of connecting workload to the wrong security group, which may result in giving it an unwanted access. For example connecting Prod workload to Dev security group.
  • The customer would like to protect 3-tier applications (Web, App and DB), these applications run on 3 vSphere clusters (Dev, Pre-Prod and Prod) .

 

To achieve the customer requirements, we will build a security concept that is based on the NSX Service Composer.

We will demonstrate the implementation of security policies and security groups to protect 3-tier applications, based on the assumption that the application run on pre-prod cluster but the same concept applies to any cluster.

 

Security Level Concept:

We will use the concept of “Security Levels” to differentiate between firewall rules granularly.

Each level will have different firewall access policy starting from zero (no access) till the highest level (Application access).

Level-1 Basic Security groups (SG)

Level 1 (L1) SG used to create the building block for the firewall rules, we are not using any security policy directly with Level 1 security groups.

The following security groups created at Level 1:

Cluster SG

Cluster Security Group represents different vSphere clusters.

In our example: Dev, Pre-Prod and Prod. (Some customers may have only two clusters (Dev and Prod) or a similar scenario.

For each vSphere cluster we will have a dedicated security group. Any deployed VM will be included automatically and dynamically in the relevant Cluster security group.

For example: any VM from Pre-Prod cluster will be included in “SG-L1-CL-Pre-Prod” security group.

Picture1

By creating this dynamic criteria, we have eliminated the need for manual human action.

We can leverage this ability further and create Security policy levels on top on this one.

By doing that we are reducing the human error factor of connecting Virtual Machines to the wrong security groups and as a result enabling them with dangerous unwanted security access.

In Level-1 SG, the machines are member only in a cluster security group representing its vSphere cluster environment and it will not get any dFW rules at this level.

Environment Security Group

This Security Group represents the environment that the machine belongs to.

For example: We might have a different Prod env for IT, R&D, and Sales.

This is very useful when we want to say that a machine is “Prod” and running in “R&D” env rather than “Sales”.

Env-L1 could represent different departments or projects in the company according to the way you build your infrastructure.

For example we will create a security group called “SG-L1-EN-R&D” to represent any machine owned by R&D.

The membership criteria in this example is security tag called “L1-ST-ENV-R&D”.

Picture2

Application Security Group

The Application Security Group comes to indicate the application installed on the machine (For example: Web, App or DB).

Virtual Machines will be assigned to this group by NSX security tag.

An example of security group name “SG-L1-Web” with match criteria security tag called
“L1-ST-APP-Web”.

Picture3

The full list of level 1 security tag shown in image bellow:

Picture4

Level 1 Security Groups Building Blocks Concept:

Level 1 Security Groups Building Blocks Concept

Level 2 Infrastructure Security Group

Infrastructure rules are used by machines to get common System Services like: Active directory, DNS, Anti virus and different agents and services that manage the environment.

Combining both L1-Cluster and L1-Env security groups (with Logical “AND”) will form the “Infrastructure” security group shown as Level-2 that allow the Virtual Machines to get infrastructure security policy based on the VM role.

For example, we will create Level 2 Security group called “SG-L2-INF -R&D“, this SG represent Virtual Machines from the SG-L1-CL-Pre-Pre SG  AND belong to SG-L1-EN-R&D SG environment.

The match criteria are security groups, we called this nested security groups.

Picture6

The Result of adding Level 2 security on-top of Level 1 security is illustrated in the following diagram:

Level 2 security

 

Level 3 Application Security Group

Level-3 security group are used for application access level. Security groups in this level are combining a level-2 infrastructure SG AND a level 1 application SG.

For example, we will create the security group “SG-L3-Pre-Pro-WEB” with dynamic membership criteria matching of L1 Security group “SG-L1-WEB” and security group “SG-L2-INF-R&D”:

Picture8

The relation between the different security groups is illustrated in the next diagram:

Level 3

For example, web VM from the web tier. This VM was deployed to Pre-Prod cluster, as results this VM automaticity belong to “SG-L1-CL-Pre-Prod.

The VM got the NSX security tag called “L1-ST-EN-Pre-Prod” and as result it will be a member of security group called “SG-L1-EN-Pre-Prod”.

The VM was attached with the NSX security tag “L1-ST-APP-Web” and is now a member of security group called “SG-L1-APP-WEB”.

Because of the membership in both security groups: “SG-L1-EN-Pre-Prod” AND “SG-L1-APP-WEB” this VM is automatically a member of security group called “SG-L2-EN-Pre-Prod”.

As a result of the VM being a member of “SG-L2-EN-Pre-Prod” and “SG-L1-AP-WEB” it will automatically be member of “SG”L3-Pre-Prod-APP-WEB”

Please note we’re demonstrating here just the WEB tier what the same concept Apply to APP and DB tier.

Service Composer Security Policy

Security policy for L2 infrastructure workloads in service composer includes generic firewall rules to enable workloads with infrastructure system connectivity like DNS,AD, Anti-Virus etc..

For example Level 2 security policy named as “L2-SP-INF-R&D” and contain example of 4 firewall rules:

Picture10

then we’ll apply the “L2-SP-INF-R&D” security policy on the “SG-L2-INF-R&D” security group.

Service Composer Application Security Policy

Security policy for Application workloads in service Composer includes firewall policy for application. For example, the web tier security policy called “SG-L3-R&D-WEB” and contains one firewall rule:

Picture11

Then we will apply the “L3-SP-R&D-WEB” to security policy on the “SG-L3-R&D-WEB” security group.

Example for security policy to allow the Web tier to talk to App tier with Tomcat service:

Picture12

Then we’ll apply the “L3-SP-R&D-APP” security policy on the “SG-L3-R&D- APP” security group.

Example for security policy to allow App tier to talk to DB tier with MySQL:

Picture13

Then we will apply the “L3-SP-R&D-DB” to security policy on the “SG-L3-R&D-DB” security group.

Starting from NSX version 6.2.x we can enable the “Apply To” feature to automatically enforce security policy on the security group object instead of the default distributed firewall (means apply the policy anywhere).

This is a great feature that help us avoid “spamming” of objects with unrelated dFW rules so we are more efficient with our system.

To enable this feature we need to use the following steps. In Service Composer click on “Edit Policy Firewall Setting”

Picture14

Then choose the checkbox the “Policy Security Group” instead of “Distributed Firewall”.

Picture15

We can view the full service composer firewall result in the “Firewall” Tab:

Picture16

To demonstrate the effective security policy combined with the different security levels let’s look at ‘Monitor -> service Composer’ at a VM object level.

Here is screenshot of Web-02a VM part of the WEB tier:

Picture17

The effective security policy for web-02a can be verified under Monitor -> Service Composer tab in the web client:

Picture23

The effective security policy for App-02a can be shown in the Monitor -> Service Composer tab:

Picture24

The effective security policy for DB-02a can be shown in the Monitor -> Service Composer tab:

Picture25

To recap the complete Security Group list:

Picture26

The complete Security Policy list:

Picture27

I would like to Thanks to Daniel Bakshi that review this blog post.

 

Reference blogs post by my colleagues:

Sean Howard wrote great blog post with service composer concept:

http://nsxperts.com/?p=65

Anthony Burke also Cover the Service Composer subject :

https://networkinferno.net/service-composer-security-groups-and-security-tags

And from  Service Composer on VMware official wrote by Romain Decker 

https://blogs.vmware.com/consulting/2015/01/automating-security-policy-enforcement-nsx-service-composer.html