NSX – Distributed Logical Router Deep Dive

Overview

In today’s modern Datacenter, the physical router is essential for building a workable network design. As in the physical infrastructure, we need to provide similar functionality in virtual networking. Routing between IP subnets can be performed in a logical space without traffic going out to the physical router. This routing is performed in the hypervisor kernel with a minimal CPU and memory overhead. This functionality provides an optimal data-path for routing traffic within the virtual infrastructure. Distributed routing capability in the NSX-v platform provides an optimized and scalable way of handling East – West traffic within a data center. East – West traffic is the communication between virtual machines within the datacenter. The amount of East – West traffic in the datacenter is growing. The new collaborative, distributed, and service oriented application architecture demands a higher bandwidth for server-to-server communication.

If these servers are virtual machines running on a hypervisor, and they are connected to different subnets, the communication between these servers has to go through a router. Also, if a physical router is used to provide routing services the virtual machine communication has to go out to the physical router and get back in to the server after the routing decisions have been made. This is obviously not an optimal traffic flow and is sometimes referred to as “hair pinning”.

The distributed routing on the NSX-v platform prevents the “hair-pinning” by providing hypervisor level routing functionality. Each hypervisor has a routing kernel module that performs routing between the Logical Interfaces (LIFs) defined on that distributed router instance.

The distributed logical router possesses and manages the logical interface (LIF). The LIF idea is similar to interfaces VLAN on a physical router. But on the distributed logical router, the interfaces are called LIFs. The LIF connects to the logical switches or distributed port groups. A single distributed logical router can have a maximum of 1,000 LIFs.

DLR Overview

DLR Overview

DLR Interfaces type

With the DLR we have three types of interfaces. These are called Uplink, LIFs and Management.

Uplink: This is used by the DLR Control VM to connect the upstream router. In most of the documentation you will see, it is also referred to as “transit”, and this interface is the transit interface between the logical space to the physical space. The DLR supports both OSPF and BGP on its Uplink Interface, but cannot run both at the same time. OSPF can be enabled only on single Uplink Interface.

LIFs: LIFs exist on the ESXi host at the kernel level; LIFs are the Layer 3 interface that act as the default gateway for all VM’s connected to logical switches.

Management: DLR management interface can be used for different purposes. The first one is to manage the DLR control VM remote access like SSH. Another use case is for High Availability. The last one is to send out syslog information to a syslog server. The management interface is part of the routing table of the control VM; there is no separate routing table. When we configure an IP address for the management interface only devices on the same subnet as the Management subnet will be able to reach the DLR Control VM management IP, and the remote device will not be able to contact this IP.

DLR Interface Type

DLR Interface Type

Note: If we just need the IP address to manage the DLR remotely we can SSH to the DLR “Protocol Address” explain later in this chapter, there is no need to configure new IP address for management interface.

Logical Interfaces and virtual MAC’s and Physical MAC:

Logical Interfaces (LIFs) including IP address of the DLR Kernel module inside the ESXi host. For each LIF we will have an associated MAC address called virtual MAC (vMAC).  This vMAC is not visible to the physical network. The virtual MAC (vMAC) is the MAC address of the LIF and is the same across all the ESXi hosts and is never seen by the physical network, only by virtual machines. The virtual machines use the vMAC as their default gateway MAC address. The physical MAC (pMAC) is the MAC address of the uplink through which traffic flows to the physical network, and in this case when the DLR needs to route traffic outside of the ESXi host it is the Physical MAC (pMAC) address that will be used.

In the following figure, inside esxcomp-01a that is an ESXi host, we have the DLR kernel module, this DLR instance will have two LIFs. Each LIF is associated with a logical switch VXLAN 5001 and 5002. From the perspective of VM1, the default gateway is LIF1 with IP address 172.16.10.1, VM2 has a default gateway that is LIF2 172.16.20.1 and vMAC is the same mac address for both LIFs.

The LIFs IP address and vMAC will be the same across all NSX-v hosts for the same DLR instance.

DLR and vMotion

DLR and vMotion

When VM2 is vMotioned from esxcomp-01a to esxcomp-01b, VM2 will have the same default gateway (LIF2), which is associated with vMAC, and from the perspective of VM2 nothing has been changed.

 

DLR Kernel module and ARP table

The DLR does not communicate with the NSX-v Controller to figure out the MAC address of VMs. Instead it sends an ARP request to the entire ESXi host VTEP’s members on that logical switch The VTEP’s that receive this ARP request forward it to all VMs on that logical switch.

In the following figure, if VM1 needs to communicate with VM2, this traffic will route inside the DLR kernel module at escomp-01a, this DLR needs to know the MAC address of VM1 and VM2. The DLR will then send an ARP request to all VTEP members on VXLAN 5002 to learn the MAC address of VM2. In addition to this, the DLR will also keep the ARP table entry for 600 seconds, which is called its aging time.

DLR Kernel module and ARP table

DLR Kernel module and ARP table

Note: The DLR instance may have different ARP entries between different ESXi hosts. Each DLR Kernel module maintains its own ARP table.

DLR and local routing

Since the DLR instance is distributed, each ESXi host has a route instance that can route traffic. When VM1 need to send traffic to VM2, theoretically both DLR in esxcomp-01a and esxcomp-01b can route the traffic as in the following figure. In NSX-v the DLR will always perform local routing for VMs traffic!

When VM1 sends a packet to VM2, the DLR in esxcomp-01a will route the traffic from VXLAN 5001 to VXLAN 5002 because VM1 has initiated the traffic.

DLR Local Routing

DLR Local Routing

The following illustration shows that when VM2 replies back to VM1, the DLR at esxcomp-01b will route the traffic because VM2 is near to the DLR at esxomp-01b.

Note: the actual traffic between the ESXi hosts will flow via VTEP’s.

DLR Local Routing

DLR Local Routing

Note: the actual traffic between the ESXi hosts will flow via VTEP’s.

Multiple Route Instances

The Distributed Logical Router (DLR) has two components, the first one is the DLR Control VM that is a virtual machine and the second one is the DLR Kernel module that runs in all ESXi hypervisor.  This DLR Kernel module, which is called, route-instance has the same copy of information in each ESXi host. The Route-instance works at the kernel level. We will have at least one unique route-instance of the DLR kernel module inside the ESXi host but not limited to just on ESXi host.

The following figure shows two DLR control VMs, with the DLR Control VM1 on the right and DLR Control VM2 on the left. Each Control VM has its own route-instance in the ESXi hosts. In esxcomp-01a we have the route-instance1, which is managed by the DLR control VM1, and route-instance 2, which is managed by the Control VM2, and the same also applies to escomp-01b. The DLR instance has its own range of LIFs that it manages. The DLR control VM1 manages the LIF in VXLAN 5001 and 5002. The DLR control VM2 manages the LIF in VXLAN 5003 and 5004.

Multiple Route Instances

Multiple Route Instances

Logical Router Port

Regardless of the amount of route-instances we have inside the ESXi hosts we will have one special port called the “Logical Router Port” or “vdr Port”.

This port works like a “route in stick” concept. That means all routed traffic will pass through this port. We can think of route-instance like vrf lite because each route-instance will have its own LIFs and routing table, even the LIFs IP address can overlap with others.

In the following figure we have an example of an ESXi host with two route-instances where in route-instance-1 we have the same IP address as route-instace-2, but with a different VXLAN.

Note: Different DLRs cannot share the same VXLAN

DLR vdr port

DLR vdr port

Routing information Control Plan Update Flow

We need to understand how a route is configured and pushed from the DLR control VM to the ESXi hosts. Let’s look at the following figure to understand the flow.

Step 1: An end user configures a new DLR Control VM. This DLR will have LIFs (Logical interfaces) and a static or dynamic routing protocol peer with the NSX-v Edge Services gateway device.

Step 2: The DLR LIFs configuration information is pushed to all ESXi hosts in the cluster that have been prepared by the NSX-v platform. If more than one route instance exists, the DLR LIFs information will be sent to that instance only.

At this point VM’s in a different VXLAN (East – West traffic) can communicate with each other.

Step 3: The NSX-v Edge Services gateway (ESG) will update the DLR control VM about new routes.

Step 4: The DLR control VM will update the NSX-v controller (via UWA) with Routing Information Tables (RIBs).

Step 5: Then NSX-v controller will push RIBs to all ESXi hosts that have prepared by the NSX-v platform. If more than one route instance exists, RIBs information will send to that instance only.

Step 6: Route Instance on the ESXi host creates Forwarding Information Base (FIB) and handles the data path traffic.

Routing information Control Plan Update Flow

Routing information Control Plan Update Flow

DLR Control VM communications

The DLR Control VM is a virtual machine that is typically deployed in the Management or Edge Cluster. When the ESXi host has been prepared by the NSX-v platform, one of the VIB’s creates the control plane channel between the ESXi hosts to the NSX-v controllers. The service demon inside the ESXi host which is responsible for this channel, is called netcpad, and which is also more commonly referred to as the User World Agent (UWA).

The netcpad is responsible for communication between the NSX-v controller and ESXi host learns MAC/IP/VTEP address information, and for VXLAN communications. The communication is secured and uses SSL to communicate with NSX-v controller on the control plane. The UWA can also connect to multiple NSX-v controller instances and maintains its logs at /var/log/ netcpa.log

 

Another Service demon called the vShield-Statefull-Firewall is responsible for interacting with the NSX-v Manager. This service daemon receives configuration information from the NSX-v Manager to create (or delete) the DLR Control VM, create (or delete) the ESG. Beside that, this demon also performs NSX-v firewall tasks: Retrieve the DFW policy rules, gather the DFW statistics information and send them to the NSX-v Manager, send audit logs and information to the NSX-v Manager. Part of host preparation processes SSL related tasks from the NSX-v Manager.

The DLR control VM runs two VMCI sockets to the user world agents (UWA) on the ESXi host it is residing on. The first VMCI socket is to the vShield-Statefull-Firewall service daemon on the host for receiving update configuration information from the NSX-v Manager to the DLR control VM itself, and the second to netcpad for control plane access to the controllers.

The VMCI socket provides the local communication whereby the guest virtual machines can communicate to the hypervisor where they reside but cannot communicate to the other ESXi hosts.

On this basis the routing update happens in the following manner:

  • Step (1) DLR Control VM learn new route information (from the dynamic routing as an example) to update the NSX-v controller,
  • Step (2) the DLR will use the internal channel inside the ESXi01 host called the “Virtual Machine Communication Interface” (VMCI). VMCI will open a socket to transfer learned routes as Routing Information Base (RIB) information to the netcpa service daemon.
  • Step (3) The netcpa service demon will send the RIB information to the NSX-v controller. The flow of routing information passes through the Management VMkernel interface of the ESXi host, which means that the NSX-v controllers do not need a new interface to communicate to the DLR control VM. The protocol and port used for this communication is TCP/1234.
  • Step (4) NSX Controller will forward the DLR RIB to all netcpa service daemons on the ESXi host.
  • Step (5) netcpa will forward the FIB’s to the DLR route instance.
DLR Control VM communications

DLR Control VM communications

DLR High Availability

The High Availability (HA) DLR Control VM allows redundancy at the VM level. The HA mode is Active/Passive where the active DLR Control VM holds the IP address, and if the active DLR Control VM fails the passive DLR Control VM will take ownership of the IP address (flip event). The DLR route-instance and the interface of the LIFs and IP address exists on the ESXi host as a kernel module and are not part of this Active/passive mode flip event.

The Active DLR Control VM sync-forwarding table to secondary DLR Control VM, if the active fails, the forwarding table will continue to run on the secondary unit until the secondary DLR will renew the adjacency with the upper router.

The HA heartbeat message is sent out through the DLR management interface. We must have L2 connectivity between the Active DLR Control VM and the Secondary DLR Control VM. IP address of Active/Passive assign automatic as /30 when we deploy HA. The default failover detection mechanism is 15 seconds but can be lowered down to 6 seconds. The heartbeat uses UDP Port 694 for its communication.

DLR High Availability

DLR High Availability

You can also verify the HA status by running the following command:

DLR HA verification command:

$ show service highavailability

$ show service highavailability connection-sync

$ show service highavailability link

Protocol Address and Forwarding Address

The Protocol address is the IP address of the DLR Control VM. This Control Plane actually establishes the OSPF or BGP peering with the ESG’s. The following figure shows OSPF as example:

Protocol Address and Forwarding Address

Protocol Address and Forwarding Address

The following figure shows that the DLR Forwarding Address is the IP address that uses as the  next-hop for ESG’s.

Protocol Address and Forwarding Address

Protocol Address and Forwarding Address

DLR Control VM Firewall

The DLR Control VM can protect its Management or Uplink interfaces with the built in firewall. For any device that needs to communicate with the DLR Control VM itself we will need a firewall rule to approve it.

For example SSH to the DLR control VM or even OSPF adjacencies with the upper router will need to have a firewall rule. We can Disable/Enable the DLR Control VM firewall globally.

 Note: do not confuse DLR Control VM firewall rule with NSX-v distributed firewall rule. The following image shows the firewall rule for DLR Control VM.

DLR Control VM Firewall

DLR Control VM Firewall

Creating DLR

First step will be to create the DLR Control VM.

We need to go to Network and Security -> NSX Edges -> and click on the green + button.

Here we need to specify Logical (distributed) Router

 

Creating DLR

Creating DLR

Specify the User and Password, we can Enable SSH Access:

DLR CLI Credentials

DLR CLI Credentials

We need to specify where we want to place the DLR Control VM:

place the DLR Control VM

place the DLR Control VM

We need to specify the Management interfaces and Logical Interface (LIF)

Management Interface is for access with SSH to Control VM.

Lif interface needed to be configure Second Table below “Configure Interfaces of this NSX Edge”

Configure Interfaces of this DLR

Configure Interfaces of this DLR

Configure the Lif Interface’s done by connected interface to “Logical Switch” interfaces

Connected Lif  to DLR

Connected Lif to DLR

Configure the Up-Link Transit Lif:

Configure Up-Link Lif

Configure Up-Link Lif

Configure the Web Lif:

Configure the web Lif

Configure the web Lif

Configure the App Lif:

Configure the App Lif:

Configure the App Lif:

Configure the DB Lif:

Configure the DB Lif

Configure the DB Lif

Summary of all DLR Lif’s:

Summary of all DLR Lif’s

Summary of all DLR Lif’s

DLR Control VM can work in High Availability mode, in our lab we will not enable H.A:

DLR High Availability

DLR High Availability

Summary of DLR configuration:

Summary of DLR configuration:

Summary of DLR configuration:

 

DLR Intermediate step

After completed deploying DLR, we created 4 different Lif’s.

Tranit-Network-01, Web-Tier-01, App-Tier-01, DB-Tier01

All these Lif’s are spanned over all our ESX Cluster’s.

So for example virtual machine connected to Logical Switch called “App-Tier-01” will have a default gateway of 172.16.20.1 regardless where this VM located in the DC.

DLR Intermediate step

DLR Intermediate step

 

DLR Routing verification

We can verify NSX controller receiving the DRL Lif’s IP address for each VXLAN Logical switch.

From NSX controller run this command: show control-cluster logical-routers instance all

DLR Routing verification

DLR Routing verification

The LR-Id “1460487505” is the internal id of the DLR control VM.

To verify all DLR Lif’s interfaces run this command: show control-cluster logical-routers interface-summary LR-Id.

In our lab:

show control-cluster logical-routers interface-summary LR-Id14604875

DLR Routing verification

DLR Routing verification

 

Configure OSPF on DLR

On the ESX Edges click on the DLR Type Logical Router

Configure OSPF on DLR

Configure OSPF on DLR

Go to Manage – > Routing ->  OSPF and Click “Edit”

Configure OSPF on DLR

Configure OSPF on DLR

Type in the Protocol Address and Forwarding Address.

Do not Mark the “Enable OSPF” Check box !!!

Protocol Address and Forwarding Address

Protocol Address and Forwarding Address

The Protocol address is the IP address of the DLR Logical Router Control VM, this Control Plane actually establishing the OSPF peering with the NSX Edge.

The Forwarding Address is the IP address that use next-hop for NSX Edge to forward the packet to DRL:

DLR Forwarding Address

DLR Forwarding Address

Click on “Publish Changes”:

Publish Changes

Publish Changes

The results will look like this:

DLR

Go to “Global Configuration”:

Global Configuration

Global Configuration

Type the Default Gateway for DLR (Next hop NSX Edge):

Default Gateway

Default Gateway

Enable the OSPF:

Enable the OSPF

Enable the OSPF

Then click on “Publish the Change’s”

Go Back to “OSPF” to “Are to Interface Mapping” and add the Transit-Uplink to Area 51:

Are to Interface Mapping

Are to Interface Mapping

Click on “Publish Change”

Go to Route Redistribution and make sure OSPF is enabled:

Route Redistribution

Route Redistribution

Deploy NSX Edge

In our LAB we will use NSX Edge as next-hop for LDR but it can be physical router.

NSX Edge is virtual appliance offers L2, L3, perimeter firewall, load-balancing and other services such as SSL VPN, DHCP, etc.

We will use this Edge for Dynamic Routing.

 

Go to “NSX Edge” -> and Click on the green plus button

Select “Edge Services Gateway” fill in the Name and Hostname for this Edge.

If we would like the use redundant Edge we need to checked the “Enable High Availability”

NSX Edge

NSX Edge

Put your username and password:

username and password

username and password

Select the Size of the NSX Edge:

NSX Edge size

NSX Edge size

Select where to install the Edge:

Configure the Network Interfaces:

Configure the Network Interfaces

Configure the Network Interfaces

Configure the Mgmt interface:

 

 

 

 

Configure the Mgmt interface

Configure the Mgmt interface

Configure the Transit interface:

Configure the Transit interface

Configure the Transit interface (toward  DLR):

Configure Default Gateway:

Edge Default Gateway

Edge Default Gateway

 

Set Firewall Default policy to permit all traffic:

Firewall Default policy to permit all traffic

Firewall Default policy to permit all traffic:

Summary of Edge Configuration:

Summary of Edge Configuration

Summary of Edge Configuration

Configure OSPF at NSX Edge:

Configure OSPF at NSX Edge

Configure OSPF at NSX Edge

Enable OSPF at “Global Configuration”:

Enable OSPF at "Global Configuration"

Enable OSPF at “Global Configuration”

In the “Dynamic Routing Configuration” Click “Edit”

For the “Router ID” select the interface that you have configured as the OSPF Router-ID.

Check “Enable OSPF”:

 

Enable OSPF

Enable OSPF

Publish and Go to “OSPF” Add Transit Network to Area 51 in the interface mapping section:

Map Interface to OSPF Area

Map Interface to OSPF Area

 

Click “Publish”

Make sure OSPF Status is in “Enabled” state and the Red button on the right is in “Disable”.

Getting the full picture

 

Getting the full picture

Getting the full picture

 

Dynamic OSPF Routing Verification

Open the Edge CLI

The Edge has OSPF neighbor adjacency with 192.168.10.3 This is the Control VM IP address.

Edge OSPF verfication

Edge OSPF verfication

The NSX Edge Received OSPF Routes from the DLR.

From the Edge Perspective the next-hope to DLR is the Forwarding Address 192.168.10.2

Edge OSPF Routing Verification

Edge OSPF Routing Verification

 

Related Post:

NSX Manager

NSX Controller

Host Preparation

Logical Switch

Distributed Logical Router

 

Thanks to:

Shachar Bobrovskye, Michael Haines,  Prasenjit Sarkar for contribute to this post.

Offer Nissim for reviewing this post

 

To find out more info what is Distributed Dynamic routing I recommend on reading two blogs of

Colleague of mine:

Brad Hedlund

http://bradhedlund.com/2013/11/20/distributed-virtual-and-physical-routing-in-vmware-nsx-for-vsphere/

Antony Burke

http://networkinferno.net/nsx-compendium

VMware Start Beta NSX training

VMware NSX: Install, Configure, Manage [V6.0]

VMware NSX

Overview:
This comprehensive, fast-paced training course focuses on installing, configuring, and managing VMware NSX™. NSX is a software networking and security virtualization platform that delivers the operational model of a virtual machine for the network. Virtual networks reproduce the layer 2–layer 7 network model in software, enabling complex multitier network topologies to be created and provisioned programmatically in seconds. NSX also provides a new model for network security where security profiles are distributed to and enforced by virtual ports and move with virtual machines.
For advanced course options, go to www.vmware.com/education.
Objectives: •  Describe the evolution of the Software-Defined Data Center
•  Describe how NSX is the next step in the evolution of the Software-Defined Data Center
•  Describe data center prerequisites for NSX deployment
•  Configure and deploy NSX components for management and control
•  Describe basic NSX layer 2 networking
•  Configure, deploy, and use logical switch networks
•  Configure and deploy NSX distributed router appliances to establish East-West connectivity
•  Configure and deploy VMware® NSX Edge™ services gateway appliances to establish North-South connectivity
•  Configure and use all main features of the NSX Edge services gateway
•  Configure NSX Edge firewall rules to restrict network traffic
•  Configure NSX distributed firewall rules to restrict network traffic
•  Use role-based access to control user account privileges
•  Use activity monitoring to determine whether a security policy is effective
•  Configure service composer policies
Intended Audience: Experienced system administrators that specialize in networking
Prerequisites: •  System administration experience on Microsoft Windows or Linux operating systems
•  Understanding of concepts presented in the VMware Data Center Virtualization Fundamentals course for VCA-DCV certification
Outline: 1  Course Introduction
•  Introductions and course logistics
•  Course objectives
2  VMware NSX Components for Management and Control
•  Evolution of the Software-Defined Data Center
•  Introduction to NSX
•  VMware® NSX Manager™
•  NSX Controller cluster
3  Logical Switch Networks
•  Ethernet fundamentals and basic NSX layer 2 networking
•  VMware vSphere® Distributed Switch™ overview
•  Switch link aggregation
•  Logical switch networks
•  VMware® NSX Controller® replication
4  Routing with VMware NSX Edge Appliances
•  Routing protocols primer
•  NSX logical router
•  NSX Edge services gateway
5  Features of the VMware NSX Edge Services Gateway
•  Network address translation
•  Load balancing
•  High availability
•  Virtual private networking
–  Layer 2 VPN
–  IPsec VPN
–  SSL VPN-Plus
•  VLAN-to-VXLAN bridging
6  VMware NSX Security
•  NSX Edge firewall
•  NSX distributed firewall
•  Role-based access control
•  NSX data endpoint
•  Flow Monitoring
•  Service Composer

More detailed can be found:

http://mylearn.vmware.com/mgrreg/courses.cfm?ui=www_edu&a=one&id_subject=54990