In today’s modern Datacenter, the physical router is essential for building a workable network design. As in the physical infrastructure, we need to provide similar functionality in virtual networking. Routing between IP subnets can be performed in a logical space without traffic going out to the physical router. This routing is performed in the hypervisor kernel with a minimal CPU and memory overhead. This functionality provides an optimal data-path for routing traffic within the virtual infrastructure. Distributed routing capability in the NSX-v platform provides an optimized and scalable way of handling East – West traffic within a data center. East – West traffic is the communication between virtual machines within the datacenter. The amount of East – West traffic in the datacenter is growing. The new collaborative, distributed, and service oriented application architecture demands a higher bandwidth for server-to-server communication.
If these servers are virtual machines running on a hypervisor, and they are connected to different subnets, the communication between these servers has to go through a router. Also, if a physical router is used to provide routing services the virtual machine communication has to go out to the physical router and get back in to the server after the routing decisions have been made. This is obviously not an optimal traffic flow and is sometimes referred to as “hair pinning”.
The distributed routing on the NSX-v platform prevents the “hair-pinning” by providing hypervisor level routing functionality. Each hypervisor has a routing kernel module that performs routing between the Logical Interfaces (LIFs) defined on that distributed router instance.
The distributed logical router possesses and manages the logical interface (LIF). The LIF idea is similar to interfaces VLAN on a physical router. But on the distributed logical router, the interfaces are called LIFs. The LIF connects to the logical switches or distributed port groups. A single distributed logical router can have a maximum of 1,000 LIFs.
DLR Interfaces type
With the DLR we have three types of interfaces. These are called Uplink, LIFs and Management.
Uplink: This is used by the DLR Control VM to connect the upstream router. In most of the documentation you will see, it is also referred to as “transit”, and this interface is the transit interface between the logical space to the physical space. The DLR supports both OSPF and BGP on its Uplink Interface, but cannot run both at the same time. OSPF can be enabled only on single Uplink Interface.
LIFs: LIFs exist on the ESXi host at the kernel level; LIFs are the Layer 3 interface that act as the default gateway for all VM’s connected to logical switches.
Management: DLR management interface can be used for different purposes. The first one is to manage the DLR control VM remote access like SSH. Another use case is for High Availability. The last one is to send out syslog information to a syslog server. The management interface is part of the routing table of the control VM; there is no separate routing table. When we configure an IP address for the management interface only devices on the same subnet as the Management subnet will be able to reach the DLR Control VM management IP, and the remote device will not be able to contact this IP.Note: If we just need the IP address to manage the DLR remotely we can SSH to the DLR “Protocol Address” explain later in this chapter, there is no need to configure new IP address for management interface.
Logical Interfaces and virtual MAC’s and Physical MAC:
Logical Interfaces (LIFs) including IP address of the DLR Kernel module inside the ESXi host. For each LIF we will have an associated MAC address called virtual MAC (vMAC). This vMAC is not visible to the physical network. The virtual MAC (vMAC) is the MAC address of the LIF and is the same across all the ESXi hosts and is never seen by the physical network, only by virtual machines. The virtual machines use the vMAC as their default gateway MAC address. The physical MAC (pMAC) is the MAC address of the uplink through which traffic flows to the physical network, and in this case when the DLR needs to route traffic outside of the ESXi host it is the Physical MAC (pMAC) address that will be used.
In the following figure, inside esxcomp-01a that is an ESXi host, we have the DLR kernel module, this DLR instance will have two LIFs. Each LIF is associated with a logical switch VXLAN 5001 and 5002. From the perspective of VM1, the default gateway is LIF1 with IP address 172.16.10.1, VM2 has a default gateway that is LIF2 172.16.20.1 and vMAC is the same mac address for both LIFs.
The LIFs IP address and vMAC will be the same across all NSX-v hosts for the same DLR instance.
When VM2 is vMotioned from esxcomp-01a to esxcomp-01b, VM2 will have the same default gateway (LIF2), which is associated with vMAC, and from the perspective of VM2 nothing has been changed.
DLR Kernel module and ARP table
The DLR does not communicate with the NSX-v Controller to figure out the MAC address of VMs. Instead it sends an ARP request to the entire ESXi host VTEP’s members on that logical switch The VTEP’s that receive this ARP request forward it to all VMs on that logical switch.
In the following figure, if VM1 needs to communicate with VM2, this traffic will route inside the DLR kernel module at escomp-01a, this DLR needs to know the MAC address of VM1 and VM2. The DLR will then send an ARP request to all VTEP members on VXLAN 5002 to learn the MAC address of VM2. In addition to this, the DLR will also keep the ARP table entry for 600 seconds, which is called its aging time.
DLR and local routing
Since the DLR instance is distributed, each ESXi host has a route instance that can route traffic. When VM1 need to send traffic to VM2, theoretically both DLR in esxcomp-01a and esxcomp-01b can route the traffic as in the following figure. In NSX-v the DLR will always perform local routing for VMs traffic!
When VM1 sends a packet to VM2, the DLR in esxcomp-01a will route the traffic from VXLAN 5001 to VXLAN 5002 because VM1 has initiated the traffic.
The following illustration shows that when VM2 replies back to VM1, the DLR at esxcomp-01b will route the traffic because VM2 is near to the DLR at esxomp-01b.
Note: the actual traffic between the ESXi hosts will flow via VTEP’s.
Note: the actual traffic between the ESXi hosts will flow via VTEP’s.
Multiple Route Instances
The Distributed Logical Router (DLR) has two components, the first one is the DLR Control VM that is a virtual machine and the second one is the DLR Kernel module that runs in all ESXi hypervisor. This DLR Kernel module, which is called, route-instance has the same copy of information in each ESXi host. The Route-instance works at the kernel level. We will have at least one unique route-instance of the DLR kernel module inside the ESXi host but not limited to just on ESXi host.
The following figure shows two DLR control VMs, with the DLR Control VM1 on the right and DLR Control VM2 on the left. Each Control VM has its own route-instance in the ESXi hosts. In esxcomp-01a we have the route-instance1, which is managed by the DLR control VM1, and route-instance 2, which is managed by the Control VM2, and the same also applies to escomp-01b. The DLR instance has its own range of LIFs that it manages. The DLR control VM1 manages the LIF in VXLAN 5001 and 5002. The DLR control VM2 manages the LIF in VXLAN 5003 and 5004.
Regardless of the amount of route-instances we have inside the ESXi hosts we will have one special port called the “Logical Router Port” or “vdr Port”.
This port works like a “route in stick” concept. That means all routed traffic will pass through this port. We can think of route-instance like vrf lite because each route-instance will have its own LIFs and routing table, even the LIFs IP address can overlap with others.
In the following figure we have an example of an ESXi host with two route-instances where in route-instance-1 we have the same IP address as route-instace-2, but with a different VXLAN.
Note: Different DLRs cannot share the same VXLAN
Routing information Control Plan Update Flow
We need to understand how a route is configured and pushed from the DLR control VM to the ESXi hosts. Let’s look at the following figure to understand the flow.
Step 1: An end user configures a new DLR Control VM. This DLR will have LIFs (Logical interfaces) and a static or dynamic routing protocol peer with the NSX-v Edge Services gateway device.
Step 2: The DLR LIFs configuration information is pushed to all ESXi hosts in the cluster that have been prepared by the NSX-v platform. If more than one route instance exists, the DLR LIFs information will be sent to that instance only.
At this point VM’s in a different VXLAN (East – West traffic) can communicate with each other.
Step 3: The NSX-v Edge Services gateway (ESG) will update the DLR control VM about new routes.
Step 4: The DLR control VM will update the NSX-v controller (via UWA) with Routing Information Tables (RIBs).
Step 5: Then NSX-v controller will push RIBs to all ESXi hosts that have prepared by the NSX-v platform. If more than one route instance exists, RIBs information will send to that instance only.
Step 6: Route Instance on the ESXi host creates Forwarding Information Base (FIB) and handles the data path traffic.
The DLR Control VM is a virtual machine that is typically deployed in the Management or Edge Cluster. When the ESXi host has been prepared by the NSX-v platform, one of the VIB’s creates the control plane channel between the ESXi hosts to the NSX-v controllers. The service demon inside the ESXi host which is responsible for this channel, is called netcpad, and which is also more commonly referred to as the User World Agent (UWA).
The netcpad is responsible for communication between the NSX-v controller and ESXi host learns MAC/IP/VTEP address information, and for VXLAN communications. The communication is secured and uses SSL to communicate with NSX-v controller on the control plane. The UWA can also connect to multiple NSX-v controller instances and maintains its logs at /var/log/ netcpa.log
Another Service demon called the vShield-Statefull-Firewall is responsible for interacting with the NSX-v Manager. This service daemon receives configuration information from the NSX-v Manager to create (or delete) the DLR Control VM, create (or delete) the ESG. Beside that, this demon also performs NSX-v firewall tasks: Retrieve the DFW policy rules, gather the DFW statistics information and send them to the NSX-v Manager, send audit logs and information to the NSX-v Manager. Part of host preparation processes SSL related tasks from the NSX-v Manager.
The DLR control VM runs two VMCI sockets to the user world agents (UWA) on the ESXi host it is residing on. The first VMCI socket is to the vShield-Statefull-Firewall service daemon on the host for receiving update configuration information from the NSX-v Manager to the DLR control VM itself, and the second to netcpad for control plane access to the controllers.
The VMCI socket provides the local communication whereby the guest virtual machines can communicate to the hypervisor where they reside but cannot communicate to the other ESXi hosts.
On this basis the routing update happens in the following manner:
- Step (1) DLR Control VM learn new route information (from the dynamic routing as an example) to update the NSX-v controller,
- Step (2) the DLR will use the internal channel inside the ESXi01 host called the “Virtual Machine Communication Interface” (VMCI). VMCI will open a socket to transfer learned routes as Routing Information Base (RIB) information to the netcpa service daemon.
- Step (3) The netcpa service demon will send the RIB information to the NSX-v controller. The flow of routing information passes through the Management VMkernel interface of the ESXi host, which means that the NSX-v controllers do not need a new interface to communicate to the DLR control VM. The protocol and port used for this communication is TCP/1234.
- Step (4) NSX Controller will forward the DLR RIB to all netcpa service daemons on the ESXi host.
- Step (5) netcpa will forward the FIB’s to the DLR route instance.
DLR High Availability
The High Availability (HA) DLR Control VM allows redundancy at the VM level. The HA mode is Active/Passive where the active DLR Control VM holds the IP address, and if the active DLR Control VM fails the passive DLR Control VM will take ownership of the IP address (flip event). The DLR route-instance and the interface of the LIFs and IP address exists on the ESXi host as a kernel module and are not part of this Active/passive mode flip event.
The Active DLR Control VM sync-forwarding table to secondary DLR Control VM, if the active fails, the forwarding table will continue to run on the secondary unit until the secondary DLR will renew the adjacency with the upper router.
The HA heartbeat message is sent out through the DLR management interface. We must have L2 connectivity between the Active DLR Control VM and the Secondary DLR Control VM. IP address of Active/Passive assign automatic as /30 when we deploy HA. The default failover detection mechanism is 15 seconds but can be lowered down to 6 seconds. The heartbeat uses UDP Port 694 for its communication.
You can also verify the HA status by running the following command:
DLR HA verification command:
$ show service highavailability
$ show service highavailability connection-sync
$ show service highavailability link
Protocol Address and Forwarding Address
The Protocol address is the IP address of the DLR Control VM. This Control Plane actually establishes the OSPF or BGP peering with the ESG’s. The following figure shows OSPF as example:
The following figure shows that the DLR Forwarding Address is the IP address that uses as the next-hop for ESG’s.
DLR Control VM Firewall
The DLR Control VM can protect its Management or Uplink interfaces with the built in firewall. For any device that needs to communicate with the DLR Control VM itself we will need a firewall rule to approve it.
For example SSH to the DLR control VM or even OSPF adjacencies with the upper router will need to have a firewall rule. We can Disable/Enable the DLR Control VM firewall globally.
Note: do not confuse DLR Control VM firewall rule with NSX-v distributed firewall rule. The following image shows the firewall rule for DLR Control VM.
First step will be to create the DLR Control VM.
We need to go to Network and Security -> NSX Edges -> and click on the green + button.
Here we need to specify Logical (distributed) Router
Specify the User and Password, we can Enable SSH Access:
We need to specify where we want to place the DLR Control VM:
We need to specify the Management interfaces and Logical Interface (LIF)
Management Interface is for access with SSH to Control VM.
Lif interface needed to be configure Second Table below “Configure Interfaces of this NSX Edge”
Configure the Lif Interface’s done by connected interface to “Logical Switch” interfaces
Configure the Up-Link Transit Lif:
Configure the Web Lif:
Configure the App Lif:
Configure the DB Lif:
Summary of all DLR Lif’s:
DLR Control VM can work in High Availability mode, in our lab we will not enable H.A:
Summary of DLR configuration:
DLR Intermediate step
After completed deploying DLR, we created 4 different Lif’s.
Tranit-Network-01, Web-Tier-01, App-Tier-01, DB-Tier01
All these Lif’s are spanned over all our ESX Cluster’s.
So for example virtual machine connected to Logical Switch called “App-Tier-01” will have a default gateway of 172.16.20.1 regardless where this VM located in the DC.
DLR Intermediate step
DLR Routing verification
We can verify NSX controller receiving the DRL Lif’s IP address for each VXLAN Logical switch.
From NSX controller run this command: show control-cluster logical-routers instance all
The LR-Id “1460487505” is the internal id of the DLR control VM.
To verify all DLR Lif’s interfaces run this command: show control-cluster logical-routers interface-summary LR-Id.
In our lab:
show control-cluster logical-routers interface-summary LR-Id14604875
Configure OSPF on DLR
On the ESX Edges click on the DLR Type Logical Router
Configure OSPF on DLR
Go to Manage – > Routing -> OSPF and Click “Edit”
Type in the Protocol Address and Forwarding Address.
Do not Mark the “Enable OSPF” Check box !!!
The Protocol address is the IP address of the DLR Logical Router Control VM, this Control Plane actually establishing the OSPF peering with the NSX Edge.
The Forwarding Address is the IP address that use next-hop for NSX Edge to forward the packet to DRL:
Click on “Publish Changes”:
The results will look like this:
Go to “Global Configuration”:
Type the Default Gateway for DLR (Next hop NSX Edge):
Enable the OSPF:
Then click on “Publish the Change’s”
Go Back to “OSPF” to “Are to Interface Mapping” and add the Transit-Uplink to Area 51:
Click on “Publish Change”
Go to Route Redistribution and make sure OSPF is enabled:
Deploy NSX Edge
In our LAB we will use NSX Edge as next-hop for LDR but it can be physical router.
NSX Edge is virtual appliance offers L2, L3, perimeter firewall, load-balancing and other services such as SSL VPN, DHCP, etc.
We will use this Edge for Dynamic Routing.
Go to “NSX Edge” -> and Click on the green plus button
Select “Edge Services Gateway” fill in the Name and Hostname for this Edge.
If we would like the use redundant Edge we need to checked the “Enable High Availability”
Put your username and password:
Select the Size of the NSX Edge:
Select where to install the Edge:
Configure the Network Interfaces:
Configure the Mgmt interface:
Configure the Transit interface:
Configure Default Gateway:
Set Firewall Default policy to permit all traffic:
Summary of Edge Configuration:
Configure OSPF at NSX Edge:
Enable OSPF at “Global Configuration”:
In the “Dynamic Routing Configuration” Click “Edit”
For the “Router ID” select the interface that you have configured as the OSPF Router-ID.
Check “Enable OSPF”:
Publish and Go to “OSPF” Add Transit Network to Area 51 in the interface mapping section:
Make sure OSPF Status is in “Enabled” state and the Red button on the right is in “Disable”.
Getting the full picture
Dynamic OSPF Routing Verification
Open the Edge CLI
The Edge has OSPF neighbor adjacency with 192.168.10.3 This is the Control VM IP address.
The NSX Edge Received OSPF Routes from the DLR.
From the Edge Perspective the next-hope to DLR is the Forwarding Address 192.168.10.2
Shachar Bobrovskye, Michael Haines, Prasenjit Sarkar for contribute to this post.
Offer Nissim for reviewing this post
To find out more info what is Distributed Dynamic routing I recommend on reading two blogs of
Colleague of mine: