During November I was the opportunity to take NSX advance bootcamp with one of brilliant PSO Architect in the NSX filed, Kevin Barrass
This blog was based on Kevin lecture, I add screenshots and my experience.
Upgrade NSX can be very easy if planned right, or very frustrating if we try to do shortcuts in the process. In this blog I will try to documents all the steps need for complete nsx-v upgrade.
High level upgrade flow:
Before start the upgrade procedure, pre upgrade steps must take under consideration:
- Read the NSX release notes.
- Check upgrade MD5 file.
- Verify the state of the NSX and vSphere infrastructure.
- Preserver the NSX infrastructure.
Read the NSX release notes:
How many times you face issue during upgrade process, waste hours of troubleshooting, sure you work exactly as guided, open support ticket and get answer: you hitting known upgrade issue and the workaround is writing in the release notes. RTFM, Filling dummy…? J
This line writing with blood, do not skip this step!!!! Read the release notes:
Compare the MD5
Download any of your favorite MD5 tools, I’m using free winMd5Sum
Compare MD5 sum you get from Calculate against VMware official MD5 web site.
The link to software:
Verify NSX working state
Again this line came from filed, the scenario is you complete the upgrade process and now facing issue. How do we you know if the issue wasn’t there before we start the upgrade?
Do not assume everything is working before you start to touch the infrastructure, Check it!!!
- Note current versions of NSX Manager, vCenter, ESXi and Edges Verify you can log into:
- NSX Manager Web UI
- vCenter and see NSX Manager in Plugin
- ESG, DLR control VM’s
- Validate VXLAN is functional:
- Ping between two VM’s on same logical switch (different hosts):
- Ping -l 1472 –f <dest VM>
- Ping between two VTEP’s (different hosts)
- Ping ++netstack=vxlan -d -s 1572 <dest VTEP IP>
- Validate North south by pinging out from a VM
- Visual inspection of Host Prep, Logical Network Prep, Edges (check for all Green)
Verify vSphere working state
Check DRS is enabled on clusters
Validate vMotion functions correctly
Check host connection state with vCenter
Check you have minimum 3 esxi host in etch NSX Cluster.
During NSX upgrade in some situation, NSX cluster with 2 hosts or less can causes issues with DRS/Admission control/Anti-Affinity rules. My recommendation to get success with upgrade process, try to work with 3 host in etch NSX cluster you plan to upgrade.
Preserve the NSX infrastructure
Do the upgrade during a maintenance window
Create a current backup of the NSX Manager, Check you know the backup password 🙂
Backup Firewall Policy:
Export the Distributed Firewall Rules and Service Composer :
Upgrade NSX manager
Verify the NSX manager OVA file name ended with tar.gz
Some browser may remove the gz extension, if the file look like:
Change it to:
Otherwise you will get error after complete uploading the OVA file to NSX manager:
“Invalid upgrade bundle file VMware-NSX-Manager-upgrade-bundle-6.0.x-xxxxx,gz, upgrade file name has extension tar.gz”
NSX manager Upgrade, Open NSX manager web interface and click on the Upgrade:
Click on the upgrade baton:
Click “Browse” and open the upgrade file, click Continue:
Note: NSX Manager will reboot during upgrade process, the forwarding path of VM workloads will not affected during this step unless:
We are using user identity with distributed firewall and new user login during NSX Manager is down.
The upgrade process built on two steps: validate the tar.gz image and start the actual upgrade process:
When NSX manager finish the validated process, the upgrade process start:
After complete upgrading Manager, Confirm the version from the Summary Tab of the NSX Manager Web UI:
Upgrade the NSX controllers
During upgrade controller nodes, the upgrade file is download to etch node, the process will start to upgrade node1, then node2 and end node3.
To start the upgrade process click on the “Upgrade Available”
During upgrade NSX controller we will face this state:
Node1: complete upgraded to 6.1
Node2: Is rebooting
Node3: In Normal state but in version 6.00
Results: we have one node active in 6.1 as conscience controller loss of Majority due to version mismatch
What does it mean? -> Impact on Control plane
Working with enable DRS live virtual enviroment, vMotion of VM can happen, VM may change is currint esxi host location, as results may face forwading issue because of other VTEP will not reflect this update.
Other issue may ocure if dynamic routing get update of topology state, for example new route add or remove. To avoid this issue we need keep routing unchange.
To limited the expose time window for forwading issue with worload VM’s my recommendation is the change the DRS setting to maual, this will limit the VM vMotion in NSX clusters durring controller update!!
Note: After compelte controller upgrade, change it back to privios configuration.
If we sure: VMs must not move, Dynamic routes must not change, then No impact on data plane
When controller node-2 complete is rebooting process, we get two controllers upgraded and on same version. At that point we gain back cluster majority, controller node-1 still need to finish his upgrade and rebooting process.
When all tree controller nodes completed the rebooting the cluster is upgrade.
During upgrade NSX clusters, esxi host required reboot, there will no impact on data plane for VM’s because thy will move automatically with DRS.
If DRS is disable, vSphere admin will need to move VM’s manually and reboot this esxi host.
This is rezone admission control with 2 hosts may prevent automatic host upgrade. My recommendation is to avoid 2 host clusters, or manually evacuate a host and put into maintenance mode.
If you have created anti-affinity rules for Controllers, 3 hosts will prevent the upgrade.
Disable anti-affinity rules by uncheck “Enable rule” for automatic hosts upgrade and enable it after upgrade complete.
With default anti-affinity rules for Edges/DLR, 2 hosts will prevent the upgrade. Uncheck the “Enable rule” anti-affinity rules for Edges to allow automatic hosts upgrade. Enable it after upgrade compete.
Click Cluster Host “Update”
If an upgrade is available to the Cluster an “Update” link is available in the NSX. When upgrade is initiated NSX Manager updates the NSX VIB on each host
Click on “update” to upgrade Cluster:
VIBs are updated on hosts
host reboot during upgrade:
Task view will reveal what happen during upgrade process run:
Once all hosts are rebooted, the host update is completed.
Upgrade DLR and ESG’s
During the upgrade process new ESG VM is deployed alongside the existing one, when the new ESG is ready, old ESG vnic are disconnected and new ESG vnics connected. The New ESG send GARP.
This process can affect forwarding plan, we can minims it with Edge working in ECMP mode.
Go to NSX Edges and Upgrade each one
Each ESG/DLR will then be upgraded
Check status is deployed and at correct version
Upgrade Guest Introspection / Data Security if required
NSX Guest Introspection / Data Security One Upgrade
If an upgrade is available to the Guest Introspection / Data Security an upgrade link is available in the NSX UI.
Click on upgrade if available
Follow NSX installation guide for specific details on upgrading Guest Introspection / Data Security.
Once upgrade is successful create new NSX Manager backup
The previous NSX Manager backup is only valid for the previous release
Don’t forget to Verify NSX working state