Deploying NSX-V controller failed and disappear from vSphere client

One of the following issues hit during the deployment of the NSX-v Controller cluster may cause the deployment to fail and the deletion after few minutes of the instantiated Controller nodes.

  1. Firewall blocking Controller communication with NSX Manager.
  2. Network Connectivity between NSX Manager and Controllers.
  3. DNS/NTP misconfiguration between NSX Manager/vCenter/ESXi hosts.
  4. Lack of available resources, like disk space, in the Datastore utilized for the deployment of the Controllers.

The first area to investigate is the “Task Console” on vCenter. From an analysis of the entries displayed on the console, it is clear that first the Controller virtual machine is “powered on”, but then it gets powered off and deleted. But why?

 

View vCenter Tasks

View vCenter Tasks

 

Troubleshooting step:

  • Download the NSX manager logs.
  • Right click on the upper right corner of the NSX Manager GUI and choose “Download Tech Support Log”.
Download NSX Manager Logs

Download NSX Manager Logs

 

The Tech support file can be a very large text file, so finding an issue is as challenging as looking for a needle in a pile of hay.  What to look for?

My best advice is to start with something we know, the name of the Controller node that was first instantiated and then deleted. This name was assigned to the Controller node after the completion of the deployment wizard.

In my specific example it was “controller-2”.

Open the text file and search for this name:

Search in Tech Support File

Search in Tech Support File

 

When you find the name try to use the arrow down key and start to read:

NSX Tech Support file

NSX Tech Support file

 

From this error we can learn we have connectivity issues; it appears that if the Controller node can’t connect to NSX Manager during the deploying process, it will get automatically deleted.

The next question is: why do I have connectivity issues? In my case the NSX Controller and the NSX Manager run in the same IP subnet.

The answer is found in the manual Static IP pool object that was created for the Controller cluster.

In this lab I work with subnet class B 255.255.0.0 = prefix of 16, but in the object pool I mistakenly assigned a prefix length of 24.

 

Wrong IP Pool

Wrong IP Pool

 

This was just an example on how to troubleshoot an NSX-v Controller node deployment but there may be other reasons that can cause a similar problem.

  • Firewall block Controller to talk NSX Manager.
  • Network Connectivity between NSX Manager and Controllers.
  • Make sure NSX Manager/vCenter/ESXi hosts have DNS/NTP configured
  • Make sure you have available resource like disk space in the Datastore you deploying the controllers.

Posted in Controller, Install, Troubleshooting Tagged with: , , ,

Leave a Reply