Saturday, January 30, 2016

vCenter Server shows ESXi host as not responding

In our environment we have two - three hosts located at different sites, and in every few days we was getting one or other host listed in vCenter inventory as “not accessible” and the VMs running on that particular host listed as disconnected in vCenter inventory. (These hosts have Esxi 5.5)
As a first step to troubleshoot this issue, tried to ping the host as well as access the VMs with success,  then connected to the host over ssh using putty and restarted the management agents and waited for some time to for host to respond on vCenter console but that didn’t happen.
Then I tried to reconnect the host to vCenter but end up with the error “cannot contact the specified host (EsxiHost0xxxx). The host may not be available on the network, a network configuration problem may exist, or the management service on this host may not be responding".When tried to connect to the host directly using vSphere client, I was able to connect to the host without any issue.
As no other host was having this issue except two- three remote hosts that means the issue is not related to vCenter server firewall/port blocking.
On checking vpxd logs I found few missed heartbeats entries as well as this kind of entries for affected host ,
As vpxd log clearly shows, this issue is related to vCenter to host connectivity and that could be due to congested network. Here what we can do as a work around to avoid this issue is, we can increase the host to vCenter heartbeat response timeout limit from 60 seconds to 120 seconds (by default Esxi host sends a heartbeat to vCenter in every 10 seconds and vCenter has time window of 60 seconds to receive it). Please remember Increasing the timeout is a short-term solution until the network issues can be resolved.

To do so, Using vSphere Client:

Connect to vCenter, Administration => vCenter Server Settings => select Advanced Settings
Now in the Key field, type: config.vpxd.heartbeat.notRespondingTimeout
In the Value field, type: 120
Click Add and then OK.
Restart the VMware vCenter Server service for changes to take effect.

Using vSphere Web Client:
Connect to vCenter Server using vSphere Web client and navigate to the vCenter Server instance
Select the Manage tab, 
and then select Advanced Settings and click on Edit, this will popup a new window,
Now in the Key field, type: config.vpxd.heartbeat.notRespondingTimeout
In the Value field, type: 120
Click Add, OK
Restart the VMware vCenter Server service for changes to take effect.

Reference: Related KB#1005757

That’s It… :)

2 comments:

  1. I tried this but now my vmware vsphere will not connect to the server. it gives me the error 503. How do i undo this or be able to connect back to the server?

    ReplyDelete
    Replies
    1. it is just to prevent the disconnect due to latency issue...you might had a different issue.

      Delete