Wednesday, October 17, 2018

VMware Update manager connection / plug-in error and how to fix it

Recently for some reason we had to remove and then re-join our VMware update manager server to domain (here you might have a question, why but that’s a different story), once the server re-joined then when tried to connect to our vCenter server, got following pop-up,

Note: this is vCenter 6.0

We tried to enable the Update manager plug-in but no avail.

And when checked from the Web client and tried to connect to Update manager got the following similar error message.
or

So, here it seems like domain re-join of VMware update manager broke the link between the Update manager and vCenter and now the question is how we would fix that.

To fix these errors, we need to re-validate the VMware Update manager configurations and to do that,
  • First Stop the VMware Update manager Service, then
  • Go to Update Manager installation directory, C:\Program Files (x86)\VMware\Infrastructure\Update Manager
  • Now find and open the vci-integrity.xml file and look for the “<vpxdLocation>” tag and verify the vCenter connection URL / IP detail.


In my case, here my vCenter IP is mentioned correctly but not the port detail, notice the use of http/https.
  • Change the port number mentioned here to 80 and now the URL should look like,


Now Start the VMware Update Manager Service

Now you should be able to enable the VUM plug-in in #C client / access the VMware Update manager from web client.

That’s it… 😊


An useful VMware learning resource for beginners

Today while reading something on one of the VMware Blogs site, came across this interesting VMware learning resource site called vSphere Central, as it seems really useful so thought of making a note of it.

This site is a good place to find the detailed information of some the important features of the vSphere Products, vCenter, ESXi and vRealize Operations manager with related configuration walkthrough.



etc...

Hope it would be useful for others...That's it :)


Sunday, July 15, 2018

How to check and verify the I/O device firmware/driver compatibility with VMware HCL

This is something, we as a VMware admin should be aware of because in case of any related issue this is where we check if the device is supported, if supported then what capabilities have been tested as well as the detail of device driver and compatible firmware version.

In order to check about a IO device, browse to VMware Compatibility Guide site  and select I/O devices from "What are you looking for drop-down".

Or directly browse to https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io


In order to check the I/O cards detail we need to have highlighted information handy.

VID: Vendor ID
DID: Device ID
SVID: Sub-Vendor ID
SSID: Sub-Device ID

We can get this detail on an ESXi host using vmkchdev command as follows,

#vmkchdev -l |grep I/O_device_name

So, here in case of vmnic0,

VID:DID SVID:SSID
8086:100f 15ad:0750

Same is true for any other connected I/O device.

In case of hba, use

#vmkchdev -l |grep vmhba

Now use this detail on "VMware Compatibility Guide for I/O device page" to get the required detail.

Note: you may also use #vmkchdev -l | more command to find VID:DID SVID:SSID detail of all connected PCI devices or filter the information using the grep command.


That's it... :)


Sunday, July 8, 2018

How to check FC hba driver & firmware version on ESXi host

Lately while storage team was planning to upgrade the storage system OS, during initial checks they found there are some ESXi hosts in the environment having an old version of hba driver so, they sent their recommendation to upgrade the hba driver to a minimum supported version or later.

Now here is the point, while planning to upgrade the hba or any other device driver always make sure to check and upgrade the firmware of the device to a compatible version as well otherwise you might face some serious performance and related issues (better to upgrade the device driver and firmware at same time).

One can check and verify the IO devices firmware/driver compatibly and ESXi support information on VMware Compatibility Guide Site.

Now here are the steps to check the installed firmware/driver version of any connected hba device.

First check what type of hba driver is being used on the server by running one of the following cmd,

# esxcfg-scsidevs -a

Or

# esxcli storage core adapter list

The second column of the output shows the driver that is configured for the HBA.

For native hba driver, use following cmds to get the driver/firmware detail:

# /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -d

​Here you can see the names of connected HBAs, suppose they’re: vmhba0 and vmhba2


# /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -l -i vmhba0/Emulex (type the hba/vender_name correctly as it’s case sensitive)

The output of the cmd will show you the installed hba firemware & driver version.


To check the information when legacy driver is being in use.

Go to /proc/scsi directory and look for lpfc (for emulex) or qla (for qlogic) or bfa (for brocade and sometime for qlogic as well).

Now, change the directory to appropriate hba model dir, if its lpfc then:

# cd /proc/scsi/lpfc####

Where #### is the model of the Emulex hba

Run the cmd the content of this dir,

# ls -lia

Now check the files available here (with the names as a number), in case there is a file named 6 then:

# Cat 6

You would get an output similar to,

Chip Revision: Rev-D
Manufacturer: QLogic
Model Description: QLogic-###
Instance Num: 0
Serial Num: ALX0xxxxxxxx
Firmware Version: 5.4.x.x
Hardware Version: Rev-D
Bios Version: x.x.x.x
Optrom Version: x.x.x.x
Port Count: 1
WWNN: xxxxxxxxxxxxxxxxx
WWPN: xxxxxxxxxxxxxxxxxx

To quickly check the HBA Driver in use:

1. Open a console to the ESXi/ESX host.

2. Run this command to obtain the driver type that the Host Bus Adapter is currently using:

# esxcfg-scsidevs -a

Or

# esxcli storage core adapter list

Note: The second column shows the driver that is configured for the HBA.

1. Run this command to view the driver version in use:

# vmkload_mod -s HBADriver | grep Version 

For example, run this command to check the vmkata driver:

# vmkload_mod -s vmkata | grep Version 

or you may also use following cmd.

esxcli software vib list | egrep vmkata

This will show you the driver version of hba.


To obtain the driver version for all HBAs in the system:

# for a in $(esxcfg-scsidevs -a |awk '{print $2}') ;do vmkload_mod -s $a |grep -i version ;done

That's all... :)


Monday, July 2, 2018

How to check ESXi vmnic driver and firmware detail

This is something which you may need to check while troubleshooting a network card related issue on ESXi host and want to cross verify the vmnic driver / Firmware version compatibility with VMware HCL.
In order the check the required vmnic driver/firmware detail, first connect the to the desired ESXi host over ssh using Putty or you may also connect to DCUI, 

Now use following command to get detail of connected network cards,

# esxcli network nic list

There is also a legacy command to get the same information,

# esxcfg-nics -l 

Once you identified the required vmnic name, then use one of the below command to get firmware and driver detail.

 # ethtool -i vmnic_name 

Or

# esxcli network nic get -n vmnic_name

Refer to following screenshot to see the same in action, 
Here the Firmware version is listed as N/A just because the screenshot is taken from my nested lab.

That's it... :)


Sunday, June 24, 2018

ESXi upgrade, ValueError: Cannot merge VIBs...

Recently while upgrading one of the VMhost from ESXi 6.0 U2 to U3 using VMware update manager, one of my colleague first encountered following error during remediation,

“Cannot execute upgrade script on host”

And when he tried to upgrade the VMhost by manually booting it up using the HPE custom ESXi 6.0 U3 image, he got stuck with following error,


On checking /var/log/vua.log file on host, found similar entries there.

Here it looks like the Intel_bootbak_intelcim_provider_0.5-3.3 has been released more than once, with different sizes.

So, here to fix this upgrade issue we need to remove the conflicting intelcim_provider vib. Before trying to remove it, make sure the CIM Server service is in stopped state otherwise you would get an error like, "Can't remove....device or resource busy" while trying to remove it.

Sometimes it may look like in Stopped state however as set to Start and Stop with Host, you would get an error when try to remove it. In such case, set this Service to start and Stop manually and once done, then you would be able to remove the intelcim_provider vib.

Refer to following screenshots for steps about how to Stop the CIM Server service or change its Startup type.


Now, you can remove the conflicting vib using the following cmd after connecting to host over ssh using putty:

esxcli software vib remove -n intelcim-provider  (A reboot may be required, check the cmd output reboot required parameter value)

Once done now you can re-run the ESXi upgrade again and hopefully this time it will get through as well as install the correct version of intelcim-provider.

Note: Don't forget to change the Startup type of CIM Server service post ESXi upgared/host reboot.

That's it... :)


Wednesday, May 9, 2018

Inconsistent LUN mapping related issues on ESXi hosts

Lately came across this issue, where for some reason storage team unmapped and re-mapped few RDM LUNs to VM host group (from Storage array side) and now the respective RDM disks connected to VMs get disappeared.
We had already re-scanned the hosts to storage change and Luns were showing as mounted on all the hosts and after spending two hours with VMware support we had also rebooted the host but that didn't make any difference.

Finally when we rebooted the cluster nodes then I found this has something to do with consistent mapping of rdm Luns across VM hosts (where the cluster nodes residing).

In order to check if a LUN is consistently mapped on all VM hosts in cluster, one need to have a look at Lun's canonical name's (naa.id) corresponding vmd.id

One can check the naa.id's corresponding vml.id by running following cmd  on host (over ssh, using putty),
esxcli storage core device list -d naa.id

So, if the naa.id is naa.60060480000190104063533030353445 then the command would be,

esxcli storage core device list -d naa.60060480000190104063533030353445


For example,  vml.02000500006006048000019010406353303035344553594d4d4554

One need to look at the fifth and sixth digits (see highlighted) of vml.id, this is hexadecimal number which represents the LUN number. On converting to decimal it should match to actual Lun number.

Now to fix this issue what we can do is,  remove the affected RDM disk from the both nodes and then delete the RDM pointer file from Datastore (this doesn’t affect your actual data on LUN). Now after re-scanning the hosts for Datastores, re-add the LUN as RDM drive on both nodes. Now you would be able to power on the affected node.

If due to any reason above doesn’t work then as above after removing the affected RDM drives from both nodes, follow these steps,
  1. Note the NAA_ID of the LUN.
  2. Detach RDM using vSphere client.
  3. Un-present  the LUN from host on storage array. 
  4. Rescan host storage. 
  5. Remove LUN from detached list using these commands:

    #esxcli storage core device detached list
    #esxcli storage core device detached remove -d naa.id
  6. Rescan the host storage. 
  7. Re-present LUN to host. 
  8. Now again rescan the hosts for datastores
Now cross check the vml.id on hosts and it should be same and after adding the RDM drive on nodes you will be able to power on the VM nodes.

Note: If the LUN has been flagged as perennially reserved, this can prevent the removal from succeeding and step 5 would fail.

Run this command to remove the flag:

#esxcli storage core device setconfig -d naa.id --perennially-reserved=false

Now the command to remove the device should work.

# esxcli storage core device detached remove -d naa.id


I had faced a related issue in past and discussed about that in following post, 

After unexpacted host reboot, Powering on a RDM attached virtual machine fails with the error: Incompatible device backing specified for device '0

That's it... :)