Monday, October 1, 2018

An useful VMware learning resource for beginners

Today while reading something on one of the VMware Blogs site, came across this interesting VMware learning resource site called vSphere Central, as it seems really useful so thought of making a note of it.

This site is a good place to find the detailed information of some the important features of the vSphere Products, vCenter, ESXi and vRealize Operations manager with related configuration walkthrough.




 etc...

Hope it would be useful for others...That's it :)


Sunday, July 15, 2018

How to check and verify the I/O device firmware/driver compatibility with VMware HCL

This is something, we as a VMware admin should be aware of because in case of any related issue this is where we check if the device is supported, if supported then what capabilities have been tested as well as the detail of device driver and compatible firmware version.

In order to check about a IO device, browse to VMware Compatibility Guide site  and select I/O devices from "What are you looking for drop-down".

Or directly browse to https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io


In order to check the I/O cards detail we need to have highlighted information handy.

VID: Vendor ID
DID: Device ID
SVID: Sub-Vendor ID
SSID: Sub-Device ID

We can get this detail on an ESXi host using vmkchdev command as follows,

#vmkchdev -l |grep I/O_device_name

So, here in case of vmnic0,

VID:DID SVID:SSID
8086:100f 15ad:0750

Same is true for any other connected I/O device.

In case of hba, use

#vmkchdev -l |grep vmhba

Now use this detail on "VMware Compatibility Guide for I/O device page" to get the required detail.

Note: you may also use #vmkchdev -l | more command to find VID:DID SVID:SSID detail of all connected PCI devices or filter the information using the grep command.


That's it... :)


Sunday, July 8, 2018

How to check FC hba driver & firmware version on ESXi host

Lately while storage team was planning to upgrade the storage system OS, during initial checks they found there are some ESXi hosts in the environment having an old version of hba driver so, they sent their recommendation to upgrade the hba driver to a minimum supported version or later.

Now here is the point, while planning to upgrade the hba or any other device driver always make sure to check and upgrade the firmware of the device to a compatible version as well otherwise you might face some serious performance and related issues (better to upgrade the device driver and firmware at same time).

One can check and verify the IO devices firmware/driver compatibly and ESXi support information on VMware Compatibility Guide Site.

Now here are the steps to check the installed firmware/driver version of any connected hba device.

First check what type of hba driver is being used on the server by running one of the following cmd,

# esxcfg-scsidevs -a

Or

# esxcli storage core adapter list

The second column of the output shows the driver that is configured for the HBA.

For native hba driver, use following cmds to get the driver/firmware detail:

# /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -d

​Here you can see the names of connected HBAs, suppose they’re: vmhba0 and vmhba2


# /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -l -i vmhba0/Emulex (type the hba/vender_name correctly as it’s case sensitive)

The output of the cmd will show you the installed hba firemware & driver version.


To check the information when legacy driver is being in use.

Go to /proc/scsi directory and look for lpfc (for emulex) or qla (for qlogic) or bfa (for brocade and sometime for qlogic as well).

Now, change the directory to appropriate hba model dir, if its lpfc then:

# cd /proc/scsi/lpfc####

Where #### is the model of the Emulex hba

Run the cmd the content of this dir,

# ls -lia

Now check the files available here (with the names as a number), in case there is a file named 6 then:

# Cat 6

You would get an output similar to,

Chip Revision: Rev-D
Manufacturer: QLogic
Model Description: QLogic-###
Instance Num: 0
Serial Num: ALX0xxxxxxxx
Firmware Version: 5.4.x.x
Hardware Version: Rev-D
Bios Version: x.x.x.x
Optrom Version: x.x.x.x
Port Count: 1
WWNN: xxxxxxxxxxxxxxxxx
WWPN: xxxxxxxxxxxxxxxxxx

To quickly check the HBA Driver in use:

1. Open a console to the ESXi/ESX host.

2. Run this command to obtain the driver type that the Host Bus Adapter is currently using:

# esxcfg-scsidevs -a

Or

# esxcli storage core adapter list

Note: The second column shows the driver that is configured for the HBA.

1. Run this command to view the driver version in use:

# vmkload_mod -s HBADriver | grep Version 

For example, run this command to check the vmkata driver:

# vmkload_mod -s vmkata | grep Version 

or you may also use following cmd.

esxcli software vib list | egrep vmkata

This will show you the driver version of hba.


To obtain the driver version for all HBAs in the system:

# for a in $(esxcfg-scsidevs -a |awk '{print $2}') ;do vmkload_mod -s $a |grep -i version ;done

That's all... :)


Monday, July 2, 2018

How to check ESXi vmnic driver and firmware detail

This is something which you may need to check while troubleshooting a network card related issue on ESXi host and want to cross verify the vmnic driver / Firmware version compatibility with VMware HCL.
In order the check the required vmnic driver/firmware detail, first connect the to the desired ESXi host over ssh using Putty or you may also connect to DCUI, 

Now use following command to get detail of connected network cards,

# esxcli network nic list

There is also a legacy command to get the same information,

# esxcfg-nics -l 

Once you identified the required vmnic name, then use one of the below command to get firmware and driver detail.

 # ethtool -i vmnic_name 

Or

# esxcli network nic get -n vmnic_name

Refer to following screenshot to see the same in action, 
Here the Firmware version is listed as N/A just because the screenshot is taken from my nested lab.

That's it... :)


Sunday, June 24, 2018

ESXi upgrade, ValueError: Cannot merge VIBs...

Recently while upgrading one of the VMhost from ESXi 6.0 U2 to U3 using VMware update manager, one of my colleague first encountered following error during remediation,

“Cannot execute upgrade script on host”

And when he tried to upgrade the VMhost by manually booting it up using the HPE custom ESXi 6.0 U3 image, he got stuck with following error,


On checking /var/log/vua.log file on host, found similar entries there.

Here it looks like the Intel_bootbak_intelcim_provider_0.5-3.3 has been released more than once, with different sizes.

So, here to fix this upgrade issue we need to remove the conflicting intelcim_provider vib. Before trying to remove it, make sure the CIM Server service is in stopped state otherwise you would get an error like, "Can't remove....device or resource busy" while trying to remove it.

Sometimes it may look like in Stopped state however as set to Start and Stop with Host, you would get an error when try to remove it. In such case, set this Service to start and Stop manually and once done, then you would be able to remove the intelcim_provider vib.

Refer to following screenshots for steps about how to Stop the CIM Server service or change its Startup type.


Now, you can remove the conflicting vib using the following cmd after connecting to host over ssh using putty:

esxcli software vib remove -n intelcim-provider  (A reboot may be required, check the cmd output reboot required parameter value)

Once done now you can re-run the ESXi upgrade again and hopefully this time it will get through as well as install the correct version of intelcim-provider.

Note: Don't forget to change the Startup type of CIM Server service post ESXi upgared/host reboot.

That's it... :)


Wednesday, May 9, 2018

Inconsistent LUN mapping related issues on ESXi hosts

Lately came across this issue, where for some reason storage team unmapped and re-mapped few RDM LUNs to VM host group (from Storage array side) and now the respective RDM disks connected to VMs get disappeared.
We had already re-scanned the hosts to storage change and Luns were showing as mounted on all the hosts and after spending two hours with VMware support we had also rebooted the host but that didn't make any difference.

Finally when we rebooted the cluster nodes then I found this has something to do with consistent mapping of rdm Luns across VM hosts (where the cluster nodes residing).

In order to check if a LUN is consistently mapped on all VM hosts in cluster, one need to have a look at Lun's canonical name's (naa.id) corresponding vmd.id

One can check the naa.id's corresponding vml.id by running following cmd  on host (over ssh, using putty),
esxcli storage core device list -d naa.id

So, if the naa.id is naa.60060480000190104063533030353445 then the command would be,

esxcli storage core device list -d naa.60060480000190104063533030353445


For example,  vml.02000500006006048000019010406353303035344553594d4d4554

One need to look at the fifth and sixth digits (see highlighted) of vml.id, this is hexadecimal number which represents the LUN number. On converting to decimal it should match to actual Lun number.

Now to fix this issue what we can do is,  remove the affected RDM disk from the both nodes and then delete the RDM pointer file from Datastore (this doesn’t affect your actual data on LUN). Now after re-scanning the hosts for Datastores, re-add the LUN as RDM drive on both nodes. Now you would be able to power on the affected node.

If due to any reason above doesn’t work then as above after removing the affected RDM drives from both nodes, follow these steps,
  1. Note the NAA_ID of the LUN.
  2. Detach RDM using vSphere client.
  3. Un-present  the LUN from host on storage array. 
  4. Rescan host storage. 
  5. Remove LUN from detached list using these commands:

    #esxcli storage core device detached list
    #esxcli storage core device detached remove -d naa.id
  6. Rescan the host storage. 
  7. Re-present LUN to host. 
  8. Now again rescan the hosts for datastores
Now cross check the vml.id on hosts and it should be same and after adding the RDM drive on nodes you will be able to power on the VM nodes.

Note: If the LUN has been flagged as perennially reserved, this can prevent the removal from succeeding and step 5 would fail.

Run this command to remove the flag:

#esxcli storage core device setconfig -d naa.id --perennially-reserved=false

Now the command to remove the device should work.

# esxcli storage core device detached remove -d naa.id


I had faced a related issue in past and discussed about that in following post, 

After unexpacted host reboot, Powering on a RDM attached virtual machine fails with the error: Incompatible device backing specified for device '0

That's it... :)


Friday, April 6, 2018

AWS Public IP vs Elastic IP and how can we assign one to EC2 instance

In my previous post I mentioned that in order to make an EC2 instance internet accessible, it should have a Public or Elastic IP assigned. Here anyone new to AWS may wonder that, what is this Elastic IP and How its different from Public IP.

In this post we will discuss about the similarities and differences between these two and how one can assign a Public or Elastic IP to an EC2 instance.

If you are looking from a functional point of view then they both are publicly routable IP addresses and can be used to connect your instance to internet but are different how they persist and the way you can assign one to your instance.

public IP address is assigned to your instance from Amazon's pool of public IPv4 addresses, and is not associated with your AWS account. When a public IP address is disassociated from your instance, it is released back into the public IPv4 address pool, and you cannot reuse it.

If we put it in simple word then, Public IP addresses are dynamic, which means if you stop/start your instance you get reassigned a new public IP however it would persist if you just reboot the ec2 instance.

Public IP addresses are free and you will not be charged anything for using them.

An Elastic IP address is a static public IPv4 address, designed for dynamic cloud computing. If your instance does not have a public IPv4 address, you can associate an Elastic IP address with your instance to enable communication with the internet.

An Elastic IP address is associated with your AWS account and with it, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account.
While your instance is running, you are not charged for one Elastic IP address associated with the instance, but you are charged for any additional Elastic IP addresses which is not in use.

Now let’s look at the difference between these two IP types.

1. Elastic IPs are assigned to AWS accounts which you can attach to instances. Public IPs assigned to instances directly.
2. You cannot manually attach or detach public IP from the instance. It’s auto allocated from the pool. Elastic IP can be manually attached and detach from the instance.

3. When an instance is stopped and started again, public IP gets changed. But if the instance is assigned with elastic IP, it will remain the same even if the instance is stopped and started again.

4. If elastic IP is allocated to your account and not in use then you will be charged for it on an hourly basis.

5. Public IP released once your instance is stopped so no question of getting charged for not using it.

6. You won’t be able to re-use same public IP since its allocated from free IP pool. You can always re-use, re-attach elastic IP to other instance when it is released from current instance.

7. You can have maximum 5 elastic IP to your account per region. But, you can have as many public IPs as EC2 instances you spin up.

8. You can have either of them for an instance. If you assign elastic IP to instance then its currently assigned public IP will be released to the free pool.

How would be assign Public or Elastic IP to an instance: 

Public IP: It can be assigned to an instance only during the instance creation time and there are two ways of doing that.

  • Edit the Subnet setting and enable Auto-assign Public IP to any EC2 instanced launched in this Subnet.

To do so, From AWS Console => Under Networking & Content Delivery, Select VPC => Now Click on Subnet Tab => Select the intended Subnet and either right Click or from Actions => Select Modify auto-assign IP address

That will open the following Modify auto-assign IP address pop-up, now as shown enable auto-assigning Public IP address.

Now any EC2 instance which would be launched in this subnet would have a public IP assigned.

  • You can also assign the Public IP during EC2 instance launch time, you can also alter the default public IP assignment in a subnet from here.

Whatever you select here would over right the default IP assignment settings.

Elastic IP: We need to allocate the Elastic IP address to our AWS account before making use of it.

You can go to Elascic IP windows either from EC2 instance or VPC Dashboard, once you are there then => Select Elastic IP address => Allocate new address


There is not much to discuss here, once you would click on allocate in next screen, It would allocate you an Elastic IP.

Now if you want to assign this IP to any instance, just select it and either click on Actions or Right click on it and Select Allocate address.


That would open the following Associate Address window, from here you can select the intended EC2 instance or specific network interface.


Once select the intended instance, the elastic IP would get associated with the selected instance.

Note: As mentioned in above screenshot, if you associate an Elastic IP address to an EC2 instance which already has a public IP assigned, the public IP is released.

That's it ... :)