LinScsi: SCSILinuxAbortCommands: 1843: Failed, Driver MPT SAS Host, for vmhba2
Some of the VMs hosted on VMware ESXi 5.5 (DELL PowerEdge T310 host) occasionally become unresponsive.
Logging in to the host via VMware vSphere Client fails with following error:
Could Not Connect
vSphere Client could not connect to "10.5.5.25". An unknown connection error occurred. (The request failed because the remote server took too long to respond. (The operation has timed out))
VMware console vmkernel system logs (ALT + F12) display following warnings:
mptscsih: ioc0: task abort: FAILED (sc=0x412e8182ecc0)
WARNING: LinScsi: SCSILinuxAbortCommands: 1843: Failed, Driver MPT SAS Host, for vmhba2
WARMING : ScsiPath: 6292: Set retry timeout for failed TaskMgmt abort for CmdSN 0x0, status
mptscsih: ¡oc0: attempting task abort! (sc=0x412e81718200)
The vmbha2 refers to a RAID1 SATA array that had all affected virtual machines hosted.
A temporary fix was to reboot the VMware host server. However, the same issue would occur few days or few weeks later.
DELL RAID controller and SATA hard drives didn't display any issues. Upgrading DELL motherboard and RAID controller BIOS didn't help. Installing the latest VMware ESXi patch (U3b) also didn't make any difference.
After some research I found that similar issues can be caused by Interrupt Remapping used by VMware ESXi. In theory it should improve performance, but it can be incompatible with certain hardware configurations. To fix the issue I had to disable the Interrupt Remapping: