We are using VMWare Datastores, using NFS version 3.x. The storage was routed, which is never a good thing to do because let's face it, if your VMs all lose their storage simultaneously, that constitutes a disaster. Having dependencies on a router, which can lose its routing prefixes due to a maintenance or configuration problem, is architecturally deficient (polite way of putting it). To solve this, you need to make sure that you don't have routing hops (storage on same segment as storage interface on hypervisor).
So, after our storage routers went AWOL due to a maintenance event, I noticed some VMs came back and appeared to be fine. They had rebooted and were at a login prompt. Other VMs, however, did not come back, and had some nasty things printing on the console (you could not log into these VMs).
What we noticed, was that any Linux virtual machine running with XFS file system type on boot or root (/boot or /) had this issue of being unrecoverable. VMs that were using ext3 or ext4 seemed to be able to recover and start running their services - although some were still echoing some messages to the console.
There is a lesson here. That the file system matters when it comes to resiliency in a virtualized environment.
I did some searching around for discussions on file system types, and of course there are many. This one in particular, I found interesting: ext4-vs-xfs-vs-btrfs-vs-zfs-for-nas
No comments:
Post a Comment