Friday, August 18, 2023

The Linux XFS File System - How Resilient Is It?

We are using VMWare Datastores, using NFS version 3.x.  The storage was routed, which is never a good thing to do because let's face it, if your VMs all lose their storage simultaneously, that constitutes a disaster. Having dependencies on a router, which can lose its routing prefixes due to a maintenance or configuration problem, is architecturally deficient (polite way of putting it). To solve this, you need to make sure that you don't have routing hops (storage on same segment as storage interface on hypervisor).

So, after our storage routers went AWOL due to a maintenance event, I noticed some VMs came back and appeared to be fine. They had rebooted and were at a login prompt.  Other VMs, however, did not come back, and had some nasty things printing on the console (you could not log into these VMs).


What we noticed, was that any Linux virtual machine running with XFS file system type on boot or root (/boot or /) had this issue of being unrecoverable.  VMs that were using ext3 or ext4 seemed to be able to recover and start running their services - although some were still echoing some messages to the console.

There is a lesson here. That the file system matters when it comes to resiliency in a virtualized environment.

I did some searching around for discussions on file system types, and of course there are many. This one in particular, I found interesting:  ext4-vs-xfs-vs-btrfs-vs-zfs-for-nas


No comments:

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...