We drink our own Kool-Aid at Puryear IT. I was reminded of this the other day when, suddenly, our NAS went offline because a disk in the RAID-5 set had failed. The NAS was attempting to rebuild, but didn’t have a spare disk, and.. well, things were fun after that. In the process of a few reboots and attempted rebuilds, we royally damaged one of the servers that was hosted on that NAS.
As an aside: The server is a Virtual Machine (VM) that runs under VMware which is itself connected to the NAS via iSCSI to access the VM. This is a setup we commonly use at customer sites, and one we use ourselves. It’s amazing how running the same setup you use at other sites helps you troubleshoot their problems.
So, getting back to the original issue, we had corrupted the server that runs as a VM. It would boot, but the AD NETLOGON service would not start and the applications running on the system were failing. So we wiped the VM out. No use in wasting our time troubleshooting that server. Why? Because we also use AppAssure to perform hourly disk images of our critical internal systems. Again, we provide AppAssure to our customers under an MSP license (we simply rent the license out as a monthly subscription).
With AppAssure, we were able to restore from backup to a new VM image. Sure enough, the E: in the image wasn’t working properly. We had used the most recent backup for the restore, and my guess is that a snapshot had occurred when the drive corruption happened, so it got put into the backup. No problem. We went to an earlier backup (again, the snapshot runs once an hour) and then had AppAssure restore the E: contents rather than do a complete new VM.
Whew! Long story short: It worked and we were able to get the server back online quickly once we decided to devote time to it. All in all, the actual restore/rebuild process for that server took a few man-hours of time. Not bad at all considering the critical nature of that system and how painful it would have been to restore the server from a data-only backup.
One of our customer’s will most definitely experience this very problem in the next few months. It’s a given. So having not only standard training but actual experience with this type of restore, using the very software we offer our customers, is important.