Friday, February 08, 2008

Mindy RAID!

So, I set about this weekend doing a long-needed upgrade to Mindy ... She was given the gift of
  • 3ware 9650SE 8-port SATA-II RAID card
  • 2 additional 500Gb Seagate Barracuda drives (well 1 for now, the other still contains my backup)
  • Replacing OpenSUSE 10.2 32bit with Ubuntu Server 7.10 64bit

Now, I picked the 3ware card because of the good reviews and good support for Linux. Also, I was looking for a reliable way to do RAID-5 on the system. A friend of mine scared me because he had a 50% failure rate on the same Western Digital drives I was using. (I'm still a WD fan, just not of these specific drives).

I assumed that when I launched into this project on Friday night it would be just a matter of bringing down the old system, backing up the VMs, installing the new hardware, reinstalling the software and off to the races.

Well, not quite ...

My first attempt seemed to be a complete failure. The I/O performance I was getting was terrible. It took 5 hours to copy 100Gb of virtual hard disks to the internal drive from a USB 2.0 external drive. In that same 5 hours before, I was able to copy all 500Gb of virtual hard disks. In addition, it appeared that the CPU utilization was tremendously high, so I knew something wasn't quite right.

It would appear that my first pass was a failure for the following reasons (though I'm not 100% sure on this):

  • I had to se my motherboard's video settings to use only the integrated video (my RAID card is PCI Express 4x)
  • Somehow in trying to enable write caching, the array had been left in a weird state (the 3ware BIOS Manager keep saying INIT ARRAY ... whatever that means)
  • I had used the wrong block size (64k instead of 256k)

So, after about 18 hours of copying data and mucking about, I decided it was time to cut my losses and try again, so I burned down the array and recreated, this time enabling write caching and setting a 256k block size on the array.

This time, I seemed to have much better luck. I was able to copy the VMs at near correct speed (hour for the 100Gb of data) or something like that anyway and I could do other things while the data was copying too. So it looked much more promising this time.

I finally got everything loaded and was ready to start copying my media (220Gb worth) from the virtual hard disk stored on my external drive to a newly minted 500Gb virtual drive on the array. Both drives were attached to Pesto my Ubuntu 7.0.4 32-bit server.

This process took about 14 hours. Definitely not right, but hey, at least I could copy things, so it was progress. Unfortunately, the interactive performance of the system was still abysmal. After booting Bobby, my Windows XP VM, it was horrendously sluggish and it seemed that the drive array was in constant motion, something I hadn't seen in the old system.

After whining to 3ware and mucking about with system tuning, I realized 2 things:

  • I had VMware set to allow VMs to be swapped out
  • I had the "swappiness" of the Linux VM system too high ... or more specific, I needed to instruct the kernel not to swap out until absolutely necessary by adding the following line to /etc/syscontl.conf: "vm.swappiness = 0"
That seemed to help, but I was still seeing unacceptable file transfer times from my Linux server VM to any machine on the network. It got so bad at one point that it would take almost 4 hours to transfer a 2.2Gb ISO image from the VM to my local desktop.More performance tuning ensued. This time, I took down all VMs and started with just the base system. Using that, the same 2.2Gb ISO file copied in 10 minutes ... much more like it. So it appeared that the problem wasn't with the hardware (thank goodness) and it wasn't with the base install, but was in the VM itself.So, I took a stab and started rebuilding the VM. Though instead of doing a rebuild and reinstall, I just rebuilt the .vmx file. And that seemed to work. On restarting the VM, transfers were back up to speed and the I/O wait states seemed to be much smaller now. ("iostat -d 2 -x" is your friend).
Once I get a chance to figure out what's different between the old and new .vmx files, I'll post what I found.  Or if I can't figure out what's changed I'll post the whole file to see if anyone can tell me what's the deal.
I really hope that this is the last issue I have. I'd like to add the 4th drive into the array and use the additional 400Gb of storage space to keep my lossless audio collection online.

No comments: