Friday, April 29, 2011

The little things...

One of the things that has annoyed me to no end is Chrome's theft of
window focus when you click on a link in XChat or some other
non-Chrome application.  It's not Chrome's fault.

Because I pull Twitter and Identi.ca into XChat (via Bip and Bitlbee), I
normally have a bunch of links waiting for me when I open my chat
client. Any attempt at opening more than one link at a time causes
frustration because you have to jump back to the XChat window to click
the next one. It adds a lot of mouse movement and extra clicks to
refocus.

In any case, I think I found the solution at about 4 a.m. this morning
(at the cost of a few spouse points). By opening gconf-editor and
de-selecting /apps/metacity/general/raise_on_click, Chrome no longer
jumps to the front when an URL is clicked on in XChat.

Please note that this isn't without an annoying side effect. You can
no longer click anywhere in a window to bring it to the front. You
must either click on the title bar at the top of the window, the
application's button in the window list, or use Alt-Tab to jump to the
desired app. It's something that I can live with, though.

Sunday, April 24, 2011

Lessons learned for ESXi home users

With apologies for rambling, the following is a collection of “lessons
learned”, garnered over the last two years while employing ESXi 4.x in
a home network.

Online file storage

NFS is extremely slow when run in a VM (think FreeNAS in a VM). Only
thing slower is connecting USB storage to a VM. Other storage
protocols are tolerable but they are noticeably slower. Note: this is
not to say that it should never be done. Sometimes it's unavoidable.
For us home users, it’s not uncommon to have FreeNAS running in a VM.

You get what you pay for

When you build our a server, it's better to use a new system than
reuse an existing one. However, the you-get-what-you-pay-for rule
applies. Buying the low end vanilla box, and adding non-standard
drivers, is likely to end in pain and sadness.  My HP a1540n has
chugged along for 2 years without complaint.  The eMachines
EL-1352-07e died a noisy death, involving a number of lockups and
PSODs.

The idea had been that I could replace the 5+ year old machine with
one with equitable specs and smaller size.  I'd save $700 and have a
server that was small enough that it could travel to conferences.  I'm
hoping that it's just a power supply issue.

Make backups

Always (always!) make a backup before making any changes. This even
applies to simple patching and updates. It especially applies if you
experimenting with software, even more so if that software is
packaged. All it takes is one wrong dependency and some of your
installed software either disappears or ceases to function. Making a
backup is easy, though it may take a little time. A 100 GB SATA
disk-to-SATA disk backup can take about 90 minutes to create but it's
less time than having to recover or rebuild inadvertently destroyed
data.

Don’t use snapshots

Snapshots should never be used in production environments. Snapshots
can cause your VM to run slower, especially when you have multiple
large snapshots.  I'm of the belief that snapshots can remove any
speed advantage gained by using paravirtualization.

If you use snapshots and need to export a VM for any reason, there's
extra work involved in merging all snapshots back into their parents.
There are no tools, outside of the vSphere client, that handle ESXi
snapshots.  You need the flat file before you can export the VM to
some other hypervisor.  This is done by using the scary sounding
"Delete All" button in the snapshot manager.  What it actually does is
merge all snapshots back into the core disk, by merging snapshot #3
into #2, #2 into #1, and then #1 into the core.  For large VMs,
merging can require an obscene amount of storage (a couple TB of
storage can be consumed quickly).

Use scratch VMs

Always install software in a test VM before installing the software in
a production VM, especially when handling packaged software with
dependencies.  You never know what you'll break.  Example:
KnowledgeTree requires the Zend server package to provide PHP vice the
standard PHP package.  Installing anything that requires the normal
PHP package breaks KnowledgeTree.  (Note: this is also a support for
the "Make backups" recommendation.)

Know your tools

Finally, become familiar with your tools before you need them.  Think
of it as continuity planning.  It minimizes anxiety.  If you're having
to look for tools to handle a problem, after the problem has already
occurred, you'll probably use the first cheesy tool that you can find,
vice the proper tool.

Hope this helps.

Monday, April 18, 2011

Learning about low-end systems, the hard way

Apologies for the delayed (and rambling) update.  Have been very busy.  Following is an update on the experiments with installing various virtualization technologies.  The common theme is: the video card on the older box isn't recognized by any of the software installs.  I believe this to be associated with the removal of the frame buffer as a default device on many install disks.  Ubuntu is only now adding it back.

The issue with Proxmox 1.7 turned out to be the video driver. The built-in video on the motherboard wasn't recognized by Proxmox. I got around this by putting the hard drive in a newer computer (have I said that I really like BlacX?), installing there, and moving the drive back to the original computer.

CentOS 5.5 just doesn't like my boxes, either of them. The install (net or DVD based) completes successfully but, upon reboot, hangs when udev starts up. I'm probably missing a boot option or two. Again, it's more work than I care to do at this point.

XenServer 5.6.1 installs nicely on the older hardware. One drawback is that the official management program (XenCenter) requires Windows to run. A decent alternative appears to be Open XenCenter. If I end up using this, I'll need to figure out how to load ISOs onto the server as there's no upload tool like what vSphere has.

Which brings me to a side topic: management software. One of the drawbacks for most commercial hypervisors is that you need Windows to run some sort of management software. For an all-Unix shop, this can have drastic affects on production networks (think required infrastructure to support that one Windows box). Fortunately, a number of non-Windows management pieces are available:

solutionadvantagesdisadvantages
home grown - easy to customize - must be customized for each install
- extremely limited feature set without a large investment of time
vSphere - I'm familiar with it - requires a Windows box
- requires moderately powerful hardware
XenCenter - similar to vSphere in function - requires a Windows box
Open XenCenter - doesn't require Windows - somewhat limited feature set

What each needs most:

solutionfeature
vSphere A non-Windows version of vSphere
XenCenter A non-Windows version of XenCenter
Open XenCenter A built-in means for uploading ISOs into local storage

The delay in posting was mostly caused by a hardware failure.  I'd been wanting to move the house ESXi server off of the main box and run it on a smaller system.  For this purpose, I had purchased an eMachines EL-1352-07e.  It's a 64-bit dual core AMD system with 4GB of memory and a 300 gig hard drive.  I successfully modified the ESXi install disk (I'm getting good at this) and had moved the VMs onto the new server.

To be on the safe side, I didn't erase anything from the old server, deciding to run the new server for 3 days, just in case of a failure.  Three days went by without a hick-up, so I downloaded and installed Fedora 14 with the idea that I would experiment with KVM.  That's when karma stepped in.  When I attempted to connect to the new server with the vSphere client, the connection would time out.  Checking the console, I discovered that it was frozen.  

My only recourse was to hold the power button in to trigger a hard reboot.  The system returned to normal operation.  About two hours after that, the console froze again.  Then again, after about 30 minutes.  This time, the system complained about a corrupted file system and PSOD'd.

After a couple hours of panic (I'd erased the old server, the new one had a bad file system, and the last backup was done over a month ago), I remembered that ESXi sets up a number partitions on the hard drive (the OS is separate from the datastore). I started researching what could be done to pull the VMs off of the corrupted disk.

The short version of a 2-week long story is that the VMs are now running on the old server, without any loss of data.  The month-old backup was not needed.  I'd discovered a number of tools which aided in the recovery or just made things interesting:
  • vmfs-fuse, part of vmfs-tools allows you to mount VMFS formatted hard drives under Linux
  • qemu-img allows you to convert VMs to other formats (not just qemu's)
  • vde provides a distributed (even over the Internet) soft switch
For awhile, I had the VMs running under KVM on my workstation. VMFS-fuse allowed me to mount the original data stores and qemu-img allowed me to convert the VMs to QCOW2 format. However, qemu-img could not include ESXi's snapshots in the file system conversion, so it was only useful for accessing data even older than the backup.

So, for now, the VMs are back on the old server, running under ESXi. They'll stay there at least until the "Build an Open Source Cloud Day" Friday at the SouthEast LinuxFest (SELF). Hopefully, I'll be learning a bit more about deploying/managing Xen servers (appears to be the currently supported "cloud" in CloudStack) then.

Saturday, April 16, 2011

What RAID isn't...

I periodically have to discuss what RAID is and what it isn't.  I have this discussion with management, IA, and even experienced system administrators.  Last years Information Storage and Management class allowed me to order my thoughts and come up with a more concise statement:  

Although RAID can support a back-up solution (as a separate storage solution), it's main reasons for existence is high availability or high bandwidth.  It is not (in itself) a backup solution as it does not take snapshots in time.  

If you're using RAID as your primary and you've configured it to mirror itself (on the same system), you're only one horrible accident away from losing everything.  The mirror will only save you from hardware failures.

Thoughts?  Arguments?