December 11th – National Backup Awareness Day

Something horrible just happened to Jeff Atwood aka CodingHorror.

“ugh, server failure at CrystalTech. And apparently their normal backup process silently fails at backing up VM images.”

“I had backups, mind you, but they were on the virtual machine itself :(“

It’s a times like these we start wishing for a time machine, a cosmic undo button or reversible computing.

Jeff’s blog was read by tenth of thousands of programmers and system administrators for many years. It contains information that is very valuable for these people, and represents an unthinkable amount of hours spent by Jeff. An agency rate for somebody like Jeff is between $250 and $500 an hour, but this is like appraising a priceless family heirloom.

I am not going to go through the motions of telling everybody how to backup things, about how important offisite backups are, how disk drives are fragile, how I don’t trust virtual servers, how raid is not a backup strategy, and how version control is not backup strategy, etc, etc. JWZ wrote a good article about backups.

Here are things I want to say. First, we are all not backed up sufficiently and likely have already lost data that we would want back.

I can’t find my grandmother’s recipe book (I still hope it’s only lost), my wife’s first email to me, my first web page through which she found me, my first job search web page that had a picture of the Twin Towers and said how I wanted to work there, my early school grading papers, a rare book about fishing in the Black Sea, a stamp from the Orange Republic that used to be in my father’s stamp album, the password to my very short-numbered ICQ account. A lot of stuff.

All of our digital information is susceptible to an electromagnetic pulse, fire, flood. Spinning platter hard drives are particularly bad – they have very short lifespans measured in low single digit years. CDs are even worse – aluminum inside them rots (I have a cd with a lot of outlook emails that reads as a blank filled with 1s).

So the first thing that I would like to mention is that if you never simulate a failure, you’ll never know if your stuff can be replaced. It’s not an easy thing to practice, though – restores and failovers are tricky to do.

A few jobs ago we were getting a fancy new load balancer set up. It was up and running, and supposedly we had failover: if one of the servers died, we would not even need to do anything, the backup servers would pick up the slack. I suggested that we should test it by pulling the network plug on one of the machines off hours. My boss would not allow that, saying that we could possibly break things. My argument that it’d be better if something like that happened when we were ready it would not be as bad if it happened when the actual failure would occur. When the actual failure did occur the load balancer did not switch, and we had an outage that was a good deal longer (it happened at night).

Load balancers are not backup solutions, but this story highlights an irrational streak in system administration: nobody wants to practice failure: it’s just too nerve-wracking, and a lot of hard work. It’s much easier to assume that somebody up the line did everything correctly: set up and tested backups, startup scripts, firewalls and load balancers. Setting up and validating backups and testing security are thankless jobs.

This brings me to a another point. The act of taking a backup is not risk free in itself. The biggest data losses that I suffered happened to me in the process of setting up backups. As an example I’ll bring up the legendary story about Steve Wozniak (whom I met yesterday):

The Woz was creating a floppy driver under an extreme time pressure, not sleeping much and feeling sick. The end result was a piece of software of unimaginable beauty: it bypassed a good deal of clunky hardware, and thanks to a special timing algorithm, was fast and quiet. When other disk drives sounded like a machine gun (I dealt with a few of those when I was young), Woz’s purred like a kitten. Finally he wrote the final copy onto a floppy, and decided to make a backup of it. Being dead tired, he confused the source and destination drives, and copied an empty floppy onto the one with the precious driver. Afterward he proceeded to burnish his place at the top of engineering Olympus by rewriting the thing from memory in an evening.

It’s really the easiest thing in the world to confuse the source and destination of a backup, destroying the original in the act of backup! The moral of the story?

Do as much backing up as possible, while being careful not to destroy your precious data in the process. Have an offsite backup. Print out your blog on paper if it’s any good. In fact, print out as much stuff as you can. Your backup strategy should be like a squirrel’s: bury stuff in as many places as possible (well, except sensitive information, which is a whole other story in itself).

What Up

The following will probably be only interesting to people who build their own computers (and probably not even them), so feel free to skip this post.

My little Shuttle XPC computer gave up the ghost (the onboard SATA raid controller got really messed up). It took me a good while to frankenstein together a reasonable machine out of all the parts that were stashed away in my apartment, so I am a-computin’ again. I am researching a purchase of a high powered replacement, but meanwhile, let me share some technical tidbits that I’ve learned along the way.

First of all, it’s really easy to actually fry a floppy drive. Fried floppy drives look like they are working, but they don’t. And without a floppy drive you can’t install (or repair) Windows 2000 or XP on a SATA or IDE raid array. Even if your motherboard claims that it can boot from a USB floppy drive, it probably can’t. Well, at least mine can’t. The moral of the story is that doing away with legacy hardware such as a floppy is not a good idea.

Anyhoo, it’s a good idea to have separate two drive arrays: one for data and one for the system and programs (it’s a good practice to point data directories such as the desktop to the data array). The data array should run raid 1 (mirroring) – that way if one drive dies, you will still have another. You can periodically backup onto a third drive and hold it in a remote location. With 250 gig SATA drives costing about 100 bucks there is no reason not to do this.

The system drive should run raid 0 (striping). Striping actually significantly speeds the system up. You can also keep the system drive small, say about 40 gig – and back it up onto the data drive via Norton Ghost. It might be a good idea to splurge on real SCSI drives and a card. In fact, that’s what I’ll probably do for my new computer, as drives seem to be more of a speed bottleneck than RAM or processors.

SATA drives can be mounted externally: it’s called eSATA. If you have a small computer such as an XPC it’s a very good idea, as the drives become much easier to cool. I am probably going to jerry rig an eSATA enclosure out of an old computer case and some hotswap thingies, but they are also available from these guys.

I am a big fan of dual monitors – have two 17″ lcds. To run a dual monitor setup you need either two video cards or a “dualhead” video card. Well, I have a dualhead Matrox P750 (bought it because it has 2 dvi outputs), and boy does it suck. Driver installation is a nightmare – there are several versions of video card bios and and a multitude of driver versions. Most don’t work and crash Window. If you do get it to work, the stupid card can’t work well with low color/resolution settings: it shows lines and crazy patterns in bios screens. There’s a bios fix out, but it does not work. More than that, if you want to color match the monitors through a calibration cajigger, you can’t set up individual color profiles on the monitors. In short, I am much better off with two separate cards.

Ad:


Did you know that you can color calibrate your monitor with a nifty usb powered gadget so that your digital photos will stop looking like crap? I’ve used one for a while, and it rocks!

GRETAGMACBETH Eye-One Display 2 (This is the one that I have. People say that it’s a little bit better than the cheaper
ColorVision Spyder 2)

ColorVision Spyder 2