It's a question that most people don't ask unless they're at work, but it's been growing more and more important as everything we do and buy has gone digital. We purchase and download music, books, photos, and much more to our computing devices. We have gigabytes of photos which may only exist in binary format. We have our entire personal finances stored on our computer, with the convenient backups in a secondary directory on the same machine. Should something happen to our data, it could cost us a good deal to recover it.
So how much would you be willing to pay someone to extract your precious data from a dead hard drive? $50? $100? $500?
Depending on how much you need to get back, you could be asked to pay quite a bit more.
The reason I ask this not because I plan on recovering data as a side job, but because last week my primary hard drive failed. On that drive I keep thousands of personal photos, financial files, customer source code, government forms, historical documents, 10 years worth of email (not the spam, though) ... the list goes on. This hard drive also had on it the applications that I use on a daily basis to write software, work with databases and everything else I expect my computer to do on a regular basis. When my drive seized, there was no warning ... nothing out of the ordinary. I use this computer for more than half a day, every day ... almost all 365 days in the year. The only time my computers don't receive this kind of attention is when I'm with Reiko.
So why am I just writing something about this tragic event now, when currently half my income is derived from the work I do on these very machines? Well ... probably because it wasn't all that tragic.
I keep an active spare of my notebook's hard drive. Should something happen to the drive on this machine, I can swap it out with an almost exact duplicate in the space of five minutes. The main difference is that my spare runs slower than the primary, and is only half the capacity. The same operating system is installed, and the same applications are installed. That's what I did last week.
As for all the irreplacable files ... well, I have so many levels of backups that I seldom lose more than 6 hours of work. Even if my entire computer system was stolen (notebook, servers, external hard drives, stacks of CDs and DVDs), I wouldn't be out of more than a day's work. I am a backup freak.
My system does incremental backups every 2 hours. When connected to the network, these backups are stored on my storage server. When on the road, the backups are held in queue until I'm either online to transfer the incremental file (rarely ever more than 10 meg, considering the rate at which it's done), or until I get home to do a mass upload. The 1.2 terabytes (stored RAID5) on my NAS is completely backed up once a month, with incrementals done daily and burned to DVD. My important data is further protected by being encrypted and stored on a server in Finland.
Unfortunately, I can't realistically keep everything on my server backed up in Finland ... but there's really only about 16 Gig worth of files that are truly irreplacable to me. The mp3s and other files can be completely lost, and it would only be a matter of a year or so to recover 80% of them from some online source.
But this does make me wonder about how realistic the hard drive's MTBF (Mean Time Between Failure) value is. The hard drive that died last week was a Seagate. I've had three Western Digitals and another Seagate die on me since 2000. When I look at the failure rate on the drive, I could have expected roughly 1,000,000 hours of operation. Not bad considering how there are only 8,760 hours in a year. So why did it seize after about 15,000?
Naturally things break down, but I've had six drives die on me in seven years. In seven years I have owned 28 hard drives ... that's almost 25% failure. Where is this 114 year lifespan, and where did these manufacturers come up with these numbers? How could they have possibly tested a drive to know that it could last 1,000,000 hours when the hard drive was only invented (approximately) 450,000 hours ago? (Anyone remember IBM's massive 350 disk storage unit used by RAMAC?)
Apparently, I'm not the only one asking these questions. Three engineers from Google (I really need to find things to discuss that doesn't involve that company ... I swear it was coincidence, this time) recently presented some findings at FAST '07 to show some failure trending in their environment. This was done through the use of SMART (self-monitoring, analysis and reporting technology) on the hard drives and collected over a period of several months.
What I find really interesting is that some of the most commonly held beliefs regarding hard drive usage didn't matter when looking at the big picture. Heat wasn't much of a factor unless the temperature was below 20 degrees C, usage wasn't much of a factor, manufacturing defects only made up a small portion of the issues ... so why would a large company like Google have so many issues considering they're working now with enterprise-grade equipment that costs quite a bit more than the standard ATA and SATA drives commonly used by consumers and small business?
Well, I guess this is one of the reasons the people at FAST have been all up in arms requesting that manufacturers review their MTBF rates. I agree that manufacturers shouldn't over-inflate their hardware's operational hours, and perhaps some other measurement of reliability needs to be introduced. Warranty doesn't really offer much unless data recovery is part of it, or you have lots of fail-safes in place.
To anyone that doesn't have a backup plan, I would urge you to make a copy of your "My Documents" folder at the very least. Put it on DVD and toss it in the back of the closet. Do this once a month. If the time ever comes for your equipment to fail, you don't want to shell out the $50 / gig that many professional data recovery places ask.