Hard drive failure

Posted by Pierre Igot in: Macintosh
June 6th, 2005 • 12:05 am

Last week I wrote about problems I was experiencing with Tiger, which led me to do a complete reinstall of the system.

This appeared to fix the problem at the time. Unfortunately, over the course of the following seven days, things gradually deteriorated again. The symptoms were quite simply that more and more things in Tiger were taking more and more time. As far as I could tell, the problem was not increased CPU usage. MenuMeters kept saying that my CPU levels were fine. Yet things were getting slower and slower.

Eventually, the entire system froze with the spinning pizza of death, and I ended up having to do a hard reset. Unfortunately, the same thing that had happened last week-end happened again: After the hard reset, the computer took forever to start up again. I waited for about ten minutes, and then decided that it was probably time for more drastic action again.

I tried booting from the new DiskWarrior 3.0.3 CD that I had just made using AlSoft’s updater last week. I was able to get to the DiskWarrior splash screen, but then things froze there as well.

I then tried booting from the original Mac OS X 10.4 DVD. The computer booted just fine, and I got to the first screen. I tried launching Disk Utility from there, and the application launched, but then my hard drives never did appear in the main Disk Utility window. The spinning wheel just kept going round and round. I had to give up on that as well.

The next step was to try and boot using the old Panther system that I still had on another volume (on another hard drive) in the machine. The system started up fine and got up to about two thirds of the progress bar — but then it got stuck at the “Waiting for local disks” stage.

In all these tests, when the computer would get stuck, I would hear some kind of on-going hard disk activity inside the machine, but it wasn’t the same noise as usual. It was more of a shuffling noise than the usual scratching noise.

It was time to dismantle my machine. Given the past history of this G4, I decided that it had to be either a problem with a hard drive or with the ATA bus that the hard drive was connected to.

So I took the 160 GB Seagate hard drive (purchased in May 2004) that had my defective Tiger partition on it and was on the ATA bus for the vertically mounted hard drive bay, and I hooked it up to the other ATA bus, for the horizontally mounted hard drive bay, where I already had the other 120 GB Seagate hard drive with the Panther partition on it, which still seemed to be working fine. I tried booting again with the Panther volume, and didn’t get farther than the “Waiting for local disks” stage either.

The last step was to disconnect the 160 GB Seagate entirely. I did that, and the computer booted from the Panther partition just fine. So clearly the problem was with the 160 GB hard drive itself, and not with the ATA bus. And the problem was such that, as soon as the 160 GB hard drive was connected to either bus, the computer was unable to boot from anything other than the Tiger DVD — and even then, I was unable to use Disk Utility on the Tiger DVD to try and repair the defective hard drive and its three partitions.

I decided to try one more thing with the 160 GB connected inside the machine: I tried booting in Mac OS 9. The 120 GB hard drive with the Panther partition also had Mac OS 9 on that same partition, and my G4 is old enough that it still supports booting in Mac OS 9 directly (as opposed to using it as Classic in Mac OS X). The reason I wanted to try this was that Panther was booting fine but getting stuck as the “Waiting for local disks” stage. I thought that maybe Mac OS 9 would not have that particular hurdle to overcome and I might be able to complete the booting in Mac OS 9 and actually see the partitions of the defective hard drive, without it causing my machine to freeze or become atrociously slow.

I was indeed able to boot in Mac OS 9. But then the partitions of the defective hard drive never did appear on the desktop. I tried launching DiskWarrior for Mac OS 9 (which I also had on the Panther partition), and it was able to see the unmounted partitions. I then tried to get DiskWarrior to fix them — but it too get stuck when it started reading from the partitions, with the pizza of death spinning endlessly.

There was simply nothing I could do to try and revive these partitions on the defective 160 GB hard drive. But at the same time the partitions were still accessible to a degree. They were not completely gone. So I wasn’t quite ready to give up on the hard drive.

Instead of trying to use it as an internal hard drive, I decided to try and put the defective hard drive in an external FireWire enclosure. I have this Kanguru QuickSilver external hard drive that I got a year and half ago, which is really just a standard external FireWire/USB 2.0 enclosure with an IDE hard disk in it. I broke the warranty seals (the drive was more than one year old anyway) and removed the hard disk it contained, and put my defective Seagate 160 GB hard drive in instead.

At the same time, I also installed back in the G4 the original OEM 120 GB IBM hard drive that had come with the dual G4 (MDD) and that I had replaced with the two Seagate hard drives, back when I though that the IBM drive itself was defective because of intermittent freezes that I had been getting. Further investigation had revealed that the freezes were probably not due to the hard drive itself, but rather to the ATA bus — so I had kept the hard drive just in case. Now it would come in handy. Clearly the 160 GB Seagate hard drive was defective and I was going to have to get it replaced — but I didn’t want to install Tiger on the other Seagate hard drive, which still had the Panther installation and lots of other important stuff on it. So I decided that I would reinstall the IBM drive as a replacement for the defective 160 GB Seagate hard drive, and install Tiger on that IBM drive. (The one negative point is that the IBM drive is significantly noisier than the Seagate drives, with a very high-pitched whine that really gets on my nerve. I am going to have to get it replaced quickly!)

Once Tiger was installed on the IBM drive and the defective Seagate was installed in the FireWire enclosure, I booted in Tiger. I then turned the FireWire enclosure on, waited for a bit (still hearing that constant shuffling noise coming from the defective Seagate), and then connected it to the computer.

It took a while, but the three partitions on the defective Seagate did finally appear on the desktop! Clearly the benefit of having the defective hard drive in an external FireWire enclosure rather than connecting it internally to one of the ATA buses was that the rest of the computer was still usable.

As far as I could tell, the partitions were not completely dead and inaccessible. But everything was excruciatingly slow. Opening Finder windows to display the contents of the partitions would take forever, but it would work eventually.

As a last resort, I tried to boot from an old Norton Utilities 8 CD that I had and see if I could repair the partitions as external FireWire volumes. I figured that I had nothing to lose, even if the Norton software was outdated. Unfortunately, here again, things were excruciatingly slow, to the point that, well, there was no point in trying to repair anything with Norton Utilities.

Now, you might wonder why I was so desperate to try and repair this clearly defective hard drive. Yes, I do have backups of all my really important stuff. On that defective hard drive, I normally have three partitions. One is my Tiger partition, which only contains Mac OS X 10.4 itself along with all the Apple and third-party software that cannot be run from another partition and has to be installed on the startup volume. Most of my third-party applications are still on a partition on the other 120 GB Seagate drive, so the applications on the defective Tiger partition were only the iLife and iWork stuff and a few other things that refuse to run from (or cannot easily be updated on) an separate partition. Nothing that could not be easily reinstalled on the IBM drive from DVD or CD.

This Tiger partition also contains my home folder, i.e. my customized user environment. Here again, however, most of the really important stuff — such as my Mail folder, my Address Book data, etc. — is actually on another partition on the 120 GB hard drive, and I only have symbolic links to that stuff in my home folder. But there are a few things that are actually in that home folder and are important customizations. Fortunately, I did a backup of the entire home folder last Friday — so I have a very recent copy of all the stuff.

The problem is with the other two partitions on the defective Seagate hard drive. They do not contain anything “essential”, but they do contain a lot of non-essential stuff that I don’t absolutely need, but would like to rescue just the same. For example, they contain thousands of MP3 files that I have downloaded over the past few months from various MP3 blogs that I have been reading. It’s not essential stuff, but these files can no longer be downloaded from the blogs (they appear for a limited time only), and some of them I was really interested in.

I also have some other audio and video stuff on these partitions that I can actually recover from CDs and DVDs or recapture using Audio Hijack Pro. But it would be a pain to have to redo all the capturing and extracting. It’s the kind of “non-essential” stuff that accumulates over time and that you don’t back up because it’s so bulky and would take many DVD-Rs or CD-Rs.

At that stage, I thought that, since the partitions on the defective hard drive were not completely inaccessible, there might still be a chance that I might be able to recover some of the stuff before sending the defective hard drive for repair or replacement. I couldn’t afford to have my Finder locked up for extended periods of time while it was trying to copy stuff from these defective partitions — but I thought that, if I could use a separate piece of software to try and copy that stuff, slowly but surely, in the background without interfering with the rest of my work, I might have some kind of solution here.

I tried a couple of software FTP applications that allow you to browse your local volumes as well as external FTP servers. However, I found that both of them — Interarchy and Transmit — did actually really on the Finder to access the local volumes, and trying to use them to access the partitions would lock up the Finder for extended periods of time just the same.

Then I thought that I might try one of these “Finder alternatives” that are available for Mac OS X and by-pass the Finder entirely, offering their own access to your volumes instead. So I fired up the old copy of RBrowser Lite that I still had on my Panther partition, and tried to open one of the defective partitions in a RBrowser window. And it worked! It took a long time, of course, but it worked — and, most importantly, it worked without interfering with the rest of my computing environment, including the Finder itself.

And so I started trying to copy files from the defective partitions to a safe place on the IBM hard drive. I selected a bunch of files, and dragged them onto another RBrowser window showing the contents of my IBM volume. It took a long time, but the files did get copied eventually! And I tested the recovered files on the IBM drive, and they seemed to be working fine!

I did another bunch of files, and again, it took a long time, but it worked. I kept an eye on the disk activity in Activity Monitory, and could tell that things were sometimes painstakingly slow, with only a few kilobytes getting read (and then written) at a time. But sometimes there would be bursts of disk activity and a few megabytes would get copied in a second or two.

Overall, it was still much better than having nothing at all, and I decided that I could be patient and try to recover the files over the next few days, if necessary.

So here I am now, selecting a three or four hundred files at a time and trying to copy them from my defective hard drive to a safe place. It’s taking a long time, but hey, it’s working — more or less. Unfortunately, I also get error messages in RBrowser from time to time, telling me that the copying process has failed. (No surprise here!) In those cases, I have to reselect the files in question and try again, and then it usually works. Sometimes I get a bunch of error messages in succession, and am only able to copy files one by one. But then other times RBrowser is able to copy dozens of them in “quick” succession without any user intervention — as long as you give it the time to do so.

I have about 5,000 files that I am trying to recover. So far I’ve already rescued about 1,500 of them. It’s painful, but it’s not so bad. I can still use my computer for other stuff at the same time, and it only requires my intervention from time to time, when I get an error message, or when I need to select a new bunch of files. And the files seem to be OK after that!

It’s definitely a weird experience. In my work as a Mac tech support person, I am somewhat familiar with defective hard drives. But usually they go completely dead, and there’s nothing that can be done to recover the files. In this case, it doesn’t look like it’s the media that’s defective. It is as if there is some really bad bottle-neck somewhere that causes file transfers to slow down to a crawl.

It might be a defect that’s repairable — but I wouldn’t know. I am not a hardware guy. And when I contact the vendor to try and get a replacement, I suspect that they won’t want to hear about trying to recover my data for me. It just wouldn’t be worth their time and money. If my drive is still under warranty (as I hope it is), they’ll probably just replace it with a new one.

So this is probably the only alternative I have if I want to recover these files. I wouldn’t want to spend hundreds of dollars on these files — they are not worth that much to me — but if I can rescue them like this in the background with minimal intervention, I might as well do it.

Of course, I have to hope that the defective hard drive will remain in this rather precarious state, and not die on me completely before I am able to rescue all the files I want to rescue. At this stage, it’s hard to predict.

I also cannot help but wonder if the fact that I used this partition for Tiger has anything to do with this at all. It could just be a coincidence. It could also be that the higher level of hard drive activity required by Tiger (and particularly by features such as Spotlight) was a precipitating factor. I probably will never know. But I definitely feel safer having all my important documents on a different hard drive — and not just on a different partition on the same hard drive.

Could I have avoid some of this trouble by backing up all my stuff more religiously? Of course I could have. But it’s simply not realistic to expect users to back up all their stuff all the time. Some files are more important than others, and I am quite sure that most users who do back up their stuff regularly are like me and only back up the really important stuff very regularly. Still, I have been a bit lazy lately, and am made to pay for it now — although I am really rather fortunate that the hard drive hasn’t simply gone kaput on me, and is still accessible to a degree. I guess I will have to organize myself a bit better in the future, and really make sure that I have recent backups even of the non-essential stuff — especially stuff that cannot be recovered any other way.

Still, what a way to spend your Sunday! I really hope that some day hard disks will disappear the way that diskettes have and be replaced by something much more reliable. It simply is not realistic to expect users to do back ups of all their stuff all the time.

2 Responses to “Hard drive failure”

  1. Olivier says:

    Two things:

    I recently learned that a hard drive that has been used in vertical position will have trouble working in horizontal position. I think this is because the movement of the heads is not controlled by a closed-loop system and therefore a change in orientation will cause the heads to position themselves differently relative to the cylinders. So if you change the orientation of a drive, reformat it first. All of this to say that if you’re still trying to recover data, make sure the drive has the same orientation as when it was working.

    The other thing is that last time a drive almost died in my hands (it kept spinning up, down, up, down), I took it home to try and recover the not so important stuff. So I put it in the top-case of my motorbike and drove home, fifteen minutes in cold weather (0 °C). When I got home and plugged it in a FireWire enclosure, the disk worked just fine and it still does so (although I don’t use it anymore, I had it working for a month after the alert). So if it’s a mechanical problem, putting the drive in the refrigerator for a couple of minutes could be worth it.

  2. Pierre Igot says:

    Thanks for the suggestions. I am not sure the vertical/horizontal thing will make any difference, but I will try if nothing else works… Actually, right now I am getting a sustained 10 MB transfer rate, so I am able to recover most of my stuff pretty quickly. Very strange.

    I wouldn’t rule out temperature as a factor either. It certainly seems to me that such incidents tend to increase in the summer months, as if something that was “dormant” in the cooler temperatures is eventually exposed by the warm weather. Nothing scientific here, just an impression. But these drives sure do get hot in there.

Leave a Reply

Comments are closed.