July 2nd, 2007 • 12:13 pm
Computing can be a very cruel experience.
Here I was bragging about how stable my computing environment with Mac OS X 10.4.9 was, with 23 consecutive days of uptime and no sign of degrading performance…
At the time, I noted that I really had little incentive to install the 10.4.10 update and the security update that had come out a week earlier, simply because they would force me to restart my machine and rebuild my current work environment.
Well, yesterday, after a few more days of uninterrupted computing bliss, I decided that I should install those updates after all. I had been monitoring the usual Mac news web sites and, apart from a strange audio popping problem on some laptops, I hadn’t noticed any significant issues mentioned in the forums. So I went ahead with the update.
And that’s when Mac OS X decided to punish me.
The update process itself went smoothly. I applied the combo update, which I usually do, even when the current system version is just one digit below (10.4.9 in that case), simply because people usually report fewer problems with the combo updates. I also applied the security update at the same time, so that I wouldn’t have to do two restarts in a row.
As expected, the Mac Pro went through a double-reboot process, which has become a fairly common occurrence in recent times with Mac OS X 10.4 updates. (I do wish that Apple would explain why this is necessary and warn the user.)
After that, things appeared to be back to normal. I then proceeded to try and install the iTunes 7.3 update, and experienced a weird problem with the update package, which may or may not have been related to the system update.
I went to BBEdit to write a post about that, and at some point wanted to look up a word in the dictionary. So I right-clicked on the word in my BBEdit document and—bam! The application crashed. I relaunched BBEdit, tried again, and—I got a crash again. I tried with a blank document, with a new word. Same thing. Other aspects of BBEdit appeared to be working fine, but the contextual menus were suddenly unavailable, and trying to invoke them would cause a crash.
I tried removing the BBEdit prefs file and relaunching the application. It didn’t fix the problem.
Then for some reason I tried to launch Pages. Instead of the usual blank document, I got an alert saying, “Pages cannot open the ‘Blank.template’ template.” No explanation, no suggestion for troubleshooting, just a single “OK” button. Nice.
After clicking on “OK,” the Pages application stayed open, but trying to open any other Pages file—either document or template—would result in the same error message, with the same “OK” button and no alternative.
In other words, Pages was suddenly completely unusable.
Again, I tried trashing the prefs file, to no avail.
At that point, I was starting to suspect that something had gone very wrong with the system update. I tested a few other applications. Mail seemed to be working fine. So did iPhoto. Safari too appeared to be working. But when I tried to use Camino for web browsing, I noticed that the browser would systematically fail to complete its page downloads. There would be a flurry of downloading activity when first entering a web address or clicking on a link, but then the downloading activity would trickle down to 0 kbps and the page would never finish loading. I also noticed that Camino was crashing every time I tried to quit it.
Not good. I did a bit of research in the Apple Discussions forums (with Safari), but couldn’t find anything of note.
I finally tried reinstalling the Pages application from the original disc, and that didn’t fix the problem either. At that point, I was pretty sure that the system update had somehow hosed my system (at least in part), and decided that more drastic measures were required.
I rebooted from my Disk Warrior 4 CD (which takes an eternity, even on a Mac Pro), and repaired the startup volume’s directory. There were a few errors reported, but nothing major. I also repaired permissions on the startup volume. The repair process found a number of bad permissions, but as usual, it was hard to tell whether these had anything to do with the problem that I was experiencing. (There were no permission problems with the Pages application files, for example.) I checked the S.M.A.R.T. status of my hard drives, and everything appeared to be fine.
I rebooted from my startup volume, and—the problems in Pages and BBEdit were still there. Argh.
At that point, the obvious conclusion was that some system files had been damaged during the update process, and that those were system files that were not crucial enough to bring the entire system down, but still vital for Pages and—for some reason—for BBEdit’s contextual menu, and possibly for Camino’s downloading process. (Strange combination!)
The next step was unavoidable: I had to reinstall the entire system. But first I tried it on a separate hard drive partition that I had set aside a long time ago for future system testing. The partition was empty. I booted from my Mac Pro’s system disc (10.4.7) and installed a pared-down system (without all the bundled applications, except for iWork) on that partition.
I booted from that test partition and tried running Pages. Everything was fine. I tried opening existing Pages documents (on another partition) that I couldn’t open in my damaged 10.4.10 environment. They worked fine.
I then applied the 10.4.10 combo update to that test partition. Pages was still working fine after that. I applied the 2007-006 security update as well. Pages still worked. I finally updated Pages from 2.0.1 to 2.0.2, and it was still working after that.
This (lengthy) testing procedure was conclusive: In all likelihood, reinstalling Mac OS X 10.4 on my startup volume and reapplying the updates would solve the problem.
So I rebooted from the system disc and did an “Archive and Install” installation, with users and network settings preserved, and with a customized system without all the bundled applications, which were already there on the partition and which I was hoping wouldn’t be damaged. Everything went fine and I rebooted from the startup volume. Of course, a number of things (customizations and minor third-party hacks) were lost in the process (including my customized user icon, for some reason), but on the whole the system was working fine, it was pretty close to my normal work environment and, most important, Pages worked.
I then applied the 10.4.10 combo update. Worryingly, when the time came to reboot, I had a kernel panic. But I did a hard reset and things appeared to be working fine after that. I then applied the rest of the required updates, which went OK except for a couple of minor glitches (installer application hanging instead of showing the password dialog, some updates refusing to work the first time, etc.).
Pages still worked normally after all this, and so did the contextual menu in BBEdit.
All in all, it was a significant problem, but one which was not too painful to fix (although I still hate having wasted all that time on this). But the cruel irony, of course, is that it should happened so soon after I had written a very positive post about Mac OS X’s stability and reliability.
It’s hard to know what happened exactly, and whether there is anything that I can do in the future to avoid a repeat occurrence. I did not repair permissions before applying the system update, but I have yet to find conclusive evidence that this does help. I was reasonably careful not to use the computer much while the update was in the “Optimizing” phase, after having read what the Unsanity folks had to say about it.
I guess it was probably just bad luck, or maybe the fact that I am using a few third-party hacks that are not fully supported by Mac OS X and were running at the time of the update. Who knows?
I did notice a number of unusual messages in the console while the problems were happening (although of course I do not have the required expertise to properly analyse those messages). Here’s for example what the console said when I was trying to launch Pages:
2007-07-01 17:34:12.237 Pages *** Assertion failure in -[NSMenu itemAtIndex:], Menus.subproj/NSMenu.m:713
2007-07-01 17:34:12.271 Pages Failed to open document, but no unrecoverable error
Does it mean anything? I also see a number of things in the system log… In any case, I still have the logs, in case someone has any clue.