Xserve G4 running Panther Server 10.3.9: Didn’t like that security update… at all

Posted by Pierre Igot in: Macintosh
June 12th, 2006 • 10:10 am

Ouch.

Last Thursday, we had to install a new Sun server for our library management system, and took advantage of this opportunity to also move the Xserve G4 that hosts our web site and mail server to a different location in the basement.

Everything went well, and the Xserve was back up and running in no time.

On Friday morning, I went in to check a few other things, and, for some reason, decided that it would be a good idea to apply the latest system updates that Apple had released since the upgrade to Mac OS X Server 10.3.9.

I checked with Software Update and saw a couple of Java updates, which were not really relevant (we don’t use Java on that machine), a Safari update, a QuickTime update, and a security update. I decided to apply them all, as I didn’t think that the Java, QuickTime, and Safari updates would have any impact, and the security update had been out for a while and I hadn’t seen any reports about major problems with it.

So I launched the updates and let Mac OS X do its thing. As expected, at the end of it, it required a restart, so I restarted the machine.

And then—boom. The system appeared to launch properly, but then, after logging in automatically as administrator (which it is supposed to do, in order to be able to launch FileMaker Pro and open the databases that are accessible through the web site), the launch screen for FileMaker appeared, and then the menu bar disappeared as soon as it had appeared, and all I could see on the screen was the desktop picture.

I moved the mouse pointer to the bottom of the screen, and the Dock popped up as expected. No application was running, not even the Finder. I was able to launch Safari and load the web site in Safari, so obviously the Apache server was working. I checked with our users, and the mail server was working as well. But I was unable to launch the Finder, or FileMaker. I was able to launch System Preferences and use it. I seemed to be able to launch Terminal, but then after the window appeared, if I tried to type anything, the Terminal application would quit.

I tried restarting the Xserve, but it didn’t make any difference.

And then I noticed that there was all kinds of activity on the LED indicators for the two Ethernet connections on the front of the Xserve. And the activity didn’t subside even if I waited for a bit. What was particularly strange was that it looked like there was lots of activity on both Ethernet ports, even though the second Ethernet port on the Xserve is not even used and has nothing plugged into it!

The next step was to try and boot from the system CD. This worked just fine. I launched Disk Utility from the OS installer, and repaired the startup disk and the permissions on the volume. I tried restarting from the startup volume, but it still would produce the same result.

At that stage, I decided I would try starting from the other hard drive that we have in the Xserve, which I use as a backup. I had mirrored the entire startup volume on the second hard drive on Thursday both before and after the move using Carbon Copy Cloner, so I had what I thought was an up-to-date bootable backup.

I selected the second hard drive as the startup disk and tried to restarted. The Xserve started booting the OS on the second hard drive just fine. But then near the end of the startup process, while the progress bar was almost completely full, the system got stuck, apparently at the stage where it is supposed to start Apple File Services. The startup window showed the message: “Waiting for Apple File Services…” and stayed stuck at that stage for what seemed like forever.

Eventually, after many minutes, the system got past that stage, but then it took me to the login window, even though the system was supposed to log in automatically as admin. I tried to log in as admin, but all I got was a blank screen with the spinning beachball.

I tried waiting for a while longer, in case things would just take some time the first time around. But nothing ever happened. I tried a hard reset and booting again from that second hard drive, to no avail.

In other words, I had a startup disk that appeared to be toast, and a backup that didn’t seem to work. Great.

At that stage, my focus reverted to the startup disk, since I didn’t see what I could do with the backup created by Carbon Copy Cloner to get it to work. I tried repairing the hard drive directory, with DiskWarrior this time. The repair process got suspiciously stuck with the spinning beachball for a while about half way through, but then resumed and was completed without further difficulty. DiskWarrior only reported a couple of minor errors, which it corrected.

I tried repairing the second hard drive as well, for good measure. This time, the process didn’t get stuck, and the repair went just fine, with the same two minor errors reported and repaired by DiskWarrior.

Sadly, however, these repairs didn’t change anything to the situation. I was still unable to boot properly from either hard drive. The Carbon Copy Cloner backup appeared to be no good, and the startup disk had obviously been damaged by the seemingly innocuous system update that I had tried to do.

I had external backups of most of the vital stuff on other machines, but not a complete bootable system that I could just restore from. I started from the system CD again and figured I would try an “archive and install” update. That’s when I discovered, however, that, with Mac OS X Server, you cannot do an “archive and install” update. The only option is to entirely erase the volume and start from scratch!

There are probably lots of good technical reasons for this, but this clearly meant that I had no choice but to rebuilt my entire system from scratch. And that took me pretty much the whole day, partly because the last time I had done this was more than 3 years ago…

Here are a few lessons that I learned during the process:

  • Even if you start the services that you need with Server Admin on the Xserve itself, the firewall still blocks most of them (if the firewall is one of the services that you use, of course). So you need to start the service and open the corresponding access in the firewall rules. This is different from the regular Mac OS X, where starting services in the “Sharing” control panel effectively opens the corresponding ports in the firewall at the same time.
  • The httpd.conf situation is rather confusing. (This is the file that contains essential configuration information for the web server, i.e. Apache.) There is one “master” httpd.conf file in etc/httpd/, but that’s not the one you want to work on. The one you want to work on is the one that is inside the etc/httpd/sites/ subfolder and corresponds to your particular site. (You can actually host multiple sites with Mac OS X Server, but I am hosting only one.) This is important, because by default Apache is configured to reject all rewrite rules via .htaccess files, which is something that you need to have in order to have user-friendly URLs for your dynamic web pages generated by a blogging system such as WordPress. So you need to edit the appropriate httpd.conf file inside the etc/httpd/sites/ subfolder. And you cannot do this with any old text editor in the graphic UI. You actually have to edit the file in Terminal using a text editor like pico, and with superuser status.
  • In order to install a blogging system like WordPress, you need an operational MySQL setup. The version of MySQL that comes with Mac OS X Server 10.3.9 is not the latest. In fact, it’s a pretty old version. I thought I’d try using the latest version, which is 5.0. This was a bad idea. The more recent versions of MySQL uses a different password format, which is incompatible with older tools trying to access the MySQL databases, including parts of Mac OS X 10.3.9. I learned this the hard way, i.e. by installing MySQ L 5.0 and then discovering that things wouldn’t work, and scrambling to try and “uninstall” it. Eventually, I ended up installing MySQL 4.0.26 over 5.0, and this worked. (It automatically moved the existing MySQL 5.0 install to a backup folder.) Phew! That’s was probably the trickiest part of the whole ordeal. Even though MySQL comes in a Mac OS X-friendly .pkg package, it’s still a royal pain to install and configure.
  • Restoring an existing customized version of WordPress is relatively easy. My web site is using an older version of WordPress (1.2) that is quite heavily customized. I just copied the entire web folder from my backup, and then just ran the install script at /wp-admin/install.php. After that, I just had to restore the MySQL databases from my backup “dump,” and everything was back in order—except for the user-friendly URLs, which I had to restore by fiddling with the httpd.conf file above.
  • FileMaker is pretty dumb. It uses port 80 by default for its “Web Companion” web server, which of course conflicts with the default port used by Apache. So you have to first enable Web Companion, and then change the port it uses. Traditionally the port number used is 591. And then you have to create a rule in the firewall that allows connections to that port from the outside. You have to create a firewall rule manually, which is not exactly the most user-friendly thing either.

And after 8 hours of hard work, I was pretty much back up and running. (The mail service was restored much sooner, fortunately, so that my users could use their e-mail even during the day while I was repairing the rest.)

Now the question is: What’s the easiest way to create a bootable backup? Carbon Copy Cloner is obviously no good. I think I am going to try and use the asr command that is included in Mac OS X. But this time I am going to test it properly right away, to make sure that the backup works.

As for what triggered this whole disaster, I am really not sure. After I erased the whole hard drive and reinstalled the whole system, I applied the exact same system updates again, and everything worked smoothly. So it wasn’t the updates themselves. But it was something that the updates accidentally triggered.

Could it be a hard drive failure? I certainly didn’t like that stage in the DiskWarrior repair process where things got stuck on the startup disk. Server Monitor now says the drive is fine, but we are not going to take any chances. We are going to order another Apple Drive Module for the Xserve and transfer everything on that hard drive. There are few options these days for drive modules for the older Xserve G4 systems, which use Ultra ATA rather than Serial ATA. So we’ll have to buy a 250 GB drive at $500CDN, even though we don’t need all that hard drive space. I suppose we could buy a third-party hard drive and install it in the module enclosure ourselves. But I suspect that there are minimum standards of quality required by the Xserve and you cannot use just any drive. Based on what I have read on-line, people are quite divided on this issue. It all depends on how “critical” your server is.

Well, I think the server is critical enough not to take any chances, so we are going to end up paying the premium that Apple charges for its drive modules.

One last thing: After I did all this, when I got home to my machine and tried to log in remotely via SSH, I got some weird warning about a discrepancy. I ended up having to trash the “known_hosts” file that’s inside the (invisible) “.ssh” folder in my home folder, via Terminal. After that, I was able to use SSH properly again.


10 Responses to “Xserve G4 running Panther Server 10.3.9: Didn’t like that security update… at all”

  1. mricart says:

    We have been using “Synchronize! Pro X” for startup disk mirroring, and so far it worked.

  2. AlanY says:

    CarbonCopyCloner is indeed no good. See this article:
    http://blog.plasticsfuture.org/2006/04/23/mac-backup-software-harmful/
    and the earlier article in the series:
    http://blog.plasticsfuture.org/2006/03/05/the-state-of-backup-and-cloning-tools-under-mac-os-x/

    That author runs tests on almost all Mac backup options and finds most of them lacking, but identifies one (SuperDuper) that does everything perfectly.

  3. Pierre Igot says:

    Thanks for the links and pointers. I will take a closer look. I already use SuperDuper! on my home office computer to back up data, but I don’t use it to back up an entire startup volume. My main question would be whether it works well with Mac OS X Server. OS X Server obviously has more stuff going on behind the scenes and ensuring that all this is preserved is probably harder than with OS X Client. I’ll ask the developer.

    What he says about CCC doesn’t seem to explain why my backup failed, unless there are BSD flags on the volume that have to be preserved for the system to work properly.

  4. AlanY says:

    It could be the BSD flags, but it’s much more likely to be the missing ACLs that were the source of the problem.

  5. Pierre Igot says:

    My understanding was that ACLs were introduced in Tiger. Since I am running Panther Server (10.3.9), I don’t think they are the problem.

    Even the SuperDuper! developer himself says that my CCC clone should have worked.

  6. Andrew Aitken says:

    We use SuperDuper exclusively to deploy both OS X Client and Server. So I know it works well for OS X Server.

    Wherever possible, I try to have the boot volume of the Xserve mirrored to another internal drive. When I want to install a software update like this, I simply pop the drive out, thus breaking the mirror. If it all goes pear shaped, I just plug the other drive back in, and reboot as if nothing happened, while I work on sorting out the now broken drive.

    With regard to editing config files, I use TextWrangler a lot. It is free, and offers authenticated saves – so you don’t have to use the command line text editors to edit files that need superuser authority.

    With regard to the drive module question – a few months ago I had a customer who wanted me to replace the HD, rather than the whole module in his G4 Xserve. So far, there have been no problems. It’s a very simple procedure, and the drive (which I actually have in front of me now!) is a standard 60 Gig IBM DeskStar – with some silly red marker-pen marks on it…

    The drives are just standard drives, but have been ‘tested’ by Apple, and confirmed capable to run 24/7/365 in an Xserve. If cost is an issue, I wouldn’t worry too much about putting a 3rd party drive in, just make sure it’s a reputable make. The Maxtor MaxLine range seems good, so does the Seagate NL35 series (not sure if they come in non-SATA however)

  7. sjk says:

    I’d also recommend SuperDuper! for its integrity, plus support is fast and knowledgeable if you ever need it. I used CCC before SD!, but Mike Bombich has all but abandoned development in favor of NetRestore and CCC’s forum is too “risky” to rely on for support.

  8. Aapo Laitinen says:

    When you connect to a SSH server for the first time, a fingerprint (a cryptographic hash, to be exact) of the public key of the server is stored in known_hosts. In subsequent connection attempts, the SSH client ensures that the fingerprint of the key the server sent matches to the stored fingerprint.

    When you did a complete reinstall, the SSH server generated a new private and public key for itself. Since the SSH client can’t tell whether the new key was generated on purpose or if someone is attempting a man-in-the-middle attack, it stops you from connecting.

    If you ever receive the same warning regarding a computer you don’t administer youself, contact the party responsible for the server by phone and ask if the disparency is legimate or not. If you receive the warning without a clear cause (such as a reinstall), be very scared.

  9. Pierre Igot says:

    Andrew and sjk: Thanks for the additional advice. I don’t why I didn’t think of using BBEdit (which I own) to edit the httpd.conf file. Somehow I forgot that it (like TextWrangler) can open invisible files and edit root-owned ones. I will try SD! shortly to confirm that it works well to create a bootable clone of my OS X Server volume. (The developer has told me that the only issue might be the “live” MySQL databases. But it’s not really a problem for me as I would do the backup during the night and the MySQL databases only change during the day. We don’t offer the ability to comment on post items. FileMaker would also probably complain about databases not having been closed properly, but it would still work. In any case, I keep separate backups of the databases remotely.)

    Aapo: Yes, that’s what I figured when I got the error message, and that’s how I figured out that trashing the known_hosts file would probably clear the problem. The problem is that it’s not very intuitive at all :). But I guess that’s what you get with server solutions. They are just never going to be user-friendly like a client Mac OS environment.

  10. sjk says:

    As Pierre and others already know, there can be integrity issues with backups/restores of “live” database and other “volatile” data using other software besides SD!. Just wanted to clarify that’s a general concern and not SD!-specific.

Leave a Reply

Comments are closed.