February 26th, 2007 • 3:11 pm
After another full day wasted driving to my Apple Service Provider and back (a 600-km round trip), I am happy to report that it looks like the problem with my Mac Pro has been fixed.
Last time I wrote, I said that the problem seemed to involve RAM interleaving, and I was waiting to hear back from an Apple engineer about this.
Well, it turned out that the problem was not with interleaving per se, but with the logic board, and probably, more specifically, with the lower slot for the second so-called “riser card.” (The Mac Pro has two removable riser cards with four RAM slots on each.)
This was determined in several stages. First of all, the fact that the Mac Pro worked fine with all four RAM modules in the top riser card seemed to rule out a problem with the RAM modules themselves (especially since these were already replacement RAM modules).
Then the Apple engineer got me to try swapping the two riser cards, and that seemed to solve the problem for a while, but then the kernel panics started occurring again soon after the end of our conversation. So I got back to him to report on this.
By that time, he had already started the process of shipping me a replacement riser card (he was trying hard to avoid a round-trip to my ASP and was hoping that this might solve the problem) and said that I could keep it and try it just the same, but it probably would not solve the problem. He said the next step was unfortunately… to bring the machine to my ASP.
My ASP was kind enough, once again, to order the replacements for the suspected parts (the logic board and the video card) in advance. They got the replacement logic board in time for my scheduled trip on Friday, February 16th. Unfortunately, due to bad weather and other scheduling issues, I didn’t make it there until 3:00 pm, and that didn’t give the technician enough time to do the swap before closing time (4:30 pm). But I was able to show them what the problem was, with kernel panics occurring repeatedly within a few minutes of starting the machine.
So I had to leave the machine with them, and I asked them to let me know how it went.
They got back to me on the following Monday (February 19th) to let me know that they had completed the logic board swap and that it appeared to have fixed the problem. I said I wouldn’t be able to return until the end of the week anyway, and asked them to try and use the machine for a while and also to swap riser cards, just to make sure that the machine was working in all acceptable configurations and under normal usage conditions.
Later in the week, I confirmed with them that the machine was still working as expected, and then I got back there last Friday (February 23rd) and got the machine back. I was able to test it this week-end, and indeed the problem appears to have been fixed. I have two RAM modules (2 x 512 MB) in the top riser card and two RAM modules (2 x 1 GB) in the bottom riser card, which is how the RAM modules are supposed to be arranged, and so far I haven’t had any kernel panics—and presumably interleaving is working as expected now (although I don’t know really know how to determine this, since the overall performance does seem to be much different for the tasks that I use it for).
Is this a happy conclusion to this latest ordeal with Apple hardware? I certainly hope so. I have had my share of those in recent times, and I would really like to have two Mac Pro machines working smoothly in my office for a long while. I have extended Apple Care coverage on both machines, but even with Apple Care coverage, hardware problems are still a major pain in the neck, especially when you live so far away from the closest reliable Apple Service Provider.
I must stress, however, that I have received nothing but great service from the Apple staff that I have dealt with on the phone and from my preferred Apple Service Provider, which is the Hardware Services department of Dalhousie University in Halifax, Nova Scotia. The Apple Care support representative was quick to put me in contact with an Apple engineer once she had determined that the problem was not easily solved with the normal procedures (which, of course, I had already applied before actually calling them), and the Apple engineer himself gave me his direct contact information and was quick to respond whenever I need him to. And the Apple Service Provider staff was also really helpful, working hard to try and minimize the amount of travelling required for me.
After all these recent experiences, I cannot help but wonder, in one corner of my mind, about what happens to the defective parts that are returned to Apple. Do they do any further testing on them? Do they investigate the parts any further to try and determine what the exact source of the problem was? All the Apple Service Provider people do, after all, is swap parts. They don’t actually find the specific source of the problem. They are neither trained nor qualified to do so. (This was, in fact, the first Mac Pro logic board swap for this particular technician in Halifax.)
I cannot help but wonder whether the operations of a corporation such as Apple allows for further investigative work on such issues, or if they simply throw the defective parts in the trash. I know that, from an engineering point of view, I would like to know exactly what part of the logic board was defective (probably the RAM bus for the lower riser card) and was exactly was defective in it, in order to avoid such problems in future parts in manufacturing.
But I also wouldn’t be surprised to hear that this only happens after a significant number of defects have been noticed by whatever monitoring system they have in place. Until then, logic board swaps such as the one that was required for my machine are probably nothing bug small statistical blips of little significance in the bigger picture.
It still costs me two days of travel time and a big gas bill, though! Definitely not just a small “blip” in my own computing life. Which is why I am really sincerely hoping that I will be spared further blips for a while…