Mail 2.0: The drawback of individual message files
Posted by Pierre Igot in: MailMay 14th, 2005 • 4:03 am
Mac OS X 10.4’s version of Mail introduces a major architectural change when it comes to the way e-mail messages are stored on your hard drive. In Mail 1.x, e-mail messages are not stored as individual files, but as part of the mailbox that they are stored in. The mailbox itself is stored as a “.mbox
” file, which is actually a package containing a handful of files (an index file, a text file containing the actual text of the messages stored in the mailbox, and a couple more things).
With Mail 2.0, the “.mbox
” packages have become actual folders, which contain an info.plist
file and then a folder called “Messages” in which each message stored in the mailbox is stored as a separate file with an “.elmx
” suffix.
This architectural change is not visible from within the Mail application itself. You actually have to open your “Mail” folder manually and look inside the “Mailboxes” sub-folder to see the change.
But there is one aspect of the Mac OS X interface where this architectural change can most definitely be perceived by the end user — and that’s what happens when you try and create a backup of your “Mail” folder.
Since each message is now stored as a separate file, the actual number of discreet files stored in your “Mail” folder has now increased by a huge factor. Just imagine: If you had a mailbox in Mail 1.x containing a thousand messages, on your hard drive this mailbox would consist of a handful of files inside a package. But now in Mail 2.0, your mailbox on the hard drive consists of a folder containing a thousand files!
And the difference can most definitely be felt when you try to backup your “Mail” folder — which you should do on a regular basis. For example, my own “Mail” folder, which contains approximately 10 years of daily e-mail correspondence, now contains over 60,000 files. That’s a lot of files.
(Thank God for the HFS+ hard drive format and its ability to keep the minimum file size of small files reasonably low! Apple would definitely never have been able to develop this technology on regular HFS drives, where every single file on your hard drive, regardless of how small it was, always would take up at least the equivalent of your hard drive capacity divided by 65536. Divide 120 GB by 65536 and you’ll get any idea of how “small” such files would be.)
Unfortunately, this higher number of files also means a significant increase in overhead for the Finder in its file copying processes. So now if you try to copy your “Mail” folder to an external FireWire hard drive or your iPod, for example, be prepared to wait a lot longer than you used to. It all depends on how big your e-mail archive is, of course. Not everyone archives older e-mail as religiously as I do. But I can tell you that copying a folder containing 60,000 files in the Finder takes a significant chunk of time, even with a fast machine and fast hard drives.
In fact, it is so painful that I am now using a different approach. Instead of copying my “Mail” folder to an external FireWire drive in its original form, I now archive it first with the Finder’s “ ” command. The archiving process takes a long time too: On my machine, the Finder’s initial estimate is for over 30 minutes! In truth, this estimate is way off target and the process only actually takes about five minutes — but that’s still a long time for a folder that actually weighs approximately 600 MB.
The benefit of doing this (archiving the folder first and copying the archive) instead of copying the folder to an external hard drive is that Finder operations involving the internal hard drive only are much faster than Finder operations involving both the internal hard drive and an external FireWire hard drive — especially when there are lots of small files. While copying my 600 MB “Mail” folder directly to my external FireWire hard drive would take nearly 15 minutes, the archiving process takes about 5 minutes, and copying a 250 MB archive — a single file for the Finder — onto the external FireWire hard drive only takes a few seconds.
So if, like me, you have a large archive of e-mail and want to back it up regularly, I definitely recommend that you use this approach instead of trying to copy the folder directly.
May 14th, 2005 at May 14, 05 | 11:06 am
A few remarks:
1. I think you’re missing the irony with your comment on old HFS. The main problem with that wouldn’t be the large file size but the fact that you wouldn’t be able to store more than 2^16 files on any drive because the maximum number of files is so low.
2. Copying from an internal to an external drive will be faster than copying from the internal drive to itself. The way you choose is faster because the zip file is a single file and can almost be written ‘in one go’ rather than having to create thousands of new files. You will probably see even better performance when copying to a zip file on the external drive.
3. Depending on how optimised this is, it may be worth trying to copy the folder to a disk image which has the advantage that you can use the files on it right away. I’m pretty sure that with some Unix magic you could get your machine always mount a disk image to where your Mail folders are and then have all of your Mail effectively in one place.
4. Yup, those gazillions of files suck.
May 16th, 2005 at May 16, 05 | 6:05 am
1. The irony was quite intentional :).
2. Interesting, but not in my experience. It all depends on the tech specs. On my PowerBook G4 (400 MHz, 384 MB of RAM), for example, with a 4-year-old external FireWire drive from OWC, it is much faster to copy a single 300 MB Zip file from the external drive to the PB’s internal drive than to copy 60,000 files! The fastest thing on my G4 desktop would probably be to archive on an external hard drive directly, but that is not possible with the Finder’s current “Create Archive of” command.
3. Good idea. Trouble is I’d have to create a disk image with a fixed size, and my Mail folder will continue to grow and grow over time…
4. There are pros and cons. The fast Spotlight searches are certainly a “pro”.
June 6th, 2005 at Jun 06, 05 | 6:09 am
One advantage to this new scheme: My old single mbox file was very large, several hundred MB. I use Retrospect to do incremental backups every night. Almost all the backup time and disk space was used backing up mbox, because it changes even if I receive only one short email. So every backup added another 200 MB to my backup file.
Since I didn’t know this change had been made to Mail, I haven’t checked to see if this will imrpove the backup efficiency. But it seems like it should.
June 6th, 2005 at Jun 06, 05 | 6:16 am
Indeed, it should make a significant difference for incremental backups.
June 28th, 2005 at Jun 28, 05 | 7:54 pm
How about using rsync? This only sends the changed files and would be significantly faster than copying the entire file each time.
June 29th, 2005 at Jun 29, 05 | 12:21 am
I tend to focus on user-friendly solutions :).