November 28th, 2003 • 7:39 am
A couple of days ago, I was visited by someone who needed help putting together a book on the genealogy of Acadian families. He was personally a Mac user and was in charge of putting the book together, but needed help on how to actually do it in a way that didn’t require an enormous amount of manual work.
When he first contacted me about it and told me that he had an iMac running Mac OS X, I naturally recommended my favourite page layout program, i.e. Adobe InDesign. He had a copy of PageMaker 7.0, but of course that would only run under Classic, and PageMaker had too many issues to mention anyway. (There is a reason why Adobe started again from scratch in order to come up with the Quark XPress-killer InDesign.)
The major issue was not page layout per se, but the fact that the book needed to have an extensive index of people’s names. I wasn’t too familiar with InDesign’s indexing features, but based on what I had been able to gather, putting together such an index was certainly feasible. I wasn’t entirely sure how easy or tedious it might turn out to be, however.
This person came with a copy of the previous edition of the book (from 2 or 3 decades ago, I think) and a CD containing some sample files that would be used as the basis for the new book.
It turns out that these sample files were actually RTF (Rich Text Format) files produced by the genealogy software that they have been using to collect all the genealogy information. I am not sure which software it was, but it might have been Reunion.
In any case, I initially thought these were just regular documents contained formatted text that would have to be manually indexed in InDesign. So I tried placing one of them in a blank publication in InDesign 2.0, and opened InDesign’s “Index” palette.
To my great surprise, this “Index” palette already contained numerous index entries! After a bit of experimenting, I was able to determine that the RTF files produced by the genealogy software already contain indexing information, presumably in a standard format that is part of the RTF specification and that InDesign is able to import into its own indexing feature. I certainly wasn’t aware, prior to this, that the RTF document format could include such information.
Obviously, this meant that the indexing work might turn out to be much less demanding than previously thought. But a closer look at the index information after placing the document in InDesign 2.0 soon revealed some problems. Since we are talking about Acadian genealogy, even though the text itself was in English, there were many French names with accented characters. InDesign had preserved the accented characters in the text itself, but unfortunately all the accents in the names as they appeared in the index information were screwed up. There was either a problem in the RTF files themselves or with InDesign’s RTF importing filter.
I opened the RTF files in BBEdit in order to have a look at the raw RTF data. I was thus able to quickly determine that the encoding used for the accented characters in the (visible) text was the same as the encoding used for the accented characters in the (hidden) index information. So the problem didn’t seem to be with the RTF files themselves, but with InDesign’s filter.
As an experiment, I created a new document in Word, typed a few accented characters, saved it as RTF, and opened it in BBEdit. The RTF data looked drastically different! But more specifically the codes for the accented characters appeared to be different. I then wrote down the code for the accented “é” in this new file, and found an “é” in an index entry in the genealogy RTF file and replaced the code with the new one. I then placed the document in InDesign, and the accented character in question was now correct in the index palette as well. In other words, I had a “manual” workaround for the problem with the InDesign filter. I could just do a batch search/replace in the RTF files using BBEdit before placing the files in InDesign.
But then after placing a couple of files and trying to do an unrelated search/replace for something else in the text itself (not the index information) in InDesign 2.0, we started experiencing application crashes. InDesign would simply unexpectedly quit each time I tried to do the search/replace operation. Obviously placing the indexed RTF files in a blank InDesign 2.0 document produced a highly crash-prone environment. So this wasn’t going to work. Being able to do search/replace operations on the imported text itself was essential.
Since I had just installed InDesign CS (a.k.a. InDesign 3.0) a few days before, I figured I might as well give it a try. I opened the InDesign 2.0 publication in InDesign CS and tried to do a search/replace on the text and… it worked!
So I tried to start from scratch and create a new blank InDesign 3.0 document and place a couple of sample indexed RTF files from the genealogy software. Much to my surprise, I saw that the accented characters in the index information in InDesign were actually correct! In other words, the RTF import filter in InDesign CS has been improved and now imports accented characters in hidden index information in RTF files properly as well.
This was obviously very good news, since it meant that we wouldn’t have to do any of that manual RTF data editing in BBEdit on the RTF files before placing them in InDesign.
Unfortunately, after working on this newly created file with RTF imports in InDesign CS for a while, I soon found out that things weren’t as rosy as first hoped. Yes, the accented characters in the index information were now imported properly, but working on the document once the RTF files had been imported, I still experienced far too many application crashes.
InDesign’s “file recovery” feature following an application crash is reasonably effective — it usually remembers everything except for the last few changes — but such instability is still unacceptable.
I don’t know if the crashes were due to the imported RTF documents or simply to the fact that I was working on indexed text in InDesign CS — or even simply to the fact that I was working in InDesign CS under Panther. I will have to use InDesign CS on a more regular basis in order to determine how stable it actually is.
The afternoon ended with mixed emotions. On the one hand, I had discovered that RTF files produced by genealogy software and InDesign can be a pretty powerful combination and quickly produce an extensive index of proper nouns without any manual intervention when it comes to the indexing process itself — and that InDesign CS actually includes significant improvements over InDesign 2.0 in that respect. On the other hand, InDesign CS turned out to be far too unstable for my taste. I hope that we’ll be able to resolve these stability issues in a satisfactory manner soon.