Mac OS X 10.6 (Snow Leopard): More Preview weirdness
Posted by Pierre Igot in: MacintoshMarch 22nd, 2010 • 3:53 pm
I am just not having much luck with Mac OS X’s Preview application these days. After the problem with weird blocks of text rendered as black and grey boxes discussed last week, here’s another one I encountered this week-end while working with a French-language PDF document.
As with the last problem, I am able to reproduce Preview’s weird behaviour with a single page extracted from the PDF document in question, but it applies to the entire document. Here’s the sample page:
If I open this sample page in Preview and then double-click on a word—any word—that ends with an accented “é,” like so:
then when I copy the selection to the Clipboard and attempt to paste it elsewhere in a Mac OS X, I get this:
qualite
In other words, I get the word with a regular “e” instead of the accented “é.”
I can reproduce this with any selection whose very last character is an accented “é.” If the accented character is not the very last character, then there is no problem and the text is copied and pasted faithfully. But if the last character of the selection is an “é,” then somehow the accent gets lost during the copy/paste process.
I cannot help but remember, of course, the problem I had with earlier versions of Preview for Snow Leopard where some accented characters were not displayed properly. This problem was fixed, but maybe there’s some underlying cause rooted in the very way that Preview handles accented characters. Since our computers and their software are still primarily designed by English-speaking people, the likelihood of such bugs—which primarily affect speakers of languages other than English—slipping through unnoticed is obviously significantly higher.
This time, according to Acrobat Pro, the original PDF document from which this page is taken was authored with Adobe InDesign CS4 and produced with “Adobe PDF Library 9.0.” This is consistent with what I get when I export an InDesign CS4 publication as a PDF file myself from within InDesign, so it looks like the PDF in question was authored and produced directly in InDesign CS4. (I cannot tell whether it was the Mac or PC version, but I don’t suspect it makes much difference.)
More generally, though, I must say that the work I have been doing in the past couple of weeks, which involves downloading a lot of different PDF documents from a lot of different sources and having to sometimes copy/paste text taken from these documents, has been quite revealing. I have really seen all kinds of weird text selection and copy/paste behaviours.
If Apple’s ambition is really to offer a “smart” PDF viewer that provides useful tools for working with PDF files, then they clearly still have a way to go. At the same time, it definitely looks like there are many, many “flavours” of PDF out there, and I am not just taking about the different “versions” of PDF officially supported and documented by Adobe and other providers. I am also talking about the various programs used to author the documents that end up being exported as PDF files, and the various tools used to perform that exporting procedure.
Some people use Adobe’s Acrobat products, which include a “virtual printer” that takes your application’s output via its printing feature and turns it into a PDF file. (It’s similar to the functionality built into Mac OS X, except that of course you have to purchase an expensive third-party product and deal with the “virtual printer” metaphor.) Others use a professional page layout application such as InDesign and its own built-in PDF exporting feature.
And others still probably use all kinds of other third-party tools that include some sort of PDF exporting procedure that works more or less well and produces PDF files of varying degrees of quality. One way to estimate the “quality” of the PDF is precisely in how easy it is to select text within it and copy the text (when you are authorized to do so, that is). Text selectability—or the lack thereof—is an intriguing indicator of the underlying “readability” or “smartness” of the document. And PDF files can only be as smart as the tools used to create them and the file format itself.
If I had the opportunity to explore this issue further from an engineering point of view, I wouldn’t be surprised to discover that, underneath it all, the fact that PDF exporting and printing are two related processes probably accounts for at least a portion of the on-going problems with the format. After all, when you want to get something printed, the important thing is how things will look on the printed page. It is not how easy it will be to select and copy the text on the page! I wouldn’t be surprised to discover that printer drivers and other print-oriented software tools use all kinds of hacks that they can get away with in the context of the process of printing on paper, because there is no need for documents to be “smart” and user-friendly in that context. The only thing that matters is the printed output.
The use of PDF as a format for sharing documents electronically, however, adds a whole new dimension to the traditional “printing” process and makes these “smart” features increasingly important. The addition of a “smarter” selection tool in Snow Leopard’s Preview is a good improvement, but, as we’ve seen before, it still needs work—and, in any case, it’s only a way to work around the limitations of the PDF file format. Because ideally the PDF file itself should “know” the way that text flows in a column-based layout and support smart text selection accordingly.
But it does not, and that limitation in the PDF file format itself is probably a sign of its print-oriented legacy. I suspect we’ll still have to deal with the frustrations associated with working with PDF files for many more years, until either PDF viewing tools such as Adobe Reader and Preview are improved to the point that they can work around all the limitations of the file format, or PDF as a file format itself is superseded by some other kind of user-friendly file format for sharing printable documents electronically.