July 13th, 2007 • 12:29 pm
This is a strange one.
I have this PDF file that I have to translate. I typically use Mac OS X’s Preview for viewing PDF files, simply because it is much more compact and efficient than Adobe’s bloated applications. But it has its flaws, obviously. But this is a really strange problem.
Since I am translating the document, I frequently have to select phrases in the text and copy and paste them in other applications (usually a terminology database form or the Google search field in Safari). Most of the time, this task involves strings of text consisting of whole words. So I typically use a double-click on the PDF file to select the first word in the phrase that I want, and then drag my mouse to extend the selection word by word until I have everything.
The strange thing is that, in this particular document, whenever the word that I double-click on has a punctuation mark attached to it, be it a comma or a semicolon, Preview automatically selects not only the word that I am double-clicking on, but also the next word, even though there clearly is a space after the punctuation mark.
Here is an example. In the picture below, I have just double-clicked on “council.” This should only have selected the word “council” itself, and then maybe the comma that comes after it. (Unfortunately, text selection tools are not always smart enough to exclude punctuation marks from the selection.)
Yet here is what happened:
This is quite strange, in that there clearly is a space after the comma. I tried to reproduce this elsewhere in the same PDF file, and indeed, in Preview, whenever I double-click on a word followed by a comma or a semi-colon, the application automatically selects the next word at the same time. Worse still: if I double-click on the first word in a series of words separated by commas, Preview selects the entire series at once!
To confirm that there is a problem in Preview with this file, I tried copying the selected phrase and pasting it in a word processor or text editor. When I do this, Mac OS X actually inserts “council,school,” i.e. without a space after the comma between “council” and “school.”
In other words, there clearly is something in this particular file that causes Preview to think that there is no space here. Of course, if it thinks that there is no space, it is logical that it also thinks that it is a single word. And so a single double-click on it selects the whole thing.
Fortunately, Preview does not have the same problem with other PDF files—only this particular one.
Out of curiosity, I opened the same PDF file in Acrobat Professional, and tried to reproduce the same problem, to no avail. When I double-click on “council” in the same PDF in Acrobat Pro, Acrobat correctly selects the “council” word only:
So Acrobat clearly recognizes the spaces after commas and semi-colons in this PDF file as normal spaces. But for some reason Preview doesn’t.
At this point, I started wondering whether the PDF file was created with some Windows authoring program that would have used non-standard space characters that Mac OS X’s Preview might not recognized. So I looked at the document information in Acrobat for this particular file, and here’s what I saw:
In other words, the PDF file in question was actually authored on a Mac, with Acrobat Distiller 6!
This makes the behaviour in Preview all the more puzzling. Surely PDF files generated by the Mac version of Acrobat Distiller should behave properly when opened in Mac OS X’s Preview… Yet this one clearly does not.
I am afraid there is not much point in investigating this any further. There is just no way that I can get a hold of the graphic designer who actually authored this document. There are too many intermediates. I guess I’ll just have to use an Adobe application for this particular PDF file.
But this particular incident shows that, unfortunately, the basic premise of the PDF file format, which is to provide a document format that can be read and used by everyone on all platforms with free reader applications, has yet to be fully realized. And Preview, as a PDF reader application, does have a number of annoying and unexplained issues with certain PDF files that Adobe’s applications don’t seem to have.