Pasting accented text from Preview to Word: How to get the accented characters right

Posted by Pierre Igot in: Macintosh
September 21st, 2004 • 10:55 pm

Copying and pasting text from a PDF file into a Word document has never been a very reliable proposition. The text selection tool in PDF viewers tends to be pretty poor, and then there can be all kinds of formatting issues.

With Mac OS X’s Preview, there can be an additional difficulty when attempting to copy French text which contains accented characters. Compare the following two lines in Word:

Accented characters from Preview

As you can see, there is a slight problem with the acute accent on the “e” in “dépend” in the first line. It comes from having copied the text from a PDF file in Preview and pasted it directly into a Word document. Obviously Word’s Unicode support does not go as far as to support text pasted from Preview properly. The accent is way too high above the letter.

Yesterday, I mentioned a hint at Mac OS X Hints that explains what’s going on here. In a nutshell, it’s a composed vs. decomposed Unicode issue.

In his comment on yesterday’s blog entry, ssp was kind enough to refer me to his own UnicodeChecker utility, indicating that it might provide a solution.

I am glad to report that it does. After launching UnicodeChecker, you just invoke the “New Utility Window” command in the “File” menu, and then click on the “Normalize” icon. Then you paste the text copied from a PDF file in Preview into the “Input” field, and choose the “Normalize” option in the menu underneath the field.

The “Normalize” option provides four different fields with four different copies of your text. And the important thing to know is that the fields ending in “C” (“NFC” and “NFKC“) use composed Unicode, in which the accent is actually part of the accented letter character, and not a separate entity.

After that, you just need to copy the text from one of these “C” fields and paste it into Word, and you’ll get traditional accented characters that Word fully supports.

Thanks again to ssp.


Comments are closed.

Leave a Reply

Comments are closed.