Character encoding in e-mail: having to deal with GroupWise crap in 2004
Posted by Pierre Igot in: TechnologyDecember 9th, 2004 • 8:41 am
We are in 2004 and, unbelievably, I am still experiencing problems with accented characters in e-mail messages. This is the kind of issue that should have been fixed long ago, but due to lousy software and poor compliance with universal standards, it’s still with us today.
I work for our provincial government here and they all use (as far as I can tell) this horrible GroupWise thingy sold by Novell as some kind of “integrated solution”. I am sure it helps make all our government employees super efficient in their work. But of course, GroupWise is primarily an English-language product, designed by an American company, and used by millions of English-speaking users… In all likelihood, they never experience any major problems with character encoding.
I, on the other hand, am a translator, and constantly have to switch between English and French, including in my e-mail correspondence with these GroupWise users. And sure enough, while I tend to have very few character encoding problems with other people that I work for, I regularly encounter character encoding problems with these people.
It also does not help that I am a Mac user, whereas they are all running GroupWise under Windows. For well-known reasons, the ISO-Latin1 character set and the default Windows character set have a large number of Roman characters in common, whereas the Mac Roman character set is markedly different for accented characters, which requires additional translation. What this means is that, when the character set translation between two Windows users doesn’t work properly, they might not notice it, because there is enough in common between ISO-Latin1 and the default Windows character set to make it look as if it were working fine.
When it doesn’t work properly between a Windows GroupWise user and a Mac user like me, on the other hand, it is immediately noticeable, with common French characters such as “é” and “à” getting screwed up right away.
Now, things used to be worse with Eudora. When I was using Eudora, the accented characters in messages to GroupWise users would often simply disappear altogether. It one thing to get “protÈgÈ” instead of “protégé“. It’s quite another to get “protg“.
At least this no longer happens with Mac OS X’s Mail. Still, when you scratch the surface, you quickly find out that this particular problem is probably a combination of GroupWise idiocy and overzealous Mail behaviour.
For some reason, even after all these years, Mail still uses the dreaded quoted-printable
encoding scheme to encode accented characters — even when it includes the proper ISO-Latin1 tags in the e-mail headers that it sends out. And for some reason, it’s still impossible to turn this quoted-printable
encoding scheme off, and to force Mail to use another character set (such as Unicode) by default for e-mail composing. Mail always reverts to the default “ ” for text encoding, and this “ ” setting appears to always involve the use of quoted-printable
.
At the other end of the line, though, GroupWise should be able to handle quoted-printable
just fine, shouldn’t it? And ISO-Latin1 as well? Well, in my experience, it doesn’t. And since it does handle it fine sometimes, and does not some other times, I suspect it’s either a bug in the product, or a setting that is not adjusted properly by default and has only been adjusted properly on some people’s machines, and not on other people’s computers.
Today, for example, I was having a discussion about translation issues with two different English-speaking employees from the governement. My text with accented characters, when quoted in one person’s e-mails, looked fine. In the other person’s e-mail, the accented characters in my quoted text were all replaced with… an asterisk.
I took a closer look at the raw headers for both messages, and they both come from the same mail server, and both include this line in their headers:
X-Mailer: Novell GroupWise Internet Agent 6.5.2
To me, this indicates that they are both using the same version of the software. In one message, however, the header also included:
Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
which is the same header information that Mail uses. In the other e-mail, there was only this:
Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part5777A36E.1__="
Guess which one had the accented characters preserved properly?
Now, I am pretty sure that the author of that second message did not put this header information there deliberately. So obviously there’s some configuration problem with his particular set-up. But of course, these are people working in an environment 300 km from here, and I am not supposed to do their technical trouble-shooting for them. They have their own technical support staff! (Needless to say, I am my own technical support staff, and a few other people’s as well…)
So here we are, in 2004, still struggling with stupid character encoding issues when we really shouldn’t have to. The thing is that, in a minority environment such as ours, it’s not a deal-breaker issue. Most English-speaking people here are used to accented characters being mangled by their computers. So I guess they just don’t bother trying to fix the issue. What else can I do? I’m in a double minority: French-speaking and Mac-using. It’s pretty much hopeless.
December 10th, 2004 at Dec 10, 04 | 5:29 am
Thanks for clarifying the role of quoted-printable. It’s kind of sad that there are still obsolete mail servers out there that can’t even handle 8bit messages.
In this case, however, I still suspect that the crappy part is GroupWise, either in its default configuration or in whatever config the person in question is using. I can’t really tell him to “get a decent client”, as GroupWise is the kind of thing that’s deployed department-wide by IT people and people have no choice but to use it.
December 10th, 2004 at Dec 10, 04 | 6:09 am
There is a way to set outgoing mail encoding, but it is not documented,
defaults write com.apple.mail NSPreferredMailCharset name-of-charset
where name-of-charset for you should be UTF-8 or UTF8 I guess
December 10th, 2004 at Dec 10, 04 | 3:58 am
Don’t mix charset and transfer encoding. The former is the encoding used to represent the characters, like ISO-8859-1 or UTF-8 and the latter is the mechanism used to transfer the bytes across the mail servers, like quoted-printable (that converts bytes > 128 to =XX in hex) or 8bit.
For example, I am using the UTF-8 charset with the quoted-printable transfer-encoding (therefore an é is transmitted as =C3=A9), because some obsolete mail servers refuse 8bit messages. In the 8bit case, the messages never arrives. With quoted-printable, the worst that can happen is that the recipient cannot read national characters, in which case I can always tell him to get e decent client, it’s easier that having his ISP change server.
December 10th, 2004 at Dec 10, 04 | 6:36 am
Cool. Thanks for the tip, Julik!