<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Unicode, WordPress, Panther Server and BBEdit: UTF-8 with or without BOM</title>
	<atom:link href="http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/</link>
	<description>Notes from an unfinished world…</description>
	<lastBuildDate>Thu, 15 Jul 2010 12:49:25 +0200</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Character Encodings &#171; Mr Chimp Learns to Write</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-8558</link>
		<dc:creator>Character Encodings &#171; Mr Chimp Learns to Write</dc:creator>
		<pubDate>Wed, 09 Dec 2009 14:50:15 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-8558</guid>
		<description>[...] Here are some more links: Character Encoding Issues UTF-8 With or without BOM UTF/BOM [...]</description>
		<content:encoded><![CDATA[<p>[...] Here are some more links: Character Encoding Issues UTF-8 With or without BOM UTF/BOM [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Musings &#38; Meanderings &#187; Blog Archive &#187; Waylaid by the BOM in UTF8</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-7544</link>
		<dc:creator>Musings &#38; Meanderings &#187; Blog Archive &#187; Waylaid by the BOM in UTF8</dc:creator>
		<pubDate>Wed, 24 Oct 2007 17:53:31 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-7544</guid>
		<description>[...] Although I found many entries in help forums by webmasters waylaid by BOM, the only formal faq I’ve found on it is by Sun and Unicode. The Wikipedia entry refers to this being a problem with Unix and not Windows servers, and I’ve read that including the BOM in UTF-8 by default was one of those unilateral Microsoft decisions. Here also is a post by WordPress blogger Pierre, and a related issue post on translating character sets and collation in WordPress. [...]</description>
		<content:encoded><![CDATA[<p>[...] Although I found many entries in help forums by webmasters waylaid by BOM, the only formal faq I’ve found on it is by Sun and Unicode. The Wikipedia entry refers to this being a problem with Unix and not Windows servers, and I’ve read that including the BOM in UTF-8 by default was one of those unilateral Microsoft decisions. Here also is a post by WordPress blogger Pierre, and a related issue post on translating character sets and collation in WordPress. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Musings &#38; Meanderings &#187; Blog Archive &#187; Waylaid by the BOM in UTF8</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-7497</link>
		<dc:creator>Musings &#38; Meanderings &#187; Blog Archive &#187; Waylaid by the BOM in UTF8</dc:creator>
		<pubDate>Mon, 08 Oct 2007 02:02:56 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-7497</guid>
		<description>[...] Although I found many entries in help forums by webmasters waylaid by BOM, the only formal faq I&#8217;ve found on it is by Sun and Unicode. The Wikipedia entry refers to this being a problem with Unix and not Windows servers, and I&#8217;ve read that including the BOM in UTF-8 by default was one of those unilateral Microsoft decisions. Here also is a post by WordPress blogger Pierre, and a related issue post on translating character sets and collation in WordPress. [...]</description>
		<content:encoded><![CDATA[<p>[...] Although I found many entries in help forums by webmasters waylaid by BOM, the only formal faq I&#8217;ve found on it is by Sun and Unicode. The Wikipedia entry refers to this being a problem with Unix and not Windows servers, and I&#8217;ve read that including the BOM in UTF-8 by default was one of those unilateral Microsoft decisions. Here also is a post by WordPress blogger Pierre, and a related issue post on translating character sets and collation in WordPress. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zach</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1790</link>
		<dc:creator>Zach</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1790</guid>
		<description>Agreed - the options should be &quot;UTF-8&quot; and &quot;UTF-8, with BOM&quot;. I wish they would have just completely left out the option for a BOM for UTF-8, cause it&#039;s caused me trouble in the past too. </description>
		<content:encoded><![CDATA[<p>Agreed &#8211; the options should be &#8220;UTF-8&#8243; and &#8220;UTF-8, with BOM&#8221;. I wish they would have just completely left out the option for a BOM for UTF-8, cause it&#8217;s caused me trouble in the past too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pierre Igot</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1791</link>
		<dc:creator>Pierre Igot</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1791</guid>
		<description>Olivier and Zach: Thanks for the clarifications. I did suspect something like this, but don&#039;t know enough about PHP to understand the specifics. Still, the BOM doesn&#039;t seem to bother PHP in root level files…

I think what happens here is that the BOM doesn&#039;t seem to bother PHP for root level files, but it bothers PHP if it&#039;s included in the &quot;header&quot; (beginning) of an &quot;include&quot; file, because obviously an include file is actually inserted inside an existing file when building the dynamic pages. So we end up with a page that contains a BOM somewhere in the middle of it… and that doesn&#039;t work.

However, based on what you guys are saying, I should probably get rid of the BOM everywhere. Intuitively, it doesn&#039;t make much sense to have a BOM for UTF-8. I wonder why it even is an option.</description>
		<content:encoded><![CDATA[<p>Olivier and Zach: Thanks for the clarifications. I did suspect something like this, but don&#8217;t know enough about PHP to understand the specifics. Still, the BOM doesn&#8217;t seem to bother PHP in root level files…</p>
<p>I think what happens here is that the BOM doesn&#8217;t seem to bother PHP for root level files, but it bothers PHP if it&#8217;s included in the &#8220;header&#8221; (beginning) of an &#8220;include&#8221; file, because obviously an include file is actually inserted inside an existing file when building the dynamic pages. So we end up with a page that contains a BOM somewhere in the middle of it… and that doesn&#8217;t work.</p>
<p>However, based on what you guys are saying, I should probably get rid of the BOM everywhere. Intuitively, it doesn&#8217;t make much sense to have a BOM for UTF-8. I wonder why it even is an option.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olivier</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1792</link>
		<dc:creator>Olivier</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1792</guid>
		<description>A quick guess:

The BOM (Byte Order Mark) is composed of two invisible characters at the very beginning of the file. They&#039;re used to distinguish between big endian and little endian byte order in UTF-16. There is also a BOM for UTF-8, even though there is no byte order issue with it.

When a PHP script uses the header function to send special HTTP headers (like a graphical counter that sends a Content-Type: image/png header), it must happen before any content is produced. Therefore, the php opening tag must be the very first characters in the file. When there&#039;s a BOM, the two invisible characters composing it are treated as content and the header function doesn&#039;t work anymore.

I remember reading that using a BOM with UTF-8 is not recommended, for this reason among others. I agree it&#039;s not clear in BBEdit that UTF-8 no BOM is actually the &quot;standard&quot; encoding and UTF-8 tout court is the special one.</description>
		<content:encoded><![CDATA[<p>A quick guess:</p>
<p>The BOM (Byte Order Mark) is composed of two invisible characters at the very beginning of the file. They&#8217;re used to distinguish between big endian and little endian byte order in UTF-16. There is also a BOM for UTF-8, even though there is no byte order issue with it.</p>
<p>When a PHP script uses the header function to send special HTTP headers (like a graphical counter that sends a Content-Type: image/png header), it must happen before any content is produced. Therefore, the php opening tag must be the very first characters in the file. When there&#8217;s a BOM, the two invisible characters composing it are treated as content and the header function doesn&#8217;t work anymore.</p>
<p>I remember reading that using a BOM with UTF-8 is not recommended, for this reason among others. I agree it&#8217;s not clear in BBEdit that UTF-8 no BOM is actually the &#8220;standard&#8221; encoding and UTF-8 tout court is the special one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike P.</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1795</link>
		<dc:creator>Mike P.</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1795</guid>
		<description>Yeah,

I&#039;ve had BOM problems when coding up PHP with Dreamweaver and utf-8, and found that Topstyle works well for hunting them down and getting rid of them. 

Maybe there&#039;s a better way in DW… but this works..

</description>
		<content:encoded><![CDATA[<p>Yeah,</p>
<p>I&#8217;ve had BOM problems when coding up PHP with Dreamweaver and utf-8, and found that Topstyle works well for hunting them down and getting rid of them. </p>
<p>Maybe there&#8217;s a better way in DW… but this works..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ssp</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1797</link>
		<dc:creator>ssp</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1797</guid>
		<description>Huh, this is strange. I was under the impression that no BOM is needed in UTF-8 (as UTF-8 is, well, made up of 8-bit pieces). The only case where it might be needed is to give programs a hint that a file is Unicode. But I thought that&#039;s more of a dirty trick and not necessary if the software is otherwise informed of that fact. 

Being a traditional BBEdit &#039;disliker&#039;, I&#039;ll take it as another hint at that application&#039;s particular &#039;quality&#039;. ;)

Or take it from the horse&#039;s mouth: &lt;a href=&quot;http://www.unicode.org/faq/utf_bom.html&quot; title=&quot;http://www.unicode.org/faq/utf_bom.html&quot;&gt;http://www.unicode.org/faq/utf_bom.html&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Huh, this is strange. I was under the impression that no BOM is needed in UTF-8 (as UTF-8 is, well, made up of 8-bit pieces). The only case where it might be needed is to give programs a hint that a file is Unicode. But I thought that&#8217;s more of a dirty trick and not necessary if the software is otherwise informed of that fact. </p>
<p>Being a traditional BBEdit &#8216;disliker&#8217;, I&#8217;ll take it as another hint at that application&#8217;s particular &#8216;quality&#8217;. ;)</p>
<p>Or take it from the horse&#8217;s mouth: <a href="http://www.unicode.org/faq/utf_bom.html" title="http://www.unicode.org/faq/utf_bom.html">http://www.unicode.org/faq/utf_bom.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olivier</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1802</link>
		<dc:creator>Olivier</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1802</guid>
		<description>ssp: BBEdit has many ways of figuring out the encoding of text files when it first encounters them. It supports a number of methods that are likely to be used by different people using different programs, and I&#039;m glad that it does not limit itself to the ones *you* like to use. A BOM in a UTF-8 file can exist (and is legal, according to unicode.org) so BBEdit is able to read it when it encounters it. Reciprocally, it is capable of writing in any character set it reads and does not play cop by telling you how you should save your documents. If you like simpler applications that take you by the hand using a wizard whenever you save a document, fine, but it does not mean that a more powerful tool is necessarily of lower &#039;quality&#039;.

Pierre: I made a quick test with BOMs and PHP. The problem is indeed with the header() function. If you put anything before the php block in the file (a blank line, some text), PHP fails with the error you mentioned. If the file is a BOMmed UTF-8, PHP treats the BOM as text and fails the same way. It does not seem to like any flavour of UTF-16, though.</description>
		<content:encoded><![CDATA[<p>ssp: BBEdit has many ways of figuring out the encoding of text files when it first encounters them. It supports a number of methods that are likely to be used by different people using different programs, and I&#8217;m glad that it does not limit itself to the ones *you* like to use. A BOM in a UTF-8 file can exist (and is legal, according to unicode.org) so BBEdit is able to read it when it encounters it. Reciprocally, it is capable of writing in any character set it reads and does not play cop by telling you how you should save your documents. If you like simpler applications that take you by the hand using a wizard whenever you save a document, fine, but it does not mean that a more powerful tool is necessarily of lower &#8216;quality&#8217;.</p>
<p>Pierre: I made a quick test with BOMs and PHP. The problem is indeed with the header() function. If you put anything before the php block in the file (a blank line, some text), PHP fails with the error you mentioned. If the file is a BOMmed UTF-8, PHP treats the BOM as text and fails the same way. It does not seem to like any flavour of UTF-16, though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pierre Igot</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1804</link>
		<dc:creator>Pierre Igot</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1804</guid>
		<description>Thanks to everyone for their input. It does look like the BOM can be an issue with various tools, and not just PHP scripts. I guess it&#039;ll still be years before the use of Unicode is so prevalent that most tools support it &quot;transparently&quot;. Maybe what the BBEdit developers could do is alter their UI a bit so that the user is made aware of the fact that the presence of a BOM in a UTF-8 file can cause problems with various tools.</description>
		<content:encoded><![CDATA[<p>Thanks to everyone for their input. It does look like the BOM can be an issue with various tools, and not just PHP scripts. I guess it&#8217;ll still be years before the use of Unicode is so prevalent that most tools support it &#8220;transparently&#8221;. Maybe what the BBEdit developers could do is alter their UI a bit so that the user is made aware of the fact that the presence of a BOM in a UTF-8 file can cause problems with various tools.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lachlan Hunt</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1806</link>
		<dc:creator>Lachlan Hunt</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1806</guid>
		<description>I had this exact problem when I started learning PHP.  The problem is caused because PHP begins the content from the first non-whitespace character that is not part of any PHP code.  The UTF-8 BOM, U+FEFF, is represented by the octets: 0×EF 0×BB 0×BF.  Thus, when you save a file as UTF-8, the first few characters of the file will look like this, if each octet is interpreted as one characters, as in ISO-8859-1:

&lt;pre&gt;ï»¿&lt;?php&lt;/pre&gt;

Since none of those octets represent white space, PHP assumes it is the beginning of the content, and sends out the all the default HTTP headers, and begins the content with those bytes.  So, when you try to send out additional HTTP headers, the content has already begun, so it?s too late ? you can?t bring back what you?ve already sent.

When I found this out, I read in a forum somewhere, when I was looking for a solution, that this problem either has been, or would be fixed in PHP5.  But I can?t find that forum now, so I can?t be certain.  I just know that my ISP has an old version, and I was forced to locate an editor for windows that allowed by to choose not to output the BOM.</description>
		<content:encoded><![CDATA[<p>I had this exact problem when I started learning PHP.  The problem is caused because PHP begins the content from the first non-whitespace character that is not part of any PHP code.  The UTF-8 BOM, U+FEFF, is represented by the octets: 0×EF 0×BB 0×BF.  Thus, when you save a file as UTF-8, the first few characters of the file will look like this, if each octet is interpreted as one characters, as in ISO-8859-1:</p>
<pre>ï»¿&lt;?php</pre>
<p>Since none of those octets represent white space, PHP assumes it is the beginning of the content, and sends out the all the default HTTP headers, and begins the content with those bytes.  So, when you try to send out additional HTTP headers, the content has already begun, so it?s too late ? you can?t bring back what you?ve already sent.</p>
<p>When I found this out, I read in a forum somewhere, when I was looking for a solution, that this problem either has been, or would be fixed in PHP5.  But I can?t find that forum now, so I can?t be certain.  I just know that my ISP has an old version, and I was forced to locate an editor for windows that allowed by to choose not to output the BOM.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Ingraham</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1816</link>
		<dc:creator>Paul Ingraham</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1816</guid>
		<description>A couple more relevant notes…

Here&#039;s an excerpt from BBEdit&#039;s manual which may help to further clarify the issue:

&lt;blockquote&gt;no BOM: When saving Unicode files, you should always include a byte-order  mark (BOM) so that the reading application knows what byte order the file?s  data is in. For maximum compatibility, the BOM should be used whenever  possible. &lt;b&gt;Use one of the ?no BOM? options only if there is a specific reason to  do so, such as providing compatibility with software that malfunctions when a  BOM is present.&lt;/b&gt; (For purposes of recognition when you use this option, the  UTF-16 BOM is FEFF, and the UTF-8 BOM is EFBBBF.) &lt;/blockquote&gt;

After reading this, I thought, &quot;Okay, sure thing, UTF-8 with a BOM for me, I sure am probably not one of those users with a &#039;specific reason&#039; to use the &#039;no BOM&#039; option.&quot;  So far so good, although from the sounds of the comments here I have probably come within a hair&#039;s breadth of running afoul of the BOM/PHP conflict that y&#039;all have been discussing.  Maybe I &lt;i&gt;am&lt;/i&gt; one of those users with specific reasons to use the no BOM option. :-)

I&#039;ve also discovered that PHP doesn&#039;t like UTF-16.  Lord knows why, I naively experimented with encoding some files as UTF-16, which broke rendering of PHP includes in those pages. Not really surprising in retrospect.</description>
		<content:encoded><![CDATA[<p>A couple more relevant notes…</p>
<p>Here&#8217;s an excerpt from BBEdit&#8217;s manual which may help to further clarify the issue:</p>
<blockquote><p>no BOM: When saving Unicode files, you should always include a byte-order  mark (BOM) so that the reading application knows what byte order the file?s  data is in. For maximum compatibility, the BOM should be used whenever  possible. <b>Use one of the ?no BOM? options only if there is a specific reason to  do so, such as providing compatibility with software that malfunctions when a  BOM is present.</b> (For purposes of recognition when you use this option, the  UTF-16 BOM is FEFF, and the UTF-8 BOM is EFBBBF.) </p></blockquote>
<p>After reading this, I thought, &#8220;Okay, sure thing, UTF-8 with a BOM for me, I sure am probably not one of those users with a &#8216;specific reason&#8217; to use the &#8216;no BOM&#8217; option.&#8221;  So far so good, although from the sounds of the comments here I have probably come within a hair&#8217;s breadth of running afoul of the BOM/PHP conflict that y&#8217;all have been discussing.  Maybe I <i>am</i> one of those users with specific reasons to use the no BOM option. :-)</p>
<p>I&#8217;ve also discovered that PHP doesn&#8217;t like UTF-16.  Lord knows why, I naively experimented with encoding some files as UTF-16, which broke rendering of PHP includes in those pages. Not really surprising in retrospect.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olivier</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1818</link>
		<dc:creator>Olivier</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1818</guid>
		<description>Surely PHP must have its own internal string encoding (UTF-8, UTF-16, what do I care as long as it can hold any character that exists in any language) and it only deals with other charsets on input and output. That is, it converts text to the internal encoding when it&#039;s read (the source encoding must be specified somehow, e.g. with the headers of a HTTP POST request) and also converts it to any encoding the developer sees fit when it&#039;s written somewhere (with a sensible encoding set by default). That&#039;s how software that has to deal with text data must be written. Then functions like strpos() must only be able to deal with strings encoded in PHP&#039;s internal charset.

As for a PHP script encoded in UTF-16, I see no reason why PHP fails to parse it. It has a BOM, PHP should recognise it and act accordingly. What&#039;s inside a PHP block should be converted to whatever encoding PHP likes to parse and the rest should be output as is.</description>
		<content:encoded><![CDATA[<p>Surely PHP must have its own internal string encoding (UTF-8, UTF-16, what do I care as long as it can hold any character that exists in any language) and it only deals with other charsets on input and output. That is, it converts text to the internal encoding when it&#8217;s read (the source encoding must be specified somehow, e.g. with the headers of a HTTP POST request) and also converts it to any encoding the developer sees fit when it&#8217;s written somewhere (with a sensible encoding set by default). That&#8217;s how software that has to deal with text data must be written. Then functions like strpos() must only be able to deal with strings encoded in PHP&#8217;s internal charset.</p>
<p>As for a PHP script encoded in UTF-16, I see no reason why PHP fails to parse it. It has a BOM, PHP should recognise it and act accordingly. What&#8217;s inside a PHP block should be converted to whatever encoding PHP likes to parse and the rest should be output as is.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olivier</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1820</link>
		<dc:creator>Olivier</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1820</guid>
		<description>Why would (NULL)H(NULL)i(NULL)! be so confusing to PHP? All it has to do with it is send it as is to the output, it does not even need to parse it. It&#039;s not a big challenge, it&#039;s a matter of being aware that UTF-16 exists. Unfortunately, that seems to be the problem with many developers, those of PHP among them.

? Unicode Ribbon Campaign ? No ASCII, anywhere
? &lt;a href=&quot;http://ithink.ch/unicode&quot; title=&quot;&lt;http://ithink.ch/unicode&gt;&quot;&gt;&lt;http://ithink.ch/unicode&gt;&lt;/a&gt;
</description>
		<content:encoded><![CDATA[<p>Why would (NULL)H(NULL)i(NULL)! be so confusing to PHP? All it has to do with it is send it as is to the output, it does not even need to parse it. It&#8217;s not a big challenge, it&#8217;s a matter of being aware that UTF-16 exists. Unfortunately, that seems to be the problem with many developers, those of PHP among them.</p>
<p>? Unicode Ribbon Campaign ? No ASCII, anywhere<br />
? <a href="http://ithink.ch/unicode" title="&lt;http://ithink.ch/unicode&gt;">&lt;http://ithink.ch/unicode&gt;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J. King</title>
		<link>http://www.betalogue.com/2004/09/27/unicode-wordpress-panther-server-and-bbedit-utf-8-with-or-without-bom/comment-page-1/#comment-1828</link>
		<dc:creator>J. King</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=1278#comment-1828</guid>
		<description>&lt;blockquote&gt;I&#8217;ve also discovered that PHP doesn&#8217;t like UTF-16.  Lord knows why, I naively experimented with encoding some files as UTF-16, which broke rendering of PHP includes in those pages. Not really surprising in retrospect.&lt;/blockquote&gt;
Simply: because PHP isn&#039;t a Unicode application.  To PHP, a script encoded in UTF-16 looks something like this:
&lt;pre&gt;(NULL)H(NULL)i(NULL)!&lt;/pre&gt;
You can just imagine how confusing that would be.  PHP will work okay with UTF-8 because ASCII characters have the same octet values in UTF-8 as they do in ISO-89859-1.  If you try a sorting function, though, PHP will not see your é character as an accented letter, but as a garbled string of three nonsense characters (much like the UTF-8 BOM).  UTF-8 is fine for most purposes in PHP (4), but it&#039;s certainly not fully supported, or even a good idea, to be honest.  PHP 5&#039;s default package contains some Unicode-compatible functions, but they don&#039;t cover everything, unfortunately.</description>
		<content:encoded><![CDATA[<blockquote><p>I&#8217;ve also discovered that PHP doesn&#8217;t like UTF-16.  Lord knows why, I naively experimented with encoding some files as UTF-16, which broke rendering of PHP includes in those pages. Not really surprising in retrospect.</p></blockquote>
<p>Simply: because PHP isn&#8217;t a Unicode application.  To PHP, a script encoded in UTF-16 looks something like this:</p>
<pre>(NULL)H(NULL)i(NULL)!</pre>
<p>You can just imagine how confusing that would be.  PHP will work okay with UTF-8 because ASCII characters have the same octet values in UTF-8 as they do in ISO-89859-1.  If you try a sorting function, though, PHP will not see your é character as an accented letter, but as a garbled string of three nonsense characters (much like the UTF-8 BOM).  UTF-8 is fine for most purposes in PHP (4), but it&#8217;s certainly not fully supported, or even a good idea, to be honest.  PHP 5&#8242;s default package contains some Unicode-compatible functions, but they don&#8217;t cover everything, unfortunately.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
