November 7th, 2008 • 11:42 am
The official documentation provided by the WordPress codex for the process of restoring your database from a backup suffers from a shocking omission.
It fails to specify that, when your blog is encoded using “utf8” Unicode encoding—which has been the default option in WordPress for a while now—you need to specify this in the form of an additional argument in the
mysql command used to restore the database.
In other words, instead of:
mysql --host=mysqlhostserver --user=mysqlusername --password=password databasename < blog.bak.sql
the command should actually be:
mysql --host=mysqlhostserver --user=mysqlusername --password=password --default_character_set=utf8 databasename < blog.bak.sql
Now, it could very well be that, with some configurations of MySQL, this is the default option anyway. But since most people have their blog hosted by a third-party provider and have no control over the configuration of its SQL servers, it seems to me that the WordPress codex page should at least mention that this might be a required option.
It was definitely a required option in my case, and I did not know, so when I first restored my blog from the backup, all the non-ASCII characters were screwed. Since there were several steps involved in the backup creation, downloading, unzipping, editing, zipping, uploading and restoring process, I could not be sure exactly where the problem lied.
I suspected that it might have to do with the
mysql command itself, but the MySQL documentation is not particularly user-friendly either, and I was afraid of having to spend countless more hours browsing and searching the hundreds of pages of documentation.
So I sent a plea for help to the people who had contacted me about my trouble with hackers, and I was very pleased to receive a quick reply from a friend in Sweden with exactly the information that I needed.
Sadly, WordPress, like so many other computer products, is designed by English-speaking Americans and issues that primarily affect speakers of other languages tend to be treated with less care than other issues. It certainly seems to be the case with the utf8 character encoding in the database restoring process.