WordPress: Specifying charset when restoring database from backup

Posted by Pierre Igot in: Blogging
November 7th, 2008 • 11:42 am

The official documentation provided by the WordPress codex for the process of restoring your database from a backup suffers from a shocking omission.

It fails to specify that, when your blog is encoded using “utf8” Unicode encoding—which has been the default option in WordPress for a while now—you need to specify this in the form of an additional argument in the mysql command used to restore the database.

In other words, instead of:

mysql --host=mysqlhostserver --user=mysqlusername --password=password databasename < blog.bak.sql

the command should actually be:

mysql --host=mysqlhostserver --user=mysqlusername --password=password --default_character_set=utf8 databasename < blog.bak.sql

Now, it could very well be that, with some configurations of MySQL, this is the default option anyway. But since most people have their blog hosted by a third-party provider and have no control over the configuration of its SQL servers, it seems to me that the WordPress codex page should at least mention that this might be a required option.

It was definitely a required option in my case, and I did not know, so when I first restored my blog from the backup, all the non-ASCII characters were screwed. Since there were several steps involved in the backup creation, downloading, unzipping, editing, zipping, uploading and restoring process, I could not be sure exactly where the problem lied.

I suspected that it might have to do with the mysql command itself, but the MySQL documentation is not particularly user-friendly either, and I was afraid of having to spend countless more hours browsing and searching the hundreds of pages of documentation.

So I sent a plea for help to the people who had contacted me about my trouble with hackers, and I was very pleased to receive a quick reply from a friend in Sweden with exactly the information that I needed.

Sadly, WordPress, like so many other computer products, is designed by English-speaking Americans and issues that primarily affect speakers of other languages tend to be treated with less care than other issues. It certainly seems to be the case with the utf8 character encoding in the database restoring process.

One Response to “WordPress: Specifying charset when restoring database from backup”

  1. henryn says:

    Glad to hear you got help on this and the UTF 8 mystery is solved.

    I feel very fortunate to have access –at an excellent price, nothing– to MySQL and many other software packages. I’m not surprised at the fact that the documentation isn’t very good: In my experience, the tekkies doing the design and implementation simply don’t have the skills to produce good documentation, or even to recognize the importance of it. Maybe that’s good, because if they were more broadly skilled, the wouldn’t do such a good job on the software itself.

    I find it ironic that most software documentation “traditionally” has been terrible, no matter how much or how little one pays for the software. Paradoxically, In the big picture, for desktop application software, I feel that the proof of implementation quality is that no user ever consults the documentation.

    There’s no one “answer”. But … I recently found a very useful bit of freeware –for transcribers of audio/video material– with very poor documentation. I volunteered to help with the documentation.

Leave a Reply

Comments are closed.