May 302009

This plugin is based on the post “Convert MySQL Tables to UTF-8” and an existing plugin by g30rg3_x. The reason I modified their code is that they will convert all tables in your database to the UTF-8 charset, but what we need is to convert WP tables, so I changed the code "SHOW TABLES" to "SHOW TABLES LIKE " . $table_prefix . "%", which will guarantee other tables could stay untouched. Besides, g30rg3_x’s purpose was to alter the charset of old WP databases to new UTF-8 databases, but in fact we also need to change the charset after we moved our DB to a new host when the charset is not UTF-8 by default. Judging from my experience, the default charset/collation for many web hosts is latin1/latin1_swedish_ci (I don’t know why), whereas popular web-buidling systems often use utf8/utf8_general_ci, thus we need to change the charset before all content could be normally displayed. Without PHP and SHOW TALBES / SHOW COLUMNS, we will need to write endless code to change all tables and all columns.

mysql> select collation('asdf'); # default collation
+-------------------+
| collation('asdf') |
+-------------------+
| latin1_swedish_ci |
+-------------------+
1 row in set (0.00 sec)
Download the UTF-8 DB Converter for WordPress

The critical part of this plugin is:

....
$sql2 = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci";
....
$sql4 = "ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type
         CHARACTER SET utf8 COLLATE utf8_bin";
....

I don’t think I need to describe the installation again, but I sould warn you again about possible data lost during the conversion. Do back up early please.

Related Posts

31 Responses to “Convert MySQL Database to UTF-8 in WordPress”

Comments (27) Pingbacks (4)
  1. r-sn.com says:

    thank you ..
    plugin nice ..
    but links i have not Convert to UTF-8 .. :x

  2. r-sn.com says:

    Example:
    % d8% a7% d8% aa% d8% b5% d9% 84 -% d8% a8% d9% 86% d8% a7
    I want to convert:
    اتصل بنا

    • Yihui Xie says:

      That’s not a UTF-8 problem. You should use urldecode() to decode the string to your (Arabian?) characters. For example:

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
      <?php
      echo urldecode("%D8%A7%D8%AA%D8%B5%D9%84+%D8%A8%D9%86%D8%A7");
      ?>
  3. charlesss says:

    Love the blog mate, keep up the good work – I’ll definitely recommend your blog to some friends of mine ^^

  4. Saevar says:

    Thank you :D but i have 1 question, i have sql file that was using iso-8859-1 for 2 years and then the user change the code to utf-8, will this repair and convert iso-8859-1 to utf8 ?

  5. Derek says:

    Worked perfectly. You just saved me a lot of time! Thanks so much!

    Derek

  6. meandi says:

    thank you very much for the plugin!

  7. Andrew says:

    Thank you so much for this plugin. I just applied it to my installation, and it worked perfectly with no troubles.

  8. WS says:

    Dude, you rock! I’ve been trying to convert my db, which was a mess of latin and utf8. I found the previous WP plug-in and it butchered my posts, clipping them after any non-latin character (I write in English, but use em-dashes like their going out of style).

    Your plugin worked without a hitch! You have no idea how grateful I am! Thanks again for fixing the previous version :)

  9. Thank you! Plugin works great, and I can finally use UTF8 on my website

  10. Ben says:

    Thanks for the great plugin. I’m getting an error. Do you have any idea what this means?

    wp_bdprss_items changed to UTF-8 successfully.
    ---- item_feed_url changed to UTF-8 successfully.
    SQL Error:
    Specified key was too long; max key length is 1000 bytes

  11. anthony says:

    I first noticed a character set encoding problem ( see “Victim?s parents to speak on bullycase” here: http://carmel-indiana.us/page/6/ ). What was supposed to be an apostrophe was displaying as a question mark instead. I installed and activated the plugin. It indicated that it had fixed the encoding. But I don’t see that the post has been changed. Is the plugin supposed to fix posts that were posted prior to installation of the plugin?

  12. Joan Piedra says:

    You sir, are a lifesaver. Thank you very much for this plugin.

  13. Laura says:

    When I ran the plug-in I got the following:

    wp_posts changed to UTF-8 successfully.
    SQL Error:
    Column ‘post_content’ cannot be part of FULLTEXT index

    Any clue as to what I can do about the error and get rid of the charset issue?

    Thanks much

  14. peter says:

    hello!

    i made an upgrade from 2.0.7 to 3.1 but i have problems it will not work
    http://www.apfel.at/tagesgedanken1/?p=1259
    from december on i copy the new articel new into it, so from
    http://www.apfel.at/tagesgedanken1/?p=1
    to
    http://www.apfel.at/tagesgedanken1/?p=2222
    the article have the problem, befor it was iso-8859-1

    could you help?

    thank you
    regards
    peter

  15. peter says:

    hello!

    thank you for your wonderfull work of the plugin

    i tested it but in my case it will not work
    from http://www.apfel.at/tagesgedanken1/?p=2224 to the last
    http://www.apfel.at/tagesgedanken1/?p=2269 is ok
    i put it after upgrade from 2.0.7 into 3.1 into

    befor the old one i have iso-8859-1 so i did the upgrade with your plugin
    from http://www.apfel.at/tagesgedanken1/?p=1
    to
    http://www.apfel.at/tagesgedanken1/?p=2222
    but it will not change, what should i do?

    regards
    peter

    • Yihui Xie says:

      Sorry but I don’t think my plugin can be of help in your case, because your original database was not encoded in UTF-8. This plugin does not translate the encoding (say, between ISO-8859-1 and UTF-8) — it just tells the database that the “correct” encoding should be UTF-8. In other words, it does not modify the data in your database.

  16. peter says:

    thank you very much,

    >>In other words, it does not modify the data in your database

    to you know how i could modify the database?

    regards
    peter

    • Yihui Xie says:

      This is a painful problem for me, and I’m not good at the encoding problems at MySQL level. Maybe the manual page could help: http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html

      To avoid such problems, I just use UTF-8 for everything.

      I have a better solution, which might save you a little bit time. If you have a backup of your original database or website, you can export your blog posts in WordPress into an XML file; then you can use some text editors to convert the encoding to UTF-8 (this is much easier than manipulating MySQL if you are unfamiliar with MySQL); in the end you can import the UTF8-encoded XML file into your WordPress.

      • peter says:

        hi!!

        i go back to 2.0.7 activate the plugin but it did not work the error
        SQL Error:
        You have an error in your SQL syntax near ‘DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci’ at line 1

        what could i do to get it work if could give an an login if you want to help me

        thank you
        :)
        peter

      • Yihui Xie says:

        As I said, this plugin does not help you really convert the encoding of the data — it only tells the MySQL database about the real encoding. You need to use other approaches to complete the encoding conversion (the possible one has been given above).

  17. Vlad says:

    Hello,

    I have made one fix to your excellent plugin. On line 132, replace

    $sql4 = “ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type CHARACTER SET utf8 COLLATE utf8_bin”;

    with

    $sql4 = “ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type CHARACTER SET utf8 COLLATE utf8_general_ci”;

    This is so that when you are manually editing/viewing the field within phpMyAdmin, the text that is stored in the field will be human readable. Without this change, it appears as binary-encoded data, which is very inconvenient to use.

Leave a Reply

(required)

(required)

WWW.YIHUI.NAME XIE@YIHUI.NAME © 2007 - 2012 by Yihui Xie