The problem
Ok. I did everything it said in the previous article, and both my database and my site are working in UTF-8. Still, anything in Hebrew I enter into PHPMyAdmin comes up as question marks on my site. If I enter in data from the website, it comes up as gibberish on PHPMyAdmin. When I change character coding in my browser from UTF-8 to Hebrew-ISO (ISO 8859-8) or Hebrew-Windows (Windows 1255), the data is displayed correctly. Which is incorrectly handling UTF-8, my website, or PHPMyAdmin?
The solution – set the correct database connection character set
The database connection character set, or “client encoding”, is the character set that the client uses, which does not have to be the same as the server encoding. For example, one can have a website that is rendered in Hebrew-ISO characters, while the database uses UTF-8. If the client encoding is set to “hebrew”, the SQL queries will use Hebrew-ISO characters, which will be translated into UTF-8 characters on the server.
The function to check the client encoding is mysql_client_encoding(), and the function to change the client encoding is mysql_set_charset(). Regardless of how the database is configured, the default client encoding is “latin1”. The possible values are listed at the end of the article.
Hence, if you want the site to be displayed with the Hebrew-ISO character set, after you connect to the database, you should call the following command in PHP:
mysql_set_charset(‘hebrew’);
If the site is displayed using the UTF-8 character set, then you can use the following command instead:
mysql_set_charset(‘utf8’);
Valid values for the database connection character set
(from http://bugs.php.net/bug.php?id=45921, can be derived by the SQL statement “show character set”)
+———-+—————————–+———————+——–
+
| Charset | Description | Default collation | Maxlen
|
+———-+—————————–+———————+——–
+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2
|
| dec8 | DEC West European | dec8_swedish_ci | 1
|
| cp850 | DOS West European | cp850_general_ci | 1
|
| hp8 | HP West European | hp8_english_ci | 1
|
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1
|
| latin1 | cp1252 West European | latin1_swedish_ci | 1
|
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1
|
| swe7 | 7bit Swedish | swe7_swedish_ci | 1
|
| ascii | US ASCII | ascii_general_ci | 1
|
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3
|
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2
|
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1
|
| tis620 | TIS620 Thai | tis620_thai_ci | 1
|
| euckr | EUC-KR Korean | euckr_korean_ci | 2
|
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1
|
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2
|
| greek | ISO 8859-7 Greek | greek_general_ci | 1
|
| cp1250 | Windows Central European | cp1250_general_ci | 1
|
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2
|
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1
|
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1
|
| utf8 | UTF-8 Unicode | utf8_general_ci | 3
|
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2
|
| cp866 | DOS Russian | cp866_general_ci | 1
|
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1
|
| macce | Mac Central European | macce_general_ci | 1
|
| macroman | Mac West European | macroman_general_ci | 1
|
| cp852 | DOS Central European | cp852_general_ci | 1
|
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1
|
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1
|
| cp1256 | Windows Arabic | cp1256_general_ci | 1
|
| cp1257 | Windows Baltic | cp1257_general_ci | 1
|
| binary | Binary pseudo charset | binary | 1
|
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1
|
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2
|
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3
|
+———-+—————————–+———————+——–