postgresql 字符集

来源:互联网 发布:珍宝猫粮怎么样 知乎 编辑:程序博客网 时间:2024/06/05 14:34

postgresql 字符集

postgresql数据库支持多种字符集,在配置字符集时要分清楚服务器与客户端的字符集,字符集不一致尽管有时能够发生转换,但带来的问题也很头疼。语言环境的配置也很重要。
服务器字符集<来自文档>:

代码
复制代码
Name    Description    Language    Server?    Bytes/Char    AliasesBIG5    Big Five    Traditional Chinese    No    1-2    WIN950, Windows950EUC_CN    Extended UNIX Code-CN    Simplified Chinese    Yes    1-3     EUC_JP    Extended UNIX Code-JP    Japanese    Yes    1-3     EUC_JIS_2004    Extended UNIX Code-JP, JIS X 0213    Japanese    Yes    1-3     EUC_KR    Extended UNIX Code-KR    Korean    Yes    1-3     EUC_TW    Extended UNIX Code-TW    Traditional Chinese, Taiwanese    Yes    1-3     GB18030    National Standard    Chinese    No    1-2     GBK    Extended National Standard    Simplified Chinese    No    1-2    WIN936, Windows936ISO_8859_5    ISO 8859-5, ECMA 113    Latin/Cyrillic    Yes    1     ISO_8859_6    ISO 8859-6, ECMA 114    Latin/Arabic    Yes    1     ISO_8859_7    ISO 8859-7, ECMA 118    Latin/Greek    Yes    1     ISO_8859_8    ISO 8859-8, ECMA 121    Latin/Hebrew    Yes    1     JOHAB    JOHAB    Korean (Hangul)    No    1-3     KOI8    KOI8-R(U)    Cyrillic    Yes    1    KOI8RLATIN1    ISO 8859-1, ECMA 94    Western European    Yes    1    ISO88591LATIN2    ISO 8859-2, ECMA 94    Central European    Yes    1    ISO88592LATIN3    ISO 8859-3, ECMA 94    South European    Yes    1    ISO88593LATIN4    ISO 8859-4, ECMA 94    North European    Yes    1    ISO88594LATIN5    ISO 8859-9, ECMA 128    Turkish    Yes    1    ISO88599LATIN6    ISO 8859-10, ECMA 144    Nordic    Yes    1    ISO885910LATIN7    ISO 8859-13    Baltic    Yes    1    ISO885913LATIN8    ISO 8859-14    Celtic    Yes    1    ISO885914LATIN9    ISO 8859-15    LATIN1 with Euro and accents    Yes    1    ISO885915LATIN10    ISO 8859-16, ASRO SR 14111    Romanian    Yes    1    ISO885916MULE_INTERNAL    Mule internal code    Multilingual Emacs    Yes    1-4     SJIS    Shift JIS    Japanese    No    1-2    Mskanji, ShiftJIS, WIN932, Windows932SHIFT_JIS_2004    Shift JIS, JIS X 0213    Japanese    No    1-2     SQL_ASCII    unspecified (see text)    any    Yes    1     UHC    Unified Hangul Code    Korean    No    1-2    WIN949, Windows949UTF8    Unicode, 8-bit    all    Yes    1-4    UnicodeWIN866    Windows CP866    Cyrillic    Yes    1    ALTWIN874    Windows CP874    Thai    Yes    1     WIN1250    Windows CP1250    Central European    Yes    1     WIN1251    Windows CP1251    Cyrillic    Yes    1    WINWIN1252    Windows CP1252    Western European    Yes    1     WIN1253    Windows CP1253    Greek    Yes    1     WIN1254    Windows CP1254    Turkish    Yes    1     WIN1255    Windows CP1255    Hebrew    Yes    1     WIN1256    Windows CP1256    Arabic    Yes    1     WIN1257    Windows CP1257    Baltic    Yes    1     WIN1258    Windows CP1258    Vietnamese    Yes    1    ABC, TCVN, TCVN5712, VSCII
复制代码

 

常用的简体中文字符集是UTF8和EUC_CN两种。
可自动转换字符集<来自文档>:

代码
复制代码
Server Character Set    Available Client Character SetsBIG5    not supported as a server encodingEUC_CN    EUC_CN, MULE_INTERNAL, UTF8EUC_JP    EUC_JP, MULE_INTERNAL, SJIS, UTF8EUC_KR    EUC_KR, MULE_INTERNAL, UTF8EUC_TW    EUC_TW, BIG5, MULE_INTERNAL, UTF8GB18030    not supported as a server encodingGBK    not supported as a server encodingISO_8859_5    ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866, WIN1251ISO_8859_6    ISO_8859_6, UTF8ISO_8859_7    ISO_8859_7, UTF8ISO_8859_8    ISO_8859_8, UTF8JOHAB    JOHAB, UTF8KOI8    KOI8, ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251LATIN1    LATIN1, MULE_INTERNAL, UTF8LATIN2    LATIN2, MULE_INTERNAL, UTF8, WIN1250LATIN3    LATIN3, MULE_INTERNAL, UTF8LATIN4    LATIN4, MULE_INTERNAL, UTF8LATIN5    LATIN5, UTF8LATIN6    LATIN6, UTF8LATIN7    LATIN7, UTF8LATIN8    LATIN8, UTF8LATIN9    LATIN9, UTF8LATIN10    LATIN10, UTF8MULE_INTERNAL    MULE_INTERNAL, BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW, ISO_8859_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251SJIS    not supported as a server encodingSQL_ASCII    any (no conversion will be performed)UHC    not supported as a server encodingUTF8    all supported encodingsWIN866    WIN866, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN1251WIN874    WIN874, UTF8WIN1250    WIN1250, LATIN2, MULE_INTERNAL, UTF8WIN1251    WIN1251, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866WIN1252    WIN1252, UTF8WIN1253    WIN1253, UTF8WIN1254    WIN1254, UTF8WIN1255    WIN1255, UTF8WIN1256    WIN1256, UTF8WIN1257    WIN1257, UTF8WIN1258    WIN1258, UTF8
复制代码

 

以下针对客户端与服务器字符集配置问题作几个小测试。

测试一:服务器、客户端、语言环境一致的情况

代码
复制代码
[postgre@iss3984 ~]$ echo $LANGen_US.UTF-8[postgre@iss3984 ~]$ psql daduxiongWelcome to psql 8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands       \g or terminate with semicolon to execute query       \q to quitdaduxiong=# \l       List of databases   Name    |  Owner  | Encoding-----------+---------+---------- daduxiong | postgre | UTF8 postgres  | postgre | UTF8 template0 | postgre | UTF8 template1 | postgre | UTF8(4 rows)daduxiong=# show client_encoding; client_encoding----------------- UTF8(1 row)daduxiong=# insert into t1 values (1,'中国');INSERT 0 1daduxiong=# select * from t1; id |          name         ----+------------------------  1 | 中国                 (1 row)
复制代码

 

服务器与客户端字符集相同,在数据录入时不发生字符集转换;因语言环境也相同所以展现不会出现乱码。

测试二:客户端与服务器、语言环境不一致的情况

代码
复制代码
daduxiong=# \encoding GBKdaduxiong=# show client_encoding; client_encoding----------------- GBK(1 row)daduxiong=# insert into t1 values (2,'日本');INSERT 0 1daduxiong=# select * from t1; id |          name          ----+-------------------------  1 | �й�                   2 | 日本                (2 rows)
复制代码

 

客户端与服务器的字符集不一致,在数据录入时将发生字符集转换;当前展现的第二条记录非乱码形式是因为客户端字符集为GBK,在UTF8下同样出现乱码,在使用时需要语言环境进行配置。

测试三:服务器与客户端、语言环境不一致的情况

代码
复制代码
daduxiong=# \q[postgre@iss3984 ~]$ LANG=GBK export LANG[postgre@iss3984 ~]$ echo $LANGGBK[postgre@iss3984 ~]$ psql daduxiongWelcome to psql 8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands       \g or terminate with semicolon to execute query       \q to quitdaduxiong=# show client_encoding; client_encoding----------------- UTF8(1 row)daduxiong=# \encoding GBKdaduxiong=# select * from t1; id |          name          ----+-------------------------  1 | �й�                   2 | 日本                (2 rows)daduxiong=# insert into t1 values (3,'美国');INSERT 0 1daduxiong=# select * from t1; id |          name          ----+-------------------------  1 | �й�                   2 | 日本                  3 | 美国                (3 rows)daduxiong=# \q
复制代码

 

客户端、语言环境均配置为GBK字符集,在当前环境下展现的为非乱码形式,数据录入时将发生字符集转换。

测试四:服务器与客户端、语言环境恢复一致的情况

代码
复制代码
[postgre@iss3984 ~]$ LANG=en_US.UTF8 export LANG[postgre@iss3984 ~]$ psql daduxiongWelcome to psql 8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands       \g or terminate with semicolon to execute query       \q to quitdaduxiong=# show client_encoding; client_encoding----------------- UTF8(1 row)daduxiong=# select * from t1; id |          name          ----+-------------------------  1 | 中国                   2 | 鏃ユ湰                  3 | 缇庡浗                (3 rows)daduxiong=# 
复制代码
通过恢复原始的字符集状态,所有环境均为UTF8字符集,此时发现经过字符集转换后的内容为乱码。