Hibernate JDBC Mysql Emoj

来源:互联网 发布:安徽网络电视台直播 编辑:程序博客网 时间:2024/04/30 06:15

在向Mysql中插入Emoj表情时,会出现错误,类似于:

ncorrect string value: '\xF0\x9F\x98\x81' for column 'XXXXXX' at row X;
这是由于编码的问题。
比如使用python的MySQLdb连接MySQL时默认的charset是latin1,需要自己指定charset=’utf8′,即使是在服务器端的init-connect=’SET NAMES utf8′,MySQLdb也会使用latin1覆盖该选项;
hibernate中可以这样修改:
session.doReturningWork(new ReturningWork<Object>() {    @Override    public Object execute(Connection conn) throws SQLException    {        try(Statement stmt = conn.createStatement()) {            stmt.executeQuery("SET NAMES utf8mb4");        }        return null;    }});


emoji表情与utf8mb4

关于emoji表情的话mysql的utf8是不支持,需要修改设置为utf8mb4,才能支持,详细emoji表情与utf8mb4的关系。MYSQL 5.5 之前, UTF8 编码只支持1-3个字节,只支持BMP这部分的unicode编码区, BMP是从哪到哪,到http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters这里看,基本就是0000~FFFF这一区。 从MYSQL5.5开始,可支持4个字节UTF编码utf8mb4,一个字符最多能有4字节,所以能支持更多的字符集。

utf8mb4 is a superset of utf8

utf8mb4兼容utf8,且比utf8能表示更多的字符。

客户端

jdbc的连接字符串不支持utf8mb4,这个这种方式来解决的,如果服务器端设置了character_set_server=utf8mb4,则客户端会自动将传过去的utf-8视作utf8mb4。

  • Connector/J did not support utf8mb4 for servers 5.5.2 and newer.

    Connector/J now auto-detects servers configured withcharacter_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=...as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)

其他的client端,比如php、python需要看下client是否支持,如果不能在连接字符串中指定的话,可以在获取连接之后,执行”set names utf8mb4″来解决这个问题;

因为utf8mb4是utf8的超集,理论上即使client修改字符集为utf8mb4,也会不会对已有的utf8编码读取产生任何问题。

服务端:

1  修改database,table,column字符集
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)

2  修改my.ini(linux下为my.cnf)

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect=’SET NAMES utf8mb4′

重新启动Mysql,检查字符集
mysql> SHOW VARIABLES WHERE Variable_name LIKE ‘character_set_%’ OR Variable_name LIKE ‘collation%';
+————————–+——————–+
| Variable_name | Value |
+————————–+——————–+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+————————–+——————–+
rows in set (0.00 sec)

如果是用java连接的mysql,需要升级mysql-connector-java.jar至少到5.1.22

init-connect 选项不起作用的原因

init-connect=’SET NAMES utf8′
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
这三个选项应该配置的是服务器端的,而我们设置的character_set_server=utf8,默认这三个选项就是utf8的,因此这个指定我觉得没有作用。

Note that the content of init_connect is not executed for users that have the SUPER privilege. This is done so that an erroneous value for init_connect does not prevent all clients from connecting. For example, the value might contain a statement that has a syntax error, thus causing client connections to fail. Not executing init_connect for users that have the SUPER privilege enables them to open a connection and fix the init_connect value.

通过python进行测试
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’,charset=’gb2312′)

mysql> show grants for ‘admin2′@’127.0.0.1′;
+—————————————————————–+
| Grants for admin2@127.0.0.1 |
+—————————————————————–+
| GRANT USAGE ON *.* TO ‘admin2′@’127.0.0.1′ |
| GRANT SELECT, INSERT ON `test`.`test` TO ‘admin2′@’127.0.0.1′ |
+—————————————————————–+
2 rows in set (0.00 sec)

但是测试脚本执行的结果仍然显示乱码,并且character_set_client、character_set_results为MySQLdb.connect连接方法中设置的参数,init-connect没有执行。猜测假如init-connect=’SET NAMES utf8′按照文档中所说的在连接连上server之后,执行set操作,session级别的参数character_set_client、character_set_results应该为utf8

但是将init-connect改为”insert into test.test values(‘hello’)”,执行结果显示插入了hello

将mysql的log开启之后发现,对于使用python下面的MySQLdb来说,其中set autocommit=0是MySQLdb默认的方式。
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’,charset=’gb2312′)
MySQLdb先执行init-connect的SET NAMES utf8,然后将charset=’gb2312′解释为SET NAMES gb2312执行,所以使用不同语言的客户端的时候最好都强制对字符集进行指定或者深入调查清楚默认的行为。
101118 0:27:52 1 Connect admin2@127.0.0.1 on test
1 Query SET NAMES utf8
1 Query SET NAMES gb2312
1 Query set autocommit=0
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’)
101118 0:27:52 1 Connect admin2@127.0.0.1 on test
1 Query SET NAMES utf8
1 Query set autocommit=0


参考链接:

http://afei2.sinaapp.com/?p=202

http://afei2.sinaapp.com/?p=518

0 0
原创粉丝点击