61.笔记 MySQL学习——字符串类型与字符集

来源:互联网 发布:淘宝店主自拍技巧 编辑:程序博客网 时间:2024/06/01 08:02

61.笔记 MySQL学习——字符串类型与字符集

字符串值分为两类:二进制串和非二进制串

二进制串是一组字节序列;非二进制串是一个字符序列。

非二进制串与字符集相关,字符集决定了哪些字符可以用。字符集有一种或多种排序规则。

默认的字符集和排序规则是latin1和latin1_swedish-ci

字符单位在占用存储空间方面存在差异。

查看支持哪些字符集,命令如下:

mysql> show character set;

+----------+---------------------------------+---------------------+--------+

| Charset | Description                    | Default collation   | Maxlen |

+----------+---------------------------------+---------------------+--------+

| big5    | Big5 Traditional Chinese       | big5_chinese_ci     |      2 |

| dec8    | DEC West European              | dec8_swedish_ci     |      1 |

| cp850   | DOS West European              | cp850_general_ci    |      1 |

| hp8     | HP West European                | hp8_english_ci      |     1 |

| koi8r   | KOI8-R Relcom Russian          | koi8r_general_ci    |      1 |

| latin1  | cp1252 West European           | latin1_swedish_ci   |      1 |

| latin2  | ISO 8859-2 Central European    | latin2_general_ci   |      1|

| swe7    | 7bit Swedish                   | swe7_swedish_ci     |      1 |

| ascii   | US ASCII                       | ascii_general_ci    |      1 |

| ujis    | EUC-JP Japanese                | ujis_japanese_ci    |      3 |

| sjis    | Shift-JIS Japanese              | sjis_japanese_ci    |     2 |

| hebrew  | ISO 8859-8 Hebrew              | hebrew_general_ci   |      1 |

| tis620  | TIS620 Thai                    | tis620_thai_ci      |      1 |

| euckr   | EUC-KR Korean                   | euckr_korean_ci     |     2 |

| koi8u   | KOI8-U Ukrainian               | koi8u_general_ci    |      1 |

| gb2312  | GB2312 Simplified Chinese      | gb2312_chinese_ci   |      2 |

| greek   | ISO 8859-7 Greek               | greek_general_ci    |      1 |

| cp1250  | Windows Central European       | cp1250_general_ci   |      1 |

| gbk     | GBK Simplified Chinese         | gbk_chinese_ci      |      2 |

| latin5  | ISO 8859-9 Turkish             | latin5_turkish_ci   |      1 |

| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |

| utf8    | UTF-8 Unicode                  | utf8_general_ci     |      3 |

| ucs2    | UCS-2 Unicode                  | ucs2_general_ci     |      2 |

| cp866   | DOS Russian                    | cp866_general_ci    |      1 |

| keybcs2 | DOS Kamenicky Czech-Slovak     | keybcs2_general_ci  |      1 |

| macce   | Mac Central European           | macce_general_ci    |      1 |

| macroman | Mac West European               | macroman_general_ci |      1 |

| cp852   | DOS Central European           | cp852_general_ci    |      1 |

| latin7  | ISO 8859-13 Baltic             | latin7_general_ci   |      1 |

| utf8mb4 | UTF-8 Unicode                  | utf8mb4_general_ci  |      4 |

| cp1251  | Windows Cyrillic               | cp1251_general_ci   |      1 |

| utf16   | UTF-16 Unicode                 | utf16_general_ci    |      4 |

| utf16le | UTF-16LE Unicode               | utf16le_general_ci  |      4 |

| cp1256  | Windows Arabic                 | cp1256_general_ci   |      1 |

| cp1257  | Windows Baltic                 | cp1257_general_ci   |      1 |

| utf32   | UTF-32 Unicode                 | utf32_general_ci    |      4 |

| binary  | Binary pseudo charset          | binary              |      1 |

| geostd8 | GEOSTD8 Georgian               | geostd8_general_ci  |      1 |

| cp932   | SJIS for Windows Japanese      | cp932_japanese_ci   |      2 |

| eucjpms | UJIS for Windows Japanese       |eucjpms_japanese_ci |      3 |

| gb18030 | China National Standard GB18030 | gb18030_chinese_ci  |     4 |

+----------+---------------------------------+---------------------+--------+

41 rows in set (0.01 sec)

显示排序

mysql> show collation;

+--------------------------+----------+-----+---------+----------+---------+

| Collation                | Charset  | Id  |Default | Compiled | Sortlen |

+--------------------------+----------+-----+---------+----------+---------+

| big5_chinese_ci          | big5    |   1 | Yes     | Yes     |       1 |

| big5_bin                 | big5     | 84 |         | Yes      |      1 |

| dec8_swedish_ci          | dec8     |  3 | Yes     | Yes      |      1 |

| dec8_bin                 | dec8     | 69 |         | Yes     |       1 |

| cp850_general_ci         | cp850    |   4| Yes     | Yes      |      1 |

….

排序规则有字符集名、语言名和一个附加的后缀构成。

可以使用函数CHARSET()或COLLATION()来确定某个字符串的字符集和排序规则。

如下:
mysql>select charset(x'0123'),collation(x'0123');

+------------------+--------------------+

| charset(x'0123') | collation(x'0123') |

+------------------+--------------------+

| binary           | binary             |

+------------------+--------------------+

1 row in set (0.00 sec)

默认,十六进制常量当做二进制串。

强制解释为某字符集的记法

使用引导符记法,两种:

如下:

_charset str

另一种是

N ‘str’ 等价于_utf8 ‘str’

引导符不会改变字符串的值,通过使用CONVERT函数可以改变字符串的值。

实例如下:

mysql> set @s1=_ucs2 'ABCD';

Query OK, 0 rows affected (0.00 sec)

mysql> set @s2=convert('ABCD' using ucs2);

Query OK, 0 rows affected (0.00 sec)

mysql> select char_length(@s1),length(@s1),char_length(@s2),length(@s2);

+------------------+-------------+------------------+-------------+

| char_length(@s1) | length(@s1) |char_length(@s2) | length(@s2) |

+------------------+-------------+------------------+-------------+

|                2 |           4 |                4 |           8 |

+------------------+-------------+------------------+-------------+

1 row in set (0.02 sec)

第二个语句把字符串的每个字符转换为相应的ucs2字节,结果包含4个字符的ucs2字符串。

 

PS:二进制串没有字符集的概念。使用率二进制排序规则的非二进制串,会被解释成字符,并且比较的是他们的数字字符值。

例如:

mysql> set @s1=binary 'abcd';

Query OK, 0 rows affected (0.00 sec)

mysql> set @s2=_latin1 'abcd' collate latin1_bin;

Query OK, 0 rows affected (0.00 sec)

mysql> select upper(@s1),upper(@s2);

+------------+------------+

| upper(@s1) | upper(@s2) |

+------------+------------+

| abcd      | ABCD       |

+------------+------------+

1 row in set (0.00 sec)

mysql> select @s1,upper(convert(@s1 usinglatin1));

+------+----------------------------------+

| @s1 | upper(convert(@s1 using latin1)) |

+------+----------------------------------+

| abcd | ABCD                             |

+------+----------------------------------+

1 row in set (0.00 sec)

二进制串没有字符集所以UPPER函数无法转换,需要转换需要先转换成非二进制串。

 

 

 

 

 

 

 

 

阅读全文
0 0