MySQL 全文索引基础操作

来源:互联网 发布:如何改变图片尺寸软件 编辑:程序博客网 时间:2024/05/18 00:37

当前测试的  mysql 版本: 5.0.95


MYSQL MYISAM引擎支持全文索引;MYSQL 5.6 后,InnoDB 也完全支持全文索引。
全文索引支持的字段类型包括 char ,varchar , text 。


创建测试表:(36行数据)

CREATE TABLE test.ft_tab (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,charname CHAR(10),collatename VARCHAR(200),description TEXT(500),maxlen int) ENGINE=InnoDB;insert into test.ft_tab (charname,collatename,description,maxlen)select CHARACTER_SET_NAME,DEFAULT_COLLATE_NAME,DESCRIPTION,MAXLENfrom information_schema.character_sets ;

5.0 中 InnoDB 引擎创建失败,改为 MYISAM 再创建:

mysql> CREATE FULLTEXT INDEX idx_ft_tab ON test.ft_tab(description);ERROR 1214 (HY000): The used table type doesn't support FULLTEXT indexesmysql> mysql> ALTER TABLE test.ft_tab ENGINE = MYISAM;Query OK, 36 rows affected (0.02 sec)Records: 36  Duplicates: 0  Warnings: 0mysql> CREATE FULLTEXT INDEX idx_ft_tab ON test.ft_tab(description);Query OK, 36 rows affected (0.01 sec)Records: 36  Duplicates: 0  Warnings: 0

使用全文索引搜索数据:

mysql> explain select * from test.ft_tab where description like '%Windows%';+----+-------------+--------+------+---------------+------+---------+------+------+-------------+| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |+----+-------------+--------+------+---------------+------+---------+------+------+-------------+|  1 | SIMPLE      | ft_tab | ALL  | NULL          | NULL | NULL    | NULL |   36 | Using where | +----+-------------+--------+------+---------------+------+---------+------+------+-------------+1 row in set (0.00 sec)mysql> explain select * from test.ft_tab where match(description) against('Windows');+----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+| id | select_type | table  | type     | possible_keys | key        | key_len | ref  | rows | Extra       |+----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+|  1 | SIMPLE      | ft_tab | fulltext | idx_ft_tab    | idx_ft_tab | 0       |      |    1 | Using where | +----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+1 row in set (0.00 sec)


使用 IN BOOLEAN MODE 设置特定搜索:

#至少包含一个单词(或)select * from test.ft_tab where match(description) against('Windows Central' IN BOOLEAN MODE);#括号,至少包含一个单词(或)select * from test.ft_tab where match(description) against('(Windows Central)' IN BOOLEAN MODE);select * from test.ft_tab where match(description) against('+(Windows Central)' IN BOOLEAN MODE);#括号/大于小于好,至少包含一个单词(或),大于号的排序最高,小于号的最低,同时存在的处中间select * from test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);select * ,(match(description) against('+(>Windows <Central)' IN BOOLEAN MODE)) AS REFfrom test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);#两个(多个)单词同时包含(与)select * from test.ft_tab where match(description) against('+Windows +Central' IN BOOLEAN MODE);#包含一个(多个)单词,且不包含另一个(多个)单词select * from test.ft_tab where match(description) against('+Windows -Central' IN BOOLEAN MODE);#包含一个(多个)单词,如果包含一个单词(如 Central),排序更高select * from test.ft_tab where match(description) against('+Windows Central' IN BOOLEAN MODE);select * ,(match(description) against('+Windows Central' IN BOOLEAN MODE)) AS REFfrom test.ft_tab where match(description) against('+Windows Central' IN BOOLEAN MODE);#双引号,内部作为一个词组查询select * from test.ft_tab where match(description) against('"Windows Central"' IN BOOLEAN MODE);#星号,模糊匹配,只在单词尾部select * from test.ft_tab where match(description) against('Windows*' IN BOOLEAN MODE);select * from test.ft_tab where match(description) against('Cen*' IN BOOLEAN MODE);#包含一个(多个)单词,如果包含一个单词(如 ~Central),排序更低select * from test.ft_tab where match(description) against('+Windows ~Central' IN BOOLEAN MODE);select * ,(match(description) against('+Windows ~Central' IN BOOLEAN MODE)) AS REFfrom test.ft_tab where match(description) against('+Windows ~Central' IN BOOLEAN MODE);

上面的一个查询为例,匹配度越高,匹配的结果值(如下字段 ref )越大。

mysql> select * ,(match(description) against('+(>Windows <Central)' IN BOOLEAN MODE)) AS REF    -> from test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);+----+----------+---------------------+-----------------------------+--------+-------------------+| id | charname | collatename         | description                 | maxlen | REF               |+----+----------+---------------------+-----------------------------+--------+-------------------+|  7 | latin2   | latin2_general_ci   | ISO 8859-2 Central European |      1 | 0.666666686534882 | | 18 | cp1250   | cp1250_general_ci   | Windows Central European    |      1 |  1.16666674613953 | | 26 | macce    | macce_general_ci    | Mac Central European        |      1 | 0.666666686534882 | | 28 | cp852    | cp852_general_ci    | DOS Central European        |      1 | 0.666666686534882 | | 30 | cp1251   | cp1251_general_ci   | Windows Cyrillic            |      1 |               1.5 | | 31 | cp1256   | cp1256_general_ci   | Windows Arabic              |      1 |               1.5 | | 32 | cp1257   | cp1257_general_ci   | Windows Baltic              |      1 |               1.5 | | 35 | cp932    | cp932_japanese_ci   | SJIS for Windows Japanese   |      2 |               1.5 | | 36 | eucjpms  | eucjpms_japanese_ci | UJIS for Windows Japanese   |      3 |               1.5 | +----+----------+---------------------+-----------------------------+--------+-------------------+


现在查询一个单词,却找不出结果。 词语 'DEC'  不属于过滤的单词,为什么查不出?

mysql> select * from test.ft_tab where match(description) against('DEC');Empty set (0.00 sec)

查看全文索引变量,就知道了。最小的长度为 4 字符,4 个字符以下的都忽略不当作单词了!

mysql> show variables like '%ft%';+--------------------------+----------------+| Variable_name            | Value          |+--------------------------+----------------+| ft_boolean_syntax        | + -><()~*:""&| | | ft_max_word_len          | 84             | | ft_min_word_len          | 4              | | ft_query_expansion_limit | 20             | | ft_stopword_file         | (built-in)     | +--------------------------+----------------+

#变量说明
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html

ft_boolean_syntax :IN BOOLEAN MODE 匹配的选项
ft_max_word_len :最长匹配的字符数
ft_min_word_len :最短匹配的字符数
ft_query_expansion_limit :使用WITH QUERY EXPANSION 搜索出的前N(20)个结果再次搜索
ft_stopword_file :停词表文件,用于设置不需要匹配的词表

#默认停止词表
http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html


设置变量,允许查询3个长度的单词:

[root@localhost]# vi /etc/my.cnf[mysqld]ft_min_word_len=3#重启服务,修复表[root@localhost]# service mysqld restartmysql> repair table test.ft_tab;

再查询,结果出来了!

mysql> select * from test.ft_tab where match(description) against('DEC');+----+----------+-----------------+-------------------+--------+| id | charname | collatename     | description       | maxlen |+----+----------+-----------------+-------------------+--------+|  2 | dec8     | dec8_swedish_ci | DEC West European |      1 | +----+----------+-----------------+-------------------+--------+

而词语“for” 却查询不出!被忽略掉了!

mysql> select * from test.ft_tab where match(description) against('for');Empty set (0.00 sec)

变量 ft_stopword_file 没有指定文件(built-in),使用系统默认设置(参考:Full-Text Stopwords)

现在手动设置:

[root@localhost]# vi /etc/my.cnf[mysqld]ft_stopword_file=/usr/share/mysql/ft_stopword.txt[root@localhost]# service mysqld restartmysql> repair table test.ft_tab;mysql> show variables like '%ft%';
mysql> show variables like '%ft%';+--------------------------+----------------------------------+| Variable_name            | Value                            |+--------------------------+----------------------------------+| ft_boolean_syntax        | + -><()~*:""&|                   | | ft_max_word_len          | 84                               | | ft_min_word_len          | 3                                | | ft_query_expansion_limit | 20                               | | ft_stopword_file         | /usr/share/mysql/ft_stopword.txt | +--------------------------+----------------------------------+

再查询,结果出来了!
mysql> select * from test.ft_tab where match(description) against('for');+----+----------+---------------------+---------------------------+--------+| id | charname | collatename         | description               | maxlen |+----+----------+---------------------+---------------------------+--------+| 35 | cp932    | cp932_japanese_ci   | SJIS for Windows Japanese |      2 | | 36 | eucjpms  | eucjpms_japanese_ci | UJIS for Windows Japanese |      3 | +----+----------+---------------------+---------------------------+--------+

现在需要把 “for” 忽略掉,怎么设置呢?如果添加过滤词,编辑文件添加该字符:

[root@localhost]# vi /usr/share/mysql/ft_stopword.txt'for','in','at'[root@localhost]# service mysqld restart

再查询,找不到了!已经被忽略!~

mysql> select * from test.ft_tab where match(description) against('for');Empty set (0.00 sec)


参考: Full-Text Search Functions


0 0
原创粉丝点击