在MYSQL中运用全文索引(FULLTEXT index)
来源:互联网 发布:大数据分析工程师 编辑:程序博客网 时间:2024/04/29 00:16
http://blog.sina.com.cn/s/blog_64589b110101c31a.html
在MYSQL中使用全文索引(FULLTEXTindex)
MYSQL的一个很有用的特性是使用全文索引(FULLTEXTindex)查找文本的能力.目前只有使用MyISAM类型表的时候有效(MyISAM是默认的表类型,如果你不知道使用的是什么类型的表,那很可能就是MyISAM).全文索引可以建立在TEXT,CHAR或者VARCHAR类型的字段,或者字段组合上.我们将建立一个简单的表用来解释各种特性.
简单用法(MATCH()函数)对3.23.23以后的版本有效,复杂的用法(IN BOOLEANMODE修饰语)对4以后的版本有效,本文的第一部分着重简单用法,第二部分讲复杂用法.
一个简单的表
我们将在整个过程中使用下面的表.CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy))TYPE=MyISAM;
如果你没有把默认的表类型设置成MyISAM以外的类型那么TYPE=MyISAM可以省略.建表之后,向其中填充一些数据,例如:INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');
如果你已经建立好了一个表,你可以使用ALTER TABLE(就像CREATEINDEX语句一样)语句添加一个全文索引,例如:ALTER TABLE fulltext_sample ADDFULLTEXT(copy)
查找文本
全文索引搜索的语法很简单,你只要MATCH字段,AGAINST你要查找的文本,例如:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('love');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
在全文索引上进行搜索是不区分大小写的,因此下面的语句也可以正常运行:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('LOVE');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
全文索引通常用来搜索自然语言文本,例如报纸文章,网页内容等等.因此MySQL为这类搜索添加了很多特性.MySQL不索引任何长度小于等于3的文本,也不索引有50%机会出现的单词.这意味着如果你的表少于2条记录,基于全文索引的搜索不会返回任何东西.将来,MySQL会使这项功能更灵活,但是现在它应该可以适合大部分自然语言的使用.如果你的数据库中的大部分记录都包含”music”,你很可能不希望返回这些记录,你可以使用INBOOLEAN MODE修饰符来获得50%左右的阀值,见本文第二部分.
结果将按照关联性从高到底的顺序返回.
主要特性
下面是标准的全文索引搜索的主要特性:
1.排除重复词语
2.排除长度小于4的词语
3.排除在多于一半记录中出现的词语(就是说只要要有3条记录)
4.带连字符的词语被认为两个词语
5.结果按照关联度降序返回
6.忽略列表中的词语也被从搜索结果中排除.忽略列表基于普通的英文单词,因此如果你的数据用作不同的目的,你可能希望改变忽略列表.不幸的是,这样作并不容易.你需要编辑文件myisam/ft_static.c,重新编辑MySQL,并重建索引!这里有一个忽略列表.注意,这些在不同的版本里有所更改.
忽略列表"a", "a's", "able", "about", "above", "according","accordingly", "across", "actually", "after", "afterwards","again", "against", "ain't", "all", "allow", "allows", "almost","alone", "along", "already", "also", "although", "always", "am","among", "amongst", "an", "and", "another", "any", "anybody","anyhow", "anyone", "anything", "anyway", "anyways", "anywhere","apart", "appear", "appreciate", "appropriate", "are", "aren't","around", "as", "aside", "ask", "asking", "associated", "at","available", "away", "awfully", "b", "be", "became", "because","become", "becomes", "becoming", "been", "before", "beforehand","behind", "being", "believe", "below", "beside", "besides", "best","better", "between", "beyond", "both", "brief", "but", "by", "c","c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause","causes", "certain", "certainly", "changes", "clearly", "co","com", "come", "comes", "concerning", "consequently", "consider","considering", "contain", "containing", "contains","corresponding", "could", "couldn't", "course", "currently", "d","definitely", "described", "despite", "did", "didn't", "different","do", "does", "doesn't", "doing", "don't", "done", "down","downwards", "during", "e", "each", "edu", "eg", "eight", "either","else", "elsewhere", "enough", "entirely", "especially", "et","etc", "even", "ever", "every", "everybody", "everyone","everything", "everywhere", "ex", "exactly", "example", "except","f", "far", "few", "fifth", "first", "five", "followed","following", "follows", "for", "former", "formerly", "forth","four", "from", "further", "furthermore", "g", "get", "gets","getting", "given", "gives", "go", "goes", "going", "gone", "got","gotten", "greetings", "h", "had", "hadn't", "happens", "hardly","has", "hasn't", "have", "haven't", "having", "he", "he's","hello", "help", "hence", "her", "here", "here's", "hereafter","hereby", "herein", "hereupon", "hers", "herself", "hi", "him","himself", "his", "hither", "hopefully", "how", "howbeit","however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if","ignored", "immediate", "in", "inasmuch", "inc", "indeed","indicate", "indicated", "indicates", "inner", "insofar","instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll","it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept","know", "knows", "known", "l", "last", "lately", "later", "latter","latterly", "least", "less", "lest", "let", "let's", "like","liked", "likely", "little", "look", "looking", "looks", "ltd","m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile","merely", "might", "more", "moreover", "most", "mostly", "much","must", "my", "myself", "n", "name", "namely", "nd", "near","nearly", "necessary", "need", "needs", "neither", "never","nevertheless", "new", "next", "nine", "no", "nobody", "non","none", "noone", "nor", "normally", "not", "nothing", "novel","now", "nowhere", "o", "obviously", "of", "off", "often", "oh","ok", "okay", "old", "on", "once", "one", "ones", "only", "onto","or", "other", "others", "otherwise", "ought", "our", "ours","ourselves", "out", "outside", "over", "overall", "own", "p","particular", "particularly", "per", "perhaps", "placed", "please","plus", "possible", "presumably", "probably", "provides", "q","que", "quite", "qv", "r", "rather", "rd", "re", "really","reasonably", "regarding", "regardless", "regards", "relatively","respectively", "right", "s", "said", "same", "saw", "say","saying", "says", "second", "secondly", "see", "seeing", "seem","seemed", "seeming", "seems", "seen", "self", "selves", "sensible","sent", "serious", "seriously", "seven", "several", "shall", "she","should", "shouldn't", "since", "six", "so", "some", "somebody","somehow", "someone", "something", "sometime", "sometimes","somewhat", "somewhere", "soon", "sorry", "specified", "specify","specifying", "still", "sub", "such", "sup", "sure", "t", "t's","take", "taken", "tell", "tends", "th", "than", "thank", "thanks","thanx", "that", "that's", "thats", "the", "their", "theirs","them", "themselves", "then", "thence", "there", "there's","thereafter", "thereby", "therefore", "therein", "theres","thereupon", "these", "they", "they'd", "they'll", "they're","they've", "think", "third", "this", "thorough", "thoroughly","those", "though", "three", "through", "throughout", "thru","thus", "to", "together", "too", "took", "toward", "towards","tried", "tries", "truly", "try", "trying", "twice", "two", "u","un", "under", "unfortunately", "unless", "unlikely", "until","unto", "up", "upon", "us", "use", "used", "useful", "uses","using", "usually", "v", "value", "various", "very", "via", "viz","vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd","we'll", "we're", "we've", "welcome", "well", "went", "were","weren't", "what", "what's", "whatever", "when", "whence","whenever", "where", "where's", "whereafter", "whereas", "whereby","wherein", "whereupon", "wherever", "whether", "which", "while","whither", "who", "who's", "whoever", "whole", "whom", "whose","why", "will", "willing", "wish", "with", "within", "without","won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes","yet", "you", "you'd", "you'll", "you're", "you've", "your","yours", "yourself", "yourselves", "z","zero",
让我们看一下其中的一些词.如果你懒的输入,但是想查找”love”这个词,象下面这样:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('lov');
Empty set (0.00 sec)
什么都没返回,因为全文索引只包含完整的单词,不是部分单词.如果想得到返回,你必须把单词写完整,就像第一个例子里一样.
就像我们提过的,连字符单词在全文索引中被排除(它们被作为单独的单词索引),因此下面的语句什么都不返回:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('all-out');
Empty set (0.00 sec)
很不幸,两个单词都小于4个字符,因此单独搜索时也不会出现,而且通常的搜索中也不会出现.本文的第二部分中使用BOOLEANMODE搜索可以搜索部分的或者包含连字符的单词.
你也可以一次搜索多个单词,用逗号分隔.下面的例子查找包含”here”和”appears”的记录:mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('here','appears');
Empty set (0.01 sec)
出乎意料这个语句没有返回.但是仔细看看忽略列表,这个词被列在其中,因此被从索引中排除了.忽略列表可能是人们解释MySQL全文索引没有生效的通常原因.如果你的查询返回了一个结果,那么你的版本的MySQL的忽略列表不包含”here”这个词.
关联度
下面的例子说明记录返回的优先级mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy)AGAINST('good,alert');
+---------------------------+
| copy |
+---------------------------+
| A good alert |
| It appears good from here |
| An all-out alert |
+---------------------------+
记录”A goodalert”首先出现,因为它同时包含要搜索的两个词.你不必相信我-只需要看看MySQL在结果中显示的优先级.简单的在字段列表中重复MATCH()函数,例如:mysql> SELECT copy,MATCH(copy) AGAINST('good,alert') ASrelevance
FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+------------------+
| copy | relevance |
+---------------------------+------------------+
| A good alert | 1.3551264824316 |
| An all-out alert | 0.68526663197496 |
| It appears good from hear | 0.67003110026735 |
+---------------------------+------------------+
关联度的计算非常复杂,它基于索引中单词的数量,记录中不同单词的个数,索引和返回结果中单词的总数,以及单词的重要程度.这个数字可能在你的MySQL版本中有所不同,MySQL偶尔会强化计算逻辑.
对大多数应用来说标准的全文索引搜索非常有用而充分,MySQL 4让它更加强大.
- 在MYSQL中运用全文索引(FULLTEXT index)
- 在MYSQL中运用全文索引(FULLTEXT index)(转)
- 在MYSQL中运用全文索引(FULLTEXT index)
- 在mysql中运用全文索引(FULLTEXT index)
- 在MYSQL中运用全文索引(FULLTEXT index)
- MySQL使用全文索引(fulltext index)
- MySQL使用全文索引(fulltext index)
- MySQL全文索引 Fulltext
- MySQL 全文索引(fulltext)
- mysql 全文索引 FULLTEXT
- mysql全文索引FULLTExt
- mysql全文索引:fulltext
- MySQL FULLTEXT全文搜索引擎
- MySQL-创建和使用全文索引(FullText)
- MYSQL的索引类型:PRIMARY, INDEX,UNIQUE,FULLTEXT,SPAIAL
- 在mysql中使用全文索引
- Mysql FULLTEXT索引说明
- mysql使用fulltext索引
- Effective Java之对所有对象通用的方法
- Observer模式
- 【Java 并发系列】深入分析Volatile的实现原理
- 后缀表达式
- CSDN博客等级规则
- 在MYSQL中运用全文索引(FULLTEXT index)
- Spring 简介
- unity for android 开发
- 块设备
- 定位uncaught exception类型的崩溃所在行数
- RasEnumAutodialAddresses
- SQL 触发器详解
- git pull 与 git push
- 利用PHPMailer 进行邮件发送