Elasticsearch源码分析五--调用Lucene查询接口之模糊查询(Fuzzy)
来源:互联网 发布:php 编译 sass 编辑:程序博客网 时间:2024/06/05 07:01
- 简介
- 查询语法
- 源码分析
简介
模糊查询是基于编辑距离算法来匹配文档。编辑距离的计算基于我们提供的查询词条和被搜索文档。此查询很占用CPU资源。可以在搜索词的尾部加上字符 “~” 来进行模糊查询。
查询语法
例如,查询语句 “think~” 返回所有包含和 think 类似的关键词的文档。增量因子(boost factor)为0.2。
{
“query” : {
“fuzzy” : {
“title” : {
“value” : “think~”,
“min_similarity” : 0.2
}}}}
min_similarity:指定了一个词条被算作匹配所必须拥有的最小相似度。对字符串字段来说,这个值应该在0到1之间,包含0和1。对于数值型字段,这个值可以大于1,比如查询值是20, 设为3,则可以得到17~23的值。对于日期字段,可以把参数值设为1d、 2d、 1m等,分别表示1天、 2天、 1个月。
源码分析
'''(1)Elasticsearch code'''public class FuzzyQueryParser implements QueryParser { public static final String NAME = "fuzzy"; @Override public Query parse(QueryParseContext parseContext) throws IOException, QueryParsingException { XContentParser parser = parseContext.parser(); XContentParser.Token token = parser.nextToken(); if (token != XContentParser.Token.FIELD_NAME) { throw new QueryParsingException(parseContext.index(), "[fuzzy] query malformed, no field"); } String fieldName = parser.currentName(); String value = null; float boost = 1.0f; //LUCENE 4 UPGRADE we should find a good default here I'd vote for 1.0 -> 1 edit String minSimilarity = "0.5"; int prefixLength = FuzzyQuery.defaultPrefixLength; int maxExpansions = FuzzyQuery.defaultMaxExpansions; boolean transpositions = false; MultiTermQuery.RewriteMethod rewriteMethod = null; token = parser.nextToken(); if (token == XContentParser.Token.START_OBJECT) { String currentFieldName = null; while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { if (token == XContentParser.Token.FIELD_NAME) { currentFieldName = parser.currentName(); } else { if ("term".equals(currentFieldName)) { value = parser.text(); } else if ("value".equals(currentFieldName)) { value = parser.text(); } else if ("boost".equals(currentFieldName)) { boost = parser.floatValue(); } else if ("min_similarity".equals(currentFieldName) || "minSimilarity".equals(currentFieldName)) { minSimilarity = parser.text(); } else if ("prefix_length".equals(currentFieldName) || "prefixLength".equals(currentFieldName)) { prefixLength = parser.intValue(); } else if ("max_expansions".equals(currentFieldName) || "maxExpansions".equals(currentFieldName)) { maxExpansions = parser.intValue(); } else if ("transpositions".equals(currentFieldName)) { transpositions = parser.booleanValue(); } else if ("rewrite".equals(currentFieldName)) { rewriteMethod = QueryParsers.parseRewriteMethod(parser.textOrNull(), null); } else { throw new QueryParsingException(parseContext.index(), "[fuzzy] query does not support [" + currentFieldName + "]"); } } } parser.nextToken(); } else { value = parser.text(); // move to the next token parser.nextToken(); } if (value == null) { throw new QueryParsingException(parseContext.index(), "No value specified for fuzzy query"); } Query query = null; MapperService.SmartNameFieldMappers smartNameFieldMappers = parseContext.smartFieldMappers(fieldName); if (smartNameFieldMappers != null) { if (smartNameFieldMappers.hasMapper()) { query = smartNameFieldMappers.mapper().fuzzyQuery(value, minSimilarity, prefixLength, maxExpansions, transpositions); } } if (query == null) { //LUCENE 4 UPGRADE we need to document that this should now be an int rather than a float int edits = FuzzyQuery.floatToEdits(Float.parseFloat(minSimilarity), value.codePointCount(0, value.length())); '''构造Lucene的FuzzyQuery对象,参数包括查询Term、编辑距离、匹配的公共前缀长度、查询可被扩展到的最大词条数''' query = new FuzzyQuery(new Term(fieldName, value), edits, prefixLength, maxExpansions, transpositions); } if (query instanceof MultiTermQuery) { QueryParsers.setRewriteMethod((MultiTermQuery) query, rewriteMethod); } query.setBoost(boost); return wrapSmartNameQuery(query, smartNameFieldMappers, parseContext); }}'''(2)Lucene code''''''FuzzyQuery是MultiTermQuery的子类,调用父类的rewrite方法,需要将查询词"think~"重写成think、thinking等索引中存在的词,然后合并这些词的倒排表'''public class FuzzyQuery extends MultiTermQuery { ...}public abstract class MultiTermQuery extends Query { @Override public final Query rewrite(IndexReader reader) throws IOException { return rewriteMethod.rewrite(reader, this); }
0 0
- Elasticsearch源码分析五--调用Lucene查询接口之模糊查询(Fuzzy)
- Elasticsearch源码分析一--调用Lucene查询接口之match_all查询
- Elasticsearch源码分析二--调用Lucene查询接口之常用词查询
- Elasticsearch源码分析三--调用Lucene查询接口之词条查询
- Elasticsearch源码分析四--调用Lucene查询接口之通配符查询
- Elasticsearch源码分析六--调用Lucene查询接口之前缀查询(Prefix)
- Elasticsearch源码分析七--调用Lucene查询接口之范围查询
- Elasticsearch源码分析十--调用Lucene查询分析器Analyzer
- elasticsearch源码分析之search查询(十一)
- lucene-FuzzyQuery模糊查询
- ElasticSearch 模糊匹配查询
- SAP HANA 模糊查询(SAP HANA SQL Function Fuzzy search)
- Lucene之模糊、精确、匹配、范围、多条件查询-yellowcong
- Lucene学习笔记之(五)lucene的特殊查询
- elasticsearch源码分析之Transport(五)
- elasticsearch源码分析之Transport(五)
- lucene源码-查询过程
- Elasticsearch查询分析
- pat 1019
- ASP.NET导入Excel到数据库(SQL)
- mongodb子文档处理--常用的mongoose方法
- 远程桌面开启
- Django模型层Meta内部类详解
- Elasticsearch源码分析五--调用Lucene查询接口之模糊查询(Fuzzy)
- Android自定义上下文菜单
- 勿忘初心 不负梦想
- 第八周项目3-指向学生类的指针
- bzoj-2286 消耗战【虚树+倍增lca+单调栈】
- webservice--获取天气信息
- NTFS Journaling
- 24位RGB数据保存为BMP图片
- Mysql学习篇之---Ubuntu环境下远程连接