Elasticsearch源码分析五--调用Lucene查询接口之模糊查询(Fuzzy)

来源:互联网 发布:php 编译 sass 编辑:程序博客网 时间:2024/06/05 07:01
  • 简介
  • 查询语法
  • 源码分析

简介

模糊查询是基于编辑距离算法来匹配文档。编辑距离的计算基于我们提供的查询词条和被搜索文档。此查询很占用CPU资源。可以在搜索词的尾部加上字符 “~” 来进行模糊查询。

查询语法

例如,查询语句 “think~” 返回所有包含和 think 类似的关键词的文档。增量因子(boost factor)为0.2。
{
“query” : {
“fuzzy” : {
“title” : {
“value” : “think~”,
“min_similarity” : 0.2
}}}}
min_similarity:指定了一个词条被算作匹配所必须拥有的最小相似度。对字符串字段来说,这个值应该在0到1之间,包含0和1。对于数值型字段,这个值可以大于1,比如查询值是20, 设为3,则可以得到17~23的值。对于日期字段,可以把参数值设为1d、 2d、 1m等,分别表示1天、 2天、 1个月。

源码分析

'''(1)Elasticsearch code'''public class FuzzyQueryParser implements QueryParser {    public static final String NAME = "fuzzy";    @Override    public Query parse(QueryParseContext parseContext) throws IOException, QueryParsingException {        XContentParser parser = parseContext.parser();        XContentParser.Token token = parser.nextToken();        if (token != XContentParser.Token.FIELD_NAME) {            throw new QueryParsingException(parseContext.index(), "[fuzzy] query malformed, no field");        }        String fieldName = parser.currentName();        String value = null;        float boost = 1.0f;        //LUCENE 4 UPGRADE we should find a good default here I'd vote for 1.0 -> 1 edit        String minSimilarity = "0.5";        int prefixLength = FuzzyQuery.defaultPrefixLength;        int maxExpansions = FuzzyQuery.defaultMaxExpansions;        boolean transpositions = false;        MultiTermQuery.RewriteMethod rewriteMethod = null;        token = parser.nextToken();        if (token == XContentParser.Token.START_OBJECT) {            String currentFieldName = null;            while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {                if (token == XContentParser.Token.FIELD_NAME) {                    currentFieldName = parser.currentName();                } else {                    if ("term".equals(currentFieldName)) {                        value = parser.text();                    } else if ("value".equals(currentFieldName)) {                        value = parser.text();                    } else if ("boost".equals(currentFieldName)) {                        boost = parser.floatValue();                    } else if ("min_similarity".equals(currentFieldName) || "minSimilarity".equals(currentFieldName)) {                        minSimilarity = parser.text();                    } else if ("prefix_length".equals(currentFieldName) || "prefixLength".equals(currentFieldName)) {                        prefixLength = parser.intValue();                    } else if ("max_expansions".equals(currentFieldName) || "maxExpansions".equals(currentFieldName)) {                        maxExpansions = parser.intValue();                    } else if ("transpositions".equals(currentFieldName)) {                      transpositions = parser.booleanValue();                    } else if ("rewrite".equals(currentFieldName)) {                        rewriteMethod = QueryParsers.parseRewriteMethod(parser.textOrNull(), null);                    } else {                        throw new QueryParsingException(parseContext.index(), "[fuzzy] query does not support [" + currentFieldName + "]");                    }                }            }            parser.nextToken();        } else {            value = parser.text();            // move to the next token            parser.nextToken();        }        if (value == null) {            throw new QueryParsingException(parseContext.index(), "No value specified for fuzzy query");        }        Query query = null;        MapperService.SmartNameFieldMappers smartNameFieldMappers = parseContext.smartFieldMappers(fieldName);        if (smartNameFieldMappers != null) {            if (smartNameFieldMappers.hasMapper()) {                query = smartNameFieldMappers.mapper().fuzzyQuery(value, minSimilarity, prefixLength, maxExpansions, transpositions);            }        }        if (query == null) {            //LUCENE 4 UPGRADE we need to document that this should now be an int rather than a float            int edits = FuzzyQuery.floatToEdits(Float.parseFloat(minSimilarity),               value.codePointCount(0, value.length()));            '''构造Lucene的FuzzyQuery对象,参数包括查询Term、编辑距离、匹配的公共前缀长度、查询可被扩展到的最大词条数'''            query = new FuzzyQuery(new Term(fieldName, value), edits, prefixLength, maxExpansions, transpositions);        }        if (query instanceof MultiTermQuery) {            QueryParsers.setRewriteMethod((MultiTermQuery) query, rewriteMethod);        }        query.setBoost(boost);        return wrapSmartNameQuery(query, smartNameFieldMappers, parseContext);    }}'''(2)Lucene code''''''FuzzyQuery是MultiTermQuery的子类,调用父类的rewrite方法,需要将查询词"think~"重写成think、thinking等索引中存在的词,然后合并这些词的倒排表'''public class FuzzyQuery extends MultiTermQuery {   ...}public abstract class MultiTermQuery extends Query {  @Override  public final Query rewrite(IndexReader reader) throws IOException {    return rewriteMethod.rewrite(reader, this);  }

0 0
原创粉丝点击