elasticsearch全局检索多分词器匹配

来源：互联网发布：淘宝空间怎么登陆编辑：程序博客网时间：2024/05/29 18:52

在es全局检索的需求中，需要进行多个分词器同时匹配关键词，例如：在商品名称、品牌名称和类目名称中匹配含有“西”关键字的查询结果，当一个字段匹配时即加入查询结果用sql语句表达为：select * from item where item_name like '%西%' orbrand_name like '%西%' orc_name like '%西%'

其中item_name，brand_name，c_name分别商品名称、品牌名称和类目名称。

这个简单的需求在es中却实现比较困难，原因是es在索引数据时会针对字段内容进行分词，下面列出es几种分词器的特性：

1）standard分词器

es默认的分词器，对中文支持不友好，会将中文分成单字，这样在查询多个汉字时就匹配不到doc，所以针对中文字段可使用ik。

2）ik分词器

需要单独安装ik插件，有ik_smart和ik_max_word两种分词粒度，其中ik_max_word粒度更细，但如果ik识别不出的词，就不会分出。

导致上边的全局检索例子查询“西”时匹配不到数据。

3）pinyin分词器

需要安装插件，可支持拼音全拼、简拼和首字母查询。

鉴于以上分词器的特性，在全局检索时可能需要使用几种分词器同时工作，那这种需求该如何来处理呢？答案是使用multi_field

以下为multi_field的mapping：

{
        "item" : {
            "properties" : {
                "item_name" : {
                    "type" : "multi_field",
                    "fields" : {
                        "item_name_ik" : {"type" : "string", "analyzer" :"ik"},
                        "item_name_not" : {"type" : "string", "index" : "not_analyzed"},
                        "item_name_standard" : {"type" : "string"}
                    }
                },
               "brand_name" : {
                    "type" : "multi_field",
                    "fields" : {
                        "brand_name_ik" : {"type" : "string", "analyzer" :"ik"},
                        "brand_name_not" : {"type" : "string", "index" : "not_analyzed"},
                        "brand_name_standard" : {"type" : "string"}
                    }
                },
                "c_name" : {
                    "type" : "multi_field",
                    "fields" : {
                        "c_name_ik" : {"type" : "string", "analyzer" :"ik"},
                        "c_name_not" : {"type" : "string", "index" : "not_analyzed"},
                        "c_name_standard" : {"type" : "string"}
                    }
                }
            }
        }
    }

对每个需要查询的字段分别设置不同的分词器，查询时的json如下：

{"from" : 0,
"size" : 20,
"query" : {
    "bool" : {
      "should" : [ {
        "fuzzy" : {
          "item_name.item_name_ik" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "item_name.item_name_not" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "item_name.item_name_standard" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "brand_name.brand_name_ik" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "brand_name.brand_name_not" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "brand_name.brand_name_standard" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "c_name.c_name_ik" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "c_name.c_name_not" : {
            "value" : "西"
          }
        }
      }, {
        "fuzzy" : {
          "c_name.c_name_standard" : {
            "value" : "西"
          }
        }
      } ]
    }
}
}

这样就会针对所有分词的情况，查询到含有关键字“西”的文档，如果觉得这样写的结构比较麻烦，也可使用multi_match

如下：

{
"multi_match" : {
"query" : "西",
"fields" : [ "brand_name.brand_name_standard", "item_name.item_name_standard", "c_name.c_name_standard" ....]
}
}

另外：

使用client客户端api可根据字段名获取到mapping信息，例如可根据item_name名字找到它下边的c_name_standard等名称

这样在可简化查询条件的构建，代码如下：

//查询item_name下的fileds设置，遍历出各fields的名字放入list

List<String> list = new ArrayList<String>();String fieldName = "item_name";GetFieldMappingsRequest fieldMappingsRequest = new GetFieldMappingsRequest().indices(index).types(type).fields(fieldName);GetFieldMappingsResponse responseActionFuture = client.admin().indices().getFieldMappings(fieldMappingsRequest).actionGet();GetFieldMappingsResponse.FieldMappingMetaData fieldMappingMetaData = responseActionFuture.fieldMappings(index,type,fieldName);Object field = fieldMappingMetaData.sourceAsMap().get(fieldName);if(field == null){    return list;}Map<String, Object> fieldsMap = (Map)((Map)field).get("fields");if(fieldsMap == null){    return list;}else{    Iterator<Map.Entry<String, Object>> entries = fieldsMap.entrySet().iterator();    while (entries.hasNext()) {        Map.Entry<String, Object> entry = entries.next();        System.out.println("Key = " + entry.getKey());        list.add(entry.getKey());    }}//构建查询条件SearchRequestBuilder builder = client.prepareSearch(index).setTypes(type);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
for(String field : list){    boolQueryBuilder.should(QueryBuilders.fuzzyQuery(query.getKey() + "." +field, query.getValue()));}builder.setQuery(boolQueryBuilder);
SearchResponse searchResponse = builder.execute().actionGet();SearchHits hits = searchResponse.getHits();

转载地址：点击打开链接

阅读全文

1 0